A Novel Model based on Encoder-Decoder Architecture and Attention Mechanism for Automatic Abstractive Text Summarization
Subject Areas :hasan aliakbarpor 1 , mohammadtaghi manzouri 2 , amirmasoud rahmani 3
1 -
2 -
3 -
Keywords: Deep learning, Abstractive summarization, Encoder-decoder architecture, Auxiliary attention mechanism, Linguistic features,
Abstract :
By the extension of the Web and the availability of a large amount of textual information, the development of automatic text summarization models as an important aspect of natural language processing has attracted many researchers. However, with the growth of deep learning methods in the field of text processing, text summarization has also entered a new phase of development and abstractive text summarization has experienced significant progress in recent years. Even though, it can be claimed that all the potential of deep learning has not been used for this aim and the need for progress in this field, as well as considering the human cognition in creating the summarization model, is still felt. In this regard, an encoder-decoder architecture equipped with auxiliary attention is proposed in this paper which not only used the combination of linguistic features and embedding vectors as the input of the learning model but also despite previous studies that commonly employed the attention mechanism in the decoder, it utilized auxiliary attention mechanism in the encoder to imitate human brain and cognition in summary generation. By the employment of the proposed attention mechanism, only the most important parts of the text rather than the whole input text are encoded and then sent to the decoder to generate the summary. The proposed model also used a switch with a threshold in the decoder to overcome the rare words problem. The proposed model was examined on CNN / Daily Mail and DUC-2004 datasets. Based on the empirical results and according to the ROUGE evaluation metric, the proposed model obtained a higher accuracy compared to other existing methods for generating abstractive summaries on both datasets.
[1] M. Dey and D. Das, "A Deep Dive into Supervised Extractive and Abstractive Summarization from Text," in Data Visualization and Knowledge Engineering: Springer, 2020, pp. 109-132.
[2] T. Shi, Y. Keneshloo, N. Ramakrishnan, and C. K. Reddy, "Neural abstractive text summarization with sequence-to-sequence models," ACM Transactions on Data Science, vol. 2, no. 1, pp. 1-37, 2021.
[3] A. M. Al-Numai and A. M. Azmi, "The Development of Single-Document Abstractive Text Summarizer During the Last Decade," in Trends and Applications of Text Summarization Techniques: IGI Global, 2020, pp. 32-60.
[4] S. Chakraborty, X. Li, and S. Chakraborty, "A more abstractive summarization model," arXiv preprint arXiv:2002.10959, 2020.
[5] L. Abualigah, M. Q. Bashabsheh, H. Alabool, and M. Shehab, "Text Summarization: A Brief Review," in Recent Advances in NLP: The Case of Arabic Language: Springer, 2020, pp. 1-15.
[6] Y. Dong, "A survey on neural network-based summarization methods," arXiv preprint arXiv:1804.04589, 2018.
[7] F. Zhao, B. Quan, J. Yang, J. Chen, Y. Zhang, and X. Wang, "Document Summarization using Word and Part-of-speech based on Attention Mechanism," in Journal of Physics: Conference Series, 2019, vol. 1168, no. 3: IOP Publishing, p. 032008.
[8] D. Suleiman and A. Awajan, "Deep Learning Based Abstractive Text Summarization: Approaches, Datasets, Evaluation Measures, and Challenges," Mathematical Problems in Engineering, vol. 2020, 2020.
[9] H. Lin and V. Ng, "Abstractive Summarization: A Survey of the State of the Art," in Proceedings of the AAAI Conference on Artificial Intelligence, 2019, vol. 33, pp. 9815-9822.
[10] W. Kryściński, N. S. Keskar, B. McCann, C. Xiong, and R. Socher, "Neural text summarization: A critical evaluation," arXiv preprint arXiv: 1908.08960, 2019.
[11] X. Xiang, G. Xu, X. Fu, Y. Wei, L. Jin, and L. Wang, "Skeleton to Abstraction: An Attentive Information Extraction Schema for Enhancing the Saliency of Text Summarization," Information, vol. 9, no. 9, p. 217, 2018.
[12] S. Song, H. Huang, and T. Ruan, "Abstractive text summarization using LSTM-CNN based deep learning," Multimedia Tools and Applications, vol. 78, no. 1, pp. 857-875, 2019.
[13] H. P. Luhn, "The automatic creation of literature abstracts," IBM Journal of research and development, vol. 2, no. 2, pp. 159-165, 1958.
[14] I. Sutskever, O. Vinyals, and Q. V. Le, "Sequence to sequence learning with neural networks," in Advances in neural information processing systems, 2014, pp. 3104-3112.
[15] A. M. Rush, S. Chopra, and J. Weston, "A neural attention model for abstractive sentence summarization," arXiv preprint arXiv:1509.00685, 2015.
[16] D. Bahdanau, K. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and translate," arXiv preprint arXiv: 1409.0473,2014.
[17] S. Chopra, M. Auli, and A. M. Rush, "Abstractive sentence summarization with attentive recurrent neural networks," in Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016, pp. 93-98.
[18] W. Zeng, W. Luo, S. Fidler, and R. Urtasun, "Efficient summarization with read-again and copy mechanism," arXiv preprint arXiv:1611.03382, 2016.
[19] S. Shen, Y. Zhao, Z. Liu, and M. Sun, "Neural headline generation with sentence-wise optimization," arXiv preprint arXiv:1604.01904, 2016.
[20] S. Takase, J. Suzuki, N. Okazaki, T. Hirao, and M. Nagata, "Neural headline generation on abstract meaning representation," in Proceedings of the 2016 conference on empirical methods in natural language processing, 2016, pp. 1054-1059.
[21] T. Wang, P. Chen, K. Amaral, and J. Qiang, "An experimental study of LSTM encoder-decoder model for text simplification," arXiv preprint arXiv:1609.03663, 2016.
[22] Q. Chen, X. Zhu, Z. Ling, S. Wei, and H. Jiang, "Distraction-based neural networks for document summarization," arXiv preprint arXiv:1610.08462, 2016.
[23] A. See, P. J. Liu, and C. D. Manning, "Get to the point: Summarization with pointer-generator networks," arXiv preprint arXiv:1704.04368, 2017.
[24] K. Al-Sabahi, Z. Zuping, and Y. Kang, "Bidirectional attentional encoder-decoder model and bidirectional beam search for abstractive summarization," arXiv preprint arXiv:1809.06662, 2018.
[25] K. Yao, L. Zhang, D. Du, T. Luo, L. Tao, and Y. Wu, "Dual encoding for abstractive text summarization," IEEE transactions on cybernetics, 2018.
[26] W. H. Alquliti and N. B. A. Ghani, "Convolutional Neural Network based for Automatic Text Summarization."
[27] Y. Zhang, D. Li, Y. Wang, Y. Fang, and W. Xiao, "Abstract Text Summarization with a Convolutional Seq2seq Model," Applied Sciences, vol. 9, no. 8, p. 1665, 2019.
[28] R. Nallapati, B. Zhou, C. Gulcehre, and B. Xiang, "Abstractive text summarization using sequence-to-sequence rnns and beyond," arXiv preprint arXiv:1602.06023, 2016.
[29] T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Distributed Representations of Words and Phrases and their Compositionality, Nips," 2013.
[30] W. Yoon, Y. S. Yeo, M. Jeong, B.-J. Yi, and J. Kang, "Learning by Semantic Similarity Makes Abstractive Summarization Better," arXiv preprint arXiv:2002.07767, 2020.
[31] A. Graves, "Generating sequences with recurrent neural networks," arXiv preprint arXiv:1308.0850, 2013.
[32] P. Over, H. Dang, and D. Harman, "DUC in context," Information Processing & Management, vol. 43, no. 6, pp. 1506-1520, 2007.
[33] C.-Y. Lin, "ROUGE: A Package for Automatic Evaluation of Summaries," in Association for Computational Linguistic, Barcelona, Spain, 2004.
[34] A. Fan, D. Grangier, and M. Auli, "Controllable abstractive summarization," arXiv preprint arXiv:1711.05217, 2017.
[35] R. Paulus, C. Xiong, and R. Socher, "A deep reinforced model for abstractive summarization," arXiv preprint arXiv:1705.04304, 2017.
[36] W.-T. Hsu, C.-K. Lin, M.-Y. Lee, K. Min, J. Tang, and M. Sun, "A unified model for extractive and abstractive summarization using inconsistency loss," arXiv preprint arXiv:1805.06266, 2018.
[37] A. Celikyilmaz, A. Bosselut, X. He, and Y. Choi, "Deep communicating agents for abstractive summarization," arXiv preprint arXiv:1803.10357, 2018.
[38] H. Zhang, J. Xu, and J. Wang, "Pretraining-based natural language generation for text summarization," arXiv preprint arXiv:1902.09243, 2019.
[39] P. Li, L. Bing, and W. Lam, "Actor-critic based training framework for abstractive summarization," arXiv preprint arXiv:1803.11070, 2018.
[40] Q. Zhou, N. Yang, F. Wei, and M. Zhou, "Selective encoding for abstractive sentence summarization," arXiv preprint arXiv:1704.07073, 2017.