On the Behavior of Pre-trained Word Embedding Variants in Deep Headline Generation from Persian Texts
Subject Areas : electrical and computer engineeringMohammad Ebrahim Shenassa 1 , Behrooz Minaei-Bidgoli 2
1 -
2 - Faculty member of compter engineering school
Keywords: Deep learning, sequence-to-sequence models, BERT, headline generation, benchmark dataset,
Abstract :
Inspired by sequence-to-sequence models for machine translation, deep-learning based summarization methods were presented. The summaries generated this way, are structurally more readable and usually convey the complete meaning to the reader. In these methods, embedding vectors are used for semantic representation, in which the weight of each word vector is learned according to its neighboring words from a large corpus. In static word embedding, the weight of the vectors is obtained by choosing a proximity window for each word. But in contextual ones like BERT, multilayer transformers are applied to calculate the weight of these vectors, which pay attention to all the words in the text. So far, several papers have shown that contextual word embedding are more successful than the other ones due to the ability of fine-tuning the weights to perform a specific natural language processing task. However, the performance of the initial weights of these vectors is not investigated for headline generation from Persian texts. In this paper, we will investigate the behavior of pre-trained word embedding variants without fine-tuning in deep headline generation from Persian texts. To train the headline generation model, "Elam Net" is used, which is a Persian corpus containing about 350 thousand pairs of abstracts and titles of scientific papers. The results show that the use of BERT model, even without fine-tuning its weights, is effective in improving the quality of generated Persian headlines, bringing the ROUGE-1 metric to 42%, which is better than the other pre-trained ones.
[1] K. Woodsend, Y. Feng, and M. Lapata, "Title generation with quasi-synchronous grammar," in Proc. EMNLP Conf. Empir. Methods Nat. Lang. Process, pp. 513-523, Cambridge, MA, USA, 9-11 Oct. 2010.
[2] Y. Liu and M. Lapata, "Text summarization with pretrained encoders," in Proc. of the Conf. on Empirical Methods in Natural Language Processing and the 9th Int. Joint Conf. on Natural Language Processing, EMNLP-IJCNLP'19, pp. 3728-3738, Hong Kong, China, 3-7 Nov. 2019.
[3] M. Farahani, M. Gharachorloo, and M. Manthouri, "Leveraging parsBERT and pretrained mT5 for persian abstractive text summarization," in Proc. 26th Int. Comput. Conf. Comput. Soc. Iran, CSICC'21, 6 pp., Tehran, Iran, 3-4 Mar. 2021.
[4] M. E. Shenassa and B. Minaei-Bidgoli, "ElmNet: a benchmark dataset for generating headlines from Persian papers," Multimed. Tools Appl., vol. 81, no. 2, pp. 1853-1866, Jan. 2022.
[5] B. Dorr, D. Zajic, and R. Schwartz, "Hedge trimmer," in Proc. of the HLT-NAACL 03 on Text Summarization Workshop, vol. 5, 8 pp., Stroudsburg, PA, USA, 31-31 May 2003.
[6] L. Vanderwende, H. Suzuki, and C. Brockett, "Microsoft research at DUC2006: task-focused summarization with sentence simplification and lexical expansion," in Proc. of Document Understanding Workshop, DUC'06, pp. 70-77, New York, NY, USA, 6-8 Jun. 2006.
[7] J. M. Conroy, J. D. Schlesinger, D. P. O'leary, and J. Goldstein, "Back to basics: CLASSY 2006," in Proc. of Document Understanding Workshop, DUC'06, pp. 150-158, New York, NY, USA, 6-8 Jun. 2006.
[8] K. Knight and D. Marcu, "Summarization beyond sentence extraction: a probabilistic approach to sentence compression," Artif. Intell., vol. 139, no. 1, pp. 91-107, Jul. 2002.
[9] M. Galley and K. McKeown, "Lexicalized markov grammars for sentence compression," in Proc. The Conf. of the North American Chapter of the Association for Computational Linguistics, Hlt-Naacl'07, pp. 180-187, 2007.
[10] J. Turner and E. Charniak, "Supervised and unsupervised learning for sentence compression," in Proc. of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 290-297, Ann Arbor, MI, USA, Jun. 2005.
[11] E. Alfonseca, D. Pighin, and G. Garrido, "Heady: news headline abstraction through event pattern clustering," in Proc. 51st Annu. Meet. Assoc. Comput. Linguist. Conf., ACL'13, vol. 1, pp. 1243-1253, Sofia, Bulgaria, 4-9 Aug. 2013.
[12] K. Filippova, E. Alfonseca, C. A. Colmenares, L. Kaiser, and O. Vinyals, "Sentence compression by deletion with LSTMs," in Proc. Conf. on Empirical Methods in Natural Language Processing, EMNLP'15, pp. 360-368, Lisbon, Portugal, 17-21 Sept.2015.
[13] W. Che, Y. Zhao, H. Guo, Z. Su, and T. Liu, "Sentence compression for aspect-based sentiment analysis," IEEE/ACM Trans. Audio Speech Lang. Process., vol. 23, no. 12, pp. 2111-2124, Dec. 2015.
[14] Z. Wei, Y. Liu, C. Li, and W. Gao, "Using tweets to help sentence compression for news highlights generation," Social Media Content Analysis: Natural Language Processing and Beyond, vol. 3, pp. 309-320, Nov. 2017.
[15] M. Banko, V. O. Mittal, and M. J. Witbrock, "Headline generation based on statistical translation," in Proc. of the 38th Annual Meeting on Association for Computational Linguistics, pp. 318-325, Hong Kong, China, 3-6 Oct. 2000.
[16] R. Sun, Y. Zhang, M. Zhang, and D. Ji, "Event-driven headline generation," in Proc. ACL-IJCNLP 2015-53rd Annual Meeting of the Association for Computational Linguistics and the 7th Int.Joint Conf. on Natural Language Processing of the Asian Federation of Natural Language Processing, pp. 462-472, Beijing, China, 26-31 Jul. 2015.
[17] S. Chopra, M. Auli, and A. M. Rush, "Abstractive sentence summarization with attentive recurrent neural networks," in Proc. of the 2016 Conf. of the North American Chapter of the Association for Computational Linguistics, pp. 93-98, San Diego, CA, USA, 12-17 Jun. 2016.
[18] R. Nallapati, B. Zhou, C. dos Santos, C. Gulcehre, and B. Xiang, "Abstractive text summarization using sequence-to-sequence RNNs and Beyond," in Proc. 20th SIGNLL Conf. Comput. Nat. Lang. Learn., pp. 280-290, Berlin, Germany, 11-12 Aug. 2016.
[19] A. See, P. J. Liu, and C. D. Manning, "Get to the point: summarization with pointer-generator networks," in Proc. 55th Annu. Meet. Assoc. for Comput. Linguist., vol. 1, pp. 1073-1083, Vancouver, Canada, 30 Jul.-4 Aug. 2017.
[20] J. Gehring, M. Auli, D. Grangier, D. Yarats, and Y. N. Dauphin, "Convolutional sequence to sequence learning," in Proc. of the 34th Int. Conf. on Machine Learning, ICM'17L, pp. 1243-1252, Sydney, Australia, 6-11 Aug. 2017.
[21] P. Kouris, G. Alexandridis, and A. Stafylopatis, "Abstractive text summarization based on deep learning and semantic content generalization," in Proc. 57th Annual Meeting of the Association for Computational Linguistics, Conf., ACL'19, pp. 5082-5092, Florence, Italy, 28 Jul.-2 Aug. 2020.
[22] G. Klein, Y. Kim, Y. Deng, J. Senellart, and A. M. Rush, "OpenNMT: open-source toolkit for neural machine translation," in Proc. 55th Annual Meeting of the Association for Computational Linguistics, Proc. of System Demonstrations, ACL'17, pp. 67-72, Vancouver, Canada, 30 Jul.-4 Aug. 2017.
[23] Y. Liu, et al., RoBERTa: A Robustly Optimized BERT Pretraining Approach, http://arxiv.org/abs/1907.11692, 2019.
[24] C. Raffel, et al., "Exploring the limits of transfer learning with a unified text-to-text transformer," J. Mach. Learn. Res., vol. 21, pp. 1-67, 2020.
[25] M. Lewis, et al., "BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension," in Proc. of the Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 7871-7880, 5-10 Jul. 2020.
[26] Z. Yang, et al., "XLNet: generalized autoregressive pretraining for language understanding," in Proc. 33rd Conference on Neural Information Processing Systems, pp. 5753-5763, Vancouver, Canada, 8-14 Dec. 2019.
[27] K. Song, B. Wang, Z. Feng, L. Ren, and F. Liu, "Controlling the amount of verbatim copying in abstractive summarization," in Proc. 34th AAAI Conf. on Artificial Intelligence, AAAI'20, pp. 8902-8909, New York, NY, USA, 7-12 Feb. 2020.
[28] D. Bahdanau, K. H. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and translate," in Proc. 3rd Int. Conf. on Learning Representations, ICLR'15., 15 pp., San Diego, CA, USA, 7-9 May 2015.
[29] J. Pennington, R. Socher, and C. D. Manning, "GloVe: global vectors for word representation," in Proc. of the Conf. on Empirical Methods in Natural Language Processing, EMNLP'14, pp. 1532-1543, Doha, Qatar.25-29 Oct. 2014.
[30] A. Vaswani, et al., "Attention is all you need," in Proc. of the 31st Int. Conf. on Neural Information Processing Systems, NIPS'27, pp. 5999-6009, Long Beach, CA, USA, 4-9 Dec. 2017.
[31] HAZM, "Python library for digesting Persian text," Sobhe, https://github.com/sobhe/hazm%0Ahttps://github.com/sobhe/hazm, 2014.
[32] ن. غنی و ن. ریاحی، "خلاصهسازی چکیدهای متون فارسی با رویکرد مبتنی بر گراف،" مجموعه مقالات سیزدهمین کنفرانس بینالمللی فناوری اطلاعات،کامپیوتر و مخابرات، 22 ص.، تفلیس،گرجستان، 14 آبان 1400.
[33] M. Moradi, M. Dashti, and M. Samwald, "Summarization of biomedical articles using domain-specific word embeddings and graph ranking," J. Biomed. Inform., vol. 107, Article ID: 103452. Jul. 2020.
[34] D. Anand and R. Wagh, "Effective deep learning approaches for summarization of legal texts," J. King Saud Univ.-Comput. Inf. Sci., vol. 34, no. 5, pp. 2141-2150, May 2022.
[35] J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, "BERT: pre-training of deep bidirectional transformers for language understanding," in Proc. Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT'19, pp. 4171-4186, Minneapolis, MN, USA3-7 Jun. 2019.
[36] C. Y. Lin, " ROUGE: a package for automatic evaluation of summaries," in Proc. of the Workshop on Text Summarization Branches Out, pp. 74–81, Barcelona, Spain, 25–26 July 2004.
[37] T. A. Dang and N. T. T. Nguyen, "Abstractive text summarization using pointer-generator networks with pre-trained word embedding," in Proc. ACM Int. Conf. Proc. Series, pp. 473-478, Hanoi, Viet Nam4-6 Dec. 2019.
[38] D. Nam, J. Yasmin, and F. Zulkernine, "Effects of pre-trained word embeddings on text-based deception detection," in Proc. IEEE 18th Int. Conf. on Dependable, Autonomic and Secure Computing, IEEE 18th Int. Conf. on Pervasive Intelligence and Computing, IEEE 6th Int. Conf. on Cloud and Big Data Computing and IEEE 5th Int. Conf. on Cybe Conf on Cyber Science and Technology Congress, pp. 437-443, Calgary, Canada, 17-22 Aug. 2020.
[39] R. Weng, H. Yu, S. Huang, S. Cheng, and W. Luo, "Acquiring knowledge from pre-trained model to neural machine translation," in Proc. 34th AAAI Conf. on Artificial Intelligence, AAAI'20, pp. 9266-9273, New York, NY, USA, 7-12 Feb. 2020.
[40] S. Gehrmann, Y. Deng, and A. M. Rush, "Bottom-up abstractive summarization," in Proc. of the Conf. on Empirical Methods in Natural Language Processing, EMNLP'18, pp. 4098-4109, Brussels, Belgium, 31 Oct.-4 Nov. 2018.
[41] K. Ethayarajh, "How contextual are contextualized word representations? comparing the geometry of BERT, ELMO, and GPT-2 embeddings," in Proc. Conf. on Empirical Methods in Natural Language Processing and 9th Inte. Joint Conf. on Natural Language Processing, EMNLP-IJCNLP'19, pp. 55-65, Hong Kong, China, 3-7 Nov. 2019.
[42] I. Beltagy, M. E. Peters, and A. Cohan, Longformer: The Long-Document Transformer, [Online] Available: http://arxiv.org/abs/2004.05150, 2020.