A Survey on Multi-document Summarization and Domain-Oriented Approaches
محورهای موضوعی : Natural Language ProcessingMahsa Afsharizadeh 1 , Hossein Ebrahimpour-Komleh 2 , Ayoub Bagheri 3 , Grzegorz Chrupała 4
1 - University of Kashan
2 - University of Kashan
3 - Utrecht University
4 - Tilburg University
کلید واژه: Multi-Document Summarization, Single Document Summarization, Extractive, Abstractive, Domain-Oriented, ROUGE,
چکیده مقاله :
Before the advent of the World Wide Web, lack of information was a problem. But with the advent of the web today, we are faced with an explosive amount of information in every area of search. This extra information is troublesome and prevents a quick and correct decision. This is the problem of information overload. Multi-document summarization is an important solution for this problem by producing a brief summary containing the most important information from a set of documents in a short time. This summary should preserve the main concepts of the documents. When the input documents are related to a specific domain, for example, medicine or law, summarization faces more challenges. Domain-oriented summarization methods use special characteristics related to that domain to generate summaries. This paper introduces the purpose of multi-document summarization systems and discusses domain-oriented approaches. Various methods have been proposed by researchers for multi-document summarization. This survey reviews the categorizations that authors have made on multi-document summarization methods. We also categorize the multi-document summarization methods into six categories: machine learning, clustering, graph, Latent Dirichlet Allocation (LDA), optimization, and deep learning. We review the different methods presented in each of these groups. We also compare the advantages and disadvantages of these groups. We have discussed the standard datasets used in this field, evaluation measures, challenges and recommendations.
Before the advent of the World Wide Web, lack of information was a problem. But with the advent of the web today, we are faced with an explosive amount of information in every area of search. This extra information is troublesome and prevents a quick and correct decision. This is the problem of information overload. Multi-document summarization is an important solution for this problem by producing a brief summary containing the most important information from a set of documents in a short time. This summary should preserve the main concepts of the documents. When the input documents are related to a specific domain, for example, medicine or law, summarization faces more challenges. Domain-oriented summarization methods use special characteristics related to that domain to generate summaries. This paper introduces the purpose of multi-document summarization systems and discusses domain-oriented approaches. Various methods have been proposed by researchers for multi-document summarization. This survey reviews the categorizations that authors have made on multi-document summarization methods. We also categorize the multi-document summarization methods into six categories: machine learning, clustering, graph, Latent Dirichlet Allocation (LDA), optimization, and deep learning. We review the different methods presented in each of these groups. We also compare the advantages and disadvantages of these groups. We have discussed the standard datasets used in this field, evaluation measures, challenges and recommendations.
[1] G. Carenini and J. C. K. Cheung, "Extractive vs. NLG-based abstractive summarization of evaluative text: The effect of corpus controversiality," in Proceedings of the Fifth International Natural Language Generation Conference, 2008, pp. 33-41.
[2] A. Abdi, N. Idris, R. M. Alguliyev, and R. M. Aliguliyev, "Query-based multi-documents summarization using linguistic knowledge and content word expansion," Soft Computing, vol. 21, pp. 1785-1801, 2017.
[3] C. Ma, W. E. Zhang, M. Guo, H. Wang, and Q. Z. Sheng, "Multi-document Summarization via Deep Learning Techniques: A Survey," arXiv preprint arXiv:2011.04843, 2020.
[4] J. Goldstein, V. Mittal, J. Carbonell, and M. Kantrowitz, "Multi-document summarization by sentence extraction," in Proceedings of the 2000 NAACL-ANLP Workshop on Automatic summarization, 2000, pp. 40-48.
[5] R. R. K. Parchi M. Joshi, "Survey on Multi-document Summarizer," International Journal of Science and Research (IJSR), vol. 3, p. 5, 2014 2014.
[6] N. Andhale and L. Bewoor, "An overview of text summarization techniques," in Computing Communication Control and automation (ICCUBEA), 2016 International Conference on, 2016, pp. 1-7.
[7] M. Yousefiazar, "Query-oriented single-document summarization using unsupervised deep learning," 2015.
[8] M. Fuentes Fort, A flexible multitask summarizer for documents from different media, domain and language: Universitat Politècnica de Catalunya, 2008.
[9] K. Mani, I. Verma, H. Meisheri, and L. Dey, "Multi-document summarization using distributed bag-of-words model," in 2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI), 2018, pp. 672-675.
[10] L. Lebanoff, K. Song, and F. Liu, "Adapting the Neural Encoder-Decoder Framework from Single to Multi-document Summarization," arXiv preprint arXiv:1808.06218, 2018.
[11] S. Tabassum and E. Oliveira, "A review of recent progress in multi-document summarization," in Doctoral Symposium in Informatics Engineering, 2015.
[12] C. Shah and A. Jivani, "Literature study on multi-document text summarization techniques," in International Conference on Smart Trends for Information Technology and Computer Communications, 2016, pp. 442-451.
[13] A. Tandel, B. Modi, P. Gupta, S. Wagle, and S. Khedkar, "Multi-document text summarization-a survey," in Data Mining and Advanced Computing (SAPIENCE), International Conference on, 2016, pp. 331-334.
[14] Y. Chali, S. A. Hasan, and S. R. Joty, "A SVM-based ensemble approach to multi-document summarization," in Canadian Conference on Artificial Intelligence, 2009, pp. 199-202.
[15] S. Ma, Z.-H. Deng, and Y. Yang, "An unsupervised multi-document summarization framework based on neural document model," in Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, 2016, pp. 1514-1523.
[16] P. M. Sabuna and D. B. Setyohadi, "Summarizing Indonesian text automatically by using sentence scoring and decision tree," in Information Technology, Information Systems and Electrical Engineering (ICITISEE), 2017 2nd International conferences on, 2017, pp. 1-6.
[17] S. Ou, C. S. Khoo, and D. H. Goh, "A multi-document summarization system for sociology dissertation abstracts: design, implementation and evaluation," in International Conference on Theory and Practice of Digital Libraries, 2005, pp. 450-461.
[18] V. K. Gupta and T. J. Siddiqui, "Multi-document summarization using sentence clustering," in Intelligent Human Computer Interaction (IHCI), 2012 4th International Conference on, 2012, pp. 1-5.
[19] X. Cai and W. Li, "Ranking through clustering: An integrated approach to multi-document summarization," IEEE transactions on audio, speech, and language processing, vol. 21, pp. 1424-1433, 2013.
[20] M. Al-Dhelaan, "StarSum: A Simple Star Graph for Multi-document Summarization," in Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015, pp. 715-718.
[21] A. Khan, N. Salim, W. Reafee, A. Sukprasert, and Y. J. Kumar, "A clustered semantic graph approach for multi-document abstractive summarization," Jurnal Teknologi (Sciences & Engineering), vol. 77, pp. 61-72, 2015.
[22] G. Glavaš and J. Šnajder, "Event graphs for information retrieval and multi-document summarization," Expert systems with applications, vol. 41, pp. 6904-6916, 2014.
[23] D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent dirichlet allocation," the Journal of machine Learning research, vol. 3, pp. 993-1022, 2003.
[24] R. K. Roul, "Topic modeling combined with classification technique for extractive multi-document text summarization," Soft Computing, vol. 25, pp. 1113-1127, 2021.
[25] L. Na, L. Ming-xia, L. Ying, T. Xiao-jun, W. Hai-wen, and X. Peng, "Mixture of topic model for multi-document summarization," in Control and Decision Conference (2014 CCDC), The 26th Chinese, 2014, pp. 5168-5172.
[26] J. W. da Cruz Souza and A. Di Felippo, "Characterization of Temporal Complementary: Fundamentals for Multi-Document Summarization /Caracterizacao da complementaridade temporal: subsidios para sumarizacao automatica multidocumento," Alfa: Revista de Lingüística, vol. 62, pp. 121-148, 2018.
[27] A. Su, D. Su, J. M. Mulvey, and H. V. Poor, "PoBRL: Optimizing Multi-document Summarization by Blending Reinforcement Learning Policies," arXiv preprint arXiv:2105.08244, 2021.
[28] R. M. Alguliev, R. M. Aliguliyev, and N. R. Isazade, "Multiple documents summarization based on evolutionary optimization algorithm," Expert Systems with Applications, vol. 40, pp. 1675-1689, 2013.
[29] J. M. Sanchez-Gomez, M. A. Vega-Rodríguez, and C. J. Pérez, "Extractive multi-document text summarization using a multi-objective artificial bee colony optimization approach," Knowledge-Based Systems, vol. 159, pp. 1-8, 2018.
[30] A. John, P. Premjith, and M. Wilscy, "Extractive multi-document summarization using population-based multicriteria optimization," Expert Systems with Applications, vol. 86, pp. 385-397, 2017.
[31] M. Afsharizadeh, H. Ebrahimpour-Komleh, and A. Bagheri, "Automatic Text Summarization of COVID-19 Research Articles Using Recurrent Neural Networks and Coreference Resolution," Frontiers in Biomedical Technologies, vol. 7, pp. 236-248, 2020.
[32] Y. Zhang, M. J. Er, R. Zhao, and M. Pratama, "Multiview convolutional neural networks for multidocument extractive summarization," IEEE transactions on cybernetics, vol. 47, pp. 3230-3242, 2017.
[33] Z. Cao, F. Wei, L. Dong, S. Li, and M. Zhou, "Ranking with Recursive Neural Networks and Its Application to Multi-document Summarization," in AAAI, 2015, pp. 2153-2159.
[34] S.-h. Zhong, Y. Liu, B. Li, and J. Long, "Query-oriented unsupervised multi-document summarization via deep learning model," Expert Systems with Applications, vol. 42, pp. 8146-8155, 2015.
[35] S. S. Lakshmi and M. U. Rani, "Multi-document Text Summarization Using Deep Learning Algorithm with Fuzzy Logic," 2018.
[36] A. Nenkova and K. McKeown, "Automatic summarization," Foundations and Trends® in Information Retrieval, vol. 5, pp. 103-233, 2011.
[37] S. Kasundra and D. L. Kotak, "Study on Multi-document Summarization by Machine Learning Technique for Clustered Documents," 2017.
[38] Z. JIAMING, "Exploiting Textual Structures of Technical Papers for Automatic Multi-document Summarization," 2008.
[39] K. McKeown and D. R. Radev, "Generating summaries of multiple news articles," in Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, 1995, pp. 74-82.
[40] D. R. Radev, "A common theory of information fusion from multiple text sources step one: cross-document structure," in Proceedings of the 1st SIGdial workshop on Discourse and dialogue-Volume 10, 2000, pp. 74-83.
[41] O. Bodenreider, "The unified medical language system (UMLS): integrating biomedical terminology," Nucleic acids research, vol. 32, pp. D267-D270, 2004.
[42] N. Elhadad, M.-Y. Kan, J. L. Klavans, and K. R. McKeown, "Customization in a unified framework for summarizing medical literature," Artificial intelligence in medicine, vol. 33, pp. 179-198, 2005.
[43] K. Sarkar, "Using domain knowledge for text summarization in medical domain," International Journal of Recent Trends in Engineering, vol. 1, pp. 200-205, 2009.
[44] K. Hong, "Content selection in multi-document summarization," 2015.
[45] C.-Y. Lin, "Rouge: A package for automatic evaluation of summaries," Text Summarization Branches Out, 2004.
[46] C.-Y. Lin, "Looking for a few good metrics: Automatic summarization evaluation-how many samples are enough?," in NTCIR, 2004.