Search Engine for Structured Event Retrieval from News Sources
Subject Areas : electrical and computer engineeringA. mirzaeiyan 1 , s. aliakbary 2
1 -
2 -
Keywords: Event detection, search engine, information retrieval, text mining,
Abstract :
Analysis of published news content is one of the most important issues in information retrieval. Much research has been conducted to analyze individual news articles, while most news events in the media are published in the form of several related articles. Event detection is the task of discovering and grouping documents that describe the same event. It also facilitates better navigation of users in news spaces by presenting an understandable structure of news events. With rapid and increasing growth of online news, the need for search engines to retrieve news events is felt more than ever. The main assumption of event detection is that the words associated with an event appear in the same time windows and similar documents. Accordingly, in this research, we propose a retrospective and feature-pivot method that clusters words into groups according to semantic and temporal features. We then use these words to produce a time frame and a human readable text description. The proposed method is evaluated on the All The News dataset, which consists of two hundred thousand articles from 15 news sources in 2016 and compared to other methods. The evaluation shows that the proposed method outperforms previous methods in terms of precision and recall.
[1] S. Lv, et al., "Yet another approach to understanding news event evolution," World Wide Web, vol. 23, no. 4, pp. 2449-2470, May 2020.
[2] O. N. N. Fernando and C. W. Chang, "Twittener: an aggregated news platform," in Proc. IEEE Int. Conf. on Cyberworlds, pp. 378-381, Kyoto, Japan, 2-4 Oct. 2019.
[3] Q. He, Topical Analysis of Text Streams, Ph.D. Dissertation, Nanyang Technological University, Singapore, 2009.
[4] L. Hu, B. Zhang, L. Hou, and J. Li, "Adaptive online event detection in news streams," Knowledge-Based Systems, vol. 138, pp. 105-112, 15 Dec. 2017.
[5] T. Kala, Event Detection from Text Data, Bacholor Thesis,Department of Cybernetics Faculty of Electrical Engineering, Czech Technical University in Prague, May 2017.
[6] F. Atefeh and W. Khreich, "A survey of techniques for event detection in twitter," Computational Intelligence, vol. 31, no. 1, pp. 132-164, Feb. 2015.
[7] D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent dirichlet allocation," J. of Machine Learning Research, vol. 3, pp. 993-1022, Mar. 2003.
[8] Q. He, K. Chang, and E. P. Lim, "Analyzing feature trajectories for event detection," in Proc. of the 30th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 207-214, Amsterdam, The Netherlands, 22-27 Jul. 2007.
[9] Y. Sumikawa and A. Jatowt, "System for category-driven retrieval of historical events," in Proc. of the 18th ACM/IEEE on Joint Conf. on Digital Libraries, pp. 413-414, Fort Worth Texas USA, 3-7 Jun. 2018.
[10] D. Metzler, C. Cai, and E. Hovy, "Structured event retrieval over microblog archives," in Proc. of the Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 646-655, Montreal, Canada, 3-8 Jun. 2012.
[11] I. Moutidis and H. T. P. Williams, "Utilizing complex networks for event detection in heterogeneous high-volume news streams," Complex Networks and Their Applications VIII: Proc. of the 8th Int. Conf. on Complex Networks and Their Applications, vol. 1, pp. 659-672, Lisbon, Portugal, 10-12 Dec. 2019.
[12] H. Schutze, C. D. Manning, and P. Raghavan, Introduction to Information Retrieval, vol. 39, Cambridge University Press Cambridge, 2008.
[13] T. Nicholls and J. Bright, "Understanding news story chains using information retrieval and network clustering techniques," Communication Methods and Measures, Routledge, vol. 13, no. 1, pp. 43-59, 2019.
[14] V. D. Blondel, J. L. Guillaume, R. Lambiotte, and E. Lefebvre, "Fast unfolding of communities in large networks," J. of Statistical Mechanics: Theory and Experiment, vol. 2008, no. 10, Article No.: P10008, Oct. 2008.
[15] M. J. Kusner, Y. Sun, N. I. Kolkin, and K. Q. Weinberger, "From word embeddings to document distances," in Proc. of the 32nd Int. Conf. on Machine Learning, vol. 37, pp. 957-966, Lille, France, 6-11 Jul. 2015.
[16] R. Rehurek and P. Sojka, "Software framework for topic modelling with large corpora," in Proc. of LREC Workshop New Challenges for NLP Frameworks, pp. 46-50, Valletta, Malta, 22-22 May 2010.