ارائه یک موتور جستجو برای بازیابی رویداد ساختارمند از منابع خبری

الموضوعات : مهندسی برق و کامپیوتر

1 - دانشگاه شهید بهشتی
2 - دانشگاه شهید بهشتی

تاريخ الإرسال : 26 الجمعة , ربيع الثاني, 1442 تاريخ التأكيد : 08 الأحد , ذو الحجة, 1442 تاريخ الإصدار : 05 السبت , جمادى الثانية, 1443

الکلمات المفتاحية: تشخیص رویداد, موتور جستجو, بازیابی اطلاعات, متن‌کاوی,

ملخص المقالة :

تحلیل محتوای اخبار منتشرشده، یکی از مسایل مهم در حوزه بازیابی اطلاعات است. امروزه تحقیقات زیادی برای تحلیل تک‌تک مقالات خبری انجام شده‌ است، در حالی که اکثر رویدادهای خبری به شکل چندین مقاله مرتبط به هم به طور مکرر در رسانه‌ها منتشر می‌شوند. تشخیص رویداد، وظیفه کشف و گروه‌بندی اسنادی را دارد که رویدادی یکسان را شرح می‌دهد و با ارائه یک ساختار قابل درک از گزارش‌های خبری، هدایت بهتر کاربران در فضاهای خبری را تسهیل می‌کند. با رشد سریع و روزافزون اخبار برخط، نیاز به ایجاد موتورهای جستجو برای بازیابی رویدادهای خبری به منظور تسهیل جستجوی کاربران در این فضاهای خبری بیش از پیش احساس می‌شود. فرض اصلی تشخیص رویداد بر این است که به احتمال زیاد کلمات مرتبط به یک رویداد یکسان در دنیای واقعی، در اسناد و پنجره‌های زمانی مشابه ظاهر می‌شوند. بر همین اساس ما در این تحقیق روشی گذشته‌نگر و ویژگی‌محور پیشنهاد می‌کنیم که کلمات را بر اساس ویژگی‌های معنایی و زمانی گروه‌بندی می‌کند. سپس از این کلمات برای تولید یک بازه زمانی و توصیف متنی قابل درک برای انسان استفاده می‌کنیم. ارائه یک معماری مناسب و استفاده مؤثر از خوشه‌بندی جهت بازیابی رویدادها و همچنین تشخیص مناسب زمان رویداد، از نوآوری‌های این پژوهش به شمار می‌روند. روش پیشنهادی روی مجموعه داده AllTheNews که تقریباً شامل دویست هزار مقاله از ۱۵ منبع خبری در سال 2016 می‌باشد ارزیابی شده و با روش‌های دیگر مقایسه گردیده است. ارزیابی‌ها نشان می‌دهد که روش پیشنهادی در دو معیار دقت و یادآوری نسبت به روش‌های پیشین عملکرد بهتری دارد.

المصادر:

[1] S. Lv, et al., "Yet another approach to understanding news event evolution," World Wide Web, vol. 23, no. 4, pp. 2449-2470, May 2020.
[2] O. N. N. Fernando and C. W. Chang, "Twittener: an aggregated news platform," in Proc. IEEE Int. Conf. on Cyberworlds, pp. 378-381, Kyoto, Japan, 2-4 Oct. 2019.
[3] Q. He, Topical Analysis of Text Streams, Ph.D. Dissertation, Nanyang Technological University, Singapore, 2009.
[4] L. Hu, B. Zhang, L. Hou, and J. Li, "Adaptive online event detection in news streams," Knowledge-Based Systems, vol. 138, pp. 105-112, 15 Dec. 2017.
[5] T. Kala, Event Detection from Text Data, Bacholor Thesis,Department of Cybernetics Faculty of Electrical Engineering, Czech Technical University in Prague, May 2017.
[6] F. Atefeh and W. Khreich, "A survey of techniques for event detection in twitter," Computational Intelligence, vol. 31, no. 1, pp. 132-164, Feb. 2015.
[7] D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent dirichlet allocation," J. of Machine Learning Research, vol. 3, pp. 993-1022, Mar. 2003.
[8] Q. He, K. Chang, and E. P. Lim, "Analyzing feature trajectories for event detection," in Proc. of the 30th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 207-214, Amsterdam, The Netherlands, 22-27 Jul. 2007.
[9] Y. Sumikawa and A. Jatowt, "System for category-driven retrieval of historical events," in Proc. of the 18th ACM/IEEE on Joint Conf. on Digital Libraries, pp. 413-414, Fort Worth Texas USA, 3-7 Jun. 2018.
[10] D. Metzler, C. Cai, and E. Hovy, "Structured event retrieval over microblog archives," in Proc. of the Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 646-655, Montreal, Canada, 3-8 Jun. 2012.
[11] I. Moutidis and H. T. P. Williams, "Utilizing complex networks for event detection in heterogeneous high-volume news streams," Complex Networks and Their Applications VIII: Proc. of the 8th Int. Conf. on Complex Networks and Their Applications, vol. 1, pp. 659-672, Lisbon, Portugal, 10-12 Dec. 2019.
[12] H. Schutze, C. D. Manning, and P. Raghavan, Introduction to Information Retrieval, vol. 39, Cambridge University Press Cambridge, 2008.
[13] T. Nicholls and J. Bright, "Understanding news story chains using information retrieval and network clustering techniques," Communication Methods and Measures, Routledge, vol. 13, no. 1, pp. 43-59, 2019.
[14] V. D. Blondel, J. L. Guillaume, R. Lambiotte, and E. Lefebvre, "Fast unfolding of communities in large networks," J. of Statistical Mechanics: Theory and Experiment, vol. 2008, no. 10, Article No.: P10008, Oct. 2008.
[15] M. J. Kusner, Y. Sun, N. I. Kolkin, and K. Q. Weinberger, "From word embeddings to document distances," in Proc. of the 32nd Int. Conf. on Machine Learning, vol. 37, pp. 957-966, Lille, France, 6-11 Jul. 2015.
[16] R. Rehurek and P. Sojka, "Software framework for topic modelling with large corpora," in Proc. of LREC Workshop New Challenges for NLP Frameworks, pp. 46-50, Valletta, Malta, 22-22 May 2010.

شارک

عنوان URL للمقالة

ارائه یک موتور جستجو برای بازیابی رویداد ساختارمند از منابع خبری

رایمگ

الروابط

المراكز ذات الصلة

دعامة

الصفحات الرسمية