استفاده از خوشه‌بندی تکاملی برای تشخیص موضوع در بلاگ‌نویسی کوچک با لحاظ‌نمودن اطلاعات شبکه اجتماعی

محورهای موضوعی : مهندسی برق و کامپیوتر

الهام سادات َعلوی ¹ , هدی مشایخی ² , حمید حسن‌پور ³ , باقر رحیم‌پور کامی ⁴

1 - دانشگاه صنعتی شاهرود
2 - دانشگاه صنعتی شاهرود
3 - دانشگاه صنعتی شاهرود
4 - دانشگاه علوم و فنون مازندران

تاریخ دریافت : 1398/03/02 تاریخ پذیرش : 1398/05/29 تاریخ انتشار : 1398/12/17

کلید واژه: تشخیص موضوعخوشه‌بندی تکاملیشبکه اجتماعی مدل احتمالاتی,

چکیده مقاله :

متون کوتاه رسانه‌های اجتماعی مانند توییتر اطلاعات زیادی در مورد موضوع‌های داغ و افکار عمومی ارائه می‌دهند. برای درک بهتر اطلاعات دریافتی از شبکه‌های اجتماعی، شناسایی و ردیابی موضوع امری ضروری است. در بسیاری از روش‌های ارائه‌شده در این زمینه، تعداد موضوع‌ها باید از پیش مشخص باشد و نمی‌تواند در طول زمان تغییر کند. از این منظر، این روش‌ها برای داده‌های در حال افزایش و پویا مناسب نیستند. همچنین مدل‌های تکاملی موضوعی غیر پارامتری به دلیل مشکل کمبود داده‌ها، بر روی متون کوتاه عملکرد مناسبی ندارند. در این مقاله، یک مدل خوشه‌بندی تکاملی جدید ارائه کرده‌ایم که به طور ضمنی از فرایند رستوران چینی وابسته به فاصله (dd-CRP) الهام گرفته است. در روش ارائه‌شده برای حل مشکل کمبود داده‌ها، از اطلاعات شبکه اجتماعی در کنار شباهت متنی، برای بهبود ارزیابی شباهت بین توییت‌ها استفاده شده است. همچنین در روش پیشنهادی، برخلاف اکثر روش‌های مطرح‌شده در این زمینه، تعداد خوشه‌ها به صورت خودکار محاسبه می‌شود. در واقع در این روش، توییت‌ها با احتمالی متناسب با شباهتشان به هم متصل می‌شوند و مجموعه‌ای از این اتصال‌ها یک موضوع را تشکیل می‌دهد. برای افزایش سرعت اجرای الگوریتم، از یک روش خلاصه‌سازی مبتنی بر خوشه‌بندی استفاده نموده‌ایم. ارزیابی روش بر روی مجموعه داده واقعی که در طول دو ماه و نیم از شبکه اجتماعی توییتر جمع‌آوری شده است، انجام می‌شود. ارزیابی به صورت خوشه‌بندی متون و مقایسه بین آنها می‌باشد. نتایج ارزیابی نشان می‌دهد که روش پیشنهادی نسبت به روش‌های مقایسه‌شده دارای انسجام موضوعی بهتری بوده و می‌تواند به طور مؤثر برای تشخیص موضوع بر روی متون کوتاه رسانه‌های اجتماعی استفاده گردد.

چکیده انگلیسی:

Short texts of social media like Twitter provide a lot of information about hot topics and public opinions. For better understanding of such information, topic detection and tracking is essential. In many of the available studies in this field, the number of topics must be specified beforehand and cannot be changed during time. From this perspective, these methods are not suitable for increasing and dynamic data. In addition, non-parametric topic evolution models lack appropriate performance on short texts due to the lack of sufficient data. In this paper, we present a new evolutionary clustering algorithm, which is implicitly inspired by the distance-dependent Chinese Restaurant Process (dd-CRP). In the proposed method, to solve the data sparsity problem, social networking information along with textual similarity has been used to improve the similarity evaluation between the tweets. In addition, in the proposed method, unlike most methods in this field, the number of clusters is calculated automatically. In fact, in this method, the tweets are connected with a probability proportional to their similarity, and a collection of these connections constitutes a topic. To speed up the implementation of the algorithm, we use a cluster-based summarization method. The method is evaluated on a real data set collected over two and a half months from the Twitter social network. Evaluation is performed by clustering the texts and comparing the clusters. The results of the evaluations show that the proposed method has a better coherence compared to other methods, and can be effectively used for topic detection from social media short texts.

منابع و مأخذ:

[1] ر. بهرامی و ح. مریم، "ارائه یک الگوریتم تشخیص رویداد جدید و ردیابی موضوع در اخبار فارسی،" مجموعه مقالات دومین همایش ملی پژوهش‌های کاربردی در علوم کامپیوتر و فناوری اطلاعات، 8 صص.، دانشگاه جامع علمی کاربردی، تهران، 1393.
[2] J. Allan, "Introduction to topic detection and tracking," In: Allan J. (eds.) Topic Detection and Tracking. The Information Retrieval Series, vol 12, pp.1-16, Springer, Boston, MA, USA, 2002.
[3] T. R. Zaman, R. Herbrich, J. Van Gael, and D. Stern, "Predicting information spreading in twitter," in Proc. Workshop on Computational Social Science and the Wisdom of Crowds, Nips, vol. 104, pp. 17599-17601, 2010.
[4] V. Krishnan and J. Eisenstein, "Nonparametric Bayesian storyline detection from microtexts," arXiv preprint arXiv:1601.04580, 2016.
[5] J. H. Lau, N. Collier, and T. Baldwin, "On-line trend analysis with topic models: # twitter trends detection topic model online," in Proc. of COLING, pp. 1519-1534, Mumbai, India, Dec. 2012.
[6] L. AlSumait, D. Barbara, and C. Domeniconi, "On-line lda: adaptive topic models for mining text streams with applications to topic detection and tracking," in Proc. IEEE Int. Conf. on Data Mining, pp. 3-12, Pisa, Italy, 15-19 Dec. 2008.
. [7] K. Nur'aini, I. Najahaty, L. Hidayati, H. Murfi, and S. Nurrohmah, "Combination of singular value decomposition and K-means clustering methods for topic detection on Twitter," in Proc. In. Conf. on Advanced Computer Science and Information Systems, ICACSIS’15, pp. 123-128, Depok, Indonesia, 10-11 Oct. 2015.
[8] S. Li, X. Lv, T. Wang, and S. Shi, "The key technology of topic detection based on K-means," in Proc. Int. Conf. on Future Information Technology and Management Engineering, vol. 2, pp. 387-390, Changzhou,, China, 9-10 Oct. 2010.
[9] L. M. Aiello, et al., "Sensing trending topics in Twitter," IEEE Trans. on Multimedia, vol. 15, no. 6, pp. 1268-1282, Jun. 2015.
[10] Y. Xiaolin, Z. Xiao, K. Nan, and Z. Fengchao, "An improved single-pass clustering algorithm internet-oriented network topic detection," in Proc. 4th IEEE Int. Conf. on Intelligent Control and Information Processing, ICICIP’13, pp. 560-564, Beijing, China, 9-11 Jun. 2013.
[11] F. Atefeh and W. Khreich, "A survey of techniques for event detection in twitter," Computational Intelligence, vol. 31, no. 1, pp. 132-164, Feb. 2013.
[12] L. M. Aiello, et al., "Sensing trending topics in Twitter," IEEE Trans. on Multimedia, vol. 15, no. 6, pp. 1268-1282, Oct. 2013.
[13] G. Petkos, S. Papadopoulos, and Y. Kompatsiaris, "Two-level Message Clustering for Topic Detection in Twitter." SNOW-DC@ WWW, pp. 49-56, 2014.
[14] R. Ibrahim, A. Elbagoury, M. S. Kamel, and F. Karray, "Tools and approaches for topic detection from Twitter streams: survey," Knowledge and Information Systems, vol. 54, no. 3, pp. 511-539, Mar. 2018.
[15] S. Yang, Q. Sun, H. Zhou, Z. Gong, Y. Zhou, and J. Huang, "A topic detection method based on keygraph and community partition," in Proc. of the Int. Conf. on Computing and Artificial Intelligence, ICCAI'18, pp. 30-34, Chengdu, China, Mar. 2018.
[16] H. J. Choi and C. H. Park, "Emerging topic detection in twitter stream based on high utility pattern mining," Expert Systems with Applications, vol. 115, pp. 27-36, Jun. 2019.
[17] Y. N. Li, Y. Tao, J. N. Wang, and Y. H. Fu, "A new online new event detection algorithm based on event merging and event splitting," Applied Mechanics and Materials, vol. 513, pp. 2024-2030, Feb. 2014.
[18] D. T. Nguyen and J. E. Jung, "Real-time event detection for online behavioral analysis of big social data," Future Generation Computer Systems, vol. 66, pp. 137-145, Jun. 2017.
[19] Y. Zhang, W. Mao, and J. Lin, "Modeling topic evolution in social media short texts," in Proc. IEEE Int. Conf. on Big Knowledge, ICBK’17, pp. 315-319, Hefei, China, 9-10 Aug. 2017.
[20] D. M. Blei and P. I. Frazier, "Distance dependent Chinese restaurant processes," Journal of Machine Learning Research, vol. 12, pp. 2383-2410, 2011.
[21] J. H. Lau, D. Newman, and T. Baldwin, "Machine reading tea leaves: automatically evaluating topic coherence and topic model quality," in Proc. of the 14th Conf. of the European Chapter of the Association for Computational Linguistics, EACL'14, pp. 530-539, Gothenburg, Sweden, 26-30 Apr. 2014.
[22] J. H. Lau, T. Baldwin, and D. Newman, "On collocations and topic models," ACM Trans. on Speech and Language Processing, Article No.: 10, Jul. 2013.
[23] Y. Fang, H. Zhang, Y. Ye, and X. Li, "Detecting hot topics from Twitter: a multiview approach," J. of Information Scienc, vol. 40, no. 5, pp. 578-593, Jul. 2014.
[24] J. Tang and H. Liu, "Feature selection with linked data in social media," in Proc. of the SIAM Int. Conf. on Data Mining, pp. 118-128, Anaheim, CA, USA, 26-28 Apr. 2012.

اشتراک گذاری

آدرس مقاله

استفاده از خوشه‌بندی تکاملی برای تشخیص موضوع در بلاگ‌نویسی کوچک با لحاظ‌نمودن اطلاعات شبکه اجتماعی

رایمگ

پیوندهای سایت

مراکز مرتبط

پشتیبانی

صفحات رسمی