پیشنهاد هشتگ در سیستم‌های میکروبلاگ توسط بردار موضوعی: مورد کاربرد توئیتر

محورهای موضوعی : مهندسی برق و کامپیوتر

1 - دانشگاه ارومیه
2 - دانشگاه ارومیه

تاریخ دریافت : 1397/02/02 تاریخ پذیرش : 1397/08/05 تاریخ انتشار : 1398/01/31

کلید واژه: سیستم‌های توصیه‌گرتوصیه هشتگبردار موضوعیتخصیص دیریکله نهفتهنمونه‌برداری Gibbsمیکروبلاگتوئیتر,

چکیده مقاله :

با معرفی وب ۲.۰، داده‌های ایستا که در وب ۱.۰ وجود داشتند، حالت ساخت‌یافته‌تری به خود گرفتند. ویکی‌ها، بلاگ‌ها، شبکه‌های اجتماعی و سیستم‌های بوکمارکینگ اجتماعی مثال‌هایی از آن هستند که کاربران در آنها محتوا تولید می‌کنند. یکی از مشکلات تولید محتوا توسط کاربر، عدم یکپارچگی محتوای تولیدشده می‌باشد که باعث تولید داده‌های ناهمگون شده و اجرای الگوریتم‌ها و تکنیک‌های کامپیوتری را دشوار می‌سازد. راه حل وب ۲.۰ برای کاهش اثر این مشکل، استفاده از هشتگ (تگ) برای مطالب منتشرشده توسط کاربر است که خود کاربر به مطالب منتشرشده خود، تگ می‌زند. این راهکار در میکروبلاگ‌هایی چون توئیتر کماکان رفع نشده است چرا که کاربران با محدودیت کاراکتری (۱۴۰ کاراکتر برای هر توئیت) مواجه هستند و ممکن است تعداد کاراکترهای محتوا باعث شود که برخی کاراکترهای هشتگ در پست نباشد. در این مقاله سعی شده تا با استفاده از روش تخصیص دیریکله نهفته و نمونه‌برداری Gibbs فروریخته، مشکل پیشنهاد هشتگ در محیط ناهمگون توئیتر رفع شود. پیشنهاد هشتگ بر روی 8396744 توئیت به زبان انگلیسی پیاده‌سازی و در آزمایش‌های مختلف بین ۱ تا ۵ مرتبط‌ترین هشتگ پیشنهاد شده است. نتایج در حالات مختلف دقت بالای ۲۰% و فراخوانی بالای ۴۵% را نشان می‌دهد که نشانگر افزایش دقت از ۳% به ۲۱% و افزایش فراخوانی از ۳۲% به ۴۶% در مقایسه با دقیق‌ترین روش بررسی‌شده پیشنهاد هشتگ توسط LDA بدون تغییر، توسط نویسندگان است.

چکیده انگلیسی:

Static contents defined in Web 1.0 were replaced with structured user generated contents by means of Web 2.0. Wikis, Blogs, Social Networks, and Social Bookmarking Systems are some of the examples where users can generate and publish contents. Generating contents by users leads to creation of heterogeneous data which makes computation and algorithms hard to be applied. Web 2.0 benefits hashtags (tags) in order to solve the heterogeneous problem of the contents in which users can label their contents with hashtags. This technique cannot help in microblogging systems such as Twitter because of number of characters in each tweet (140 characters per tweet) and leads the tags or words be truncated or be used in heterogeneous form. In the current paper, a novel method is introduced based on Latent Dirichlet Allocation which can be used for numericalization tweets in a vector namely topic vector (TV). Additionally, TV is used for modeling users’ taste which can improve hashtag recommendation. The proposed method has been tested on 8396744 real tweets in English. The top 1 to 5 hashtags are recommended for each tweet and results show precision more than 20% and recall more than 45%. The improvement applied by TV shows that the most precision is increased from 3% to 32%, and recall from 21% to 46% to the best method tested by the authors.

منابع و مأخذ:

[1] N. Eltantawy and J. B. Wiest, "Social media in the Egyptian revolution: reconsidering resource mobilization theory," International J. of Communication, vol. 5, no. 1, 18 pp, 2011.
[2] D. Laniado and P. Mika, "Making sense of twitter," in Proc. 9th Inte. Semantic Web Conf., ISWC'10, pp. 470-485, Shanghai, China, 7-10 Nov. 2010.
[3] E. Otsuka, S. A. Wallace, and D. Chiu, "Design and evaluation of a twitter hashtag recommendation system," in Proc. of the 18th International Database Engineering & Applications Symposium, pp. 330-333, Yokohama, Japan, 7-9 Jul. 2014.
[4] J. Hillebrand, Twitter Hashtag Analysis: Do People Really Use Them?, 08 Aug 2016, https://www.quintly.com/blog/2014/08/ twitter-hashtag-analysis/[Accessed 8 Aug. 2016].
[5] D. A. Melis and M. Saveski, "Topic modeling in twitter: aggregating tweets by conversations," in Proc. 10th Int. AAAI Conf. on Web and Social Media, ICWSM'16, pp. 519-522, Cologne, Germany, 17-20 May 2016.
[6] R. Mehrotra, S. Sanner, W. Buntine, and L. Xie, "Improving LDA topic models for microblogs via automatic tweet labeling and pooling," in Proc. of 36th Annual ACM Special Interest Group on Information Retrieval Conf., SIGIR'13, pp. 889-892, Dublin, Ireland, 28 Jul.-1 Aug. 2013.
[7] N. F. N. Rajani, K. McArdle, and J. Baldridge, "Extracting topics based on authors, recipients and content in microblogs," in Proc. of the 37th Int. ACM SIGIR Conf. on Research & Development in Information Retrieval, SIGIR'14, pp. 1171-1174, New York, NY, USA, 6-11 Jul. 2014.
[8] F. Godin, V. Slavkovikj, W. De Neve, B. Schrauwen, and R. V. Walle, "Using topic models for twitter hashtag recommendation," in Proc. of the 22nd Int. Conf. on World Wide Web, WWW'13 Companion, pp. 593-596, New York, NY, USA, 13-17 May 2013.
[9] Z. Ding, X. Huang, and Q. Zhang, "Automatic hashtag recommendation for microblogs using topic-specific translation model," in Proc. 24th Int. Conf. on Computational Linguistics COLING'12, pp. 265-274, Bombay, India, 11-16 Dec. 2012.
[10] J. She and L. Chen, "TOMOHA: topic model-based hashtag recommendation on twitter," in Proc. of the 23rd Int. Conf. on World Wide Web, WWW'14 Companion, pp. 371-372, New York, NY, USA, 07-11 Apr. 2014.
[11] Y. Gong, Q. Zhang, and X. Huang, "Hashtag recommendation using dirichlet process mixture models incorporating types of hashtags," inProc. of the Conf. on Empirical Methods in Natural Language Processing, pp. 401-410, Lisbon, Portugal, 17-21 Sept. 2015.
[12] F. Ricci, L. Rokach, B. Sharpira, and P. B. Kantor, Recommender Systems Handbook, Springer, 2011.
[13] D. Jannach, M. Zanker, A. Felfernig, and G. Fredrich, Recommender Systems: An Introduction, Cambridge University Press, Sep. 2010.
[14] م. رجب‌زاده و ر. رافع، "ارائه یک سیستم توصیه‌گر ترکیبی برای تجارت الکترونیک،" مجله مهندسی برق دانشگاه تبریز، جلد ۴۵، شماره ۴، صص. ۸۵-۹۱، زمستان ۱۳۹۴.
[15] M. S. Tajbakhsh and J. Bagherzadeh, "Microblogging hashtag recommendation system based on semantic TF-IDF: twitter use case," in Proc. 3rd International Symposium on Social Networks Analysis, Management and Security, SNAMS'16, pp. 252-257, Vienne, Austria, 22-24 Sept. 2016.
[16] م. محسنی، م. ازوجی و ر. قادری، "قطعه‌بندی تصویر مبتنی بر برش نرمالیزه گراف از دیدگاه میزان اطلاعات جداکننده،" مجله مهندسی برق دانشگاه تبریز، جلد ۴۶، شماره ۱، صص. ۳۰۳-۳۱۰، بهار ۱۳۹۵.
[17] D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent dirichlet allocation," J. of Machine Learning Research, vol. 3, no. 1, pp. 993-1022, Mar. 2003.
[18] G. Casella and E. I. George, "Explaining the gibbs sampler," The American Statistician, vol. 46, no. 3, pp. 167-174, Dec. 1990.
[19] E. Zangerle, W. Gassler, and G. Specht, "On the impact of text similarity functions on hashtag recommendations in microblogging environments," Social Network Analysis and Mining, vol. 3, no. 4, pp. 889-898, Dec. 2013.
[20] R. Li, S. Wang, H. Deng, R. Wang, and K. C. Chang, "Towards social user profiling: unified and discriminative influence model for inferring home locations," in Proc. of the 18th ACM SIGKDD Inte. Conf. on Knowledge Discovery and Data Mining, pp. 1023-1031, Beijing, China, 12-16 Aug. 2012.
[21] J. W. Perry, A. Kent, and M. M. Berry, "Machine literature searching X Machine language, factors underlying its design and development," American Documentation, vol. 6, no. 4, pp. 242-254, Oct. 1995.

اشتراک گذاری

آدرس مقاله

پیشنهاد هشتگ در سیستم‌های میکروبلاگ توسط بردار موضوعی: مورد کاربرد توئیتر

رایمگ

پیوندهای سایت

مراکز مرتبط

پشتیبانی

صفحات رسمی