یادگیری رتبه ‏بندی محتوای فارسی وب بر مبنای برنامه‏ نویسی ژنتیک چند لایه

محورهای موضوعی : عمومى

1 - -

تاریخ دریافت : 1398/12/15 تاریخ پذیرش : 1398/12/15 تاریخ انتشار : 1399/04/01

کلید واژه: یادگیری رتبه بندی, مدل برنامه نویسی ژنتیک چند لایه, ویژگی های کلیک از گذر داده, محتوای فارسی وب, مجموعه داده dotIR,

چکیده مقاله :

یادگیری رتبه‏بندی، یک رویکرد نو ظهور به منظور رفع چالش‏های موجود و بهبود عملکرد جویشگرهای وب، بسیار امید بخش و کارآمد است. در عین حال عدم توجه جدی به سوابق تعاملات کاربران با جویشگر طی فرآیند جستجو و ارزیابی نتایج بدست آمده، یکی از معضلات جدی آن بشمار می‏رود. در عین حال حجم بسیار زیاد ویژگی‏های مورد نیاز از اسناد و پرس‏وجوهای کاربران نیز کاربردی بودن این رویکرد را در شرایط واقعی با ابهام مواجه ساخته است. استفاده از مدل اطلاعات کلیک از گذر داده‏ها و تولید ویژگی‏های کلیک از گذر داده، راهکار نوینی است که بر مبنای آن و با بکارگیری مدل برنامه‏نویسی ژنتیک چند لایه، مدل رتبه‏بندی مناسبی تحت عنوان MGP-Rank برای بازیابی اطلاعات انگلیسی وب، عرضه شده است. در این پژوهش این، با عنایت به ویژگی‏های خاص زبان فارسی، از طریق ارائه سناریوهای مناسب برای ایجاد ویژگی‏های کلیک از گذر داده این الگوریتم، این الگوریتم بومی‏سازی شده است. نتایج حاصل از ارزیابی عملکرد این الگوریتم در حوزه زبان فارسی با استفاده از مجموعه داده dotIR، حاکی از توانمندی قابل ملاحظه آن نسبت به روش‏های مرجع رتبه‏بندی اطلاعات است. این بهبود عملکرد، بخصوص در بخش ابتدایی فهرست نتایج جستجو که غالباً بیشتر مورد مراجعه کاربران است، قابل توجه است.

چکیده انگلیسی:

Learning to rank (L2R) has emerged as a promising approach in handling the existing challenges of Web search engines. However, there are major drawbacks with the present learning to rank techniques. Current L2R algorithms do not take into account to the search behavior of the users embedded in their search sessions’ logs. On the other hand, machine-learning as a data-intensive process requires a large volume of data about users’ queries as well as Web documents. This situation has made the usage of L2R techniques questionable in the real-world applications. Recently, by the use of the click-through data model and based on the generation of click-through features, a novel approach is proposed, named as MGP-Rank. Using the layered genetic-programming model, MGP-Rank has achieved noticeable performance on the ranking of the English Web content. In this study, with respect to the specific characteristics of the Persian language, some suitable scenarios are presented for the generation of the click-through features. In this way, a customized version of the MGP-Rank is proposed of the Persian Web retrieval. The evaluation results of this algorithm on the dotIR dataset, indicate its considerable improvement in comparison with major ranking methods. The improvement of the performance is particularly more noticeable in the top part of the search results lists, which are most frequently visited by the Web users.

منابع و مأخذ:

1. Q. Ai, K. Bi, J. Guo, W.B. Croft Croft, “Learning a Deep Listwise Context Model for Ranking Refinement”, In Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 135-144, 2018.
2. Q. Ai, X. Wang, S. Bruch, N. Golbandi, M. Bendersky, and M. Najork, “Learning Groupwise Multivariate Scoring Functions Using Deep Neural Networks”, In Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval, pp. 85-92, 2019.
3. C.C. Alves, M.A. Gonçalves, D. Sousa, and T. Salles, “Generalized BROOF-L2R: A General Framework for Learning to Rank Based on Boosting and Random Forests”, In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pp. 95-104, 2016.
4. Z. Cao, T. Qin, T.Y. Liu, M.F. Tsai, and H. Li, “Learning to rank: from pairwise approach to listwise approach”, In Proceedings of the 24th International Conference on Machine Learning, pp. 129-136, 2007.
5. S. Chakrabarti, R. Khanna, U. Sawant, and C. Bhattacharyya, “Structured learning for nonsmooth ranking losses”, In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 88-96, 2008.
6. O. Chapelle & Y. Chang, “Yahoo! Learning to Rank Challenge Overview”, Journal of Machine Learning Research, pp. 14, 1-24, 2011.
7. O. Chapelle & M. Wu, “Gradient descent optimization of smoothed information retrieval metrics”, Information Retrieval, Vol. 13, No. 3, pp. 216-235, 2010.
8. W. Chu & Z. Ghahramani, “Preference learning with Gaussian processes”, In Proceedings of the 22nd International Conference on Machine Learning, pp. 137-144, 2005.
9. W. Chu & S.S. Keerthi, “New approaches to support vector ordinal regression”, In Proceedings of the 22nd International Conference on Machine Learning, pp. 145-152, 2005.
10. D. Cossock & T. Zhang, “Subset ranking using regression”, In Proceedings of the 19th annual conference on Learning Theory, pp. 605-619, 2006.
11. F. Dammak, H. Kammoun, and A.B. Hamadou, “Improving pairwise learning to rank algorithms for document retrieval”, In of IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1-8, 2017.
12. E. Darrudi, H.B. Hashemi, A. Aleahmad, A. Habibian, A. ZarehBidoki, A. Shakery, and M. Rahgozar, “A standard web test collection for IR domain”, Technical Reoprt, Iran Telecommunication Research Center, 2009.
13. W. Fan, M.D. Gordon, and P. Pathak, “Ranking function optimization for effective web search by genetic programming: an empirical study”, In Proceedings of the 37th Hawaii International Conference on System Sciences, pp. 1-8, 2004.
14. W. Fan, M.D. Gordon, and P. Pathak, “On linear mixture of expert approaches to information retrieval”, Decision Support System, Vol. 42, No. 2, pp. 975-987, 2006.
15. N. Fuhr, “Optimum polynomial retrieval functions based on the probability ranking principle”, ACM Transactions on Information Systems, 183-204, 1989.
16. J. Gao, H. Qi, X. Xia, and J.Y. Nie, “Linear discriminant model for information retrieval”, In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 290-297, 2005.
17. Z. Hu, Y. Wang, Q. Peng, and H. Li, “Unbiased LambdaMART: An Unbiased Pairwise Learning-to-Rank Algorithm”, In Proceedings of the World Wide Web Conference, pp. 2830-2836, 2019.
18. K. Järvelin & J. Kekäläinen, “IR evaluation methods for retrieving highly relevant documents”, In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 41-48, 2000.
19. K. Järvelin & J. Kekäläinen, “Cumulated gain-based evaluation of IR techniques”, ACM Transactions on Information Systems, Vol. 20, No. 4, pp. 422-446, 2002.
20. X.B. Jin, G.G. Geng, G.S. Xie, and K. Huang, “Approximately optimizing NDCG using pair-wise loss”, Information Sciences, Vol. 453, pp. 50-65, 2018.
21. T. Joachims, “Optimizing search engines using clickthrough data”, In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 133-142, 2002.
22. T. Joachims, L.A. Granka, B. Pan, H.A. Hembrooke, and G .Gay, “Accurately Interpreting Clickthrough Data as Implicit Feedback”, In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 154-161, 2005.
23. A.H. Keyhanipour, B. Moshiri, F. Oroumchian, M. Rahgozar, and K. Badie, “Learning to rank: new approach with the layered multi-population genetic programming on click-through features”, Genetic Programming and Evolvable Machines, Vol. 17, pp. 203-230, 2016.
24. M. Köppel, A. Segner, M. Wagener, L. Pensel, A. Karwath, and S. Kramer, “Pairwise Learning to Rank by Neural Networks Revisited: Reconstruction, Theoretical Analysis and Practical Performance”, arXiv:1909.02768v1, 2019.
25. P. Li, Q. Wu, and C.J. Burges, “McRank: Learning to rank using multiple classification and gradient boosting”, Advances in Neural Information Processing Systems 20, pp. 845-852, 2008.
26. J.Y. Lin, H.R. Ke, B.C. Chien, and W.P. Yang, “Classifier design with feature selection and feature extraction using layered genetic programming”, Expert Systems with Applications, Vol. 34, 1384-1393, 2008.
27. Y. Lin, J. Wu, B. Xu, K. Xu, and H. Lin, “Learning to rank using multiple loss functions”, International Journal of Machine Learning and Cybernetics, Vol. 10, pp. 485-494, 2019.
28. T.Y. Liu, Learning to Rank for Information Retrieval, Berlin: Springer-Verlag, 2011.
29. H. Liu, Z. Wu, X. Zhang, “CPLR: Collaborative pairwise learning to rank for personalized recommendation, Knowledge-Based Systems”, Vol. 148, pp. 31-40, 2018.
30. C. Macdonald, I. Ounis, “Usefulness of Quality Click-through Data for Training”, In Proceedings of the 2009 workshop on Web Search Click Data, pp. 75-79, 2009.
31. C. Macdonald, R.L. Santos, I. Ounis, “On the Usefulness of Query Features for Learning to Rank”, In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 2559-2562, 2012.
32. C.D. Manning, P. Raghavan, and H. Schütze, An Introduction to Information Retrieval, Cambridge University Press, 2008.
33. R. Nallapati, “Discriminative models for information retrieval”, In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 64-71, 2004.
34. L. Pang, J. Xu, Q. Ai, Y. Lan, X. Cheng, and J. Wen, “SetRank: Learning a Permutation-Invariant Ranking Model for Information Retrieval”, arXiv:1912.05891v1, 2019.
35. T. Qin, T.Y. Liu, J. Xu, and H. Li, “LETOR: Benchmark dataset for research on learning to rank for information retrieval”, In Proceedings of the LR4IR 2007, in conjunction with SIGIR 2007, pp. 3-10, 2007.
36. T. Qin, T.Y. Liu, and H. Li, “A general approximation framework for direct optimization of information retrieval measures”, Information Retrieval, Vol. 13, No. 4, pp. 375-397, 2009.
37. R. Rahimi, A. Montazeralghaem, and J. Allan, “Listwise Neural Ranking Models”, In Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval, pp. 101-104, 2019.
38. E. Renshaw, A. Lazier, C. Burges, T. Shaked, M. Deeds, N. Hamilton, and G. Hullender, “Learning to rank using gradient descent”, In Proceedings of the 22nd International Conference on Machine Learning, pp. 89-96, 2005.
39. S. Tan, Z. Zhou, and P. Li, “Fast Item Ranking under Neural Network based Measures”, In Proceedings of the 13th International Conference on Web Search and Data Mining, pp. 591-599, 2020.
40. M. Taylor, J. Guiver, S. Robertson, and T. Minka, “Softrank: optimising non-smooth rank metrics”, In Proceedings of the 1st International Conference on Web Search and Web Data Mining, pp. 77-86, 2008.
41. A. Trotman, “Learning to rank”, Information Retrieval, Vol. 8, No. 3, pp. 359-381, 2005.
42. M.F. Tsai, T.Y. Liu, T. Qin, H.H. Chen, and W.Y. Ma, “Frank: a ranking method with fidelity loss”, In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 383-390, 2007.
43. M.N. Volkovs & R.S. Zemel, “Boltzrank: learning to maximize expected ranking gain”, In Proceedings of the 26th International Conference on Machine Learning, pp. 1089-1096, 2009.
44. J. Wang, L. Yu, W. Zhang, Y. Gong, Y. Xu, and B. Wang, “IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models”, In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 515-524, 2017.
45. T. Xia, S. Zhai, and S. Wang, “Analysis of Regression Tree Fitting Algorithms in Learning to Rank”, arXiv:1909.05965v1, 2019.
46. J. Xu, T.Y. Liu, M. Lu, H. Li, and W.Y. Ma, “Directly optimizing IR evaluation measures in learning to rank”, In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 107-114, 2008.
47. Q. Xu, M. Li, and M. Yu, “Learning to rank with relational graph and pointwise constraint for cross-modal retrieval”, Soft Computing, Vol. 23, pp. 9413-9427, 2019.
48. J.Y. Yeh, J.Y. Lin, H.R. Ke, and W.P. Yang, “Learning to Rank for Information Retrieval Using Genetic Programming”, In Proceedings of the SIGIR 2007 Workshop on Learning to Rank for Information Retrieval, pp. 1-8, 2007.
49. J.Y. Yeh, J.Y. Lin, H.R. Ke, and W.P. Yang, “Learning to Rank for Information Retrieval Using Layered Multi-Population Genetic Programming”, In Proceedings of the 2012 IEEE International Conference on Computational Intelligence and Cybernetics, pp. 45-49, 2012.
50. Y. Yue, T. Finley, F. Radlinski and T. Joachims, “A support vector method for optimizing average precision”, In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 271-278, 2007.
51. Z. Zheng, H. Zha, and G. Sun, “Query-level learning to rank using isotonic regression”, In Proceedings of the SIGIR 2008 Workshop on Learning to Rank for Information Retrieval, pp. 9-14, 2008.
52. W. Zhou, J. Li, Y. Zhou, M.H. Momen, “Bayesian pairwise learning to rank via one-class collaborative filtering”, Neurocomputing, Vol. 367, pp. 176-187, 2019.
53. X. Zhu & D. Klabjan, “Listwise Learning to Rank by Exploring Unique Ratings”, In Proceedings of the 13th International Conference on Web Search and Data Mining, pp. 798-806, 2020.
54. O. Zoeter, M. Taylor, E. Snelson, J. Guiver, N. Craswell, N., and M. Szummer, “A decision theoretic framework for ranking using implicit feedback”, In Proceedings of the SIGIR 2008 Workshop on Learning to Rank for Information Retrieval, pp. 24-31, 2008.

اشتراک گذاری

آدرس مقاله

یادگیری رتبه ‏بندی محتوای فارسی وب بر مبنای برنامه‏ نویسی ژنتیک چند لایه

رایمگ

پیوندهای سایت

مراکز مرتبط

پشتیبانی

صفحات رسمی