یادگیری رتبه بندی محتوای فارسی وب بر مبنای برنامه نویسی ژنتیک چند لایه
الموضوعات :
1 - -
الکلمات المفتاحية: یادگیری رتبه بندی, مدل برنامه نویسی ژنتیک چند لایه, ویژگی های کلیک از گذر داده, محتوای فارسی وب, مجموعه داده dotIR,
ملخص المقالة :
یادگیری رتبهبندی، یک رویکرد نو ظهور به منظور رفع چالشهای موجود و بهبود عملکرد جویشگرهای وب، بسیار امید بخش و کارآمد است. در عین حال عدم توجه جدی به سوابق تعاملات کاربران با جویشگر طی فرآیند جستجو و ارزیابی نتایج بدست آمده، یکی از معضلات جدی آن بشمار میرود. در عین حال حجم بسیار زیاد ویژگیهای مورد نیاز از اسناد و پرسوجوهای کاربران نیز کاربردی بودن این رویکرد را در شرایط واقعی با ابهام مواجه ساخته است. استفاده از مدل اطلاعات کلیک از گذر دادهها و تولید ویژگیهای کلیک از گذر داده، راهکار نوینی است که بر مبنای آن و با بکارگیری مدل برنامهنویسی ژنتیک چند لایه، مدل رتبهبندی مناسبی تحت عنوان MGP-Rank برای بازیابی اطلاعات انگلیسی وب، عرضه شده است. در این پژوهش این، با عنایت به ویژگیهای خاص زبان فارسی، از طریق ارائه سناریوهای مناسب برای ایجاد ویژگیهای کلیک از گذر داده این الگوریتم، این الگوریتم بومیسازی شده است. نتایج حاصل از ارزیابی عملکرد این الگوریتم در حوزه زبان فارسی با استفاده از مجموعه داده dotIR، حاکی از توانمندی قابل ملاحظه آن نسبت به روشهای مرجع رتبهبندی اطلاعات است. این بهبود عملکرد، بخصوص در بخش ابتدایی فهرست نتایج جستجو که غالباً بیشتر مورد مراجعه کاربران است، قابل توجه است.
1. Q. Ai, K. Bi, J. Guo, W.B. Croft Croft, “Learning a Deep Listwise Context Model for Ranking Refinement”, In Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 135-144, 2018.
2. Q. Ai, X. Wang, S. Bruch, N. Golbandi, M. Bendersky, and M. Najork, “Learning Groupwise Multivariate Scoring Functions Using Deep Neural Networks”, In Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval, pp. 85-92, 2019.
3. C.C. Alves, M.A. Gonçalves, D. Sousa, and T. Salles, “Generalized BROOF-L2R: A General Framework for Learning to Rank Based on Boosting and Random Forests”, In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pp. 95-104, 2016.
4. Z. Cao, T. Qin, T.Y. Liu, M.F. Tsai, and H. Li, “Learning to rank: from pairwise approach to listwise approach”, In Proceedings of the 24th International Conference on Machine Learning, pp. 129-136, 2007.
5. S. Chakrabarti, R. Khanna, U. Sawant, and C. Bhattacharyya, “Structured learning for nonsmooth ranking losses”, In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 88-96, 2008.
6. O. Chapelle & Y. Chang, “Yahoo! Learning to Rank Challenge Overview”, Journal of Machine Learning Research, pp. 14, 1-24, 2011.
7. O. Chapelle & M. Wu, “Gradient descent optimization of smoothed information retrieval metrics”, Information Retrieval, Vol. 13, No. 3, pp. 216-235, 2010.
8. W. Chu & Z. Ghahramani, “Preference learning with Gaussian processes”, In Proceedings of the 22nd International Conference on Machine Learning, pp. 137-144, 2005.
9. W. Chu & S.S. Keerthi, “New approaches to support vector ordinal regression”, In Proceedings of the 22nd International Conference on Machine Learning, pp. 145-152, 2005.
10. D. Cossock & T. Zhang, “Subset ranking using regression”, In Proceedings of the 19th annual conference on Learning Theory, pp. 605-619, 2006.
11. F. Dammak, H. Kammoun, and A.B. Hamadou, “Improving pairwise learning to rank algorithms for document retrieval”, In of IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1-8, 2017.
12. E. Darrudi, H.B. Hashemi, A. Aleahmad, A. Habibian, A. ZarehBidoki, A. Shakery, and M. Rahgozar, “A standard web test collection for IR domain”, Technical Reoprt, Iran Telecommunication Research Center, 2009.
13. W. Fan, M.D. Gordon, and P. Pathak, “Ranking function optimization for effective web search by genetic programming: an empirical study”, In Proceedings of the 37th Hawaii International Conference on System Sciences, pp. 1-8, 2004.
14. W. Fan, M.D. Gordon, and P. Pathak, “On linear mixture of expert approaches to information retrieval”, Decision Support System, Vol. 42, No. 2, pp. 975-987, 2006.
15. N. Fuhr, “Optimum polynomial retrieval functions based on the probability ranking principle”, ACM Transactions on Information Systems, 183-204, 1989.
16. J. Gao, H. Qi, X. Xia, and J.Y. Nie, “Linear discriminant model for information retrieval”, In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 290-297, 2005.
17. Z. Hu, Y. Wang, Q. Peng, and H. Li, “Unbiased LambdaMART: An Unbiased Pairwise Learning-to-Rank Algorithm”, In Proceedings of the World Wide Web Conference, pp. 2830-2836, 2019.
18. K. Järvelin & J. Kekäläinen, “IR evaluation methods for retrieving highly relevant documents”, In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 41-48, 2000.
19. K. Järvelin & J. Kekäläinen, “Cumulated gain-based evaluation of IR techniques”, ACM Transactions on Information Systems, Vol. 20, No. 4, pp. 422-446, 2002.
20. X.B. Jin, G.G. Geng, G.S. Xie, and K. Huang, “Approximately optimizing NDCG using pair-wise loss”, Information Sciences, Vol. 453, pp. 50-65, 2018.
21. T. Joachims, “Optimizing search engines using clickthrough data”, In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 133-142, 2002.
22. T. Joachims, L.A. Granka, B. Pan, H.A. Hembrooke, and G .Gay, “Accurately Interpreting Clickthrough Data as Implicit Feedback”, In Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 154-161, 2005.
23. A.H. Keyhanipour, B. Moshiri, F. Oroumchian, M. Rahgozar, and K. Badie, “Learning to rank: new approach with the layered multi-population genetic programming on click-through features”, Genetic Programming and Evolvable Machines, Vol. 17, pp. 203-230, 2016.
24. M. Köppel, A. Segner, M. Wagener, L. Pensel, A. Karwath, and S. Kramer, “Pairwise Learning to Rank by Neural Networks Revisited: Reconstruction, Theoretical Analysis and Practical Performance”, arXiv:1909.02768v1, 2019.
25. P. Li, Q. Wu, and C.J. Burges, “McRank: Learning to rank using multiple classification and gradient boosting”, Advances in Neural Information Processing Systems 20, pp. 845-852, 2008.
26. J.Y. Lin, H.R. Ke, B.C. Chien, and W.P. Yang, “Classifier design with feature selection and feature extraction using layered genetic programming”, Expert Systems with Applications, Vol. 34, 1384-1393, 2008.
27. Y. Lin, J. Wu, B. Xu, K. Xu, and H. Lin, “Learning to rank using multiple loss functions”, International Journal of Machine Learning and Cybernetics, Vol. 10, pp. 485-494, 2019.
28. T.Y. Liu, Learning to Rank for Information Retrieval, Berlin: Springer-Verlag, 2011.
29. H. Liu, Z. Wu, X. Zhang, “CPLR: Collaborative pairwise learning to rank for personalized recommendation, Knowledge-Based Systems”, Vol. 148, pp. 31-40, 2018.
30. C. Macdonald, I. Ounis, “Usefulness of Quality Click-through Data for Training”, In Proceedings of the 2009 workshop on Web Search Click Data, pp. 75-79, 2009.
31. C. Macdonald, R.L. Santos, I. Ounis, “On the Usefulness of Query Features for Learning to Rank”, In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 2559-2562, 2012.
32. C.D. Manning, P. Raghavan, and H. Schütze, An Introduction to Information Retrieval, Cambridge University Press, 2008.
33. R. Nallapati, “Discriminative models for information retrieval”, In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 64-71, 2004.
34. L. Pang, J. Xu, Q. Ai, Y. Lan, X. Cheng, and J. Wen, “SetRank: Learning a Permutation-Invariant Ranking Model for Information Retrieval”, arXiv:1912.05891v1, 2019.
35. T. Qin, T.Y. Liu, J. Xu, and H. Li, “LETOR: Benchmark dataset for research on learning to rank for information retrieval”, In Proceedings of the LR4IR 2007, in conjunction with SIGIR 2007, pp. 3-10, 2007.
36. T. Qin, T.Y. Liu, and H. Li, “A general approximation framework for direct optimization of information retrieval measures”, Information Retrieval, Vol. 13, No. 4, pp. 375-397, 2009.
37. R. Rahimi, A. Montazeralghaem, and J. Allan, “Listwise Neural Ranking Models”, In Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval, pp. 101-104, 2019.
38. E. Renshaw, A. Lazier, C. Burges, T. Shaked, M. Deeds, N. Hamilton, and G. Hullender, “Learning to rank using gradient descent”, In Proceedings of the 22nd International Conference on Machine Learning, pp. 89-96, 2005.
39. S. Tan, Z. Zhou, and P. Li, “Fast Item Ranking under Neural Network based Measures”, In Proceedings of the 13th International Conference on Web Search and Data Mining, pp. 591-599, 2020.
40. M. Taylor, J. Guiver, S. Robertson, and T. Minka, “Softrank: optimising non-smooth rank metrics”, In Proceedings of the 1st International Conference on Web Search and Web Data Mining, pp. 77-86, 2008.
41. A. Trotman, “Learning to rank”, Information Retrieval, Vol. 8, No. 3, pp. 359-381, 2005.
42. M.F. Tsai, T.Y. Liu, T. Qin, H.H. Chen, and W.Y. Ma, “Frank: a ranking method with fidelity loss”, In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 383-390, 2007.
43. M.N. Volkovs & R.S. Zemel, “Boltzrank: learning to maximize expected ranking gain”, In Proceedings of the 26th International Conference on Machine Learning, pp. 1089-1096, 2009.
44. J. Wang, L. Yu, W. Zhang, Y. Gong, Y. Xu, and B. Wang, “IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models”, In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 515-524, 2017.
45. T. Xia, S. Zhai, and S. Wang, “Analysis of Regression Tree Fitting Algorithms in Learning to Rank”, arXiv:1909.05965v1, 2019.
46. J. Xu, T.Y. Liu, M. Lu, H. Li, and W.Y. Ma, “Directly optimizing IR evaluation measures in learning to rank”, In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 107-114, 2008.
47. Q. Xu, M. Li, and M. Yu, “Learning to rank with relational graph and pointwise constraint for cross-modal retrieval”, Soft Computing, Vol. 23, pp. 9413-9427, 2019.
48. J.Y. Yeh, J.Y. Lin, H.R. Ke, and W.P. Yang, “Learning to Rank for Information Retrieval Using Genetic Programming”, In Proceedings of the SIGIR 2007 Workshop on Learning to Rank for Information Retrieval, pp. 1-8, 2007.
49. J.Y. Yeh, J.Y. Lin, H.R. Ke, and W.P. Yang, “Learning to Rank for Information Retrieval Using Layered Multi-Population Genetic Programming”, In Proceedings of the 2012 IEEE International Conference on Computational Intelligence and Cybernetics, pp. 45-49, 2012.
50. Y. Yue, T. Finley, F. Radlinski and T. Joachims, “A support vector method for optimizing average precision”, In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 271-278, 2007.
51. Z. Zheng, H. Zha, and G. Sun, “Query-level learning to rank using isotonic regression”, In Proceedings of the SIGIR 2008 Workshop on Learning to Rank for Information Retrieval, pp. 9-14, 2008.
52. W. Zhou, J. Li, Y. Zhou, M.H. Momen, “Bayesian pairwise learning to rank via one-class collaborative filtering”, Neurocomputing, Vol. 367, pp. 176-187, 2019.
53. X. Zhu & D. Klabjan, “Listwise Learning to Rank by Exploring Unique Ratings”, In Proceedings of the 13th International Conference on Web Search and Data Mining, pp. 798-806, 2020.
54. O. Zoeter, M. Taylor, E. Snelson, J. Guiver, N. Craswell, N., and M. Szummer, “A decision theoretic framework for ranking using implicit feedback”, In Proceedings of the SIGIR 2008 Workshop on Learning to Rank for Information Retrieval, pp. 24-31, 2008.