Effective Query Recommendation with Medoid-based Clustering using a Combination of Query, Click and Result Features
محورهای موضوعی : Semantic WebElham Esmaeeli-Gohari 1 , Sajjad Zarifzadeh 2
1 - Yazd University
2 - Yazd University
کلید واژه: Recommendation Systems, , Search Engine, , Clustering, , Query, , Click, ,
چکیده مقاله :
Query recommendation is now an inseparable part of web search engines. The goal of query recommendation is to help users find their intended information by suggesting similar queries that better reflect their information needs. The existing approaches often consider the similarity between queries from one aspect (e.g., similarity with respect to query text or search result) and do not take into account different lexical, syntactic and semantic templates exist in relevant queries. In this paper, we propose a novel query recommendation method that uses a comprehensive set of features to find similar queries. We combine query text and search result features with bipartite graph modeling of user clicks to measure the similarity between queries. Our method is composed of two separate offline (training) and online (test) phases. In the offline phase, it employs an efficient k-medoids algorithm to cluster queries with a tolerable processing and memory overhead. In the online phase, we devise a randomized nearest neighbor algorithm for identifying most similar queries with a low response-time. Our evaluation results on two separate datasets from AOL and Parsijoo search engines show the superiority of the proposed method in improving the precision of query recommendation, e.g., by more than 20% in terms of p@10, compared with some well-known algorithms.
Query recommendation is now an inseparable part of web search engines. The goal of query recommendation is to help users find their intended information by suggesting similar queries that better reflect their information needs. The existing approaches often consider the similarity between queries from one aspect (e.g., similarity with respect to query text or search result) and do not take into account different lexical, syntactic and semantic templates exist in relevant queries. In this paper, we propose a novel query recommendation method that uses a comprehensive set of features to find similar queries. We combine query text and search result features with bipartite graph modeling of user clicks to measure the similarity between queries. Our method is composed of two separate offline (training) and online (test) phases. In the offline phase, it employs an efficient k-medoids algorithm to cluster queries with a tolerable processing and memory overhead. In the online phase, we devise a randomized nearest neighbor algorithm for identifying most similar queries with a low response-time. Our evaluation results on two separate datasets from AOL and Parsijoo search engines show the superiority of the proposed method in improving the precision of query recommendation, e.g., by more than 20% in terms of p@10, compared with some well-known algorithms.
[1] J. Wang, J. Z. Huang, J. Guo, and Y. Lan, “Query Ranking Model for Search Engine Query Recommendation”, International Journal of Machine Learning and Cybernetics, Vol. 8, No. 3, 2015, pp. 1-20.
[2] M. Mitsui, “A Generative Framework to Query Recommendation and Evaluation”, in ACM Conference on Human Information Interaction and Retrieval (CHIIR), 2017, pp. 407-409.
[3] X. Zhu, J. Guo, X. Cheng and Y. Lan, “More than Relevance: High Utility Query Recommendation by Mining Users' Search Behaviors”, in 21st ACM International Conference on Information and Knowledge Management (CIKM), 2012, pp. 1814-1814.
[4] P. Melville and V. Sindhwani, “Recommender Systems”, Encyclopedia of Machine Learning, Springer US, pp. 829-838, 2011.
[5] Y. Liu, J. Miao, M. Zhang, S. Ma and L. Ru, “How Do Users Describe Their Information Need: Query Recommendation based on Snippet Click Model”, Expert Systems with Applications, Vol. 38, No. 11, 2011, pp. 13:847-13:856.
[6] W. Song, J. Z. Liang, X. L. Cao and S. C. Park, “An Effective Query Recommendation Approach using Semantic Strategies for Intelligent Information Retrieval”, Expert Systems with Applications, Vol. 41, No. 2, 2014, pp. 366-372.
[7] Y. Song, D. Zhou and L. W. He, “Query Suggestion by Constructing Term-transition Graphs”, in 5th ACM International Conference on Web Search and Data Mining (WSDM), 2012, pp. 353-362.
[8] Web Search Query Log Downloads. https://www.jeffhuang.com/search_query_logs.html (2018).
[9] Z. Zhang and O. Nasraoui, “Mining Search Engine Query Logs for Query Recommendation”, in 15th International Conference on World Wide Web (WWW), 2006, pp. 1039–1040.
[10] J. R. Wen, J. Y. Nie and H. J. Zhang, “Clustering User Queries of a Search Engine”, in 10th International Conference on World Wide Web (WWW), 2001, pp. 162-168.
[11] R. Baeza-Yates, C. Hurtado and M. Mendoza, “Query Recommendation using Query Logs in Search Engines”, in International Conference on Extending Database Technology (EDBT), 2004, pp. 588-596.
[12] Y. Hong, J. Vaidya and H. Lu, “Search Engine Query Clustering using Top-k Search Results”, in IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology (WI-IAT), 2011, pp. 112-119.
[13] R. Chaudhary and N. Taneja, “A Novel Approach for Query Recommendation via Query Logs”, International Journal of Scientific and Engineering Research (IJSER), Vol. 3, No. 8, 2012, pp. 1-6.
[14] L. Fitzpatrick and M. Dent, “Automatic Feedback using Past Queries: Social Searching?”, in 20th Annual International Conference on Research and Development in Information Retrieval (SIGIR), 1997, pp. 306–313.
[15] K. Sugiyama, K. Hatano and M. Yoshikawa, “Adaptive Web Search based on User Profile Constructed without any Effort from Users”, in 13th Conference on World Wide Web (WWW), 2004, pp. 675–684.
[16] B. J. Chelliah, R. Ojha, S. Semwal, P. Dobhal and C. Sahu, “Personalized Search Engine with Query Recommendation and Re-ranking”, Journal of Network Communications and Emerging Technologies (JNCET), Vol. 8, No. 4, 2018, pp. 213-217.
[17] Q. Mei, D. Zhou and K. Church, “Query Suggestion using Hitting Time”, in 17th ACM Conference on Information and Knowledge Management (CIKM), 2008, pp. 469–478.
[18] P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis and S. Vigna, “The Query-flow Graph: Model and Applications”, in 17th ACM Conference on Information and Knowledge Management (CIKM), 2008, pp. 609-618.
[19] A. Anagnostopoulos, L. Becchetti, C. Castillo and A. Gionis, “An Optimization Framework for Query Recommendation”, in 3rd ACM International Conference on Web Search and Data Mining (WSDM), 2010, pp. 161-170.
[20] L. Bai, J. Guo and X. Cheng, “Query Recommendation by Modelling the Query-flow Graph”, in Asia Information Retrieval Symposium (AIRS), 2011, pp. 137-146.
[21] F. Bonchi, R. Perego, F. Silvestri, H. Vahabi and R. Venturini, “Recommendations for the Long Tail by Term-Query Graph”, in 20th International Conference on World Wide Web (WWW), 2011, pp. 15-16.
[22] J. Wang, J. Z. Huang, D. Wu, J. Guo and Y. Lan, “An Incremental Model on Search Engine Query Recommendation”, Elsevier Neurocomputing, Vol. 218, 2016, pp. 423-431.
[23] A. Sordoni, Y. Bengio, H. Vahabi, C. Lioma, J. G. Simonsen and J. Y. Nie, “A Hierarchical Recurrent Encoder-decoder for Generative Context-aware Query Suggestion”, in 24th ACM International Conference on Information and Knowledge Management (CIKM), 2015, pp. 553-562.
[24] Z. Huang, B. Cautis, R. Cheng and Y. Zheng, “Kb-enabled Query Recommendation for Long-tail queries”, in 25th ACM International Conference on Information and Knowledge Management(CIKM), 2016, pp. 2107-2112.
[25] R. Cheng, Y. Zheng and J. Yan, “Entity-based Query Recommendation for Long-tail Queries”, ACM Transactions on Knowledge Discovery from Data, Vol. 1, No. 1, 2018, pp. 1-22.
[26] S. Kannan and V. Gurusamy, “Preprocessing techniques for text mining”, International Journal of Computer Science & Communication Networks, Vol. 5, No. 1, 2014, pp. 1-7.
[27] Hunspell GitHub. https://github.com/hunspell (2018).
[28] H. Taghi-Zadeh, M. H. Sadreddini, M. H. Diyanati, and A. H. Rasekh, “A New Hybrid Stemming Method for Persian Language”, Digit. Scholarsh. Humanit., Vol. 32, No. 1, 2017, pp. 209–221.
[29] Infront Webworks: Value of Organic First-page Results. https://www.infront.com/blogs/the-infront-blog/2015/6/17/value-of-first-page-google-results (2018).
[30] T. Joachims, L. Granka, B. Pan, H. Hembrooke, F. Radlinski and G. Gay, “Evaluating the Accuracy of Implicit Feedback from Clicks and Query Reformulations in Web Search”, ACM Transactions on Information Systems (TOIS), Vol. 25, No. 2, 2007, pp. 183-190.
[31] L. Kaufman, and P.J. Rousseeuw, “Clustering by Means of Medoids in Statistical Data Analysis based on the L1–norm and Related Methods”, North-Holland, pp. 405–416, 1987.
[32] H. S. Park, and C. H. Jun, “A Simple and Fast Algorithm for K-medoids Clustering”, Elsevier Expert Systems with Applications, Vol. 36, No. 2, 2009, pp. 3336-3341.
[33] T. Shakiba, S. Zarifzadeh, and V. Derhami, “Spam Query Detection using Stream Clustering”, Springer World Wide Web (WWW), Vol. 21, No. 2, 2018, pp. 557–572.
[34] M. Fallah, S. Zarifzadeh, “Practical Detection of Click Spams using Efficient Classification-based Algorithms”, International Journal of Information and Communication Technology Research, Vol. 10, No. 2, 2018, pp. 63-71.
[35] C. D. Manning, P. Raghavan and H. Schütze, Introduction to Information Retrieval, Cambridge University Press, 2008.