Effective solving the One-Two Gap Problem in the PageRank algorithm
محورهای موضوعی : Pervasive computingJavad Paksima 1 , Homa Khajeh 2
1 - Department of Engineering, Payame Noor University,Tehran,Iran
2 - Department of Engineering, Science and art University, Yazd, Ira
کلید واژه: One-Two Gap , PageRank , Search Engine , Web Graph,
چکیده مقاله :
One of the criteria for search engines to determine the popularity of pages is an analysis of links in the web graph, and various methods have already been presented in this regard. The PageRank algorithm is the oldest web page ranking methods based on web graph and is still used as one of the important factors of web pages on Google. Since the invention of this method, several bugs have been published and solutions have been proposed to correct them. The most important problem that is most noticed is pages without an out link or so-called suspended pages. In web graph analysis, we noticed another problem that occurs on some pages at the out degree of one, and the problem is that under conditions, the linked page score is more than the home page. This problem can generate unrealistic scores for pages, and the link chain can invalidate the web graph. In this paper, this problem has been investigated under the title "One-Two Gap", and a solution has been proposed to it. Experimental results show that fixing of the One-Two gap problem using the proposed solution. Test standard benchmark dataset, TREC2003, is applied to evaluate the proposed method. The experimental results show that our proposed method outperforms PageRank method theoretically and experimentally in the term of precision, accuracy, and sensitivity with such criteria as PD, P@n, NDCG@n, MAP, and Recall.
One of the criteria for search engines to determine the popularity of pages is an analysis of links in the web graph, and various methods have already been presented in this regard. The PageRank algorithm is the oldest web page ranking methods based on web graph and is still used as one of the important factors of web pages on Google. Since the invention of this method, several bugs have been published and solutions have been proposed to correct them. The most important problem that is most noticed is pages without an out link or so-called suspended pages. In web graph analysis, we noticed another problem that occurs on some pages at the out degree of one, and the problem is that under conditions, the linked page score is more than the home page. This problem can generate unrealistic scores for pages, and the link chain can invalidate the web graph. In this paper, this problem has been investigated under the title "One-Two Gap", and a solution has been proposed to it. Experimental results show that fixing of the One-Two gap problem using the proposed solution. Test standard benchmark dataset, TREC2003, is applied to evaluate the proposed method. The experimental results show that our proposed method outperforms PageRank method theoretically and experimentally in the term of precision, accuracy, and sensitivity with such criteria as PD, P@n, NDCG@n, MAP, and Recall.
[1] K. Purcell, J. Brenner, and L. Rainie, “Search Engine Use 2012,” 2012.
[2] Y. Zhang, B. J. Jansen, and A. Spink, “Time series analysis of a Web search engine transaction log,” Information Processing & Management, vol. 45, no. 2, pp. 230–245, 2009.
[3] R. Baeza-Yates and B. Ribeiro-Neto, “Modern information retrieval,” New York, vol. 9, p. 513, 1999.
[4] Searchmetrics, “Searchmetrics Ranking Factors 2016: Rebooting for Relevance,” 2016. [Online]. Available: http://www.searchmetrics.com/knowledge-base/ranking-factors/. [Accessed: 07-May-2017].
[5] L. Page, S. Brin, R. Motwani, and T. Winograd, “The PageRank Citation Ranking: Bringing Order to the Web,” World Wide Web Internet And Web Information Systems, vol. 54, no. 1999–66, pp. 1–17, 1998.
[6] G.-R. Xue, Q. Yang, H.-J. Zeng, Y. Yu, and Z. Chen, “Exploiting the hierarchical structure for link analysis,” Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 186--193, 2005.
[7] J. M. Kleinberg, “Authoritative sources in a hyperlinked environment,” Journal of the ACM, vol. 46, no. 5, pp. 604–632, Sep. 1999.
[8] R. Patchmuthu, “Link analysis algorithms to handle hanging and spam pages,” 2014.
[9] L. Page, S. Brin, R. Motwani, and T. Winograd, “The PageRank Citation Rankin: Bringing Order to the Web,” World Wide Web Internet And Web Information Systems, vol. 54, no. 1999–66, pp. 1–17, 1998.
[10] M. Bianchini, M. Gori, and F. Scarselli, “Inside pagerank,” ACM Transactions on Internet Technology (TOIT), vol. 5, no. 1, pp. 92–128, 2005.
[11] A.-L. Barabási and R. Albert, “Emergence of Scaling in Random Networks,” Science, vol. 286, no. October, pp. 509–512, 1999.
[12] L. Z. Xiang, “Research and Improvement of PageRank Sort Algorithm Based on Retrieval Results,” in Intelligent Computation Technology and Automation (ICICTA), 2014 7th International Conference on, 2014, pp. 468–471.
[13] Z. Dou, R. Song, J.-Y. Nie, and J.-R. Wen, “Using anchor texts with their hyperlink structure for web search,” in Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, 2009, pp. 227–234.
[14] A. N. Langville and C. D. Meyer, “A reordering for the PageRank problem,” SIAM Journal on Scientific Computing, vol. 27, no. 6, pp. 2112–2120, 2006.
[15] R. K. Patchmuthu, A. K. SINGH, and A. Mohan, “A new algorithm for detection of link spam contributed by zero-out link pages,” Turkish Journal of Electrical Engineering & Computer Sciences, vol. 24, no. 4, pp. 2106–2123, 2016.
[16] I. C. F. Ipsen and T. M. Selee, “PageRank computation, with special attention to dangling nodes,” SIAM Journal on Matrix Analysis and Applications, vol. 29, no. 4, pp. 1281–1296, 2007.
[17] X. Wang, T. Tao, J.-T. Sun, A. Shakery, and C. Zhai, “Dirichletrank: Solving the zero-one gap problem of pagerank,” ACM Transactions on Information Systems (TOIS), vol. 26, no. 2, p. 10, 2008.
[18] K. Berlt, E. S. de Moura, A. Carvalho, M. Cristo, N. Ziviani, and T. Couto, “Modeling the web as a hypergraph to compute page reputation,” Information Systems, vol. 35, no. 5, pp. 530–543, 2010.
[19] A. N. Nikolakopoulos and J. D. Garofalakis, “NCDawareRank: a novel ranking method that exploits the decomposable structure of the web,” in Proceedings of the sixth ACM international conference on Web search and data mining, 2013, pp. 143–152.
[20] A. Broder et al., “Graph structure in the web,” Computer networks, vol. 33, no. 1, pp. 309–320, 2000.
[21] D. Donato, L. Laura, S. Leonardi, and S. Millozzi, “Large scale properties of the webgraph,” The European Physical Journal B-Condensed Matter and Complex Systems, vol. 38, no. 2, pp. 239–243, 2004.
[22] A. Flammini, F. Menczer, and A. Vespignani, “The egalitarian effect of search engines,” arXiv preprint cs.CY/0511005, 2005.
[23] L. Becchetti and C. Castillo, “The distribution of PageRank follows a power-law only for particular values of the damping factor,” in Proceedings of the 15th international conference on World Wide Web, 2006, pp. 941–942.
[24] G. Pandurangan, P. Raghavan, and E. Upfal, “Using pagerank to characterize web structure,” Internet Mathematics, vol. 3, no. 1, pp. 1–20, 2006.
[25] N. Litvak, W. R. W. Scheinhardt, and Y. Volkovich, “In-Degree and PageRank of Web pages: Why do they follow similar power laws?,” arXiv preprint math/0607507, 2006.
[26] P. Zha, X. Xu, and M. Zuo, “An Efficient Improved Strategy for the PageRank Algorithm,” in 2011 International Conference on Management and Service Science, 2011, pp. 1–4.
[27] S. Setayesh, A. Harounabadi, and A. M. Rahmani, “Presentation of an Extended Version of the PageRank Algorithm to Rank Web Pages Inspired by Ant Colony Algorithm,” International Journal of Computer Applications, vol. 85, no. 17, 2014.
[28] K. Mohan and J. Kurmi, “A Technique to Improved Page Rank Algorithm in perspective to Optimized Normalization Technique,” International Journal, vol. 8, no. 3, 2017.
[29] N. L. Amy and D. M. Carl, “Google’s PageRank: The Math Behind the Search Engine,” Priceton university Press, Nol, vol. 3, pp. 335–380, 2004.
[30] W. Xing and A. Ghorbani, “Weighted PageRank algorithm,” in Proceedings of the Second Annual Conference on Communication Networks and Services Research, 2004, pp. 305–314.
[31] F. Mathieu and M. Bouklit, “The effect of the back button in a random walk: application for pagerank,” in Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters, 2004, pp. 370–371.
[32] M. Zhukovskii, G. Gusev, and P. Serdyukov, “URL redirection accounting for improving link-based ranking methods,” in Advances in Information Retrieval, Springer, 2013, pp. 656–667.
[33] Z. Bahrami Bidoni, R. George, and K. Shujaee, “A Generalization of the PageRank Algorithm,” ICDS 2014, The Eighth International Conference on Digital Society, pp. 108–113, 2014.
[34] A. K. Singh, “A comparative study of page ranking algorithms for information retrieval,” International journal of electrical and computer engineering, vol. 4, pp. 469--480, 2009.
[35] A. M. Zareh Bidoki and N. Yazdani, “DistanceRank: An intelligent ranking algorithm for web pages,” Information Processing and Management, vol. 44, no. 2, pp. 877–892, 2008.
[36] B. Poblete, C. Castillo, and A. Gionis, “Dr. searcher and mr. browser: a unified hyperlink-click graph,” in Proceedings of the 17th ACM conference on Information and knowledge management, 2008, pp. 1123–1132.
[37] R. Baeza-Yates, P. Boldi, and C. Castillo, “Generalizing pagerank: Damping functions for link-based ranking algorithms,” in Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, 2006, pp. 308–315.
[38] T. Qin, T. Y. Liu, J. Xu, and H. Li, “LETOR: A benchmark collection for research on learning to rank for information retrieval,” Information Retrieval, vol. 13, pp. 346–374, 2010.
[39] T. H. Haveliwala, “Efficient computation of PageRank,” 1999.
[40] M. Arora, U. Kanjilal, and D. Varshney, “Evaluation of information retrieval: Precision and recall,” International Journal of Indian Culture and Business Management, vol. 12, no. 2, pp. 224–236, 2016.