The Surfer Model with a Hybrid Approach to Ranking the Web Pages
Subject Areas :Javad Paksima 1 , Homa Khajeh 2
1 - Department of Engineering, Science and Art University, Yazd, Iran
2 - Department of Engineering, Science and Art University, Yazd, Iran
Keywords: Ranking, Web Pages, Surfer Model, Learning Automata, Information Retrieval,
Abstract :
Users who seek results pertaining to their queries are at the first place. To meet users’ needs, thousands of webpages must be ranked. This requires an efficient algorithm to place the relevant webpages at first ranks. Regarding information retrieval, it is highly important to design a ranking algorithm to provide the results pertaining to user’s query due to the great deal of information on the World Wide Web. In this paper, a ranking method is proposed with a hybrid approach, which considers the content and connections of pages. The proposed model is a smart surfer that passes or hops from the current page to one of the externally linked pages with respect to their content. A probability, which is obtained using the learning automata along with content and links to pages, is used to select a webpage to hop. For a transition to another page, the content of pages linked to it are used. As the surfer moves about the pages, the PageRank score of a page is recursively calculated. Two standard datasets named TD2003 and TD2004 were used to evaluate and investigate the proposed method. They are the subsets of dataset LETOR3. The results indicated the superior performance of the proposed approach over other methods introduced in this area.
[1] R. Albert, H. Jeong, and A.-L. Barab&225;si. “Internet: Diameter of the world-wide web,” Nature, vol. 401, no. 6749, pp. 130–131, Sep. 1999.#
[2] S. Lawrence and C. L. Giles. “Accessibility of information on the web,” Nature, vol. 400, no. 6740, p. 107, Jul. 1999.#
[3] K. Purcell, J. Brenner, and L. Rainie. (2012, Oct.). “Search Engine Use 2012,” PEW Research Center, [Online]. p. 42, 2012 Available: http://www.pewinternet.org/2012/03/09/search-engine-use-2012/. [Sep. 1,2014].#
[4] S. E. Robertson, and S. Walker. "Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval," in Proc. 17th annual international ACM SIGIR conf. Research and development information retrieval, Springer-Verlag New York, Inc, 1994, pp. 232-241.#
[5] G. Salton and C. Buckley. “Term-weighting approaches in automatic text retrieval.” Information processing and management, vol. 24, no. 5, pp. 513–523, Dec. 1988.#
[6] M. R. Henzinger, R. Motwani, and C. Silverstein. “Challenges in web search engines.” In ACM SIGIR Forum, vol. 36, no. 2, pp. 11–22, Sep. 2002.#
[7] L. Page, S. Brin, R. Motwani, and T. Winograd. “The PageRank Citation Ranking: Bringing Order to the Web.” World Wide Web Internet Web Information System, vol. 54, no. 1999–66, pp. 1–17, Jan. 1998.#
[8] G.-R. Xue, Q. Yang, H.-J. Zeng, Y. Yu, and Z. Chen. “Exploiting the hierarchical structure for link analysis,” in Proc. 28th annual international ACM SIGIR Conf. Research and development in information retrieval, 2005, pp. 186–193.#
[9] A.-L. Barab&225;si and R. Albert, “Emergence of scaling in random networks.” Science, vol. 286, pp. 509–512, Oct. 1999.#
[10] J. M. Kleinberg. “Authoritative Sources in a Hyperlinked Environment.” Journal of the ACM (JACM), vol. 46, pp. 668–677, Sep. 1999.#
[11] F. Qiu and J. Cho. “Automatic identification of user interest for personalized search,” in Proc. 15th int. conf. World Wide Web, 2006, pp. 727–736.#
[12] G. Salton. The SMART retrieval system—experiments in automatic document processing. Amsterdam: IOS Press, 1971.#
[13] P. Ghodsnia, A. M. Z. Bidoki, and N. Yazdani. “A punishment/reward based approach to ranking,” in Proc. 2nd int. conf. Scalable information systems, 2007, p. 58.#
[14] W.-C. Peng and Y.-C. Lin. “Ranking web search results from personalized perspective,” in E-Commerce Technology, 2006. 8th IEEE Int. Conf. Enterprise Computing, E-Commerce, and E-Services, 3rd IEEE Int. Conf., 2006, p. 12.#
[15] A. Kritikopoulos, M. Sideri, and I. Varlamis, “WordRank: A Method for Ranking Web Pages Based on Content Similarity,” presented at Databases, 2007. BNCOD’07. 24th British Nat. Conf., 2007, pp. 92–100.#
[16] W. Xing and A. Ghorbani, “Weighted pagerank algorithm,” Commun. Networks and Services Research, 2004. in Proc.. 2nd Annual Conf., 2004, pp. 305–314.#
[17] Z. Gy&246;ngyi, H. Garcia-Molina, and J. Pedersen. “Combating web spam with trustrank,” in Proc. 30th Int. Conf. Very large data bases-Vol. 30, 2004, pp. 576–587.#
[18] S. Pandey, S. Roy, C. Olston, J. Cho, and S. Chakrabarti. “Shuffling a stacked deck: the case for partially randomized ranking of search engine results,” in Proc. 31st Int. Conf. Very large data bases, 2005, pp. 781–792.#
[19] E. Khodadadian, M. Ghasemzadeh, V. Derhami, and S. A. Mirsoleimani, “A novel ranking algorithm based on Reinforcement Learning.” presented at 16th CSI Int. Symposium on Artificial Intelligence and Signal Processing (AISP 2012), 2012, pp. 546–551.#
[20] V. Derhami, E. Khodadadian, M. Ghasemzadeh, and A. M. Zareh Bidoki. “Applying reinforcement learning for web pages ranking algorithms.” Applied Soft Computing, vol. 13, no. 4, pp. 1686–1692, Apr. 2013.#
[21] A. M. Zareh Bidoki and N. Yazdani. “DistanceRank: An intelligent ranking algorithm for web pages.” Information Processing and Management, vol. 44, no. 2, pp. 877–892, Mar. 2008.#
[22] R. Lempel, and S. Moran. "SALSA: the stochastic approach for link-structure analysis." ACM Transactions on Information Systems (TOIS), vol. 19, no. 2, pp. 131-160, Apr. 2001.#
[23] A. Shakery and C. Zhai. “Relevance Propagation for Topic Distillation UIUC TREC 2003 Web Track Experiments.” in TREC 2003, 2003, pp. 673–677.#
[24] A. M. Z. Bidoki and J. A. Thom. “Combination of documents features based on simulated click-through data.” in Advances in Information Retrieval, Springer Berlin Heidelberg, 2009, pp. 538–545.#
[25] L. Rodrigues and S. Jaswal. “An Efficient Page Ranking Approach Based on Hybrid Model.” presented at Adv. in Computing and Communation Engeering (ICACCE), 2015 2nd Int. Conf., 2015, pp. 693–696.#
[26] T. H. Haveliwala. “Topic-Sensitive PageRank.” in Proc. 11th Int. Conf. World Wide Web, Www2002.Org, 2002, pp. 517–526.#
[27] K. S. Narendra and M. A. L. Thathachar. Learning automata: an introduction. Courier Corporation, 2012.#
[28] S. Lakshmivarahan and M. A. L. Thathachar. “Bounds on the convergence probabilities of learning automata.” IEEE Transactions on Systems, Man, and Cybernetics, A Systems Humans, vol. 6, no. 11, pp. 756–763, Nov. 1976.#
[29] E. Billard and S. Lakshmivarahan. “Learning in multilevel games with incomplete information. I.” Systems Man, and Cybernetics Part B Cybernetics IEEE Trans., vol. 29, no. 3, pp. 329–339, Jun. 1999.#
[30] J. A. Torkestani and M. R. Meybodi. “A New Vertex Coloring Algorithm Based on Variable Aaction-Set Learning Automata.” Computing and Informatics, vol. 29, no. 3, pp. 447–466, Jan. 2012.#
[31] N. Kumar, N. Chilamkurti, and J. J. P. C. Rodrigues. “Learning automata-based opportunistic data aggregation and forwarding scheme for alert generation in vehicular ad hoc networks.” Computer Communications, vol. 39, pp. 22–32, Feb. 2014.#
[32] M. Hasanzadeh and M. R. Meybodi. “Grid resource discovery based on distributed learning automata.” Computing, vol. 96, no. 9, pp. 909–922, Sep. 2014.#
[33] S. Bhattacharyya, A. Sengupta, T. Chakraborti, A. Konar, and D. N. Tibarewala. “Automatic feature selection of motor imagery EEG signals using differential evolution and learning automata.” Medical & biological engineering & computing, vol. 52, no. 2, pp. 131–139, Feb. 2014.#
[34] J. Akbari Torkestani. “An adaptive learning to rank algorithm: Learning automata approach,” Decision Support Systems, vol. 54, no. 1, pp. 574–583, Dec. 2012.#
[35] J. A. Torkestani. “An adaptive focused web crawling algorithm based on learning automata.” Applied Intelligence, vol. 37, no. 4, pp. 586–601, Dec. 2012.#
[36] H. Beigy and M. R. Meybodi. “A mathematical framework for cellular learning automata.” Advances in Complex Systems, vol. 7, no. 03n04, pp. 295–319, Sep. 2004.#
[37] T. H. Haveliwala. (1999,Oct.). “Efficient computation of PageRank,” Technical Report, Database Group, Computer Science Department, Stanford University, [On-line]. 1999. Available: http://ilpubs.stanford.edu:8090/386/ [Dec., 1, 2014].#
[38] T. Qin, T. Y. Liu, J. Xu, and H. Li. “LETOR: A benchmark collection for research on learning to rank for information retrieval,” Information Retrieval, vol. 13, no. 4, pp. 346–374, Jan. 2010.#
[39] J. Xu, T.-Y. Liu and H. Li. “The Evaluation Tool in LETOR,” Microsoft Research Asia [Online]. Available: http://research.microsoft.com/en-us/um/beijing/projects/letor/LETOR3.0/EvaluationTool.zip [Dec. 1,2014].#
[40] T.-Y. Liu. Learning to Rank for Information retrieval, Springer Science & Business Media, 2011.#
[41] D. H. T. N. Sander Bockting. “A cross-benchmark comparison of 87 learning to rank methods.” Information processing & management, vol. 51, no. 6, pp. 757–772, Nov. 2015.#