مرکز منطقه ای اطلاع رسانی علوم و فناوری فصلنامه فناوری اطلاعات و ارتباطات ایران 2717-0411 10 37 2020 6 21 Learning to Rank for the Persian Web Using the Layered Genetic Programming یادگیری رتبه ‏بندی محتوای فارسی وب بر مبنای برنامه‏ نویسی ژنتیک چند لایه 45 70 fa امیرحسین کیهانی پور دانشگاه تهران 2020 3 5 Learning to rank (L2R) has emerged as a promising approach in handling the existing challenges of Web search engines. However, there are major drawbacks with the present learning to rank techniques. Current L2R algorithms do not take into account to the search behavior of the users embedded in their search sessions’ logs. On the other hand, machine-learning as a data-intensive process requires a large volume of data about users’ queries as well as Web documents. This situation has made the usage of L2R techniques questionable in the real-world applications. Recently, by the use of the click-through data model and based on the generation of click-through features, a novel approach is proposed, named as MGP-Rank. Using the layered genetic-programming model, MGP-Rank has achieved noticeable performance on the ranking of the English Web content. In this study, with respect to the specific characteristics of the Persian language, some suitable scenarios are presented for the generation of the click-through features. In this way, a customized version of the MGP-Rank is proposed of the Persian Web retrieval. The evaluation results of this algorithm on the dotIR dataset, indicate its considerable improvement in comparison with major ranking methods. The improvement of the performance is particularly more noticeable in the top part of the search results lists, which are most frequently visited by the Web users. یادگیری رتبه‏بندی، یک رویکرد نو ظهور به منظور رفع چالش‏های موجود و بهبود عملکرد جویشگرهای وب، بسیار امید بخش و کارآمد است. در عین حال عدم توجه جدی به سوابق تعاملات کاربران با جویشگر طی فرآیند جستجو و ارزیابی نتایج بدست آمده، یکی از معضلات جدی آن بشمار می‏رود. در عین حال حجم بسیار زیاد ویژگی‏های مورد نیاز از اسناد و پرس‏وجوهای کاربران نیز کاربردی بودن این رویکرد را در شرایط واقعی با ابهام مواجه ساخته است. استفاده از مدل اطلاعات کلیک از گذر داده‏ها و تولید ویژگی‏های کلیک از گذر داده، راهکار نوینی است که بر مبنای آن و با بکارگیری مدل برنامه‏نویسی ژنتیک چند لایه، مدل رتبه‏بندی مناسبی تحت عنوان MGP-Rank برای بازیابی اطلاعات انگلیسی وب، عرضه شده است. در این پژوهش این، با عنایت به ویژگی‏های خاص زبان فارسی، از طریق ارائه سناریوهای مناسب برای ایجاد ویژگی‏های کلیک از گذر داده این الگوریتم، این الگوریتم بومی‏سازی شده است. نتایج حاصل از ارزیابی عملکرد این الگوریتم در حوزه زبان فارسی با استفاده از مجموعه داده dotIR، حاکی از توانمندی قابل ملاحظه آن نسبت به روش‏های مرجع رتبه‏بندی اطلاعات است. این بهبود عملکرد، بخصوص در بخش ابتدایی فهرست نتایج جستجو که غالباً بیشتر مورد مراجعه کاربران است، قابل توجه است.

http://jour.aicti.ir/en/Article/Download/8179