بهبود رتبه‌بندی با استفاده از BERT

الموضوعات : electrical and computer engineering

شکوفه بستان ¹ , علی محمد زارع بیدکی ² , محمدرضا پژوهان ³

1 - دانشكده مهندسی كامپيوتر، دانشگاه یزد، ایران
2 - دانشكده مهندسی كامپيوتر، دانشگاه یزد، ایران
3 - دانشكده مهندسی كامپيوتر، دانشگاه یزد، ایران

تاريخ الإرسال : 14 الأحد , ذو الحجة, 1444 تاريخ التأكيد : 07 الأربعاء , جمادى الثانية, 1445 تاريخ الإصدار : 01 الأحد , محرم, 1446

الکلمات المفتاحية: بردار معنایی, درون‌سازی واژه, رتبه‌بندی, یادگیری عمیق,

ملخص المقالة :

رتبه‌بندی کارآمد اسناد در عصر اطلاعات امروز، نقش مهمی در سیستم‌های بازیابی اطلاعات ایفا می‌کند. این مقاله یک رویکرد جدید برای رتبه‌بندی اسناد با استفاده از مدل‌های درون‌سازی با تمرکز بر مدل زبانی BERT برای بهبود نتایج رتبه‌بندی ارائه می‌کند. رویکرد پیشنهادی از روش‌های درون‌سازی واژگان برای به‌تصویرکشیدن نمایش‌های معنایی پرس‌وجوهای کاربر و محتوای سند استفاده می‌کند. با تبدیل داده‌های متنی به بردارهای معنایی، ارتباط و شباهت بین پرس‌و‌جوها و اسناد تحت روابط رتبه‌بندی پیشنهادی با هزینه کمتر مورد ارزیابی قرار می‌گیرد. روابط رتبه‌بندی پیشنهادی عوامل مختلفی را برای بهبود دقت در نظر می‌گیرند که این عوامل شامل بردارهای درون‌سازی واژگان، مکان واژگان کلیدی و تأثیر واژگان باارزش در رتبه‌بندی بر مبنای بردارهای معنایی است. آزمایش‌ها و تحلیل‌های مقایسه‌ای برای ارزیابی اثربخشی روابط پیشنهادی اعمال گردیده است. نتایج تجربی، اثربخشی رویکرد پیشنهادی را با دستیابی به دقت بالاتر در مقایسه با روش‌های رتبه‌بندی رایج نشان می‌دهند. این نتایج بیانگر آن مسئله است که استفاده از مدل‌های درون‌سازی و ترکیب آن در روابط رتبه‌بندی پیشنهادی به‌طور قابل توجهی دقت رتبه‌بندی را تا 87/0 در بهترین حالت بهبود می‌بخشد. این بررسی به بهبود رتبه‌بندی اسناد کمک می‌کند و پتانسیل مدل درون‌سازی BERT را در بهبود عملکرد رتبه‌بندی نشان می‌دهد.

المصادر:

[1] Y. Yum, et al., "A word pair dataset for semantic similarity and relatedness in Korean medical vocabulary: reference development and validation," JMIR Medical Informatics, vol. 9, no. 6, Article ID: e29667, Jun. 2021.
[2] E. Hindocha, V. Yazhiny, A. Arunkumar, and P. Boobalan, "Short-text semantic similarity using GloVe word embedding," International Research J. of Engineering and Technology, vol. 6, no. 4, pp. 553-558, Apr. 2019.
[3] J. Zhang, Y. Liu, J. Mao, W. Ma, and J. Xu, "User behavior simulation for search result re-ranking," ACM Trans. on Information Systems, vol. 41, no. 1, Article ID: 5, 35 pp., Jan. 2023.
[4] V. Zosimov and O. Bulgakova, "Usage of inductive algorithms for building a search results ranking model based on visitor rating evaluations," in Proc. IEEE 13th Int. Scientific and Technical Conf. on Computer Sciences and Information Technologies, CSIT'18, pp. 466-469, Lviv, Ukraine, 11-14 Sept. 2018.
[5] B. Mitra and N. Craswell, Neural Models for Information Retrieval, arXiv preprint arXiv:1705.01509, vol. 1, 2017.
[6] V. Gupta, A. Dixit, and S. Sethi, "A comparative analysis of sentence embedding techniques for document ranking," J. of Web Engineering, vol. 21, no. 7, pp. 2149-2186, 2022.
[7] J. Pennington, R. Socher, C. Ma, and C. Manning, "GloVe: global vectors for word representation," in Proc. Conf. on Empirical Methods in Natural Language Processing, EMNLP'14, pp. 1532-1543, Doha, Qatar, 25-29 Oct. 2014.
[8] T. Mikolov, K. Chen, G. Corrado, and J. Dea, "Efficient estimation of word representations in vector space," in Proc. In. Conf. on Learning Representations, ICLR'13, 12 pp., Scottsdale, AZ, USA, 2-4 May 2013.
[9] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, "Enriching word vectors with subword information," Trans. of the Association for Computational Linguistics, vol. 5, pp. 135-146, 2017.
[10] M. E. Peters, et al., "Deep contextualized word representations," in Proc. Conf. of the North American Chapter of the Association of Computational Linguistics, NAACL-HLT'18, 11 pp., New Orleans, LA, USA, 1-6 Jun. 2018.
[11] J. Devlin, M. W. Chang, and K. L. Kristina, "BERT: pre-training of deep bidirectional transformers for language understanding," in Proc. Conf. of the North American Chapter of the Association of Computational Linguistics, NAACL-HLT'18, 16 pp., New Orleans, LA, USA, 1-6 Jun. 2018.
[12] T. Brown, et al., "Language models are few-shot learners," in Proc. 34th Conf. on Neural Information Processing Systems, NeurIPS'20, 25 pp., Vancouver, Canada, 6-12 Dec. 2020.
[13] P. Sherki, S. Navali, and R. Inturi, "Retaining semantic data in binarized word embedding," in ¬Proc. IEEE 15th Int. Conf. on Semantic Computing, ICSC'21, pp. 130-133, Laguna Hills, CA, USA, 27-29 Jan. 2021.
[14] L. Shaohua, C. Tat-Seng, Z. Jun, and C. Miao, Generative Topic Embedding: A Continuous Representation of Documents (Extended Version with Proofs), arXiv preprint arXiv:1606.02979, vol. 1, 2016.
[15] B. Mitra, E. Nalisnick, N. Craswell, and R. Caruana, "A dual embedding space model for document ranking," in Proc. 25th Int. Conf. Companion on World Wide Web, WWW'16, 10 pp., Montreal, Canada, 11-15 Apr. 2016.
[16] M. Dehghani, H. Zamani, A. Severyn, and J. Kamps, "Neural ranking models with weak supervision," in Proc. of the 40th Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, SIGIR '17, pp. 65-74, Tokyo, Japan, 7-11 Aug. 2017.
[17] C. Xiong, Z. Dai, and J. Callan, "End-to-end neural ad-hoc ranking with kernel pooling," in Proc. of the 40th Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 55-64, Tokyo, Japan, 7-11 Aug. 2017.
[18] R. Brochier, A. Guille, and J. Velcin, "Global vectors for node representations," in Proc. ACM World Wide Web Conf., WWW'19, San Francisco, pp. 2587-2593, San Francisco, CA, USA, 13-17 May 2019.
[19] A. Gourru and J. Velcin, "Gaussian embedding of linked documents from a pretrained semantic space," in Proc. 29th Int. Joint Conf. on Artificial Intelligence, IJCAI'20, pp. 3912-3918, Yokohama, Japan, 8-10 Jan. 2021.
[20] R. Menon, J. Kaartik, and K. Nambiar, "Improving ranking in document based search systems," in Proc. 4th Int. Conf. on Trends in Electronics and Informatics, ICOEI'20, pp. 914-921, Tirunelveli, India, 15-17 Jun. 2020.
[21] J. Li, C. Guo, and Z. Wei, "Improving document ranking with relevance-based entity embeddings," in Proc. 8th Int. Conf. on Big Data and Information Analytics, BigDIA'22, China, pp. 186-192, Guiyang, China, 24-25 Aug. 2022.
[22] S. Han, X. Wang, M. Bendersky, and M. Najork, Learning-to-Rank with BERT in TF-Ranking, Google Research Tech Report, 2020.
[23] ش. بستان، ع. زارع بیدکی و م. ر. پژوهان، "درون¬سازی معنایی واژه¬ها با استفاده از BERT روی وب فارسی،" نشریه مهندسی برق و مهندسی کامپیوتر ایران، ب- مهندسی کامپیوتر، سال 21، شماره 2، صص. 100-89، تابستان 1402.
[24] M. Farahani, M. Gharachorloo, M. Farahani, and M. Manthouri, "Parsbert: transformer-based model for Persian language understanding," Neural Processing Letters, vol. 53, pp. 3831-3847, 2021.
[25] D. Yang and Y. Yin, "Evaluation of taxonomic and neural embedding methods for calculating semantic similarity," Natural Language Engineering, vol. 28, no. 6, pp. 733-761, Nov. 2022.
[26] R. Mihalcea, C. Corley, and C. Strapparava, "Corpus-based and knowledge-based measures of text semantic similarity," in Proc. 21st National Conf. on Artificial Intelligence, vol. 1, pp. 775-780, Boston, MA, USA, 16-20 Jul. 2006.
[27] K. Jarvelin and J. Kekalainen, "Cumulated gain-based evaluation of IR techniques," ACM Trans. on Information Systems, vol. 20, no. 4, pp. 422-446, Oct. 2002.

شارک

عنوان URL للمقالة

بهبود رتبه‌بندی با استفاده از BERT

رایمگ

الروابط

المراكز ذات الصلة

دعامة

الصفحات الرسمية