Semantic Word Embedding Using BERT on the Persian Web
Subject Areas : electrical and computer engineeringshekoofe bostan 1 , Ali-Mohammad Zare-Bidoki 2 , mohamad reza pajohan 3
1 - Yazd University
2 - Associate Professor
3 - Yazd University
Keywords: Semantic vector, word embedding, ranking, deep learning,
Abstract :
Using the context and order of words in sentence can lead to its better understanding and comprehension. Pre-trained language models have recently achieved great success in natural language processing. Among these models, The BERT algorithm has been increasingly popular. This problem has not been investigated in Persian language and considered as a challenge in Persian web domain. In this article, the embedding of Persian words forming a sentence was investigated using the BERT algorithm. In the proposed approach, a model was trained based on the Persian web dataset, and the final model was produced with two stages of fine-tuning the model with different architectures. Finally, the features of the model were extracted and evaluated in document ranking. The results obtained from this model are improved compared to results obtained from other investigated models in terms of accuracy compared to the multilingual BERT model by at least one percent. Also, applying the fine-tuning process with our proposed structure on other existing models has resulted in the improvement of the model and embedding accuracy after each fine-tuning process. This process will improve result in around 5% accuracy of the Persian web ranking.
[1] A. Bidoki, Effective Web Ranking and Crawling, Ph.D. Thesis, University of Tehran, 2009.
[2] W. Qader, M. Ameen, and B. Ahmed, "An overview of bag of words; importance, implementation, applications, and challenges," in Proc. IEEE Int. Engineering Conf., IEC'19, pp. 200-204, Erbil, Iraq, 23-25 Jun. 2019.
[3] G. Salton and C. Buckley, "Term-weighting approaches in automatic text retrieval," Information Processing & Management, vol. 24, no. 5, pp. 513-523, 1988.
[4] Y. Benjio and R. Ducharme, "A neural probabilistic language model," The J. of Machine Learning Research, vol. 3, pp. 1137-1155, 2003.
[5] T. Mikolov, K. Chen, G. Corrado, and J. Dea, "Efficient estimation of word representations in vector space," in Proc. Int. Conf. on Learning Representations, ICLR'13, pp. 1137-1155, Scottsdale, AZ, USA, 2-4 May 2013.
[6] T. Mikolov, I. Sutskever, K. Chen, and G. Corr, "Distributed representations of words and phrases and their compositionality," In C. J. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger (ed.), Annual Conf. on Neural Information Processing Systems, NIPS'13, vol. 2, pp. 3111-3119, Lake Tahoe, NV, USA, 5-10 Dec. 2013.
[7] J. Pennington, R. Socher, C. Ma, and C. Manning, "GloVe: global vectors for word representation," in Proc. Conf. on Empirical Methods in Natural Language Processing, EMNLP'14, pp. 1532-1543, Doha, Qatar, Oct. 2014.
[8] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, "Enriching word vectors with subword information," Trans. of the Association for Computational Linguistics (TACL), vol. 5, pp. 135-146, 2017.
[9] S. Pan and Q. Yang, "A survey on transfer learning," IEEE Trans. on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345-1359, Oct. 2010.
[10] M. Peters, et al., "Deep contextualized word representations," in Proc. Conf. of the North American Chapter of the Association of Computational Linguistics, HLT-NAACL'18, vol. 1, pp. 2227-2237, New Orleans, LA, USA, Jun. 2018.
[11] J. Devlin, M. Chang, and K. Kristina, "BERT: pre-training of deep bidirectional transformers for language understanding," in Proc. Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, HLT-NAACL'19, pp. 4171-4186, Minneapolis, MN, USA, 2-7 Jun. 2019.
[12] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, Improving Language Understanding by Generative Pre-Training, Technical Report, OpenAI, 11 Jun. 2018.
[13] S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, vol. 9, no. 8, pp. 1735-1780, Nov. 1997.
[14] T. Mikolov, S. Kombrink, L. Burget, and J. Cernocky, "Extensions of recurrent neural network language model," in Proc. IEEE Int. Speech and Signal Processing, ICASSP'11, pp. 5528-5531, Prague, Czech Republic, 22-27 May 2011.
[15] M. Schuster and K. Paliwal, "Bidirectional recurrent neural networks," IEEE Trans. on Signal Processing, vol. 45, no. 11, pp. 2673-2681, Nov. 1997.
[16] A. Vaswani, et al., "Attention is all you need," In Proc. 31st Annual Conf. on Neural Information Processing Systems, NIPS'17, 11 pp., Long Beach, CA, USA, 4-9 Dec. 2017.
[17] Z. Lan, et al., A Lite BERT for Self-Supervised Learning of Language Representations, arXiv preprint arXiv:1909.11942, 2019.
[18] Y. Liu, et al., A Robustly Optimized BERT Pretraining Approach, arXiv preprint arXiv:1907.11692, 2019.
[19] V. Sanh, L. Debut, J. Chaumond, and T. Wolf, DistilBERT, A Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter, arXiv preprint arXiv:1910.01108, 2019.
[20] M. Farahani, M. Gharachorloo, M. Farahani, and M. Manthouri, " ParsBERT: transformer-based model for persian language understanding," Neural Processing Letters, vol. 53, pp. 3831-3847, 2021.
[21] BERT, "huggingface," 2018. Available: https://huggingface.co/docs/transformers/.
[22] C. Sun, X. Qiu, Y. Xu, and X. Huang, "How to fine-tune BERT for text classification?" in Proc. China National Conf. on Chinese Computational Linguistics, CCL'19, pp. 194-206, Kunming, China, 18-20 Oct. 2019.
[23] D. Viji and S. Revathy, "A hybrid approach of weighted fine-tuned BERT extraction with deep siamese bi-LSTM model for semantic text similarity identification," Multimedia Tools and Applications, vol. 81, pp. 6131-6157, 2022.
[24] A. Agarwal and P. Meel, "Stacked bi-LSTM with attention and contextual BERT embeddings for fake news analysis," in Proc. 7th Int. Conf. on Advanced Computing and Communication Systems, ICACCS'21, pp. 233-237, Coimbatore, India, 19-20 Mar. 2021.
[25] K. Jarvelin and J. Kekalainen, "Cumulated gain-based evaluation of IR techniques," ACM Trans. on Information Systems, vol. 20, no. 4, pp. 422-446, Oct. 2002.