اعتقادیابی متون فارسی بر اساس یادگیری عمیق با تفکیک احساس-کلمه
حسین علی کرمی
1
(
دانشکده مهندسی کامپیوتر، دانشگاه آزاد تهران شمال، تهران، ایران.
)
امیرمسعود بیدگلی
2
(
دانشکده مهندسی کامپیوتر، دانشگاه آزاد تهران شمال، تهران، ایران.
)
حمید حاج سیدجوادی
3
(
Shahed University
)
کلید واژه: اعتقادکاوی, پردازش زبان طبیعی(NLP), یادگیری عمیق, متن کاوی,
چکیده مقاله :
اعتقادکاوی یا طبقه بندی متون بر اساس احساس و عقیده کاربران در وبسایت ها و رسانه های اجتماعی به مردم، شرکت ها و سازمان ها کمک میکند تا بتوانند تصمیم گیری های مهم را انجام دهند. اعتقادکاوی شامل یک سیستم برای تحلیل عقاید و احساسات مردم درباره یک موجودیت مانند محصولات، افراد، سازمان ها با توجه به نظرات، پیام ها و توییت های کاربران در رسانه های اجتماعی می باشد. در این مقاله اعتقادکاوی متون فارسی بر اساس پیام ها، نظرات و توییت های کابران در رسانه اجتماعی و وبسایت های ۴ مجموعه داده با استفاده از دو روش یادگیری عمیق CNN , LSTM با در نظر گرفتن احساس کلمه، در دو قطب مثبت و منفی با بازه ۲- و ۲+ طبقه بندی شده اند. در روش پیشنهادی ابتدا فرآیند پیشپردازش دادهها بر اساس تبدیل کاراکتر به عدد، حذف لیست واژه های اضافی و تحلیل چند واژهای انجام میشود، سپس جهت اعتقادکاوی و طبقهبندی متون فارسی با الگوریتم یادگیری ماشین CNN , LSTM با تفکیک احساس کلمه (WSD) استفاده میشود تا شدت احساسات را با توجه به کلمات تشخیص دهد . مدل پیشنهادی را CNN_WSD و LSTM_WSD می نامیم. در روش پیشنهادی مجموعه داده های فارسی توییتر برای ارزیابی استفاده شده و سپس با سایر روش های یادگیری ماشین و یادگیری عمیق DNN, CNN, LSTM مقایسه می شود، در پیاده سازی این روش از نرم افزار متلب python استفاده شده است. میزان دقت روش پیشنهادی برای LSTM-WSD و CNN-WSD به ترتیب 95.8 و 94.3 درصد است.
چکیده انگلیسی :
Belief analysis or the classification of texts based on the feelings and opinions of users on websites and social media helps people, companies and organizations to make important decisions. Belief mining includes a system for analyzing people's opinions and feelings about an entity such as products, people, organizations, according to the opinions, messages and tweets of users in social media. In this article, the belief analysis of Persian texts based on the messages, comments and tweets of users in social media and websites of 4 datasets using two deep learning methods, CNN, LSTM, taking into account the sense of the word, in two poles, positive and negative with intervals. 2- and 2+ are classified. In the proposed method, first the process of data pre-processing based on character to number conversion, removing the list of extra words and multi-word analysis is done, then for belief analysis and classification of Persian texts CNN, LSTM machine learning algorithm with word sense separation (WSD) is used to Recognize the intensity of emotions according to the words. We call the proposed model CNN_WSD and LSTM_WSD. In the proposed method, the Persian Twitter dataset is used for evaluation and then it is compared with other machine learning and deep learning methods, DNN, CNN, LSTM, in the implementation of this method, python software is used. The accuracy rate of the proposed method for LSTM-WSD and CNN-WSD is 95.8 and 94.3%, respectively.
1. Abid, F.; Alam, M.; Yasir, M.; Li, C.J. Sentiment analysis through recurrent variants latterly on convolutional neural network of Twitter. Future Gener. Comput. Syst. 2019, 95, 292–308.
2. Alharbi, A.S.M.; de Doncker, E. Twitter sentiment analysis with a deep neural network: An enhanced approach using user behavioral information. Cogn. Syst. Res. 2019, 54, 50–61.
3. H. Alikarami, A. M. Bidgoli and H. H. S. Javadi, (2023), "Belief Mining in Persian Texts Based on Deep Learning and Users' Opinions (revised December 2022)," in IEEE Transactions on Affective Computing, doi: 10.1109/TAFFC.2023.3288407.
4. Alikarami, H. and Khadem, F., (2016), Data Mining Using Genetic Algorithms and Cellular Learning Automata Based on Factor Analysis and Cluster Analysis, 1stInternational Conference on New Research Achievements in Electrical and Computer Engineering, Tehran, Iran.
5. Available online: http://alt.qcri.org/semeval2017/ (accessed on 12 March 2020).
6. Available online: http://help.sentiment140.com/site-functionality (accessed on 12 March 2020).
7. Available online: http://www.cs.cornell.edu/people/pabo/movie-review-data/ (accessed on 12 March 2020).
8. Available online: https://www.kaggle.com/c/word2vec-nlp-tutorial/data (accessed on 12 March 2020).
9. Available online: https://www.kaggle.com/crowdflower/twitter-airline-sentiment (accessed on 12March 2020).
10. Barushka, A., Hajek, P.: Spam detection on social networks using cost-sensitive feature selection and ensemble-based regularized deep neural networks. Neural Comput. Appl. 1–19 (2020)
11. Basiri, M. E., Nilchi, A. R. N. & Ghassem-aghaee, N., (2014). A Framework for Sentiment Analysis in Persian.
12. Basiri, M.E. and kabiri, A., (2018), Words Are Important: Improving Sentiment Analysis in the Persian Language by Lexicon Refining, ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), Vol 17(4), pp. 1145-1154.
13. Cach Dang, N. , Moreno-García, M.N. and De la Prieta, F., (2020), Sentiment Analysis Based on Deep Learning: A Comparative Study, Electronics 2020, 9, 483; doi:10.3390/electronics9030483.
14. Catal, C., Nangir, M.: A sentiment classification model based on multiple classifiers. Appl. Soft Comput. 50, 135–141 (2017)
15. Chen, X., Xue, Y., Zhao, H., Lu, X., Hu, X., Ma, Z.: A novel feature extraction methodology for sentiment analysis of product reviews. Neural Comput. Appl. 31(10), 6625–6642 (2019)
16. Chen, Z.; Liu, B. Lifelong machine learning. Synth. Lect. Artif. Intell. Mach. Learn. 2018, 12, 1–207. [CrossRef]
17. Dashtipour, K. et al., (2018). Exploiting Deep Learning for Persian Sentiment Analysis. s.l., s.n.
18. Dastgheib, M.B. and Koleini, S., (2019), Persian Text Classification Enhancement by Latent Semantic Space, International Journal of Information Science and Management, Vol 17(1), pp. 33-46.
19. Do, H.H., Prasad, P.W.C., Maag, A., Alsadoon, A.: Deep learning for aspect-based sentiment analysis: a comparative review. Expert Syst. Appl. 118, 272–299 (2019)
20. Do, H.H.; Prasad, P.; Maag, A.; Alsadoon, A.J. Deep Learning for Aspect-Based Sentiment Analysis: A Comparative Review. Expert Syst. Appl. 2019, 118, 272–299. [CrossRef]
21. Du, C. and Huang, L., (2018), Text Classification Research with Attention-based Recurrent Neural Networks, International Journal of Computers Communications & Control, ISSN 1841-9836, 13(1),pp. 50-61.
22. Fang, Y., Tan, H. and Zhang, J., (2018), Multi-Strategy Sentiment Analysis of Consumer Reviews Based on Semantic Fuzziness, IEEE. Translations and content mining are permitted for academic research only, Vol 6, pp.20625-20631.
23. Ferrara, E., Varol, O., Davis, C., Menczer, F., and Flammini, A., (2016), ‘‘The rise of social bots,’’ Commun. ACM, vol. 59, no. 7, pp. 96–104.
24. Hajek P., Barushka A., Munk M. (2020) Opinion Mining of Consumer Reviews Using Deep Neural Networks with Word-Sentiment Associations. In: Maglogiannis I., Iliadis L., Pimenidis E. (eds) Artificial Intelligence Applications and Innovations. AIAI 2020. IFIP Advances in Information and Communication Technology, vol 583. Springer, Cham. https://doi.org/10.1007/978-3-030-49161-1_35.
25. Hassan, A. and Mahmood, A., (2018), Convolutional Recurrent Deep Learning Model for Sentence Classification, IEEE, Vol 6, pp. 13949 – 13957.
26. Hosseini, P. et al., 2018. SentiPers: A Sentiment Analysis Corpus for Persian. arXiv.
27. Jason Wei and Kai Zou. 2019. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks. arXiv e-prints, page arXiv:1901.11196.
28. Jeong, B.; Yoon, J.; Lee, J.-M. Social media mining for product planning: A product opportunity mining approach based on topic modeling and sentiment analysis. Int. J. Inf. Manag. 2019, 48, 280–290. [CrossRef]
29. Johnson, R., Zhang, T.: Effective use of word order for text categorization with convolutional neural networks. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 103–112 (2015)
30. Joseph Turian, Lev-Arie Ratinov, and Yoshua Bengio. 2010. Word representations: A simple and general method for semi-supervised learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 384–394, Uppsala, Sweden. Association for Computational Linguistics.
31. Kausar, S., Huahu, X., Shabir, M.Y., Ahmad, W.: A sentiment polarity categorization technique for online product reviews. IEEE Access 8, 3594–3605 (2019)
32. Kim, Y., 2014. Convolutional Neural Networks for Sentence Classification. Doha, Qatar, s.n.
33. Kraus, M.; Feuerriegel, S. Sentiment analysis based on rhetorical structure theory: Learning deep neural networks from discourse trees. Expert Syst. Appl. 2019, 118, 65–79.
34. Kumar, S.; Gahalawat, M.; Roy, P.P.; Dogra, D.P.; Kim, B.-G.J.E. Exploring Impact of Age and Gender on Sentiment Analysis Using Machine Learning. Electronics 2020, 9, 374.
35. LeCun, Y., Bengio, Y. & Hinton, G., 2015. Deep learning. Nature, Volume 521, pp. 436-444.
36. Li, L.; Goh, T.-T.; Jin, D. How textual quality of online reviews a_ect classification performance: A case of deep learning sentiment analysis. Neural Comput. Appl. 2018, 1–29.
37. Liu, B., 2012. Sentiment Analysis and Opinion Mining. Synthesis lectures on human language technologies, pp. 1-167.
38. Maas, A.L.; Daly, R.E.; Pham, P.T.; Huang, D.; Ng, A.Y.; Potts, C. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, Portland, OR, USA, 19–24 June 2011; pp. 142–150.
39. Marzieh Fadaee, Arianna Bisazza, and Christof Monz. 2017. Data Augmentation for Low-Resource Neural Machine Translation. arXiv e-prints, page arXiv:1705.00440.
40. Mousavirad, S.J. and Ebrahimpour-Komleh, H., (2014), Wrapper Feature Selection using Discrete Cuckoo Optimization Algorithm, Austrian E-Journals of Universal Scientific Organization, Vol. 4(11), Apr, pp. 709-721.
41. Onan, A.: Deep learning based sentiment analysis on product reviews on Twitter. In: Younas, M., Awan, I., Benbernou, S. (eds.) Innovate-Data 2019. CCIS, vol. 1054, pp. 80–91. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-27355-2_6
42. Piryani, R., Madhavi, D. and Singh, V.K., (2017), “Analytical mapping of opinion mining and sentiment analysis research during 2000–2015,” Information Processing & Management, vol. 53, no. 1, pp. 122–150.
43. Qiu, L. and Li, J., (2018), Sentiment analysis of short texts in microblog based on ependency parsing, springer: Cluster Computing, Volume 21, Issue 1, pp 985-995.
44. Roustaei, A. and Rastegari, H., (2018), Persian question classification using headword and semantic features, IEEE, Journal of Theoretical and Applied Information Technology, Vol 96(21), pp. 7206-7214.
45. Schmitt, M.; Steinheber, S.; Schreiber, K.; Roth, B. Joint Aspect and Polarity Classification for Aspect-based Sentiment Analysis with End-to-End Neural Networks. arXiv 2018, arXiv:1808.09238.
46. Shams, M., Shakery, A. & Faili, H., (2012). A non-parametric LDA-based induction method for sentiment analysis. Shiraz, Iran, s.n.
47. Shayaa, S. and et al., (2018), Sentiment Analysis of Big Data: Methods, Applications, and Open Challenges, IEEE. Translations and content mining are permitted for academic research only,Vol 6, pp. 37807-37827.
48. Singh, V.K.; Mukherjee, M.; Mehta, G.K. Combining collaborative filtering and sentiment classification for improved movie recommendations. In Proceedings of the International Workshop on Multi-disciplinary Trends in Artificial Intelligence, Hyderabad, India, 7–9 December 2011; pp. 38–50.
49. Singhal, P.; Bhattacharyya, P. Sentiment Analysis and Deep Learning: A Survey; Center for Indian Language Technology, Indian Institute of Technology: Bombay, Indian, 2016.
50. Sohrabi, M.K. and Roshani, R., (2017), Frequent itemset mining using cellular learning automata, Computers in Human Behavior, Vol 68, pp. 244-253.
51. Stai, E.; Kafetzoglou, S.; Tsiropoulou, E.E.; Papavassiliou, S.J. A holistic approach for personalization, relevance feedback & recommendation in enriched multimedia content. Multimed. Tools Appl. 2018, 77, 283–326.
52. Tang, D., Qin, B., Liu, T.: Document modelling with gated recurrent neural network for sentiment classification. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1422–1432 (2015)
53. Urologin, S., (2018), Sentiment Analysis Visualization and Classification of Summarized News Articles: A Novel Approach, (IJACSA) International Journal of Advanced Computer Science and Applications,Vol. 9, No. 8, pp. 616-624.
54. Wang, Y.; Wang, M.; Xu, W. A sentiment-enhanced hybrid recommender system for movie recommendation A big data analytics framework. Wirel. Commun. Mob. Comput. 2018, 2018. [CrossRef]
55. Woolley, S.C., (2016), ‘‘Automating power: Social bot interference in global politics,’’ First Monday, vol. 21, no. 4.
56. Wu, C.; Wu, F.; Wu, S.; Yuan, Z.; Liu, J.; Huang, Y. Semi-supervised dimensional sentiment analysis with variational autoencoder. Knowl. Based Syst. 2019, 165, 30–39.
57. Yang, C.; Zhang, H.; Jiang, B.; Li, K.J. Aspect-based sentiment analysis with alternating coattention networks. Inf. Process. Manag. 2019, 56, 463–478. [CrossRef]
58. Yao, Q.Z., Song, Z.L. and Peng, C., (2011), Research on text categorization based on LDA, Computer Engineering and Applications, Vol 47(13), pp. 150–153.