An Efficient Sentiment Analysis Model for Crime Articles’ Comments using a Fine-tuned BERT Deep Architecture and Pre-Processing Techniques
محورهای موضوعی : Natural Language ProcessingSovon Chakraborty 1 , Muhammad Borhan Uddin Talukdar 2 , Portia Sikdar 3 , Jia Uddin 4
1 - Department of Computer science and Engineering, University of Liberal Arts Bangladesh, Dhaka, Bangladesh
2 - Department of Computer Science and Engineering, Daffodil International University, Savar, Bangladesh
3 - Department of Computer Science and Engineering, North Western University, Khulna, Bangladesh
4 - .Department of AI and Big Data, Woosong University, Daejeon, South Korea
کلید واژه: BERT, BNLP, NLP, Sentiment Analysis, Bangla Sentiment Analysis.,
چکیده مقاله :
The prevalence of social media these days allows users to exchange views on a multitude of events. Public comments on the talk-of-the-country crimes can be analyzed to understand how the overall mass sentiment changes over time. In this paper, a specialized dataset has been developed and utilized, comprising public comments from various types of online platforms, about contemporary crime events. The comments are later manually annotated with one of the three polarity values- positive, negative, and neutral. Before feeding the model with the data, some pre-processing tasks are applied to eliminate the dispensable parts each comment contains. In this study, A deep Bidirectional Encoder Representation from Transformers (BERT) is utilized for sentiment analysis from the pre-processed crime data. In order the evaluate the performance that the model exhibits, F1 score, ROC curve, and Heatmap are used. Experimental results demonstrate that the model shows F1 Score of 89% for the tested dataset. In addition, the proposed model outperforms the other state-of-the-art machine learning and deep learning models by exhibiting higher accuracy with less trainable parameters. As the model requires less trainable parameters, and hence the complexity is lower compared to other models, it is expected that the proposed model may be a suitable option for utilization in portable IoT devices.
The prevalence of social media these days allows users to exchange views on a multitude of events. Public comments on the talk-of-the-country crimes can be analyzed to understand how the overall mass sentiment changes over time. In this paper, a specialized dataset has been developed and utilized, comprising public comments from various types of online platforms, about contemporary crime events. The comments are later manually annotated with one of the three polarity values- positive, negative, and neutral. Before feeding the model with the data, some pre-processing tasks are applied to eliminate the dispensable parts each comment contains. In this study, A deep Bidirectional Encoder Representation from Transformers (BERT) is utilized for sentiment analysis from the pre-processed crime data. In order the evaluate the performance that the model exhibits, F1 score, ROC curve, and Heatmap are used. Experimental results demonstrate that the model shows F1 Score of 89% for the tested dataset. In addition, the proposed model outperforms the other state-of-the-art machine learning and deep learning models by exhibiting higher accuracy with less trainable parameters. As the model requires less trainable parameters, and hence the complexity is lower compared to other models, it is expected that the proposed model may be a suitable option for utilization in portable IoT devices.
[1] S. R. Bandekar and C. Vijayalakshmi, “Design and Analysis of Machine Learning Algorithms for the reduction of crime rates in India,” Procedia Computer Science, vol. 172. Elsevier BV, pp. 122–127, 2020. doi: 10.1016/j.procs.2020.05.018.
[2] M. Pavel Rahman, A. K. M. Ifranul Hoque, Md. Faysal Ahmed, I. Iftekhirul, A. Alam, and N. Hossain, “Bangladesh Crime Reports Analysis and Prediction,” 2021 International Conference on Software Engineering & Computer Systems and 4th International Conference on Computational Science and Information Management (ICSECS-ICOCSIM). IEEE, Aug. 2021. doi: 10.1109/icsecs52883.2021.00089.
[3] A. H. Mohd Hanif, N. Maarop, N. Kamaruddin, and G. N. Samy, “Machine Learning Approach in Predicting Fraudulent Job Advertisement,” International Journal of Academic Research in Business and Social Sciences, vol. 14, no. 1. Human Resources Management Academic Research Society (HRMARS), Jan. 12, 2024. doi: 10.6007/ijarbss/v14-i1/20532.
[4] A. Alzubaidi, “Measuring the level of cyber-security awareness for cybercrime in Saudi Arabia,” Heliyon, vol. 7, no. 1. Elsevier BV, p. e06016, Jan. 2021. doi: 10.1016/j.heliyon.2021.e06016.
[5] S. Lal, L. Tiwari, R. Ranjan, A. Verma, N. Sardana, and R. Mourya, “Analysis and Classification of Crime Tweets,” Procedia Computer Science, vol. 167. Elsevier BV, pp. 1911–1919, 2020. doi: 10.1016/j.procs.2020.03.211.
[6] A. A. Biswas and S. Basak, “Forecasting the Trends and Patterns of Crime in Bangladesh using Machine Learning Model,” 2019 2nd International Conference on Intelligent Communication and Computational Techniques (ICCT). IEEE, Sep. 2019. doi: 10.1109/icct46177.2019.8969031.
[7] F. M. J. Mehedi Shamrat et al., “Sentiment analysis on twitter tweets about COVID-19 vaccines usi ng NLP and supervised KNN classification algorithm,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 23, no. 1. Institute of Advanced Engineering and Science, p. 463, Jul. 01, 2021. doi: 10.11591/ijeecs.v23.i1.pp463-470.
[8] S. Aghababaei and M. Makrehchi, “Mining Social Media Content for Crime Prediction,” 2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI). IEEE, Oct. 2016. doi: 10.1109/wi.2016.0089.
[9] W. Li, L. Zhu, Y. Shi, K. Guo, and E. Cambria, “User reviews: Sentiment analysis using lexicon integrated two-channel CNN–LSTM family models,” Applied Soft Computing, vol. 94. Elsevier BV, p. 106435, Sep. 2020. doi: 10.1016/j.asoc.2020.106435.
[10] Rahman, S., Hemel, J. N., Anta, S. J. A., Al Muhee, H., & Uddin, J. (2018, June). Sentiment analysis using R: An approach to correlate cryptocurrency price fluctuations with change in user sentiment using machine learning. In 2018 Joint 7th International Conference on Informatics, Electronics & Vision (ICIEV) and 2018 2nd International Conference on Imaging, Vision & Pattern Recognition (icIVPR) (pp. 492-497). IEEE.
[11] S. Rahman, J. N. Hemel, S. J. A. Anta, H. Al Muhee, and J. Uddin, “Sentiment analysis using R: An approach to correlate cryptocurrency price fluctuations with change in user sentiment using machine learning,” In Joint 7th International Conference on Informatics, Electronics & Vision (ICIEV) and 2nd International Conference on Imaging, Vision & Pattern Recognition (icIVPR), 2018, pp. 492-497.
[12] M. M. Rahman, Md. Aktaruzzaman Pramanik, R. Sadik, M. Roy, and P. Chakraborty, “Bangla Documents Classification using Transformer Based Deep Learning Models,” 2020 2nd International Conference on Sustainable Technologies for Industry 4.0 (STI). IEEE, Dec. 19, 2020. doi: 10.1109/sti50764.2020.9350394.
[13] M. Singh, A. K. Jakhar, and S. Pandey, “Sentiment analysis on the impact of coronavirus in social life using the BERT model,” Social Network Analysis and Mining, vol. 11, no. 1. Springer Science and Business Media LLC, Mar. 19, 2021. doi: 10.1007/s13278-021-00737-z.
[14] Z. Gao, A. Feng, X. Song, and X. Wu, “Target-Dependent Sentiment Classification With BERT,” IEEE Access, vol. 7. Institute of Electrical and Electronics Engineers (IEEE), pp. 154290–154299, 2019. doi: 10.1109/access.2019.2946594.
[15] C. Sun, L. Huang, and X. Qiu, “Utilizing,” Proceedings of the 2019 Conference of the North. Association for Computational Linguistics, 2019. doi: 10.18653/v1/n19-1035.
[16] S. Xie, J. Cao, Z. Wu, K. Liu, X. Tao, and H. Xie, “Sentiment Analysis of Chinese E-commerce Reviews Based on BERT,” 2020 IEEE 18th International Conference on Industrial Informatics (INDIN). IEEE, Jul. 20, 2020. doi: 10.1109/indin45582.2020.9442190.
[17] Biswas, A., Chakraborty, S., Rifat, A. N. M. Y., Chowdhury, N. F., & Uddin, J. (2020, August). Comparative Analysis of Dimension Reduction Techniques Over Classification Algorithms for Speech Emotion Recognition. In International Conference for Emerging Technologies in Computing (pp. 170-184). Springer, Cham.
[18] S. Thurner, R. Hanel, B. Liu and B. Corominas-Murtra “Understading Zipf’s law of word frequencies through sample space collapse in sentence formation,” Journal of The Royal Society Interface, vol. 12, no. 108, The Royal Society, p. 2-150330, Jul 2015, doi: 10.1098/rsif.2015.0330.
[19] S. Nakagawa, P. C. D. Johnson, and H. Schielzeth, “The coefficient of determination R 2 and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded,” Journal of The Royal Society Interface, vol. 14, no. 134. The Royal Society, p. 20170213, Sep. 2017. doi: 10.1098/rsif.2017.0213.
[20] H. Jing, C. Wang, L. Cheng, J. Qi, S. Jiang, and X. Zhang, “Automatic Development of Knowledge Graph Based on NLTK and Sentence Analysis,” 2021 3rd International Conference on Natural Language Processing (ICNLP). IEEE, Mar. 2021. doi: 10.1109/icnlp52887.2021.00015.
[21] S. Ezhilarasi and P. U. Maheswari, “Depicting a Neural Model for Lemmatization and POS Tagging of Words from Palaeographic Stone Inscriptions,” 2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS). IEEE, May 06, 2021. doi: 10.1109/iciccs51141.2021.9432315.
[22] G. Y. Annum, “A Basic Strategy for Incorporating Lecture Notes with Audio-Visuals of Practical Activities to Foster Online Electronic Learning Implementation in Studio or Laboratory-Based Institutions,” Creative Education, vol. 14, no. 07. Scientific Research Publishing, Inc., pp. 1421–1439, 2023. doi: 10.4236/ce.2023.147090.
[23] Lu, S., Wang, M., Liang, S., Lin, J., & Wang, Z. (2020, September). Hardware accelerator for multi-head attention and position-wise feed-forward in the transformer. In 2020 IEEE 33rd International System-on-Chip Conference (SOCC) (pp. 84-89). IEEE.25. M. A. Rahman and E. Kumar Dey, “Datasets for aspect-based sentiment analysis in bangla and its baseline evaluation,” Data, vol. 3, no. 2, pp. 1-15.
[24] S. Chowdhury and W. Chowdhury, “Performing sentiment analysis in Bangla microblog posts,” 2014 International Conference on Informatics, Electronics & Vision (ICIEV). IEEE, May 2014. doi: 10.1109/iciev.2014.6850712.
[25] M. H. Munna, M. R. I. Rifat, and A. S. M. Badrudduza, “Sentiment Analysis and Product Review Classification in E-commerce Platform,” 2020 23rd International Conference on Computer and Information Technology (ICCIT). IEEE, Dec. 19, 2020. doi: 10.1109/iccit51783.2020.9392710.
[26] Md. H. Alam, M.-M. Rahoman, and Md. A. K. Azad, “Sentiment analysis for Bangla sentences using convolutional neural network,” 2017 20th International Conference of Computer and Information Technology (ICCIT). IEEE, Dec. 2017. doi: 10.1109/iccitechn.2017.8281840.
[27] D. Sharma, M. Sabharwal, V. Goyal, and M. Vij, “Sentiment Analysis Techniques for Social Media Data: A Review,” First International Conference on Sustainable Technologies for Computational Intelligence. Springer Singapore, pp. 75–90, Nov. 02, 2019. doi: 10.1007/978-981-15-0029-9_7.