Using Sentiment Analysis and Combining Classifiers for Spam Detection in Twitter
Subject Areas :mehdi salkhordeh haghighi 1 , Aminolah Kermani 2
1 -
2 - Sadjad University
Keywords: Spam Detection, Twitter, Word Embedding, Convolutional neural network, Deep learning, sentiment analysis, Ensemble Learning,
Abstract :
The welcoming of social networks, especially Twitter, has posed a new challenge to researchers, and it is nothing but spam. Numerous different approaches to deal with spam are presented. In this study, we attempt to enhance the accuracy of spam detection by applying one of the latest spam detection techniques and its combination with sentiment analysis. Using the word embedding technique, we give the tweet text as input to a convolutional neural network (CNN) architecture, and the output will detect spam text or normal text. Simultaneously, by extracting the suitable features in the Twitter network and applying machine learning methods to them, we separately calculate the Tweeter spam detection. Eventually, we enter the output of both approaches into a Meta Classifier so that its output specifies the final spam detection or the normality of the tweet text. In this study, we employ both balanced and unbalanced datasets to examine the impact of the proposed model on two types of data. The results indicate an increase in the accuracy of the proposed method in both datasets.
[1] Top Sites. Alexa Internet. Archived from the original on 23 August 2019. Retrieved May 13, 2013
[2] Twitter overcounted active users since 2014, shares surge on profit hopes, USA Today, Archived from the original on 1 January 2020. Retrieved 4 November 2019
[3] “California business and professions code". Spamlaws. Retrieved 2013-09-03.
[4] Grier, C., Thomas, K., Paxson, V., & Zhang, M., Spam: the underground on 140 characters or less. In Proceedings of the 17th ACM conference on Computer and communications security, 2010, pp. 27-37.
[5] Gheewala, S., & Patel, R. Machine learning based Twitter Spam account detection: a review. Second International Conference on Computing Methodologies and Communication (ICCMC), 2018, pp. 79-84.
[6] Patil, D. R., & Patil, J. B., Malicious URLs detection using decision tree classifiers and majority voting technique. Cybernetics and Information Technologies, 18(1), 2018, pp. 11-29.
[7] Thomas K, Grier C, Ma J, Paxson V, Song D. Design and evaluation of a real-time url spam filtering service, in IEEE Symposium on Security and Privacy, IEEE, 2011, pp. 447–62.
[8] Yang C, Harkreader R, Gu G. Empirical evaluation and new design for fighting evolving twitter spammers. IEEE Trans InfForensics Secur 2013, Vol 8(8), pp 1280–93.
[9] Chen, C., Zhang, J., Xie, Y., Xiang, Y., Zhou, W., Hassan, M. M. Alrubaian, M., A performance evaluation of machine learning-based streaming spam tweets detection. IEEE Transactions on Computational social systems, 2015, Vol 2(3), pp. 65-76.
[10] Wang, B., Zubiaga, A., Liakata, M., & Procter, R., Making the most of tweet-inherent features for social spam detection on Twitter. arXiv preprint arXiv:1503.07405, 2015.
[11] X. Zhang, Y. Wang, N. Mou, and W. Liang, “Propagating both trust and distrust with target differentiation for combating link-based Web spam,” ACM Trans. Web, vol. 8, no. 3, 2014, Art. no. 15.
[12] Wu, T., Wen, S., Xiang, Y., & Zhou, W., Twitter spam detection: Survey of new approaches and comparative study. Computers & Security, 2018, Vol 76, pp. 265-284.
[13] Sedhai, S., & Sun, A., Hspam14: A collection of 14 million tweets for hashtag-oriented spam research. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015, pp. 223-232.
[14] Sedhai, S., & Sun, A. (2017). Semi-supervised spam detection in Twitter stream. IEEE Transactions on Computational Social Systems, 2015, Vol 5(1), pp.169-175.
[15] Alom, Z., Carminati, B., & Ferrari, E., A deep learning model for Twitter spam detection. Online Social Networks and Media, 2020.
[16] Le, Q., & Mikolov, T., Distributed representations of sentences and documents. In International conference on machine learning, 2014, pp. 1188-1196.
[17] Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P., Natural language processing (almost) from scratch. Journal of machine learning research, 2011, pp. 2493-2537.
[18] Kim, Y. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882, 2014.
[19] Madisetty, S., & Desarkar, M. S. A neural network-based ensemble approach for spam detection in Twitter. IEEE Transactions on Computational Social Systems, 2018, Vol 5(4), pp. 973-984.
[20] Osgood, Charles Egerton, George J. Suci, and Percy H. Tannenbaum, The measurement of meaning. No. 47. University of Illinois press, 1957.
[21] Russell, James A, 'A circumplex model of affect', Journal of personality and social psychology, 1980, Vol 39, pp. 1161.
[22] Russell, James A, and Lisa Feldman Barrett. 'Core affect, prototypical emotional episodes, and other things called emotion: dissecting the elephant', Journal of personality and social psychology, 1999, pp. 76: 805.
[23] Andrew Ortony, Terence J. Turner, What's Basic About Basic Emotions, Psychological Review, 1990, Vol 97(3), pp. 315-31.
[24] Mohammad, Saif M., Sentiment analysis: Detecting valence, emotions, and other affectual states from text, In Emotion measurement, Woodhead Publishing, 2016, pp. 201-237.
[25] Kuppens, P., Tuerlinckx, F., Russell, J.A. and Barrett, L.F, The relation between valence and arousal in subjective experience, Psychological Bulletin, 2013, Vol 139(4), pp. 917.
[26] Kuppens, P., Tuerlinckx, F., Yik, M., Koval, P., Coosemans, J., Zeng, K.J. and Russell, J.A, the relation between valence and arousal in subjective experience varies with personality and culture, Journal of personality, 2017, Vol 85(4), pp. 530-542.
[27] Baccianella, Stefano, Andrea Esuli, and Fabrizio Sebastiani. Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining, In Lrec, 2010, vol. 10, pp. 2200-2204.
[28] LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P., Gradient-based learning applied to document recognition, Proceedings of the IEEE, 1998, Vol 86(11), pp. 2278-2324.
[29] Perveen, N., Missen, M. M. S., Rasool, Q., & Akhtar, N. Sentiment based twitter spam detection. International Journal of Advanced Computer Science and Applications (IJACSA), 2016, 7(7), 568-573.
[30] Martinez-Romo, J., & Araujo, L., Detecting malicious tweets in trending topics using a statistical analysis of language. Expert Systems with Applications, 2013, Vol 40(8), pp. 2992-3000.
[31] Töscher, A., Jahrer, M., & Bell, R. M., The bigchaos solution to the netflix grand prize. Netflix prize documentation, 2009, pp. 1-52.
[32] Niculescu-Mizil, A., Perlich, C., Swirszcz, G., Sindhwani, V., Liu, Y., Melville, P., ... & Shang, W. X. Winning the KDD cup orange challenge with ensemble selection. In KDD-Cup 2009 Competition, pp. 23-34.
[33] C. Yang, R. C. Harkreader, and G. Gu. Die free or live hard? empirical evaluation and new design for fighting evolving twitter spammers. In Proceedings of RAID, RAID’11, Berlin, Heidelberg, Springer-Verlag, 2011, pp. 318-337.