Spam Detection in Twitter by Ensemble Learning Approach
Subject Areas : electrical and computer engineeringMaryam Fasihi 1 , Mohammad Javad shayegan 2 , zahra hosieni 3 , zahra sejdeh 4
1 - university of science and culture
2 - University of Science and Culture
3 - university of science and culture
4 - university of science and culture
Keywords: Neural networks, spam detection, Twitter, Autoencoder, softmax,
Abstract :
Today, social networks play a crucial role in disseminating information worldwide. Twitter is one of the most popular social networks, with 500 million tweets sent on a daily basis. The popularity of this network among users has led spammers to exploit it for distributing spam posts. This paper employs a combination of machine learning methods to identify spam at the tweet level. The proposed method utilizes a feature extraction framework in two stages. In the first stage, Stacked Autoencoder is used for feature extraction, and in the second stage, the extracted features from the last layer of Stacked Autoencoder are fed into the softmax layer for prediction. The proposed method is compared and evaluated against some popular methods on the Twitter Spam Detection corpus using accuracy, precision, recall, and F1-score metrics. The research results indicate that the proposed method achieves a detection of 78.1%. Overall, the proposed method, using the majority voting approach with a hard selection in ensemble learning, outperforms CNN, LSTM, and SCCL methods in identifying spam tweets with higher accuracy.
[1] S. Madisetty and M. S. Desarkar, “A Neural Network-Based Ensemble Approach for Spam Detection in Twitter,” IEEE Trans. Comput. Soc. Syst., vol. 5, no. 4, pp. 973–984, Dec. 2018.
[2] M. McCord and M. Chuah, “Spam detection on twitter using traditional classifiers,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2011, vol. 6906 LNCS, pp. 175–186.
[3] X. Zhang, S. Zhu, and W. Liang, “Detecting spam and promoting campaigns in the Twitter social network,” in Proceedings - IEEE International Conference on Data Mining, ICDM, 2012, pp. 1194–1199.
[4] A. T. Kabakus and R. Kara, “A Survey of Spam Detection Methods on Twitter,” International Journal of Advanced Computer Science and Applications, 8(3), pp.29-38, 2017.
[5] X. Zheng, Z. Zeng, Z. Chen, Y. Yu, and C. Rong, “Detecting spammers on social networks,” Neurocomputing, vol. 159, no. 1, pp. 27–34, Jul. 2015.
[6] J. Martinez-Romo and L. Araujo, “Detecting malicious tweets in trending topics using a statistical analysis of language,” Expert Syst. Appl., vol. 40, no. 8, pp. 2992–3000, Jun. 2013.
[7] A. M. Al-Zoubi, H. Faris, J. Alqatawna, and M. A. Hassonah, “Evolving Support Vector Machines using Whale Optimization Algorithm for spam profiles detection on online social networks in different lingual contexts,” Knowledge-Based Syst., vol. 153, pp. 91–104, Aug. 2018.
[8] S. B. S. Ahmad, M. Rafie, and S. M. Ghorabie, “Spam detection on Twitter using a support vector machine and users’ features by identifying their interactions,” Multimed. Tools Appl., vol. 80, no. 8, pp. 11583–11605, Mar. 2021.
[9] Z. Alom, B. Carminati, and E. Ferrari, “A deep learning model for Twitter spam detection,” Online Soc. Networks Media, vol. 18, p. 100079, Jul. 2020.
[10] X. Ban, C. Chen, S. Liu, Y. Wang, and J. Zhang, “Deep-learnt features for Twitter spam detection,” 2018 Int. Symp. Secur. Priv. Soc. Networks Big Data, Soc. 2018, pp. 22–26, Dec. 2018.
[11] Y. Liu, L. Wang, T. Shi, and J. Li, “Detection of spam reviews through a hierarchical attention architecture with N-gram CNN and Bi-LSTM,” Inf. Syst., vol. 103, p. 101865, Jan. 2022.
[12] G. Jain, M. Sharma, and B. Agarwal, “Optimizing semantic LSTM for spam detection,” Int. J. Inf. Technol., vol. 11, no. 2, pp. 239–250, Jun. 2019.
[13] G. Jain, M. Sharma, B. A.-A. of M. and Artificial, and undefined 2019, “Spam detection in social media using convolutional and long short term memory neural network,” Springer, 2019.
[14] T. Wu, S. Liu, J. Zhang, and Y. Xiang, “Twitter spam detection based on deep learning,” ACM Int. Conf. Proceeding Ser., Jan. 2017.
[15] G. M. Shahariar, S. Biswas, F. Omar, F. M. Shah, and S. Binte Hassan, “Spam Review Detection Using Deep Learning,” 2019 IEEE 10th Annu. Inf. Technol. Electron. Mob. Commun. Conf. IEMCON 2019, pp. 27–33, Oct. 2019.
[16] A.T.Kabakus, and R .Kara, “‘TwitterSpamDetector’: A Spam Detection Framework for Twitter,” International Journal of Knowledge and Systems Science (IJKSS), 10(3), pp.1-14.2019.
[17] H. Shen, F. Ma, X. Zhang, L. Zong, X. Liu, and W. Liang, “Discovering social spammers from multiple views,” Neurocomputing, vol. 225, pp. 49–57, Feb. 2017.
[18] K. Lee, J. Caverlee, and S. Webb, “Uncovering social spammers: Social honeypots + machine learning,” in SIGIR 2010 Proceedings - 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2010, pp. 435–442.
[19] C. Grier, K. Thomas, V. Paxson, and M. Zhang, “@Spam: The underground on 140 characters or less,” in Proceedings of the ACM Conference on Computer and Communications Security, 2010, pp. 27–37.
[20] S. Saumya and J. P. Singh, “Spam review detection using LSTM autoencoder: an unsupervised approach,” Electron. Commer. Res., vol. 22, no. 1, pp. 113–133, Mar. 2022.
[21] J. V Lochter, T. A. Almeida, and T. C. Alberto, “Tubespam: Comment spam filtering on youtube,” ieeexplore.ieee.org.
[22] V. B. Semwal, A. Gupta, and P. Lalwani, “An optimized hybrid deep learning model using ensemble learning approach for human walking activities recognition,” J. Supercomput. 2021, pp. 1–24, Apr. 2021.
[23] M. Usama et al., “Unsupervised Machine Learning for Networking: Techniques, Applications and Research Challenges,” IEEE Access, vol. 7, pp. 65579–65615, 2019.
[1] S. Madisetty and M. S. Desarkar, "A neural network-based ensemble approach for spam detection in Twitter," IEEE Trans. Comput. Soc. Syst., vol. 5, no. 4, pp. 973-984, Dec. 2018.
[2] M. McCord and M. Chuah, "Spam detection on twitter using traditional classifiers," Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. LNCS6906, pp. 175-186, Sept. 2011.
[3] X. Zhang, S. Zhu, and W. Liang, "Detecting spam and promoting campaigns in the Twitter social network," in Proc. IEEE International Conf. on Data Mining, ICDM, pp. 1194-1199, Brussels, Belgium , 10-13 Dec. 2012.
[4] A. T. Kabakus and R. Kara, "A survey of spam detection methods on Twitter," International J. of Advanced Computer Science and Applications, vol. 8, no. 3, pp. 29-38, 2017.
[5] X. Zheng, Z. Zeng, Z. Chen, Y. Yu, and C. Rong, "Detecting spammers on social networks," Neurocomputing, vol. 159, no. 1, pp. 27-34, Jul. 2015.
[6] J. Martinez-Romo and L. Araujo, "Detecting malicious tweets in trending topics using a statistical analysis of language," Expert Syst. Appl., vol. 40, no. 8, pp. 2992-3000, Jun. 2013.
[7] A. M. Al-Zoubi, H. Faris, J. Alqatawna, and M. A. Hassonah, "Evolving support vector machines using whale optimization algorithm for spam profiles detection on online social networks in different lingual contexts," Knowledge-Based Syst., vol. 153, pp. 91-104, Aug. 2018.
[8] S. B. S. Ahmad, M. Rafie, and S. M. Ghorabie, "Spam detection on Twitter using a support vector machine and users' features by identifying their interactions," Multimed. Tools Appl., vol. 80, no. 8, pp. 11583-11605, Mar. 2021.
[9] Z. Alom, B. Carminati, and E. Ferrari, "A deep learning model for Twitter spam detection," Online Soc. Networks Media, vol. 18, Article ID: 100079, Jul. 2020.
[10] X. Ban, C. Chen, S. Liu, Y. Wang, and J. Zhang, "Deep-learnt features for Twitter spam detection," in Proc. Int. Symp. Secur. Priv. Soc. Networks Big Data, pp. 22-26, Santa Clara, CA, USA, 10-11 Dec. 2018.
[11] Y. Liu, L. Wang, T. Shi, and J. Li, "Detection of spam reviews through a hierarchical attention architecture with N-gram CNN and Bi-LSTM," Inf. Syst., vol. 103, Article ID: 101865, Jan. 2022.
[12] G. Jain, M. Sharma, and B. Agarwal, "Optimizing semantic LSTM for spam detection," Int. J. Inf. Technol., vol. 11, no. 2, pp. 239-250, Jun. 2019.
[13] G. Jain, M. Sharma, and B. Agarwal, "Spam detection in social media using convolutional and long short term memory neural network," Annals of Mathematics and Artificial Intelligence, vol. 85, no. 1, pp. 21-44, 2019.
[14] T. Wu, S. Liu, J. Zhang, and Y. Xiang, "Twitter spam detection based on deep learning," in Proc. ACM Int. Conf. Proc. Ser., 8 pp., Geelong, Australia, 30 Jan.-3 Feb 2017.
[15] G. M. Shahariar, S. Biswas, F. Omar, F. M. Shah, and S. Binte Hassan, "Spam review detection using deep learning," in Proc. IEEE 10th Annu. Inf. Technol. Electron. Mob. Commun. Conf., IEMCON’19, pp. 27-33, Vancouver, Canada, 17-19 Oct. 2019.
[16] A. T. Kabakus and R. Kara, "‘TwitterSpamDetector’: a spam detection framework for twitter," International J. of Knowledge and Systems Science, vol. 10, no. 3, pp. 1-14, Jul. 2019.
[17] H. Shen, et al., "Discovering social spammers from multiple views," Neurocomputing, vol. 225, pp. 49-57, Feb. 2017.
[18] K. Lee, J. Caverlee, and S. Webb, "Uncovering social spammers: social honeypots + machine learning," in Proc. SIGIR Proc.-33rd Annual International ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 435-442, Geneva, Switzerland, 19-23 Jul. 2010.
[19] C. Grier, K. Thomas, V. Paxson, and M. Zhang, "@spam: the underground on 140 characters or less," in Proc. of the ACM Conf. on Computer and Communications Security, pp. 27-37, Chicago, IL, USA, 4-8 Oct. 2010.
[20] S. Saumya and J. P. Singh, "Spam review detection using LSTM autoencoder: an unsupervised approach," Electron. Commer. Res., vol. 22, no. 1, pp. 113-133, Mar. 2022.
[21] J. V. Lochter, T. A. Almeida, and T. C. Alberto, "TubeSpam: comment spam filtering on YouTube," in Proc. IEEE 14th Int, Conf. on Machine Learning and Applications, pp. 138-143, Miami, FL, USA, 9-11 Dec. 2015.
[22] M. M. Abdulhasan, H. Alchilibi, M. A. Mohammed, and R. Nair, "Real-time sentiment analysis and spam detection using machine learning and deep learning," in Proc. 3rd Int. Conf. on Data Science and Big Data Analytics, pp. 507-533, Indore, India, 16-17 Jun. 2023.
[23] A. Ahraminezhad, M. Mojarad, and H. Arfaeinia, "An intelligent ensemble classification method for spam diagnosis in social networks," International J. of Intelligent Systems and Applications, vol. 14, no. 1, pp. 24-31, Feb. 2022.
[24] Z. Alom, B. Carminati, and E. Ferrari, "A deep learning model for Twitter spam detection," Online Social Networks and Media, Article ID: 100079, Jul. 2020.
[25] S. Liu, Y. Wang, J. Zhang, C. Chen, and Y. Xiang, "Addressing the class imbalance problem in twitter spam detection using ensemble learning," Computers & Security, vol. 69, pp. 35-49, Aug. 2017.
[26] C. Zhao, Y. Xin, X. Li, Y. Yang, and Y. Chen, "A heterogeneous ensemble learning framework for spam detection in social networks with imbalanced data," Applied Sciences, vol. 10, no. 3, Article ID” 936, Jan. 2020.
[27] M. Usama, et al., "Unsupervised machine learning for networking: techniques, applications and research challenges," IEEE Access, vol. 7, pp. 65579-65615, 2019.