Use of conditional generative adversarial network to produce synthetic data with the aim of improving the classification of users who publish fake news
Subject Areas :arefeh esmaili 1 , Saeed Farzi 2
1 - University student
2 - Assistant Professor
Keywords: Fake news publisher user detection, Imbalanced datasets, Generative Adversarial Network, Graph of user interaction, Node Embedding.,
Abstract :
For many years, fake news and messages have been spread in human societies, and today, with the spread of social networks among the people, the possibility of spreading false information has increased more than before. Therefore, detecting fake news and messages has become a prominent issue in the research community. It is also important to detect the users who generate this false information and publish it on the network. This paper detects users who publish incorrect information on the Twitter social network in Persian. In this regard, a system has been established based on combining context-user and context-network features with the help of a conditional generative adversarial network (CGAN) for balancing the data set. The system also detects users who publish fake news by modeling the twitter social network into a graph of user interactions and embedding a node to feature vector by Node2vec. Also, by conducting several tests, the proposed system has improved evaluation metrics up to 11%, 13%, 12%, and 12% in precision, recall, F-measure and accuracy respectively, compared to its competitors and has been able to create about 99% precision, in detecting users who publish fake news.
Parikh, S.B. and P.K. Atrey. "Media-rich fake news detection: A survey. " in 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR). 2018. IEEE.
[2] Kochkina, E., M. Liakata, and A. Zubiaga, "All-in-one: Multi-task learning for rumour verification. " arXiv preprint arXiv:1806.03713, 2018.
[3] Tacchini, E., et al., "Some like it hoax: Automated fake news detection in social networks. " arXiv preprint arXiv:1704.07506, 2017.
[4] Shu, K., et al., "Fake news detection on social media: A data mining perspective. " ACM SIGKDD explorations newsletter, 2017. 19(1): p. 22-36.
[5] Inuwa-Dutse, I., M. Liptrott, and I. Korkontzelos, "Detection of spam-posting accounts on Twitter. " Neurocomputing, 2018. 315: p. 496-511.
[6] Bindu, P., R. Mishra, and P.S. Thilagam, "Discovering spammer communities in Twitter. " Journal of Intelligent Information Systems, 2018. 51(3): p. 503-527.
[7] de Souza, J.V., et al., "A systematic mapping on automatic classification of fake news in social media. " Social Network Analysis and Mining, 2020. 10(1): p. 1-21.
[8] Grinberg, N., et al., " Fake news on Twitter during the 2016 US presidential election. " Science, 2019. 363(6425): p. 374-378.
[9] Maaten, L.v.d. and G. Hinton, "Visualizing data using t-SNE. " Journal of machine learning research, 2008. 9(Nov): p. 2579-2605.
[10] Gheewala, S. and R. Patel. "Machine learning based Twitter Spam account detection: a review. " in 2018 Second International Conference on Computing Methodologies and Communication (ICCMC). 2018. IEEE.
[11] Gaonkar, S., et al. " Detection Of Online Fake News: A Survey. " in 2019 International Conference on Vision Towards Emerging Trends in Communication and Networking (ViTECoN). 2019. IEEE.
[12] Hardalov, M., I. Koychev, and P. Nakov. " In search of credible news. " in International Conference on Artificial Intelligence: Methodology, Systems, and Applications. 2016. Springer.
[13] Goodfellow, I., et al. "Generative adversarial nets. " in Advances in neural information processing systems. 2014.
[14] Douzas, G. and F. Bacao, "Effective data generation for imbalanced learning using conditional generative adversarial networks. " Expert Systems with applications, 2018. 91: p. 464-471.
[15] Mirza, M. and S. Osindero, "Conditional generative adversarial nets. " arXiv preprint arXiv:1411.1784, 2014.
[16] Grover, A. and J. Leskovec. "node2vec: Scalable feature learning for networks. " in Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 2016. ACM.
[17] Conroy, N.K., V.L. Rubin, and Y. Chen, "Automatic deception detection: Methods for finding fake news. " Proceedings of the Association for Information Science and Technology, 2015. 52(1): p. 1-4.
[18] Bondielli, A. and F. Marcelloni, "A survey on fake news and rumour detection techniques. " Information Sciences, 2019. 497: p. 38-55.
[19] Mohammadrezaei, M., M.E. Shiri, and A.M. Rahmani, "Identifying fake accounts on social networks based on graph analysis and classification algorithms. " Security and Communication Networks, 2018. 2018.
[20] Yang, C., R. Harkreader, and G. Gu, "Empirical evaluation and new design for fighting evolving twitter spammers. " IEEE Transactions on Information Forensics and Security, 2013. 8(8): p. 1280-1293.
[21] Wang, A.H. "Don't follow me: Spam detection in twitter. " in 2010 international conference on security and cryptography (SECRYPT). 2010. IEEE.
[22] Benevenuto, F., et al. "Detecting spammers on twitter. " in Collaboration, electronic messaging, anti-abuse and spam conference (CEAS). 2010.
[23] Masood, Faiza, et al. "Spammer detection and fake user identification on social networks." IEEE Access 7 (2019): 68140-68152.
[24] Xie, Y., et al. "A Fake News Detection Framework Using Social User Graph. " in Proceedings of the 2020 2nd International Conference on Big Data Engineering. 2020.
[25] KARUNAKAR, M.G., et al., " ADAPTIVE DETECTING FAKE PROFILES IN ONLINE SOCIAL NETWORKS. "
[26] Della Vedova, M.L., et al. "Automatic online fake news detection combining content and social signals. " in 2018 22nd Conference of Open Innovations Association (FRUCT). 2018. IEEE.
[27] Shu, K., et al. "defend: Explainable fake news detection. " in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019.
[28] Guacho, G.B., et al. "Semi-supervised content-based detection of misinformation via tensor embeddings. " in 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). 2018. IEEE.
[29] Shu, K., et al. "The role of user profiles for fake news detection. " in Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 2019.
[30] Shu, K., S. Wang, and H. Liu. "Beyond news contents: The role of social context for fake news detection. " in Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. 2019.
[31] Hamdi, T., et al. "A Hybrid Approach for Fake News Detection in Twitter Based on User Features and Graph Embedding. " in International Conference on Distributed Computing and Internet Technology. 2020. Springer.
[32] Aphiwongsophon, S. and P. Chongstitvatana. "Detecting fake news with machine learning method. " in 2018 15th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON). 2018. IEEE.
[33] Hussain, M.G., et al., "Detection of Bangla Fake News using MNB and SVM Classifier. " arXiv preprint arXiv:2005.14627, 2020.
[34] Li, Y., et al., "Exploiting similarities of user friendship networks across social networks for user identification. " Information Sciences, 2020. 506: p. 78-98.
[35] Vijayaraghavan, S., et al., "Fake News Detection with Different Models. " arXiv preprint arXiv:2003.04978, 2020.
[36] Jadhav, S.S. and S.D. Thepade, "Fake news identification and classification using DSSM and improved recurrent neural network classifier. " Applied Artificial Intelligence, 2019. 33(12): p. 1058-1068.
[37] Ajao, O., D. Bhowmik, and S. Zargari. "Fake news identification on twitter with hybrid cnn and rnn models. " in Proceedings of the 9th international conference on social media and society. 2018.
[38] Zhang, J., B. Dong, and S.Y. Philip. "Fakedetector: Effective fake news detection with deep diffusive neural network. " in 2020 IEEE 36th International Conference on Data Engineering (ICDE). 2020. IEEE.
[39] Verma, A., V. Mittal, and S. Dawn. "FIND: Fake information and news detections using deep learning. " in 2019 Twelfth International Conference on Contemporary Computing (IC3). 2019. IEEE.
[40] Ruan, N., R. Deng, and C. Su, "GADM: Manual fake review detection for O2O commercial platforms. " Computers & Security, 2020. 88: p. 101657.
[41] Hosseinimotlagh, S. and E.E. Papalexakis. "Unsupervised content-based identification of fake news articles with tensor decomposition ensembles. " in Proceedings of the Workshop on Misinformation and Misbehavior Mining on the Web (MIS2). 2018.
[42] Yang, S., et al. "Unsupervised fake news detection on social media: A generative approach. " in Proceedings of the AAAI Conference on Artificial Intelligence. 2019.
[43] Phan, T.D. and N. Zincir‐Heywood, "User identification via neural network based language models. " International Journal of Network Management, 2019. 29(3): p. e2049.
[44] Mateen, M., et al. "A hybrid approach for spam detection for Twitter. " in 2017 14th International Bhurban Conference on Applied Sciences and Technology (IBCAST). 2017. IEEE.
[45] Chen, C., et al., "Statistical features-based real-time detection of drifted twitter spam. " IEEE Transactions on Information Forensics and Security, 2016. 12(4): p. 914-925.
[46] Volkova, S., et al. "Separating facts from fiction: Linguistic models to classify suspicious and trusted news posts on twitter. " in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2017.
[47] Mahmoodabad, S.D., S. Farzi, and D.B. Bakhtiarvand. "Persian rumor detection on twitter. " in 2018 9th International Symposium on Telecommunications (IST). 2018. IEEE.
[48] Wang, W., et al. "Global-and-Local Aware Data Generation for the Class Imbalance Problem. " in Proceedings of the 2020 SIAM International Conference on Data Mining. 2020. SIAM.
[49] Rout, N., D. Mishra, and M.K. Mallick, "Handling imbalanced data: A survey", in International Proceedings on Advances in Soft Computing, Intelligent Systems and Applications. 2018, Springer. p. 431-443.
[50] Chen, H. and L. Jiang, " Efficient GAN-based method for cyber-intrusion detection. " arXiv preprint arXiv:1904.02426, 2019.
[51] Lee, J. and K. Park, "GAN-based imbalanced data intrusion detection system. " Personal and Ubiquitous Computing, 2019: p. 1-8.
[52] Kim, J.-Y., S.-J. Bu, and S.-B. Cho. "Malware detection using deep transferred generative adversarial networks. " in International Conference on Neural Information Processing. 2017. Springer.
[53] Radford, A., L. Metz, and S. Chintala, "Unsupervised representation learning with deep convolutional generative adversarial networks. " arXiv preprint arXiv:1511.06434, 2015.
[54] Kovács, G., "An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets. " Applied Soft Computing, 2019. 83: p. 105662.
[55] Chawla, N.V., et al., "SMOTE: synthetic minority over-sampling technique. " Journal of artificial intelligence research, 2002. 16: p. 321-357.
[56] Batista, G.E., R.C. Prati, and M.C. Monard, "A study of the behavior of several methods for balancing machine learning training data. " ACM SIGKDD explorations newsletter, 2004. 6(1): p. 20-29.
[57] Han, H., W.-Y. Wang, and B.-H. Mao. "Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. " in International conference on intelligent computing. 2005. Springer.
[58] Cieslak, D.A., N.V. Chawla, and A. Striegel. "Combating imbalance in network intrusion datasets. " in GrC. 2006.
[59] De La Calleja, J. and O. Fuentes. "A Distance-Based Over-Sampling Method for Learning from Imbalanced Data Sets. " in FLAIRS Conference. 2007.
[60] He, H., et al. "ADASYN: Adaptive synthetic sampling approach for imbalanced learning. " in 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence). 2008. IEEE.
[61] Dong, Y. and X. Wang. "A new over-sampling approach: random-SMOTE for learning from imbalanced data sets. " in International Conference on Knowledge Science, Engineering and Management. 2011. Springer.
[62] Lee, H., J. Kim, and S. Kim, "Gaussian-Based SMOTE Algorithm for Solving Skewed Class Distributions. " International Journal of Fuzzy Logic and Intelligent Systems, 2017. 17(4): p. 229-234.
[63] Ma, L. and S. Fan, "CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests. " BMC bioinformatics, 2017. 18(1): p. 1-18.
[64] Koziarski, M. and M. Wożniak, "CCR: A combined cleaning and resampling algorithm for imbalanced data classification. " International Journal of Applied Mathematics and Computer Science, 2017. 27(4): p. 727-736.
[65] Breuer, Adam, Roee Eilat, and Udi Weinsberg. "Friend or Faux: Graph-Based Early Detection of Fake Accounts on Social Networks." Proceedings of The Web Conference 2020. 2020.
[66] Liu, Yang, and Yi-Fang Brook Wu. "FNED: A Deep Network for Fake News Early Detection on Social Media." ACM Transactions on Information Systems (TOIS) 38.3 (2020): 1-33.
[67] Liao, Hao, Qixin Liu, and Kai Shu. "Incorporating User-Comment Graph for Fake News Detection." arXiv preprint arXiv:2011.01579 (2020).
[68] Balaanand, Muthu, et al. "An enhanced graph-based semi-supervised learning algorithm to detect fake users on Twitter." The Journal of Supercomputing 75.9 (2019): 6085-6105.
[69] Fawcett, Tom. "An introduction to ROC analysis." Pattern recognition letters 27.8 (2006): 861-874.