Hoax Identification of Indonesian Tweeters using Ensemble Classifier
محورهای موضوعی : Machine learningGus Nanang Syaifuddiin 1 , Rizal Arifin 2 , Desriyanti Desriyanti 3 , Ghulam Asrofi Buntoro 4 , Zulkham Umar Rosyidin 5 , Ridwan Yudha Pratama 6 , Ali Selamat 7
1 - Department of Information Technology, Politeknik Negeri Madiun, Jl. Serayu No. 84 Madiun 63133, Indonesia
2 - Faculty of Engineering, Universitas Muhammadiyah Ponorogo, Jl. Budi Utomo No. 10 Ponorogo 63471, Indonesia
3 - Faculty of Engineering, Universitas Muhammadiyah Ponorogo, Jl. Budi Utomo No. 10 Ponorogo 63471, Indonesia
4 - Faculty of Engineering, Universitas Muhammadiyah Ponorogo, Jl. Budi Utomo No. 10 Ponorogo 63471, Indonesia
5 - Faculty of Engineering, Universitas Muhammadiyah Ponorogo, Jl. Budi Utomo No. 10 Ponorogo 63471, Indonesia
6 - Faculty of Engineering, Universitas Muhammadiyah Ponorogo, Jl. Budi Utomo No. 10 Ponorogo 63471, Indonesia
7 - Malaysia-Japan International Institute of Technology, Universiti Teknologi Malaysia, Jalan Sultan Yahya Petra, Kuala Lumpur 54100, Malaysia
کلید واژه: Hoax, Identification, Bahasa Indonesia, N-Gram, TF-IDF, Passive Aggressive Classifier,
چکیده مقاله :
Fake information, better known as hoaxes, is often found on social media. Currently, social media is not only used to make friends or socialize with friends online, but some use it to spread hate speech and false information. Hoaxes are very dangerous in social life, especially in countries with large populations and ethnically diverse cultures, such as Indonesia. Although there have been many studies on detecting false information, the accuracy and efficiency still need to be improved. To help prevent the spread of these hoaxes, we built a model to identify false information in Indonesian using an ensemble classifier that combines the n-gram method, term frequency-inverse document frequency, and passive-aggressive classifier method. The evaluation process was carried out using 5000 samples from Twitter social media accounts in this study. The testing process is carried out using four schemes by dividing the dataset into training and test data based on the ratios of 90:10, 80:20, 70:30, and 60:40. The inspection results show that our software can accurately detect hoaxes at 91.8%. We also found an increase in the accuracy and precision of hoax detection testing using the proposed method compared to several previous studies. The results show that our proposed method can be developed and used in detecting hoaxes in Indonesian on various social media platforms.
Fake information, better known as hoaxes, is often found on social media. Currently, social media is not only used to make friends or socialize with friends online, but some use it to spread hate speech and false information. Hoaxes are very dangerous in social life, especially in countries with large populations and ethnically diverse cultures, such as Indonesia. Although there have been many studies on detecting false information, the accuracy and efficiency still need to be improved. To help prevent the spread of these hoaxes, we built a model to identify false information in Indonesian using an ensemble classifier that combines the n-gram method, term frequency-inverse document frequency, and passive-aggressive classifier method. The evaluation process was carried out using 5000 samples from Twitter social media accounts in this study. The testing process is carried out using four schemes by dividing the dataset into training and test data based on the ratios of 90:10, 80:20, 70:30, and 60:40. The inspection results show that our software can accurately detect hoaxes at 91.8%. We also found an increase in the accuracy and precision of hoax detection testing using the proposed method compared to several previous studies. The results show that our proposed method can be developed and used in detecting hoaxes in Indonesian on various social media platforms.
[1] G. A. Buntoro, R. Arifin, G. N. Syaifuddiin, A. Selamat, O. Krejcar, and H. Fujita, “The implementation of the machine learning algorithm for the sentiment analysis of Indonesia’s 2019 presidential election,” IIUM Eng. J., vol. 22, no. 1, pp. 78–92, 2021, doi: https://doi.org/10.31436/iiumej.v22i1.1532.
[2] S. Pokhrel and R. Chhetri, “A Literature Review on Impact of COVID-19 Pandemic on Teaching and Learning,” High. Educ. Futur., vol. 8, no. 1, pp. 133–141, 2021, doi: https://doi.org/10.1177/2347631120983481.
[3] M. Aristeidou and S. Cross, “Disrupted distance learning: the impact of Covid-19 on study habits of distance learning university students,” Open Learn. J. Open, Distance eLearning, vol. 36, no. 3, pp. 263–282, 2021, doi: https://doi.org/10.1080/02680513.2021.1973400.
[4] M. M. Zalat, M. S. Hamed, and S. A. Bolbol, “The experiences, challenges, and acceptance of e-learning as a tool for teaching during the COVID-19 pandemic among university medical staff,” PLoS One, vol. 16, no. 3, p. e0248758, 2021, doi: https://doi.org/10.1371/journal.pone.0248758.
[5] C. Saxena, H. Baber, and P. Kumar, “No Title,” J. Educ. Technol. Syst., vol. 49, no. 4, pp. 532–554, 2021, doi: https://doi.org/10.1177/0047239520977798.
[6] A. R. Alsoud and A. A. Harasis, “The Impact of COVID-19 Pandemic on Student’s E-Learning Experience in Jordan,” J. Theor. Appl. Electron. Commer. Res., vol. 16, no. 5, pp. 1404–141, 2021, doi: https://doi.org/10.3390/jtaer16050079.
[7] M. Celliers and M. Hattingh, “A Systematic Review on Fake News Themes Reported in Literature,” in Responsible Design, Implementation and Use of Information and Communication Technology, 2020, pp. 223–234, doi: https://doi.org/10.1007/978-3-030-45002-1_19.
[8] T. Buchanan, “Why do people spread false information online? The effects of message and viewer characteristics on self-reported likelihood of sharing social media disinformation,” PLoS One, vol. 15, no. 10, p. e0239666, 2020, doi: https://doi.org/10.1371/journal.pone.0239666.
[9] T. Khan, A. Michalas, and A. Akhunzada, “Fake news outbreak 2021: Can we stop the viral spread?,” J. Netw. Comput. Appl., vol. 190, p. 103112, 2021, doi: https://doi.org/10.1016/j.jnca.2021.103112.
[10] M. Montesi, “Understanding fake news during the Covid-19 health crisis from the perspective of information behaviour: The case of Spain,” J. Librariansh. Inf. Sci., vol. 53, no. 3, pp. 454–465, 2020, doi: https://doi.org/10.1177/0961000620949653.
[11] S. van der Linden, J. Roozenbeek, and J. Compton, “Inoculating Against Fake News About COVID-19,” Front. Psychol., vol. 11, p. 566790, 2020, doi: https://doi.org/10.3389/fpsyg.2020.566790.
[12] A. Alasmari, A. Addawood, M. Nouh, W. Rayes, and A. Al-Wabil, “A Retrospective Analysis of the COVID-19 Infodemic in Saudi Arabia,” Futur. Internet, vol. 13, no. 10, p. 254, 2021, doi: https://doi.org/10.3390/fi13100254.
[13] Y. I. Ferdiawan, P. A. D. Nurjanah, E. P. Krisdyan, A. Hidayatullah, H. J. M. Sirait, and N. A. Rakhmawati, “HOAX Impact to Community Through Social Media Indonesia,” Cakrawala, vol. 19, no. 1, pp. 121–124, 2019, doi: https://doi.org/10.31294/jc.v19i1.4452.
[14] K. Lutfiyah, “Hoax and Fake News During Covid-19: Is the Law Effective in Overcoming It?,” Indones. J. Int’l Clin. Leg. Educ., vol. 2, no. 3, pp. 345–360, 2020, doi: https://doi.org/10.15294/ijicle.v2i3.38422.
[15] N. M. Nasir, B. Baequni, and M. I. Nurmansyah, “Misinformation Related to Covid-19 in Indonesia,” J. Adm. Kesehat. Indones., vol. 8, no. 1, pp. 51–59, 2020, doi: http://dx.doi.org/10.20473/jaki.v8i0.2020.51-59.
[16] M. Rasidin, D. Witro, B. Yanti, R. Purwaningsih, and W. Nurasih, “The Role of Government in Preventing The Spread if Hoax Related The 2019 Elections in Social Media,” Diakom, vol. 3, no. 2, pp. 127–3, 2020, doi: https://doi.org/10.17933/diakom.v3i2.76.
[17] M. J. Hasan, A. Rai, Z. Ahmad, and J.-M. Kim, “A Fault Diagnosis Framework for Centrifugal Pumps by Scalogram-Based Imaging and Deep Learning,” IEEE Access, vol. 9, pp. 58052–58066, 2021, doi: https://doi.org/10.1109/CEEICT.2016.7873115.
[18] M. J. Hasan, D. Shon, K. Im, H.-K. Choi, D.-S. Yoo, and J.-M. Kim, “Sleep State Classification Using Power Spectral Density and Residual Neural Network with Multichannel EEG Signals,” Appl. Sci., vol. 10, no. 21, p. 7639, 2020, doi: https://doi.org/10.3390/app10217639.
[19] M. J. Hasan, J. Uddin, and S. N. Pinku, “A novel modified SFTA approach for feature extraction,” in 2016 3rd International Conference on Electrical Engineering and Information Communication Technology (ICEEICT), 2016, pp. 1–5, doi: https://doi.org/10.1109/CEEICT.2016.7873115.
[20] H. E. Wynne and Z. Z. Wint, “Content Based Fake News Detection Using N-Gram Models,” in Information Integration and Web-based Applications & Services, 2019, pp. 669–673, doi: https://doi.org/10.1145/3366030.3366116.
[21] H. Ahmed, I. Traore, and S. Saad, “Detection of Online Fake News Using N-Gram Analysis and Machine Learning Techniques,” in Intelligent, Secure, and Dependable Systems in Distributed and Cloud Environments, 2017, pp. 127–138, doi: https://doi.org/10.1007/978-3-319-69155-8_9.
[22] H. Ahmed, I. Traore, and S. Saad, “Detecting opinion spams and fake news using text classification,” Secur. Priv., vol. 1, p. e9, 2018, doi: https://doi.org/10.1002/spy2.9.
[23] J. Huang, “Detecting Fake News With Machine Learning,” J. Phys. Conf. Ser., vol. 1693, p. 012158, 2020, doi: https://doi.org/10.1088/1742-6596/1693/1/012158.
[24] B. Al Asaad and M. Erascu, “A Tool for Fake News Detection,” in 2018 20th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), 2018, pp. 379–386, doi: https://doi.org/10.1109/SYNASC.2018.00064.
[25] M. J. Awan et al., “Fake News Data Exploration and Analytics,” Electronics, vol. 10, p. 2326, 2021, doi: https://doi.org/10.3390/electronics10192326.
[26] S. Gupta and P. Meel, “Fake News Detection Using Passive-Aggressive Classifier,” in Inventive Communication and Computational Technologies, 2020, pp. 155–164, doi: https://doi.org/10.1007/978-981-15-7345-3_13.
[27] R. R. Mandical, N. Mamatha, N. Shivakumar, R. Monica, and A. N. Krishna, “Identification of Fake News Using Machine Learning,” in 2020 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT), 2020, pp. 1–6, doi: https://doi.org/10.1109/CONECCT50063.2020.9198610.
[28] A. Chugh, Y. Arora, J. Singh, Shobhit, and Ronak, “Media Manipulation Detection System Using Passive Aggressive,” Int. J. Innov. Res. Comput. Sci. Technol., vol. 9, no. 3, pp. 48–52, 2021, doi: https://doi.org/10.21276/ijircst.2021.9.3.8.
[29] B. Zaman, A. Justitia, K. N. Sani, and E. Purwanti, “An Indonesian Hoax News Detection System Using Reader Feedback and Naïve Bayes Algorithm,” Cybern. Inf. Technol., vol. 20, no. 1, pp. 82–94, 2020, doi: https://doi.org/10.2478/cait-2020-0006.
[30] I. Y. R. Pratiwi, R. A. Asmara, and F. Rahutomo, “Study of hoax news detection using naïve bayes classifier in Indonesian language,” in 2017 11th International Conference on Information Communication Technology and System (ICTS), 2017, pp. 73–78, doi: https://doi.org/10.1109/ICTS.2017.8265649.