Implementation of Machine Learning Algorithms for Customer Churn Prediction
محورهای موضوعی : Machine learningManal Loukili 1 , Fayçal Messaoudi 2 , Raouya El Youbi 3
1 - National School of Applied Sciences, Sidi Mohamed Ben Abdellah University, Fez, Morocco
2 - National School of Business and Management, Sidi Mohamed Ben Abdellah University, Fez, Morocco
3 - National School of Applied Sciences, Sidi Mohamed Ben Abdellah University, Fez, Morocco
کلید واژه: Machine Learning, Churn Prediction, Consumer Behavior, Bagging SVM, k-NN, Random Forest,
چکیده مقاله :
Churn prediction is one of the most critical issues in the telecommunications industry. The possibilities of predicting churn have increased considerably due to the remarkable progress made in the field of machine learning and artificial intelligence. In this context, we propose the following process which consists of six stages. The first phase consists of data pre-processing, followed by feature analysis. In the third phase, the selection of features. Then the data was divided into two parts: the training set and the test set. In the prediction process, the most popular predictive models were adopted, namely random forest, k-nearest neighbor, and support vector machine. In addition, we used cross-validation on the training set for hyperparameter tuning and to avoid model overfitting. Then, the results obtained on the test set were evaluated using the confusion matrix and the AUC curve. Finally, we found that the models used gave high accuracy values (over 79%). The highest AUC score, 84%, is achieved by the SVM and bagging classifiers as an ensemble method which surpasses them.
Churn prediction is one of the most critical issues in the telecommunications industry. The possibilities of predicting churn have increased considerably due to the remarkable progress made in the field of machine learning and artificial intelligence. In this context, we propose the following process which consists of six stages. The first phase consists of data pre-processing, followed by feature analysis. In the third phase, the selection of features. Then the data was divided into two parts: the training set and the test set. In the prediction process, the most popular predictive models were adopted, namely random forest, k-nearest neighbor, and support vector machine. In addition, we used cross-validation on the training set for hyperparameter tuning and to avoid model overfitting. Then, the results obtained on the test set were evaluated using the confusion matrix and the AUC curve. Finally, we found that the models used gave high accuracy values (over 79%). The highest AUC score, 84%, is achieved by the SVM and bagging classifiers as an ensemble method which surpasses them.
[1] M. Loukili, F. Messaoudi, and M. El Ghazi, "Supervised Learning Algorithms for Predicting Customer Churn with Hyperparameter Optimization", International Journal of Advances in Soft Computing & Its Applications, Vol. 14, No. 3, 2022, pp. 49-63. doi: 10.15849/IJASCA.221128.04.
[2] K. Matuszelański, and K. Kopczewska, "Customer Churn in Retail E-Commerce Business: Spatial and Machine Learning Approach". J. Theor. Appl. Electron. Commer. Res. 2022, 17, pp. 165-198. https://doi.org/10.3390/jtaer17010009.
[3] H. Abbasimehr, M Setak, and M Tarokh, "A neuro-fuzzy classifier for customer churn prediction", International Journal of Computer Applications, Vol. 19, No. 8, 2011, pp. 35-41.
[4] A. K. Ahmad, A. Jafar, and K. Aljoumaa, "Customer churn prediction in telecom using machine learning in big data platform". Journal of Big Data, Vol. 6, No. 1, 2019, pp. 28 .
[5] J. Hadden, A. Tiwari, R. Roy, and D. Ruta, "Churn prediction : Does technology matter", International Journal of Intelligent Technology, Vol. 1, No. 2, 2006, pp. 104-110.
[6] I. Brându¸soiu, G. Toderean, and H. Beleiu, "Methods for churn prediction in the pre-paid mobile telecommunications industry", in 2016 International conference on communications (COMM), IEEE, 2016, pp. 97-100.
[7] K. Coussement, and D. Van den Poel, "Churn prediction in subscription services: An application of support vector machines while comparing two parameter-selection techniques", Expert systems with applications, Vol. 34, No. 1, pp. 313-327.
[8] J. Hadden, A. Tiwari, R. Roy, and D. Ruta, "Computer assisted customer churn management: State-of-the-art and future trends", Computers & Operations Research Vol. 34, No. 10, 2007, pp. 02-29.
[9] K. Dahiya, and S. Bhatia, "Customer churn analysis in telecom industry", in 2015 4th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO), Trends and Future Directions, 2015, pp. 1-6.
[10] L. Bottou, "Large-scale machine learning with stochastic gradient descent", in Proceedings of COMPSTAT’2010, 2010, Physica-Verlag HD, pp. 177-186.
[11] S. Suthaharan, "Support Vector Machine in Machine learning Models and Algorithms for Big Data Classification", Integrated Series in Information Systems, Springer, New York, Vol. 36, 2016, pp. 207-235.
[12] S. F. Sabbeh, "Machine-learning techniques for customer retention: A comparative study", International Journal of Advanced Computer Science and Applications, Vol. 9, No. 2, 2018.
[13] H. C. Kim, S. Pang, H. M. Je, D. Kim, and S. Y. Bang, "Support vector machine ensemble with bagging", Berlin, Heidelberg, Springer, 2002, pp. 397-408.
[14] H. Abbasimehr, M. Setak, and M. J. Tarokh, "A Comparative Assessment of the Performance of Ensemble Learning in Customer Churn Prediction", Int. Arab J. Inf. Technol, Vol. 11, No. 6, 2014, pp. 599-606. [15] S. Tavassoli, and H. Koosha, "Hybrid Ensemble Learning Approaches to Customer Churn Prediction", Kybernetes, 2021.
[16] A. Mishra, and U. S. Reddy, "A comparative study of customer churn prediction in telecom industry using ensemble-based classifiers", in 2017 International Conference on Inventive Computing and Informatics (ICICI), 2017, IEEE, pp. 721-725.
[17] N. Ali, D. Neagu, and P. Trundle, "Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets", SN Applied Sciences, Vol. 1, 2019, pp. 1-15.
[18] J. Ali, R. Khan, N. Ahmad, and I. Maqsood, "Random Forests and Decision Trees, International", Journal of Computer Science Issues, Vol. 9, No. 5, 2012, pp. 272-275.
[19] A. Alamsyah, and N. Salma, "A Comparative Study of Employee Churn Prediction Model", in 2018 4th International Conference on Science and Technology, IEEE, 2018, pp. 1-4.
[20] M. Loukili, F. Messaoudi, and M. El Ghazi, "Sentiment Analysis of Product Reviews for E-Commerce Recommendation based on Machine Learning", International Journal of Advances in Soft Computing & Its Applications, Vol. 15, No. 1, 2023, pp. 1-13.