Breast Cancer Classification Approaches - A Comparative Analysis
Subject Areas : Machine learningMohan Kumar 1 , Sunil Kumar Khatri 2 , Masoud Mohammadian 3
1 - Amity Institute of Information Technology, Amity University, Noida (UP), India
2 - Amity University, Tashkent
3 - Doctorate
Keywords: Artificial Intelligence, Machine Learning, Wisconsin Breast Cancer Diagnostic (WBCD) dataset, k-nearest neighbors (k-NN): Support Vector Classifier: Logistic Regression, ExtraTree-decision, Random-Forest.,
Abstract :
Cancer of the breast is a difficult disease to treat since it weakens the patient's immune system. Particular interest has lately been shown in the identification of particular immune signals for a variety of malignancies in this regard. In recent years, several methods for predicting cancer based on proteomic datasets and peptides have been published. The cells turns into cancerous cells because of various reasons and get spread very quickly while detrimental to normal cells. In this regard, identifying specific immunity signs for a range of cancers has recently gained a lot of interest. Accurately categorizing and compartmentalizing the breast cancer subtype is a vital job. Computerized systems built on artificial intelligence can substantially save time and reduce inaccuracy. Several strategies for predicting cancer utilizing proteomic datasets and peptides have been reported in the literature in recent years.It is critical to classify and categorize breast cancer treatments correctly. It's possible to save time while simultaneously minimizing the likelihood of mistakes using machine learning and artificial intelligence approaches. Using the Wisconsin Breast Cancer Diagnostic dataset, this study evaluates the performance of various classification methods, including SVC, ETC, KNN, LR, and RF (random forest). Breast cancer can be detected and diagnosed using a variety of measurements of data (which are discussed in detail in the article) (WBCD). The goal is to determine how well each algorithm performs in terms of precision, recall, and accuracy. The variation of each classification threshold has been tested on various algorithms and SVM turned out to be very promising.
[1] D. Hanahan and R. A. Weinberg, “Hallmarks of Cancer: The Next Generation,” Cell, vol. 144, no. 5, pp. 646–674, Mar. 2011.
[2] S. Katuwal, P. Jousilahti, and E. Pukkala, “Causes of death among women with breast cancer: A follow‐up study of 50 481 women with breast cancer in Finland,” Int. J. Cancer, vol. 149, no. 4, pp. 839–845, Aug. 2021
[3] S. F. Khorshid and A. M. Abdulazeez, “breast cancer diagnosis based on k-nearest neighbors: A review,” PalArch’s J. Archaeol. Egypt/Egyptology, vol. 18, no. 4, pp. 1927–1951, 2021
[4] Tawam Hospital |Medical News. (n.d.). Retrieved November 19, 2014, fromhttp://www.tawamhospital.ae/english/news/print.aspx?NewsID= 367
[5] M. Karabatak, “A new classifier for breast cancer detection based on Naive Bayesian,” Measurement, vol. 72, pp. 32–36, 2015
[6] A-Al. Nahid and Y. Kong, “Involvement of machine learning for breast cancer image classification: a survey,” Comput. Math. Methods Med., 2017
[7] M. Kumar, S. K. Khatri, and M. Mohammadian, “Breast cancer identification and prognosis with machine learning techniques-An elucidative review,” J. Interdiscip. Math., vol. 23, no. 2, pp. 503–521, 2020,
[8] A. Joshi and A. Mehta, “Comparative Analysis of Various Machine Learning Techniques for Diagnosis of Breast Cancer,” Int. J. Emerg. Technol., vol. 8, no. 1, pp. 522–526, 2017.
[9] B. Soni, A. Bora, A. Ghosh, and A. Reddy, “RFSVM: A Novel Classification Technique for Breast Cancer Diagnosis,” Int. J. Innov. Technol. Explor. Eng., 2019.
[10] O. L. Mangasarian, W. N. Street, and W. H. Wolberg, “Breast cancer diagnosis and prognosis via linear programming,” Oper. Res., vol. 43, no. 4, pp. 570–577, 1995
[11] Zhi-H. Zhou and Y. Jiang, “Medical diagnosis with C4. 5 rule preceded by artificial neural network ensemble,” IEEE Trans. Inf. Technol. Biomed., vol. 7, no. 1, pp. 37–42, 2003,
[12] T. Ornthammarath, “Artificial neural networks applied to the seismic design of deep tunnels,” Università degli Studi di Pavia, 2007.
[13] D. Delen, G. Walker, and A. Kadam, “Predicting breast cancer survivability: a comparison of three data mining methods,” Artif. Intell. Med., vol. 34, no. 2, pp. 113–127, Jun. 2005
[14] Jini Marsilin, “An Efficient CBIR Approach for Diagnosing the Stages of Breast Cancer Using KNN Classifier,” Bonfring Int. J. Adv. Image Process., vol. 2, no. 1, pp. 01–05, Mar. 2012
[15] S. Belciug, A-B Salem, F. Gorunescu, and M. Gorunescu, “Clustering-based approach for detecting breast cancer recurrence,” in 2010 10th International Conference on Intelligent Systems Design and Applications, pp. 533–538,Nov. 2010,
[16] M.Lichman, “UC Irvine Machine Learning Repository,” 2015. http://archive.ics.uci.edu/ml.
[17] V. Chaurasia and S. Pal, “Data Mining Techniques: To Predict and Resolve Breast Cancer Survivability,” Int. J. Comput. Sci. Mob. Comput., vol. 3, no. 1, pp. 10–22, 2014.
[18] Christobel, Angeline, and Y. Sivaprakasam, “An empirical comparison of data mining classification methods,” vol. 3, no. 2, 2011
[19] J. Abonyi and F. Szeifert, “Supervised fuzzy clustering for the identification of fuzzy classifiers,” Pattern Recognit. Lett., vol. 24, no. 14, pp. 2195–2207, 2003
[20] D. Lavanya, “Ensemble Decision Tree Classifier For Breast Cancer Data,” Int. J. Inf. Technol. Converg. Serv., vol. 2, no. 1, pp. 17–24, Feb. 2012
[21] Abad, Monica, James Carlisle Genavia, Jaybriel Lincon Somcio, and Larry Vea. "An Innovative Approach on Driver's Drowsiness Detection through Facial Expressions using Decision Tree Algorithms." In 2021 IEEE 12th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), pp. 0571-0576. IEEE, 2021.
[22] J. A. Cruz and D. S. Wishart, “Applications of machine learning in cancer prediction and prognosis,” Cancer Inform., vol. 2, p. 117693510600200030, 2006
[23] A. C. Tan and D. Gilbert, “Ensemble machine learning on gene expression data for cancer classification,” 2003
[24] G. L. Tsirogiannis, D. Frossyniotis, J. Stoitsis, S. Golemati, A. Stafylopatis, and K. S. Nikita, “Classification of medical data with a robust multi-level combination scheme,” in 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541), 2004
[25] G. Sudhamathy, M. Thilagu, and G. Padmavathi, “Comparative analysis of R package classifiers using breast cancer dataset,” Int J Eng Technol, vol. 8, pp. 2127–2136, 2016.
[26] Frank A. UCI machine learning repository. http://archive. ics. uci. edu/ml. 2010.
[27] WHO, WHO position paper on mammography screening. World Health Organization, 2014.
[28] D. Bazazeh and R. Shubair, “Comparative study of machine learning algorithms for breast cancer detection and diagnosis,” in 2016 5th international conference on electronic devices, systems and applications (ICEDSA), pp. 1–4, 2016
[29] I. Castiglioni et al., “AI applications to medical images: From machine learning to deep learning,” Phys. Medica, vol. 83, pp. 9–24, Mar. 2021
[30] R. Abdlaty, J. Hayward, T. Farrell, and Q. Fang, “Skin erythema and pigmentation: a review of optical assessment techniques,” Photodiagnosis Photodyn. Ther., vol. 33, p. 102127, Mar. 2021,
[31] M. R. Islam, M. A. Kabir, A. Ahmed, A. R. M. Kamal, H. Wang, and A. Ulhaq, “Depression detection from social network data using machine learning techniques,” Heal. Inf. Sci. Syst., vol. 6, no. 1, p. 8, Dec. 2018
[32] M. D. Skowronski and J. G. Harris, “Acoustic detection and classification of microchiroptera using machine learning: Lessons learned from automatic speech recognition,” J. Acoust. Soc. Am., vol. 119, no. 3, pp. 1817–1833, Mar. 2006
[33] L. Deng and X. Li, “Machine Learning Paradigms for Speech Recognition: An Overview,” IEEE Trans. Audio. Speech. Lang. Processing, vol. 21, no. 5, pp. 1060–1089, May 2013
[34] “Cardio-Vascular Disease Prediction based on Ensemble Technique Enhanced using Extra Tree Classifier for Feature Selection,” Int. J. Recent Technol. Eng., vol. 8, no. 3, pp. 3236–3242, Sep. 2019
[35] Kaggle.com, “Breast Cancer Wisconsin (Diagnostic) Data Set,” 2021. https://www.kaggle.com/uciml/breast-cancer-wisconsin-data.
[36] S. Zhang, X. Li, M. Zong, X. Zhu, and R. Wang, “Efficient kNN Classification With Different Numbers of Nearest Neighbors,” IEEE Trans. Neural Networks Learn. Syst., vol. 29, no. 5, pp. 1774–1785, May 2018
[37] M. Faisal, A. Scally, R. Howes, K. Beatson, D. Richardson, and M. A. Mohammed, “A comparison of logistic regression models with alternative machine learning methods to predict the risk of in-hospital mortality in emergency medical admissions via external validation,” Health Informatics J., vol. 26, no. 1, pp. 34–44, Mar. 2020
[38] V. Tatan, “Your Beginner Guide to Basic Classification Models: Logistic Regression and SVM,” 2019.
[39] O. Maier, M. Wilms, J. von der Gablentz, U. M. Kramer, T. F. Munte, and H. Handels, “Extra tree forests for sub-acute ischemic stroke lesion segmentation in MR sequences,” J. Neurosci. Methods, vol. 240, pp. 89–100, 2015
[40] S. B. Kotsiantis, “Decision trees: a recent overview,” Artif. Intell. Rev., vol. 39, no. 4, pp. 261–283, 2013
[41] F. Bray, J. Ferlay, I. Soerjomataram, R. L. Siegel, L. A. Torre, and A. Jemal, “Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries,” CA. Cancer J. Clin., vol. 68, no. 6, pp. 394–424, Nov. 2018
[42] J. L. Speiser, M. E. Miller, J. Tooze, and E. Ip, “A comparison of random forest variable selection methods for classification prediction modeling,” Expert Syst. Appl., vol. 134, pp. 93–101, Nov. 2019
[43] J. J. Rodriguez, L. I. Kuncheva, and C. J. Alonso, “Rotation forest: A new classifier ensemble method,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 10, pp. 1619–1630, 2006
[44] M. Maria and C. Yassine, “Machine learning based approaches for modeling the output power of photovoltaic array in real outdoor conditions,” Electronics, vol. 9, no. 2, p. 315, 2020
[45] S. S. Shajahaan, S. Shanthi, and V. ManoChitra, “Application of data mining techniques to model breast cancer data,” Int. J. Emerg. Technol. Adv. Eng., vol. 3, no. 11, pp. 362–369, 2013.
[46] S. Raschka, “An overview of general performance metrics of binary classifier systems,” arXiv Prepr. arXiv1410.5330, 2014
[47] Madooei, Ali, Ramy Mohammed Abdlaty, Lilian Doerwald-Munoz, Joseph Hayward, Mark S. Drew, Qiyin Fang, and Josiane Zerubia. "Hyperspectral image processing for detection and grading of skin erythema." In Medical Imaging 2017: Image Processing, vol. 10133, pp. 577-583. SPIE, 2017.
[48] R. Abdlaty et al., “Hyperspectral Imaging and Classification for Grading Skin Erythema,” Front. Phys., vol. 6, Aug. 2018