Feature Selection and Cancer Classification Based on Microarray Data Using Multi-Objective Cuckoo Search Algorithm
Subject Areas : electrical and computer engineeringkh. Kamari 1 , f. rashidi 2 , a. Khalili 3
1 -
2 -
3 - Hormozgan University
Keywords: Feature selectioninstance selectionmicroarraymulti-objective Cuckoo search algorithmfuzzy clustering,
Abstract :
Microarray datasets have an important role in identification and classification of the cancer tissues. In cancer researches, having a few samples of microarrays in cancer researches is one of the most concerns which lead to some problems in designing the classifiers. Moreover, due to the large number of features in microarrays, feature selection and classification are even more challenging for such datasets. Not all of these numerous features contribute to the classification task, and some even impede performance. Hence, appropriate gene selection method can significantly improve the performance of cancer classification. In this paper, a modified multi-objective cuckoo search algorithm is used to feature selection and sample selection to find the best available solutions. For accelerating the optimization process and preventing local optimum trapping, new heuristic approaches are included to the original algorithm. The proposed algorithm is applied on six cancer datasets and its results are compared with other existing methods. The results show that the proposed method has higher accuracy and validity in comparison to other existing approaches and is able to select the small subset of informative genes in order to increase the classification accuracy.
[1] V. Bolon-Canedo, N. Sanchez-Marono, A. Alonso-Betanzos, J. M. Benitez, and F. Herrera, "A review of microarray datasets and applied feature selection methods," Information Sciences, vol. 282, pp. 111-135, 20 Oct. 2014.
[2] M. Dashtban and M. Balafar, "Gene selection for microarray cancer classification using a new evolutionary method employing artificial intelligence concepts," Genomics, vol. 109, no. 2, pp. 91-107, Mar. 2017.
[3] I. Jain, V. K. Jain, and R. Jain, "Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification," Applied Soft Computing, vol. 62, pp. 203-215, Jan. 2018.
[4] G. Ditzler, R. Polikar, and G. Rosen, "A sequential learning approach for scaling up filter-based feature subset selection," IEEE Trans. on Neural Networks and Learning Systems, vol. 29, no. 6, pp. 2530-2544, Jun 2017.
[5] S. Sayed, M. Nassef, A. Badr, and I. Farag, "A nested genetic algorithm for feature selection in high-dimensional cancer microarray datasets," Expert Systems with Applications, vol. 121, pp. 233-243, May 2019.
[6] G. I. Sayed, A. E. Hassanien, and A. T. Azar, "Feature selection via a novel chaotic crow search algorithm," Neural Computing and Applications, vol. 31, pp. 171-188, 2019.
[7] B. Cao, J. Zhao, P. Yang, P. Yang, X. Liu, J. Qi, et al., "Multiobjective feature selection for microarray data via distributed parallel algorithms," Future Generation Computer Systems, vol. 100, no. 2, pp. 952-981, Nov. 2019.
[8] A. K. Das, S. K. Pati, and A. Ghosh, "Relevant feature selection and ensemble classifier design using bi-objective genetic algorithm," Knowledge and Information Systems, vol. 62, no. 2, pp. 1-33, Feb. 2019.
[9] X. Li and M. Yin, "Multiobjective binary biogeography based optimization for feature selection using gene expression data," IEEE Trans. on NanoBioscience, vol. 12, no. 4, pp. 343-353, Dec. 2013.
[10] A. Joshi, O. Kulkarni, G. Kakandikar, and V. Nandedkar, "Cuckoo search optimization-a review," in Materials Today: Proc., vol. 4, pp. 7262-7269, Jan. 2017.
[11] N. Kwak and C. H. Choi, "Input feature selection for classification problems," IEEE Trans. on Neural Networks, vol. 13, no. 1, pp. 143-159, Jan. 2002.
[12] B. K. Singh, K. Verma, and A. Thoke, "Fuzzy cluster based neural network classifier for classifying breast tumors in ultrasound images," Expert Systems with Applications, vol. 66, pp. 114-123, Dec. 2016.
[13] H. Uguz, "A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm," Knowledge-Based Systems, vol. 24, no. 7, pp. 1024-1032, Oct. 2011.
[14] C. Lee and G. G. Lee, "Information gain and divergence-based feature selection for machine learning-based text categorization," Information Processing & Management, vol. 42, no. 1, pp. 155-165, Jan. 2006.
[15] L. Yu and H. Liu, "Feature selection for high-dimensional data: a fast correlation-based filter solution," in Proc. of the 20th Int. Conf. on Machine Learning, ICML'03, pp. 856-863, Washington DC, USA, ???. 2003.
[16] M. Robnik-Sikonja and I. Kononenko, "Theoretical and empirical analysis of ReliefF and RReliefF," Machine Learning, vol. 53, pp. 23-69, Oct. 2003.
[17] P. E. Meyer, C. Schretter, and G. Bontempi, "Information-theoretic feature selection in microarray data using variable complementarity," IEEE J. of Selected Topics in Signal Processing, vol. 2, no. 3, pp. 261-274, Jun. 2008.
[18] L. Lan and S. Vucetic, "Improving accuracy of microarray classification by a simple multi-task feature selection filter," International J. of Data Mining and Bioinformatics, vol. 5, no. 2, pp. 189-208, Jan. 2011.
[19] J. Wang, L. Wu, J. Kong, Y. Li, and B. Zhang, "Maximum weight and minimum redundancy: a novel framework for feature subset selection," Pattern Recognition, vol. 46, no. 6, pp. 1616-1627, Jun. 2013.
[20] N. Garcia-Pedrajas and J. Perez-Rodriguez, "Multi-selection of instances: a straightforward way to improve evolutionary instance selection," Applied Soft Computing, vol. 12, no. 11, pp. 3590-3602, Nov. 2012.
[21] V. Bolon-Canedo, N. Sanchez-Marono, and A. Alonso-Betanzos, "A review of feature selection methods on synthetic data," Knowledge and Information Systems, vol. 34, no. 1, pp. 483-519, Mar. 2013.
[22] D. Keco, A. Subasi, and J. Kevric, "Cloud computing-based parallel genetic algorithm for gene selection in cancer classification," Neural Computing and Applications, vol. 30, no. 5, pp. 1601-1610, Sept. 2018.
[23] M. M. Mafarja and S. Mirjalili, "Hybrid whale optimization algorithm with simulated annealing for feature selection," Neurocomputing, vol. 260, pp. 302-312, Oct. 2017.
[24] K. H. Chen, K. J. Wang, K. M. Wang, and M. A. Angelia, "Applying particle swarm optimization-based decision tree classifier for cancer classification on gene expression data," Applied Soft Computing, vol. 24, pp. 773-780, Nov. 2014.
[25] L. Y. Chuang, C. S. Yang, K. C. Wu, and C. H. Yang, "Gene selection and classification using Taguchi chaotic binary particle swarm optimization," Expert Systems with Applications, vol. 38, no. 10, pp. 13367-13377, Sept. 2011.
[26] A. Ghosh, A. Datta, and S. Ghosh, "Self-adaptive differential evolution for feature selection in hyperspectral image data," Applied Soft Computing, vol. 13, no. 4, pp. 1969-1977, Apr. 2013.
[27] S. Nakariyakul, "A hybrid gene selection algorithm based on interaction information for microarray-based cancer classification," PloS ONE, vol. 14, no. 2, Article No.: e0212333, 15 Feb. 2019.
[28] X. Han, D. Li, P. Liu, and L. Wang, "Feature selection by recursive binary gravitational search algorithm optimization for cancer classification," Soft Computing, vol. 24, pp. 1-19, 2020.
[29] B. Liu, M. Tian, C. Zhang, and X. Li, "Discrete biogeography based optimization for feature selection in molecular signatures," Molecular Informatics, vol. 34, no. 4, pp. 197-215, Apr. 2015.
[30] S. Yazdani, J. Shanbehzadeh, and E. Aminian, "Feature subset selection using constrained binary/integer biogeography-based optimization," ISA Trans., vol. 52, no. 3, pp. 383-390, Mar. 2013.
[31] D. Rodrigues, L. A. Pereira, R. Y. Nakamura, K. A. Costa, X. S. Yang, A. N. Souza, et al., "A wrapper approach for feature selection based on bat algorithm and optimum-path forest," Expert Systems with Applications, vol. 41, no. 5, pp. 2250-2258, May 2014.
[32] A. Sharma, S. Imoto, and S. Miyano, "A top-r feature selection algorithm for microarray gene expression data," IEEE/ACM Trans. on Computational Biology and Bioinformatics, vol. 9, no. 3, pp. 754-764, Nov. 2011.
[33] L. Song, A. Smola, A. Gretton, J. Bedo, and K. Borgwardt, "Feature selection via dependence maximization," J. of Machine Learning Research, vol. 13, pp. 1393-1434, May 2012.
[34] P. A. Mundra and J. C. Rajapakse, "SVM-RFE with MRMR filter for gene selection," IEEE Trans. on Nanobioscience, vol. 9, pp. 31-37, Oct. 2009.
[35] N. Hoque, D. K. Bhattacharyya, and J. K. Kalita, "MIFS-ND: a mutual information-based feature selection method," Expert Systems with Applications, vol. 41, no. 14, pp. 6371-6385, Oct. 2014.
[36] H. H. Hsu, C. W. Hsieh, and M. D. Lu, "Hybrid feature selection by combining filters and wrappers," Expert Systems with Applications, vol. 38, no. 7, pp. 8144-8150, Jul. 2011.
[37] Y. Ye, Q. Wu, J. Z. Huang, M. K. Ng, and X. Li, "Stratified sampling for feature subspace selection in random forests for high dimensional data," Pattern Recognition, vol. 46, no. 3, pp. 769-787, Mar. 2013.
[38] M. K. Ebrahimpour and M. Eftekhari, "Ensemble of feature selection methods: a hesitant fuzzy sets approach," Applied Soft Computing, vol. 50, no. C, pp. 300-312, Jan. 2017.
[39] H. Peng, F. Long, and C. Ding, "Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy," IEEE Trans. on Pattern Analysis & Machine Intelligence, vol. 27, no. 8, pp. 1226-1238, Jun. 2005.
[40] A. Bai and A. Pradhan, "An efficient approach for feature extraction and classification of microarray cancer data," International J. of Computational Intelligence Studies, vol. 3, no. 4, pp. 339-355, Jan. 2014.
[41] X. S. Yang and S. Deb, "Cuckoo search: recent advances and applications," Neural Computing and Applications, vol. 24, no. 1, pp. 169-174, Jan. 2014.
[42] L. Lin and M. Gen, "Auto-tuning strategy for evolutionary algorithms: balancing between exploration and exploitation," Soft Computing, vol. 13, no. 2, pp. 157-168, Jan. 2009.
[43] H. Rashidi and J. Khorshidi, "Exergoeconomic analysis and optimization of a solar based multigeneration system using multiobjective differential evolution algorithm," J. of Cleaner Production, vol. 170, pp. 978-990, Jan. 2018.
[44] X. S. Yang and S. Deb, "Multiobjective cuckoo search for design optimization," Computers & Operations Research, vol. 40, no. 6, pp. 1616-1624, Jun. 2013.
[45] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, "A fast and elitist multiobjective genetic algorithm: NSGA-II," IEEE Trans. on Evolutionary Computation, vol. 6, no. 2, pp. 182-197, Aug. 2002.
[46] J. A. Olvera-Lopez, J. A. Carrasco-Ochoa, J. F. Martinez-Trinidad, and J. Kittler, "A review of instance selection methods," Artificial Intelligence Review, vol. 34, no. 2, pp. 133-143, Aug. 2010.
[47] Y. Chen, J. Bi, and J. Z. Wang, "MILES: multiple-instance learning via embedded instance selection," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 28, no. 12, pp. 1931-1947, Oct. 2006.
[48] L. Vendramin, R. J. Campello, and E. R. Hruschka, "Relative clustering validity criteria: a comparative overview," Statistical Analysis and Data Mining: the ASA Data Science J., vol. 3, no. 4, pp. 209-235, Aug.. 2010.
[49] R. J. Campello and E. R. Hruschka, "A fuzzy extension of the silhouette width criterion for cluster analysis," Fuzzy Sets and Systems, vol. 157, no. 21, pp. 2858-2875, Nov. 2006.
[50] Y. Saeys, I. Inza, and P. Larranaga, "A review of feature selection techniques in bioinformatics," Bioinformatics, vol. 23, no. 19, pp. 2507-2517, Oct. 2007.
[51] Z. Zhu, Y. S. Ong, and M. Dash, "Markov blanket-embedded genetic algorithm for gene selection," Pattern Recognition, vol. 40, no. 11, pp. 3236-3248, Nov. 2007.
[52] A. A. Aburomman and M. B. I. Reaz, "A novel SVM-kNN-PSO ensemble method for intrusion detection system," Applied Soft Computing, vol. 38, pp. 360-372, Jan. 2016.
[53] D. H. Mazumder and R. Veilumuthu, "An enhanced feature selection filter for classification of microarray cancer data," ETRI J., vol. 41, no. 3, pp. 358-370, Jun. 2019.