انتخاب ویژگی و طبقهبندی سلولهای سرطانی بر پایه دادههای ریزآرایه با استفاده از الگوریتم جستجوی فاخته چندهدفه
الموضوعات :خدیجه کمری 1 , فرزان رشیدی 2 , عبدالله خلیلی 3
1 - دانشگاه هرمزگان
2 - دانشگاه هرمزگان
3 - دانشگاه هرمزگان
الکلمات المفتاحية: انتخاب ویژگیانتخاب نمونهدادهکاویریزآرایهالگوریتم جستجوی فاخته چندهدفهخوشهبندی فازی,
ملخص المقالة :
دادههاي ریزآرایه نقش مؤثری در طبقهبندی و تشخیص انواع بافتهای سرطانی ایفا میکنند. با این حال در پژوهشهای مرتبط با سرطان، تعداد نسبتاً کم نمونهها در مقایسه با تعداد بسیار زیاد ژنها، باعث ایجاد مشکلاتی از قبیل کاهش کارایی طبقهبندها، افزایش هزینههای محاسباتی و پیچیدگی در طبقهبندی سلولهای سرطانی خواهد شد. یک راهکار مناسب جهت افزایش کارایی طبقهبندها، حذف ژنهای نامربوط و انتخاب نمونههای مناسب برای آموزش طبقهبندها است. در این مقاله یک مدل ترکیبی بر پایه الگوریتم بهینهسازی جستجوی فاخته چندهدفه و خوشهبندی فازی برای طبقهبندی دادههای ریزآرایه پیشنهاد شده است. در اين مطالعه از نسخه دودویی الگوريتم جستجوی فاخته چندهدفه به منظور انتخاب ويژگيهاي مرتبط با بیماری و از نسخه پیوسته آن برای انتخاب تعداد نمونههای مناسب برای آموزش طبقهبندها استفاده شده است. به منظور تسریع در فرایند بهینهسازی و جلوگیری از گیرافتادن الگوریتم در بهینههای محلی، راهکارهای ابتکاری جدیدی نیز به الگوریتم اضافه شدهاند. برای بررسی عملکرد مدل پیشنهادی، شبیهسازیهای متعددی بر روی شش مجموعه داده سرطانی انجام گرفته و نتایج آن با دیگر مقالات مقایسه شده است. نتایج به دست آمده نشان میدهند در بسیاری از موارد مدل پیشنهادی قادر است در مقایسه با سایر روشها، با انتخاب مجموعه کوچکتری از ژنهای متمایز، منجر به افزایش کارایی طبقهبندها شود.
[1] V. Bolon-Canedo, N. Sanchez-Marono, A. Alonso-Betanzos, J. M. Benitez, and F. Herrera, "A review of microarray datasets and applied feature selection methods," Information Sciences, vol. 282, pp. 111-135, 20 Oct. 2014.
[2] M. Dashtban and M. Balafar, "Gene selection for microarray cancer classification using a new evolutionary method employing artificial intelligence concepts," Genomics, vol. 109, no. 2, pp. 91-107, Mar. 2017.
[3] I. Jain, V. K. Jain, and R. Jain, "Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification," Applied Soft Computing, vol. 62, pp. 203-215, Jan. 2018.
[4] G. Ditzler, R. Polikar, and G. Rosen, "A sequential learning approach for scaling up filter-based feature subset selection," IEEE Trans. on Neural Networks and Learning Systems, vol. 29, no. 6, pp. 2530-2544, Jun 2017.
[5] S. Sayed, M. Nassef, A. Badr, and I. Farag, "A nested genetic algorithm for feature selection in high-dimensional cancer microarray datasets," Expert Systems with Applications, vol. 121, pp. 233-243, May 2019.
[6] G. I. Sayed, A. E. Hassanien, and A. T. Azar, "Feature selection via a novel chaotic crow search algorithm," Neural Computing and Applications, vol. 31, pp. 171-188, 2019.
[7] B. Cao, J. Zhao, P. Yang, P. Yang, X. Liu, J. Qi, et al., "Multiobjective feature selection for microarray data via distributed parallel algorithms," Future Generation Computer Systems, vol. 100, no. 2, pp. 952-981, Nov. 2019.
[8] A. K. Das, S. K. Pati, and A. Ghosh, "Relevant feature selection and ensemble classifier design using bi-objective genetic algorithm," Knowledge and Information Systems, vol. 62, no. 2, pp. 1-33, Feb. 2019.
[9] X. Li and M. Yin, "Multiobjective binary biogeography based optimization for feature selection using gene expression data," IEEE Trans. on NanoBioscience, vol. 12, no. 4, pp. 343-353, Dec. 2013.
[10] A. Joshi, O. Kulkarni, G. Kakandikar, and V. Nandedkar, "Cuckoo search optimization-a review," in Materials Today: Proc., vol. 4, pp. 7262-7269, Jan. 2017.
[11] N. Kwak and C. H. Choi, "Input feature selection for classification problems," IEEE Trans. on Neural Networks, vol. 13, no. 1, pp. 143-159, Jan. 2002.
[12] B. K. Singh, K. Verma, and A. Thoke, "Fuzzy cluster based neural network classifier for classifying breast tumors in ultrasound images," Expert Systems with Applications, vol. 66, pp. 114-123, Dec. 2016.
[13] H. Uguz, "A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm," Knowledge-Based Systems, vol. 24, no. 7, pp. 1024-1032, Oct. 2011.
[14] C. Lee and G. G. Lee, "Information gain and divergence-based feature selection for machine learning-based text categorization," Information Processing & Management, vol. 42, no. 1, pp. 155-165, Jan. 2006.
[15] L. Yu and H. Liu, "Feature selection for high-dimensional data: a fast correlation-based filter solution," in Proc. of the 20th Int. Conf. on Machine Learning, ICML'03, pp. 856-863, Washington DC, USA, ???. 2003.
[16] M. Robnik-Sikonja and I. Kononenko, "Theoretical and empirical analysis of ReliefF and RReliefF," Machine Learning, vol. 53, pp. 23-69, Oct. 2003.
[17] P. E. Meyer, C. Schretter, and G. Bontempi, "Information-theoretic feature selection in microarray data using variable complementarity," IEEE J. of Selected Topics in Signal Processing, vol. 2, no. 3, pp. 261-274, Jun. 2008.
[18] L. Lan and S. Vucetic, "Improving accuracy of microarray classification by a simple multi-task feature selection filter," International J. of Data Mining and Bioinformatics, vol. 5, no. 2, pp. 189-208, Jan. 2011.
[19] J. Wang, L. Wu, J. Kong, Y. Li, and B. Zhang, "Maximum weight and minimum redundancy: a novel framework for feature subset selection," Pattern Recognition, vol. 46, no. 6, pp. 1616-1627, Jun. 2013.
[20] N. Garcia-Pedrajas and J. Perez-Rodriguez, "Multi-selection of instances: a straightforward way to improve evolutionary instance selection," Applied Soft Computing, vol. 12, no. 11, pp. 3590-3602, Nov. 2012.
[21] V. Bolon-Canedo, N. Sanchez-Marono, and A. Alonso-Betanzos, "A review of feature selection methods on synthetic data," Knowledge and Information Systems, vol. 34, no. 1, pp. 483-519, Mar. 2013.
[22] D. Keco, A. Subasi, and J. Kevric, "Cloud computing-based parallel genetic algorithm for gene selection in cancer classification," Neural Computing and Applications, vol. 30, no. 5, pp. 1601-1610, Sept. 2018.
[23] M. M. Mafarja and S. Mirjalili, "Hybrid whale optimization algorithm with simulated annealing for feature selection," Neurocomputing, vol. 260, pp. 302-312, Oct. 2017.
[24] K. H. Chen, K. J. Wang, K. M. Wang, and M. A. Angelia, "Applying particle swarm optimization-based decision tree classifier for cancer classification on gene expression data," Applied Soft Computing, vol. 24, pp. 773-780, Nov. 2014.
[25] L. Y. Chuang, C. S. Yang, K. C. Wu, and C. H. Yang, "Gene selection and classification using Taguchi chaotic binary particle swarm optimization," Expert Systems with Applications, vol. 38, no. 10, pp. 13367-13377, Sept. 2011.
[26] A. Ghosh, A. Datta, and S. Ghosh, "Self-adaptive differential evolution for feature selection in hyperspectral image data," Applied Soft Computing, vol. 13, no. 4, pp. 1969-1977, Apr. 2013.
[27] S. Nakariyakul, "A hybrid gene selection algorithm based on interaction information for microarray-based cancer classification," PloS ONE, vol. 14, no. 2, Article No.: e0212333, 15 Feb. 2019.
[28] X. Han, D. Li, P. Liu, and L. Wang, "Feature selection by recursive binary gravitational search algorithm optimization for cancer classification," Soft Computing, vol. 24, pp. 1-19, 2020.
[29] B. Liu, M. Tian, C. Zhang, and X. Li, "Discrete biogeography based optimization for feature selection in molecular signatures," Molecular Informatics, vol. 34, no. 4, pp. 197-215, Apr. 2015.
[30] S. Yazdani, J. Shanbehzadeh, and E. Aminian, "Feature subset selection using constrained binary/integer biogeography-based optimization," ISA Trans., vol. 52, no. 3, pp. 383-390, Mar. 2013.
[31] D. Rodrigues, L. A. Pereira, R. Y. Nakamura, K. A. Costa, X. S. Yang, A. N. Souza, et al., "A wrapper approach for feature selection based on bat algorithm and optimum-path forest," Expert Systems with Applications, vol. 41, no. 5, pp. 2250-2258, May 2014.
[32] A. Sharma, S. Imoto, and S. Miyano, "A top-r feature selection algorithm for microarray gene expression data," IEEE/ACM Trans. on Computational Biology and Bioinformatics, vol. 9, no. 3, pp. 754-764, Nov. 2011.
[33] L. Song, A. Smola, A. Gretton, J. Bedo, and K. Borgwardt, "Feature selection via dependence maximization," J. of Machine Learning Research, vol. 13, pp. 1393-1434, May 2012.
[34] P. A. Mundra and J. C. Rajapakse, "SVM-RFE with MRMR filter for gene selection," IEEE Trans. on Nanobioscience, vol. 9, pp. 31-37, Oct. 2009.
[35] N. Hoque, D. K. Bhattacharyya, and J. K. Kalita, "MIFS-ND: a mutual information-based feature selection method," Expert Systems with Applications, vol. 41, no. 14, pp. 6371-6385, Oct. 2014.
[36] H. H. Hsu, C. W. Hsieh, and M. D. Lu, "Hybrid feature selection by combining filters and wrappers," Expert Systems with Applications, vol. 38, no. 7, pp. 8144-8150, Jul. 2011.
[37] Y. Ye, Q. Wu, J. Z. Huang, M. K. Ng, and X. Li, "Stratified sampling for feature subspace selection in random forests for high dimensional data," Pattern Recognition, vol. 46, no. 3, pp. 769-787, Mar. 2013.
[38] M. K. Ebrahimpour and M. Eftekhari, "Ensemble of feature selection methods: a hesitant fuzzy sets approach," Applied Soft Computing, vol. 50, no. C, pp. 300-312, Jan. 2017.
[39] H. Peng, F. Long, and C. Ding, "Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy," IEEE Trans. on Pattern Analysis & Machine Intelligence, vol. 27, no. 8, pp. 1226-1238, Jun. 2005.
[40] A. Bai and A. Pradhan, "An efficient approach for feature extraction and classification of microarray cancer data," International J. of Computational Intelligence Studies, vol. 3, no. 4, pp. 339-355, Jan. 2014.
[41] X. S. Yang and S. Deb, "Cuckoo search: recent advances and applications," Neural Computing and Applications, vol. 24, no. 1, pp. 169-174, Jan. 2014.
[42] L. Lin and M. Gen, "Auto-tuning strategy for evolutionary algorithms: balancing between exploration and exploitation," Soft Computing, vol. 13, no. 2, pp. 157-168, Jan. 2009.
[43] H. Rashidi and J. Khorshidi, "Exergoeconomic analysis and optimization of a solar based multigeneration system using multiobjective differential evolution algorithm," J. of Cleaner Production, vol. 170, pp. 978-990, Jan. 2018.
[44] X. S. Yang and S. Deb, "Multiobjective cuckoo search for design optimization," Computers & Operations Research, vol. 40, no. 6, pp. 1616-1624, Jun. 2013.
[45] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, "A fast and elitist multiobjective genetic algorithm: NSGA-II," IEEE Trans. on Evolutionary Computation, vol. 6, no. 2, pp. 182-197, Aug. 2002.
[46] J. A. Olvera-Lopez, J. A. Carrasco-Ochoa, J. F. Martinez-Trinidad, and J. Kittler, "A review of instance selection methods," Artificial Intelligence Review, vol. 34, no. 2, pp. 133-143, Aug. 2010.
[47] Y. Chen, J. Bi, and J. Z. Wang, "MILES: multiple-instance learning via embedded instance selection," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 28, no. 12, pp. 1931-1947, Oct. 2006.
[48] L. Vendramin, R. J. Campello, and E. R. Hruschka, "Relative clustering validity criteria: a comparative overview," Statistical Analysis and Data Mining: the ASA Data Science J., vol. 3, no. 4, pp. 209-235, Aug.. 2010.
[49] R. J. Campello and E. R. Hruschka, "A fuzzy extension of the silhouette width criterion for cluster analysis," Fuzzy Sets and Systems, vol. 157, no. 21, pp. 2858-2875, Nov. 2006.
[50] Y. Saeys, I. Inza, and P. Larranaga, "A review of feature selection techniques in bioinformatics," Bioinformatics, vol. 23, no. 19, pp. 2507-2517, Oct. 2007.
[51] Z. Zhu, Y. S. Ong, and M. Dash, "Markov blanket-embedded genetic algorithm for gene selection," Pattern Recognition, vol. 40, no. 11, pp. 3236-3248, Nov. 2007.
[52] A. A. Aburomman and M. B. I. Reaz, "A novel SVM-kNN-PSO ensemble method for intrusion detection system," Applied Soft Computing, vol. 38, pp. 360-372, Jan. 2016.
[53] D. H. Mazumder and R. Veilumuthu, "An enhanced feature selection filter for classification of microarray cancer data," ETRI J., vol. 41, no. 3, pp. 358-370, Jun. 2019.