انتخاب ویژگی چندبرچسبه با استفاده از راهکار ترکیبی مبتنی بر الگوریتم بهینهسازی ازدحام ذرات
الموضوعات :آذر رفیعی 1 , پرهام مرادی 2 , عبدالباقی قادرزاده 3
1 - دانشگاه آزاد اسلامی واحد سنندج
2 - دانشگاه کردستان
3 - دانشگاه آزاد اسلامی واحد سنندج
الکلمات المفتاحية: انتخاب ویژگی, طبقهبندی چندبرچسبی, استراتژی جستجوی محلی, هوش جمعی, بهینهسازی ازدحام ذرات,
ملخص المقالة :
طبقهبندی چندبرچسبی یکی از مسائل مهم در یادگیری ماشین است که کارایی الگوریتمهای این طبقهبندی با افزایش ابعاد مسأله به شدت کاهش مییابد. انتخاب ویژگی، یکی از راهکارهای اصلی برای کاهش ابعاد در مسائل چندبرچسبی است. انتخاب ویژگی چندبرچسبی یک راهکار NP Hard است و تا کنون تعدادی راهکار مبتنی بر هوش جمعی و الگوریتمهای تکاملی برای آن ارائه شده است. افزایش ابعاد مسأله منجر به افزایش فضای جستجو و به تبع، کاهش کارایی و همچنین کاهش سرعت همگرایی این الگوریتمها میشود. در این مقاله یک راهکار هوش جمعی ترکیبی مبتنی الگوریتم دودویی بهینهسازی ازدحام ذرات و استراتژی جستجوی محلی برای انتخاب ویژگی چندبرچسبی ارائه شده است. برای افزایش سرعت همگرایی، در استراتژی جستجوی محلی، ویژگیها بر اساس میزان افزونهبودن و میزان ارتباط با خروجی مسأله به دو دسته تقسیمبندی میشوند. دسته اول را ویژگیهایی تشکیل میدهند که شباهت زیادی به کلاس مسأله و شباهت کمتری به سایر ویژگیها دارند و دسته دوم هم ویژگیهای افزونه و کمتر مرتبط است. بر این اساس، یک اپراتور محلی به الگوریتم بهینهسازی ازدحام ذرات اضافه شده که منجر به کاهش ویژگیهای غیر مرتبط و افزونه هر جواب میشود. اعمال این اپراتور منجر به افزایش سرعت همگرایی الگوریتم پیشنهادی در مقایسه با سایر الگوریتمهای ارائهشده در این زمینه میشود. عملکرد روش پیشنهادی با شناختهشدهترین روشهای انتخاب ویژگی، بر روی مجموعه دادههای مختلف مقایسه گردیده است. نتایج آزمایشها نشان دادند که روش پیشنهادی از نظر دقت، دارای عملکردی مناسب است.
[1] م. رحمانی¬نیا و پ. مرادی، "يك الگوريتم انتخاب ويژگي برخط در جريان داده¬ها با استفاده از اطلاعات متقابل چندمتغيره،" نشریه مهندسی برق و مهندسی كامپیوتر ایران، ب- مهندسی کامپیوتر، سال 18، شماره 4، صص. 336-327، زمستان 1399.
[2] Y. Lin, Q. Hu, J. Liu, J. Chen, and J. Duan, "Multi-label feature selection based on neighborhood mutual information," Applied Soft Computing, vol. 38, pp. 244-256, Jan. 2016.
[3] O. Reyes, C. Morell, and S. Ventura, "Scalable extensions of the ReliefF algorithm for weighting and selecting features on the multi-label learning context," Neurocomputing, vol. 161, pp. 168-182, Aug. 2015.
[4] L. Li, et al., "Multi-label feature selection via information gain," in Proc. Int. Conf. on Advanced Data Mining and Applications, ADMA'14, pp. 345-355, Guilin, China, 19-21 Dec. 2014.
[5] Y. Lin, Q. Hu, J. Liu, and J. Duan, "Multi-label feature selection based on max-dependency and min-redundancy," Neurocomputing, vol. 168, pp. 92-103, Nov. 2015.
[6] S. Tabakhi and P. Moradi, "Relevance-redundancy feature selection based on ant colony optimization," Pattern Recognition, vol. 48, no. 9, pp. 2798-2811, Sept. 2015.
[7] P. Moradi and M. Rostami, "Integration of graph clustering with ant colony optimization for feature selection," Knowledge-Based Systems, vol. 84, pp. 144-161, Aug. 2015.
[8] J. Lee and D. W. Kim, "Memetic feature selection algorithm for multi-label classification," Information Sciences, vol. 293, pp. 80-96, Feb. 2015.
[9] Y. Yu and Y. Wang, "Feature selection for multi-label learning using mutual information and GA," in Proc. 9th Int. Conf. on Rough Sets and Knowledge Technology, RSKT'14, pp. 454-463, Shanghai, China, 24-26 Oct. 2014.
[10] Y. Zhang, D. W. Gong, X. Y. Sun, and Y. N. Guo, "A PSO- based multi-objective multi-label feature selection method in classification," Scientific Reports, vol. 7, Article ID: 376, Mar. 2017.
[11] M. L. Zhang, J. M. Peña, and V. Robles, "Feature selection for multi-label naive bayes classification," Information Sciences, vol. 179, no. 19, pp. 3218-3229, Sept. 2009.
[12] M. A. Khan, A. Ekbal, E. L. Mencía, and J. Fürnkranz, "Multi-objective optimisation-based feature selection for multi-label classification," in Proc. Int. Conf. on Applications of Natural Language to Information Systems, NLDB'17, pp. 38-41, Liege, Belgium, 21-23 Jun. 2017.
[13] M. You, J. Liu, G. Z. Li, and Y. Chen, "Embedded feature selection for multi-label classification of music emotions," International J. of Computational Intelligence Systems, vol. 5, no. 4, pp. 668-678, Aug. 2012.
[14] P. Zhu, Q. Xu, Q. Hu, C. Zhang, and H. Zhao, "Multi-label feature selection with missing labels," Pattern Recognition, vol. 74, pp. 488-502, Feb. 2018.
[15] S. Tabakhi, A. Najafi, R. Ranjbar, and P. Moradi, "Gene selection for microarray data classification using a novel ant colony optimization," Neurocomputing, vol. 168, pp. 1024-1036, Nov. 2015.
[16] R. K. Sivagaminathan and S. Ramakrishnan, "A hybrid approach for feature subset selection using neural networks and ant colony optimization," Expert Systems with Applications, vol. 33, no. 1, pp. 49-60, Jul. 2007.
[17] M. H. Aghdam, N. Ghasem-Aghaee, and M. E. Basiri, "Text feature selection using ant colony optimization," Expert Systems with Applications, vol. 36, no. 3, pt. 2, pp. 6843-6853, Apr. 2009.
[18] M. Paniri, M. B. Dowlatshahi, and H. Nezamabadi-pour, "MLACO: a multi-label feature selection algorithm based on ant colony optimization," Knowledge-Based Systems, vol. 192, Article ID: 105285, Mar. 2020.
[19] J. Yang and V. Honavar, "Feature subset selection using a genetic algorithm," IEEE Intelligent Systems, vol. 13, no. 2, pp. 117-136, Mar. 1998.
[20] M. Rostami and P. Moradi, "A clustering based genetic algorithm for feature selection," in Proc. 6th Conf. on Information and Knowledge Technology, IKT'14, pp. 112-116, Shahrood, Iran, 27-29 May. 2014.
[21] T. M. Hamdani, J. M. Won, A. M. Alimi, and F. Karray, "Hierarchical genetic algorithm with new evaluation function and bi-coded representation for the selection of features considering their confidence rate," Applied Soft Computing, vol. 11, no. 2, pp. 2501-2509, Mar. 2011.
[22] S. W. Lin, Z. J. Lee, S. C. Chen, and T. Y. Tseng, "Parameter determination of support vector machine and feature selection using simulated annealing approach," Applied Soft Computing, vol. 8, no. 4, pp. 1505-1512, Sep. 2008.
[23] S. W. Lin, T. Y. Tseng, S. Y. Chou, and S. C. Chen, "A simulated-annealing-based approach for simultaneous parameter optimization and feature selection of back-propagation networks," Expert Systems with Applications, vol. 34, no. 2, pp. 1491-1499, Feb. 2008.
[24] L. Y. Chuang, C. H. Yang, and J. C. Li, "Chaotic maps based on binary particle swarm optimization for feature selection," Applied Soft Computing, vol. 11, no. 1, pp. 239-248, Jan. 2011.
[25] Y. Liu, et al., "An improved particle swarm optimization for feature selection," J. of Bionic Engineering, vol. 8, no. 2, pp. 191-200, Jun. 2011.
[26] B. Xue, M. Zhang, and W. N. Browne, "Particle swarm optimisation for feature selection in classification: novel initialisation and updating mechanisms," Applied Soft Computing, vol. 18, pp. 261-276, May 2014.
[27] H. M. Abdelsalam and A. M. Mohamed, "Optimal sequencing of design projects' activities using discrete particle swarm optimisation," International J. of Bio-Inspired Computation, vol. 4, no. 2, pp. 100-110, 2012.
[28] K. Demir, B. H. Nguyen, B. Xue, and M. Zhang, " Particle swarm optimisation for sparsity-based feature selection in multi-label classification," in Proc. of the Genetic and Evolutionary Computation Conf. Companion, pp. 232-235, Boston, MA, USA, 9-13 Jul. 2022.
[29] J. Lee and D. W. Kim, "Mutual information-based multi-label feature selection using interaction information," Expert Systems with Applications, vol. 42, no. 4, pp. 2013-2025, Mar. 2015.
[30] W. Chen, J. Yan, B. Zhang, Z. Chen, and Q. Yang, "Document transformation for multi-label feature selection in text categorization," in Proc of 7th IEEE Int. Conf. on Data Mining, ICDM'07, vol. ???, pp. 451-456, Omaha, NE, USA, 28-31 Oct. 2007.
[31] N. Spolaôr, E. A. Cherman, M. C. Monard, and H. D. Lee, "A comparison of multi-label feature selection methods using the problem transformation approach," Electronic Notes in Theoretical Computer Science, vol. 292, pp. 135-151, Mar. 2013.
[32] G. Doquire and M. Verleysen, "Feature selection for multi-label classification problems," in Proc of Int. Work-Conf. on Artificial Neural Networks, IWANN'11, pp. 9-16, Torremolinos-Málaga, Spain, 8-10 Jun. 2011.
[33] G. Doquire and M. Verleysen, "Mutual information-based feature selection for multilabel classification," Neurocomputing, vol. 122, pp. 148-155, Dec. 2013.
[34] J. Lee and D. W. Kim, "Fast multi-label feature selection based on information-theoretic feature ranking," Pattern Recognition, vol. 48, no. 9, pp. 2761-2771, Sept. 2015.
[35] J. Read, B. Pfahringer, and G. Holmes, "Multi-label classification using ensembles of pruned sets," in Proc of 8th IEEE Int. Conf. on Data Mining, pp. 995-1000, Pisa, Italy, 15-19 Dec. 2008.
[36] A. Hashemi, M. B. Dowlatshahi, and H. Nezamabadi-pour, "MGFS: a multi-label graph-based feature selection algorithm via PageRank centrality," Expert Systems with Applications, vol. 142, Article ID: 113024, Mar. 2020.
[37] Z. Sun, et al., "Mutual information based multi-label feature selection via constrained convex optimization," Neurocomputing, vol. 329, pp. 447-456, Feb. 2019.
[38] J. Kennedy and R. Eberhart, "Particle swarm optimization," in Proc. of Int. Conf. on Neural Networks, ICNN'95, vol. 4, pp. 1942-1948, Perth, Australia, 27 Nov.-1 Dec. 1995.
[39] ح. افراخته و پ. مرادی، "روشی جدید بهمنظور خوشهبندی دادههای سرعت باد در نیروگاههای بادی با استفاده از الگوریتمهای FCM و PSO ،" نشریه مهندسی برق و مهندسی كامپیوتر ایران، ب- مهندسی کامپیوتر، سال 8، شماره 3، صص. 214-210، پاییز 1389.
[40] M. M. Kabir, M. Shahjahan, and K. Murase, "A new local search based hybrid genetic algorithm for feature selection," Neurocomputing, vol. 74, no. 17, pp. 2914-2928, Oct. 2011.
[41] D. P. Muni, N. R. Pal, and J. Das, Genetic Programming for Simultaneous Feature Selection and Classifier Design, 2006.
[42] M. M. Kabir, M. M. Islam, and K. Murase, "A new wrapper feature selection approach using neural network," Neurocomputing, vol. 73, pp. 3273-3283, Oct. 2010.
[43] P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl, "GroupLens: an open architecture for collaborative filtering of netnews," in Proc. of the ACM Conf. on Computer Supported Cooperative Work, CSCW'94, pp. 175-186, Chapel Hill, NC, USA, 22-26 Oct. 1994.
[44] X. He, D. Cai, and P. Niyogi, "Laplacian score for feature selection," in Proc. of the 18th Int. Conf. on Neural Information Processing Systems, NIPS'05, pp. 507-514, Vancouver, Canada, 5-8 Dec. 2005.
[45] M. Stone, "Cross‐validatory choice and assessment of statistical predictions," J. of the Royal Statistical Society: Series B (Methodological), vol. 36, pp. 111-133, 1974.
[46] M. L. Zhang and Z. H. Zhou, "ML-KNN: a lazy learning approach to multi-label learning," Pattern Recognition, vol. 40, no. 7, pp. 2038-2048, Jul. 2007.
[47] S. Kashef and H. Nezamabadi-pour, "A label-specific multi-label feature selection algorithm based on the Pareto dominance concept," Pattern Recognition, vol. 88, pp. 654-667, 2019.
[48] J. Lee and D. W. Kim, "Feature selection for multi-label classification using multivariate mutual information," Pattern Recognition Letters, vol. 34, no. 3, pp. 349-357, Feb. 2013.
[49] D. J. Sheskin, Handbook of Parametric and Nonparametric Statistical Procedures, 5th ed., Chapman & Hall, 2011.