طبقهبندی خودآموز نیمهنظارتی مبتنی بر ساخت همسایگی
محورهای موضوعی : مهندسی برق و کامپیوترمنا عمادی 1 , جعفر تنها 2 , محمد ابراهیم شیری 3 , مهدی حسین زاده اقدم 4
1 - دانشگاه آزاد اسلامی واحد بروجرد،گروه مهندسی کامپیوتر
2 - دانشگاه تبريز،گروه مهندسی برق و الکترونیک
3 - دانشگاه امیرکبیر،گروه علوم کامپیوتر
4 - دانشگاه بناب،گروه مهندسی کامپیوتر
کلید واژه: الگوریتم اپسیلون- همسایگی (DBSCAN), الگوریتم خودآموزی, طبقهبندی نیمهنظارتی, ماشین بردار پشتیبان,
چکیده مقاله :
بهکارگیری دادههای بدون برچسب در خودآموزی نیمهنظارتی میتواند به طور قابل توجهی دقت طبقهبند نظارتشده را بهبود بخشد، اما در برخی موارد ممکن است دقت طبقهبندی را به مقدار چشمگیری کاهش دهد. یکی از دلایل چنین تنزلی، برچسبگذاری اشتباه به دادههای بدون برچسب میباشد. در این مقاله، روشی را برای برچسبگذاری با قابلیت اطمینان بالا به دادههای بدون برچسب پیشنهاد میکنیم. طبقهبند پایه در الگوریتم پیشنهادی، ماشین بردار پشتیبان است. در این روش، برچسبگذاری فقط به مجموعهای از دادههای بدون برچسب که از مقدار مشخصی به مرز تصمیم نزدیکتر هستند انجام میشود. به این دادهها، دادههای دارای اطلاعات میگویند. اضافهشدن دادههای دارای اطلاعات به مجموعه آموزشی در صورتی که برچسب آنها به درستی پیشبینی شود در دستیابی به مرز تصمیم بهینه تأثیر بهسزایی دارد. برای کشف ساختار برچسبزنی در فضای داده از الگوریتم اپسیلون- همسایگی (DBSCAN) استفاده شده است. آزمایشهای مقایسهای روی مجموعه دادههای UCI نشان میدهند که روش پیشنهادی برای دستیابی به دقت بیشتر طبقهبند نیمهنظارتی خودآموز به نسبت برخی از کارهای قبلی عملکرد بهتری دارد.
Using the unlabeled data in the semi-supervised learning can significantly improve the accuracy of supervised classification. But in some cases, it may dramatically reduce the accuracy of the classification. The reason of such degradation is incorrect labeling of unlabeled data. In this article, we propose the method for high confidence labeling of unlabeled data. The base classifier in the proposed algorithm is the support vector machine. In this method, the labeling is performed only on the set of the unlabeled data that is closer to the decision boundary from the threshold. This data is called informative data. the adding informative data to the training set has a great effect to achieve the optimal decision boundary if the predicted label is correctly. The Epsilon- neighborhood Algorithm (DBSCAN) is used to discover the labeling structure in the data space. The comparative experiments on the UCI dataset show that the proposed method outperforms than some of the previous work to achieve greater accuracy of the self-training semi-supervised classification.
[1] D. Wu, et al., "Self-training semi-supervised classification based on density peaks of data," Neurocomputing, vol. 275, pp. 180-191, Jan. 2018.
[2] N. Zeng, Z. Wang, H. Zhang, W. Liu, and F. E. Alsaadi, "Deep belief networks for quantitative analysis of a gold immunochromatographic strip," Cognitive Computation, vol. 8, no. 4, pp. 684-692, 2016.
[3] N. Zeng, Z. Wang, and H. Zhang, "Inferring nonlinear lateral flow immunoassay state-space models via an unscented Kalman filter," Science China Information Sciences, vol. 59, no. 11, Article ID: 112204, 10 pp., 2016.
[4] N. Zeng, H. Zhang, W. Liu, J. Liang, and F. E. Alsaadi, "A switching delayed PSO optimized extreme learning machine for short-term load forecasting," Neurocomputing, vol. 240, pp. 175-182, May 2017.
[5] Y. Cao, H. He, and H. H. Huang, "LIFT: a new framework of learning from testing data for face recognition," Neurocomputing, vol. 74, no. 6, pp. 916-929, May 2011.
[6] F. Pan, J. Wang, and X. Lin, "Local margin based semi-supervised discriminant embedding for visual recognition," Neurocomputing, vol. 74, no. 5, pp. 812-819, Feb. 2011.
[7] D. Mallis, E. Sanchez, M. Bell, and G. Tzimiropoulos, "Unsupervised learning of object landmarks via self-training correspondence," Advances in Neural Information Processing Systems, vol. 33, pp. 4709-4720, 2020.
[8] G. Zhang, J. Wang, G. Shi, J. Zhang, and W. Dou, "A semi-supervised classification method for hyperspectral images by triple classifiers with data editing and deep learning," in Proc. EIA Int. Conf. Cloud Computing, Smart Grid and Innovative Frontiers in Telecommunications, pp. 171-183, Beiging, China, 4-5 Dec. 2019.
[9] J. Tanha, M. Van Someren, and H. Afsarmanesh, "Boosting for multiclass semi-supervised learning," Pattern Recognition Letters, vol. 37, pp. 63-77, Feb. 2014.
[10] D. Zhang, L. Jiao, X. Bai, S. Wang, and B. Hou, "A robust semi-supervised SVM via ensemble learning," Applied Soft Computing, vol. 65, pp. 632-643, Apr. 2018.
[11] J. Tanha, "MSSBoost: a new multiclass boosting to semi-supervised learning," Neurocomputing, vol. 314, pp. 251-266, Nov. 2018.
[12] C. Blake and C. Merz, UCI Repository of Machine Learning Databases, http://www.ics.uci.edu/?mlearn/MLRepository.html, University of California. Department of Information and Computer Science, Irvine, CA, p. 55, 1998.
[13] Z. H. Zhou and M. Li, "Semi-supervised learning by disagreement," Knowledge and Information Systems, vol. 24, no. 3, pp. 415-439, 2010.
[14] A. Blum and T. Mitchell, "Combining labeled and unlabeled data with co-training," in Proc. of the 11th Annual Conf. on Computational Learning Theory, pp. 92-100, Madison, WN, USA, 24 – 26 Jul. 1998.
[15] C. X. Ling, J. Du, and Z. H. Zhou, "When does co-training work in real data?" in ¬Proc. Pacific-Asia Conf. on Knowledge Discovery and Data Mining, pp. 596-603, Bangkok, Thailand, 27-30 Apr. 2009.
[16] Z. Jiang, S. Zhang, and J. Zeng, "A hybrid generative/discriminative method for semi-supervised classification," Knowledge-Based Systems, vol. 37, pp. 137-145, Jan. 2013.
[17] S. Sun, "A survey of multi-view machine learning," Neural Computing and Applications, vol. 23, no. 7-8, pp. 2031-2038, Feb. 2013.
[18] M. Li and Z. H. Zhou, "SETRED: self-training with editing," in Proc. Pacific-Asia Conf. on Knowledge Discovery and Data Mining, pp. 611-621, Hanoi, Vietnam, 18-20 May 2005.
[19] R. Chen, et al., "Semi-supervised anatomical landmark detection via shape-regulated self-training," Neurocomputing, vol. 471, pp. 335-345, Jan. 2022.
[20] Z. Yang and Y. Xu, "Laplacian twin parametric-margin support vector machine for semi-supervised classification," Neurocomputing, vol. 171, pp. 325-334, Jan. 2016.
[21] ش. پوربهرامی، ا. خالدی و ل. خانعلی، "الگوریتم جدید خوشهبندی ارسال داده در شبکههای حسگر بیسیم با استفاده از دایره آپولونیوس،" نشریه مهندسی برق و مهندسی کامپيوتر ايران، ب- مهندسی کامپیوتر، سال 17، شماره 3، صص. 226-219، پاییز 1398.
[22] ع. زاده بابایی، ع. باقری و خ. افشار، "ارائه یک الگوریتم خوشهبندی مبتنی بر چگالی با قابلیت کشف خوشههای با چگالی متفاوت در پایگاه دادههای مکانی،" نشریه مهندسی برق و مهندسی کامپيوتر ايران، ب- مهندسی کامپیوتر، سال 15، شماره 3، صص. 186-171، پاییز 1396.
[23] M. Ester, H. P. Kriegel, J. Sander, and X. Xu, "A density-based algorithm for discovering clusters in large spatial databases with noise," in Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, KDD’96, pp. 226-231, Portland, ON, USA, 2-4 Aug. 1996.
[24] A. Lotfi, P. Moradi, and H. Beigy, "Density peaks clustering based on density backbone and fuzzy neighborhood," Pattern Recognition, vol. 107, Article ID: 107449, Nov. 2020.
[25] S. A. Seyedi, A. Lotfi, P. Moradi, and N. N. Qader, "Dynamic graph-based label propagation for density peaks clustering," Expert Systems with Applications, vol. 115, pp. 314-328, Jan. 2019.
[26] Y. Qin, Z. L. Yu, C. D. Wang, Z. Gu, and Y. Li, "A novel clustering method based on hybrid K-nearest-neighbor graph," Pattern Recognition, vol. 74, pp. 1-14, Feb. 2018.
[27] M. Emadi, J. Tanha, M. E. Shiri, and M. H. Aghdam, "A selection metric for semi-supervised learning based on neighborhood construction," Information Processing & Management, vol. 58, no. 2, Article ID: 102444, Mar. 2021.
[28] S. Pourbahrami and L. M. Khanli, A Survey of Neighbourhood Construction Models for Categorizing Data Points, arXiv preprint arXiv:1810.03083, 2018.
[29] S. Khezri, J. Tanha, A. Ahmadi, and A. Sharifi, "STDS: self-training data streams for mining limited labeled data in non-stationary environment," Applied Intelligence, vol. 50, no. 5, pp. 1-20, 2020.
[30] X. Gu, "A self-training hierarchical prototype-based approach for semi-supervised classification," Information Sciences, vol. 535, pp. 204-224, Oct. 2020.
[31] M. M. Adankon and M. Cheriet, "Help-training for semi-supervised support vector machines," Pattern Recognition, vol. 44, no. 9, pp. 2220-2230, Sept. 2011.
[32] M. Emadi and J. Tanha, "Margin-based semi-supervised learning using apollonius circle," in Proc. Int Conf. on Topics in Theoretical Computer Science, pp. 48-60, Tehran, Iran, 26-28 Aug. 2020.
[33] S. Pourbahrami, L. M. Khanli, and S. Azimpour, "A novel and efficient data point neighborhood construction algorithm based on Apollonius circle," Expert Systems with Applications, vol. 115, pp. 57-67, Jan. 2019.
[34] S. Pourbahrami, M. A. Balafar, L. M. Khanli, and Z. A. Kakarash, "A survey of neighborhood construction algorithms for clustering and classifying data points," Computer Science Review, vol. 38, Article ID: 100315, Nov. 2020.
[35] J. Tanha, M. van Someren, and H. Afsarmanesh, "Semi-supervised self-training for decision tree classifiers," International J. of Machine Learning and Cybernetics, vol. 8, no. 1, pp. 355-370, 2017.
[36] T. Joachims, "Transductive inference for text classification using support vector machines," in Proc. of The 26th Int. Conf. on Machine Learning, ICML'99, pp. 200-209, 1999.
[37] M. Belkin, P. Niyogi, and V. Sindhwani, "Manifold regularization: a geometric framework for learning from labeled and unlabeled examples," J. of Machine Learning Research, vol. 7, no. 85, pp. 2399-2434, 2006.