ماشین یادگیری مفرط عمیق: رویکرد ترکیبی یادگیری افزایشی برای طبقهبندی دادههای جریانی
محورهای موضوعی : مهندسی برق و کامپیوتر
1 - دانشگاه سجاد مشهد،دانشکده مهندسی کامپیوتر و فناوری اطلاعات
2 - دانشگاه سمنان،دانشکده مهندسی برق و کامپیوتر
کلید واژه: دادههای جریانی, رانش مفهوم, ماشین یادگیری مفرط, یادگیری افزایشی,
چکیده مقاله :
دادههای جریانی متشکل از دادههایی است که به ترتیب و با سرعت و حجم زیاد به سیستم وارد میشوند. توزیع این دادهها ناپایدار بوده و در طول زمان ممکن است تغییر کنند. با توجه به اهمیت این نوع دادهها در حوزههایی مهم نظیر اینترنت اشیا، تسریع عملکرد و افزایش توان عملیاتی تحلیل دادههای بزرگ جریانی به عنوان موضوعی مهم، مورد توجه محققین است. در روش پیشنهادی، از مفهوم یادگیری ترکیبی برخط در مدل بهبودیافته ماشین یادگیر مفرط به منظور طبقهبندی دادههای جریانی استفاده شده است. به دلیل استفاده از رویکرد افزایشی، در هر لحظه تنها یک بلوک داده بدون نیاز به دسترسی به دادههای پیشین یاد گرفته میشود. همچنین با بهرهگیری از رویکرد آدابوست، وزندهی به طبقهبندیکنندههای پایه و تصمیمگیری در مورد حفظ و یا حذف آنها بر اساس کیفیت پیشبینیها انجام میشود. مزیت دیگر روش پیشنهادی، بهرهگیری از رویکرد مبتنی بر صحت طبقهبندی کننده جهت شناسایی رانش مفهوم است که منجر به تسهیل انطباق مدل و افزایش کارایی آن میشود. آزمایشها بر روی مجموعه دادههای استاندارد انجام گردید و روش پیشنهادی به طور میانگین با کسب 90/0% خاصبودن، 69/0% حساسیت و 87/0% صحت توانست اختلاف معناداری با دو روش رقیب داشته باشد.
Streaming data refers to data that is continuously generated in the form of fast streams with high volumes. This kind of data often runs into evolving environments where a change may affect the data distribution. Because of a wide range of real-world applications of data streams, performance improvement of streaming analytics has become a hot topic for researchers. The proposed method integrates online ensemble learning into extreme machine learning to improve the data stream classification performance. The proposed incremental method does not need to access the samples of previous blocks. Also, regarding the AdaBoost approach, it can react to concept drift by the component weighting mechanism and component update mechanism. The proposed method can adapt to the changes, and its performance is leveraged to retain high-accurate classifiers. The experiments have been done on benchmark datasets. The proposed method can achieve 0.90% average specificity, 0.69% average sensitivity, and 0.87% average accuracy, indicating its superiority compared to two competing methods.
[1] J. Lu, A. Liu, F. Dong, F. Gu, J. Gama, and G. Zhang, "Learning under concept drift: a review," IEEE Trans. on Knowledge Data Engineering, vol. 31, no. 12, pp. 2346-2363, Dec. 2018.
[2] X. Zheng, P. Li, X. Hu, and K. Yu, "Semi-supervised classification on data streams with recurring concept drift and concept evolution," Knowledge-Based Systems, vol. 215, Article ID: 106749, Mar. 2021.
[3] J. Ko and M. Comuzzi, "Keeping our rivers clean: information-theoretic online anomaly detection for streaming business process events," Information Systems, vol. 104, Article ID: 101894, Feb. 2022.
[4] H. Tavasoli, B. J. Oommen, and A. Yazidi, "On utilizing weak estimators to achieve the online classification of data streams," Engineering Applications of Artificial Intelligence, vol. 86, no. C, pp. 11-31, Nov. 2019.
[5] H. D. Dilectin and R. B. V. Mercy, "Classification and dynamic class detection of real time data for tsunami warning system," in Proc. Int. Conf. on Recent Advances in Computing and Software Systems, pp. 124-129, Chennai, India, 25-27 Apr. 2012.
[6] G. Liu, H. Cheng, Z. Qin, Q. Liu, and C. Liu, "E-CVFDT: an improving CVFDT method for concept drift data stream," in Proc. Int. Conf. on Communications, Circuits and Systems, ICCCAS’13, vol. 1, pp. 315-318, Chengdu, China, 15-17 Nov. 2013.
[7] J. Guan, W. Guo, H. Chen, and O. Lou, "An ensemble of classifiers algorithm based on GA for handling concept-drifting data streams," in Proc. 6th Int. Symp. on Parallel Architectures, Algorithms and Programming, pp. 282-284, Beijing, China, 13-15 Jul. 2014.
[8] M. A. M. Raja and S. Swamynathan, "Ensemble learning for network data stream classification using similarity and online genetic algorithm classifiers," in Proc. Int. Conf. on Advances in Computing, Communications and Informatics, ICACCI’16, pp. 1601-1607, Jaipur, India, 21-24 Sept. 2016.
[9] Y. Lv, et al., "A classifier using online bagging ensemble method for big data stream learning," Tsinghua Science Technology, vol. 24, no. 4, pp. 379-388, Aug. 2019.
[10] W. Chen, Q. Sun, J. Wang, J. J. Dong, and C. Xu, "A novel AdaBoost and CNN based for vehicle classification," IEEE Access, vol. 6, pp. 60445-60455, 2018.
[11] H. Zhao, H. Yu, D. Li, T. Mao, and H. Zhu, "Vehicle accident risk prediction based on AdaBoost-SO in vanets," IEEE Access, vol. 7, pp. 14549-14557, 2019.
[12] H. Yu, X. Sun, and J. Wang, "Ensemble OS-ELM based on combination weight for data stream classification," Applied Intelligence, vol. 49, no. 6, pp. 2382-2390, 15 Jun. 2019.
[13] D. Vitorio, E. Souza, and A. L. I. Oliveira, "Using active learning sampling strategies for ensemble generation on opinion mining," in Proc. 8th Brazilian Conf. on Intelligent Systems, BRACIS’19, pp. 114-119, Salvador, Brazil, 15-18 Oct. 2019.
[14] Y. Freund and R. E. Schapire, "Experiments with a new boosting algorithm," in Proc. of the 13th Int. Conf. on Machine Learning, pp. 148-156, Bari, Italy, 3-6 Jul. 1996.
[15] B. L. S. da Silva, F. K. Inaba, E. O. T. Salles, and P. M. Ciarelli, "Outlier robust extreme machine learning for multi-target regression," Expert Systems with Applications, vol. 140, Article ID: 112877, Feb. 2020.
[16] G. B. Huang, Q. Y. Zhu, and C. K. Siew, "Extreme learning machine: theory and applications," Neurocomputing, vol. 70, no. 1-3, pp. 489-501, Dec. 2006.
[17] S. Zhang, W. Tan, and Y. Li, "A survey of online sequential extreme learning machine," Proc. of 5th Inte. Conf. on Control, Decision and Information Technologies, CoDIT’18, pp. 45-50, Thessaloniki, Greece, 10-13 Apr. 2018.
[18] G. B. Huang, M. B. Li, L. Chen, and C. K. Siew, "Incremental extreme learning machine with fully complex hidden nodes," Neurocomputing, vol. 71, no. 4-6, pp. 576-583, Jan. 2008.
[19] G. Feng, G. Huang, Q. Lin, and R. Gay, "Error minimized extreme learning machine with growth of hidden nodes and incremental learning," IEEE Trans. on Neural Networks, vol. 20, no. 8, pp. 1352-1357, Aug. 2009.
[20] H. J. Rong, Y. S. Ong, A. H. Tan, and Z. Zhu, "A fast pruned-extreme learning machine for classification problem," Neurocomput., vol. 72, no. 1-3, pp. 359-366, Dec. 2008.
[21] Y. Miche, et al., "OP-ELM: optimally pruned extreme learning machine," IEEE Trans. on Neural Networks, vol. 21, no. 1, pp. 158-162, Jan. 2010.
[22] N. Liu and H. Wang, "Ensemble based extreme learning machine," IEEE Signal Processing Letters, vol. 17, no. 8, pp. 754-757, Aug. 2010.
[23] N. Liang, G. Huang, P. Saratchandran, and N. Sundararajan, "A fast and accurate online sequential learning algorithm for feedforward networks," IEEE Trans. on Neural Networks, vol. 17, no. 6, pp. 1411-1423, Nov. 2006.
[24] W. Guo, T. Xu, K. Tang, J. Yu, and S. Chen, "Online sequential extreme learning machine with generalized regularization and adaptive forgetting factor for time-varying system prediction," Mathematical Problems in Engineering, vol. 2018, Article ID: 6195387, 31 May 2018.
[25] J. Xie, et al., "GSPSO-LRF-ELM: grid search and particle swarm optimization-based local receptive field-enabled extreme learning machine for surface defects detection and classification on the magnetic tiles," Discrete Dynamics in Nature and Society, vol. 2020, Article ID: 4565769, 15 May 2020.
[26] Y. Lan, Y. C. Soh, and G. B. Huang, "Ensemble of online sequential extreme learning machine," Neurocomputing, vol. 72, no. 13-15, pp. 3391-3395, Aug. 2009.
[27] S. Xu and J. Wang, "Dynamic extreme learning machine for data stream classification," Neurocomputing, vol. 238, pp. 433-449, May 2017. [28] O. Aydogdu and M. Ekinci, "A new approach for data stream classification: unsupervised feature representational online sequential extreme learning machine," Multimedia Tools and Applications, vol. 79, no. 37, pp. 27205-27227, Oct. 2020.
[29] W. Li-Wen, G. Wei, and Y. Yi-Cheng, "An online weighted sequential extreme learning machine for class imbalanced data streams," J. of Physics: Conf. Series, vol. 1994, no. 1, Article ID: 012008, 10 pp., Chongqing, China, 9-11 Jul. 2021.
[30] W. Guo, "Robust adaptive online sequential extreme learning machine for predicting nonstationary data streams with outliers," J. of Algorithms & Computational Technology, vol. 13, Article ID: 1748302619895421, 18 Dec. 2019.
[31] Y. Zhang, W. Liu, X. Ren, and Y. Ren, "Dual weighted extreme learning machine for imbalanced data stream classification," J. of Intelligent & Fuzzy Systems, vol. 33, no. 2, pp. 1143-1154, 2017.
[32] B. Mirza, S. Kok, and F. Dong, "Multi-layer Online Sequential Extreme Learning Machine for Image Classification," pp. 39-49, 2016.
[33] S. Ding, L. Guo, and Y. Hou, "Extreme learning machine with kernel model based on deep learning," Neural Computing and Applications, vol. 28, no. 8, pp. 1975-1984, Aug. 2017.
[34] B. Krawczyk, L. Minku, J. Gama, J. Stefanowski, and M. Wozniak, "Ensemble learning for data stream analysis: a survey," Information Fusion, vol. 37, pp. 132-156, Sept. 2017.
[35] UC Irvine Machine Learning Repository, https://archive.ics.uci.edu/ml/index.php (Accessed 03/13, 2020).