ارائه روشی جدید بر مبنای تجزیه ماتریس غیر منفی برای کاهش ابعاد
الموضوعات :مهدی حسین زاده اقدم 1 , مرتضی آنالویی 2 , جعفر تنها 3
1 - دانشگاه بناب،دانشکده فنی و مهندسی
2 - دانشگاه علم و صنعت ایران،دانشكده مهندسي كامپيوتر
3 - دانشگاه تبریز،دانشكده مهندسي برق و كامپيوتر
الکلمات المفتاحية: کاهش ابعاد, تجزیه ماتریسی غیر منفی, نرم فروبنیوس, قوانین به روز رسانی, خوشهبندی متن,
ملخص المقالة :
یادگیری ماشین در طی دهههای گذشته به دلیل طیف گسترده کاربردهای آن مورد استفاده زیادی قرار گرفته است. در اکثر کاربردهای یادگیری ماشین مانند خوشهبندی و طبقهبندی، ابعاد دادهها زیاد میباشد و استفاده از روشهای کاهش ابعاد داده ضروری است. تجزیه ماتریس غیر منفی با استفاده از استخراج ویژگیها معنایی از دادههای با ابعاد زیاد کاهش ابعاد را انجام میدهد و در تجزیه ماتریس غیر منفی فقط نحوه مدلسازی هر بردار ویژگی در ماتریسهای تجزیهشده را در نظر میگیرد و روابط بین بردارهای ویژگی را نادیده میگیرد. ارتباطات میان بردارهای ویژگی، تجزیه بهتری را برای کاربردهای یادگیری ماشین فراهم میکنند. در این مقاله، یک روش بر مبنای تجزیه ماتریس غیر منفی برای کاهش ابعاد دادهها ارائه شده که محدودیتهایی را بر روی هر جفتبردارهای ویژگی با استفاده از معیارهای مبتنی بر فاصله ایجاد میکند. روش پیشنهادی از نرم فروبنیوس به عنوان تابع هزینه برای ایجاد قوانین به روز رسانی استفاده میکند. نتایج آزمایشها روی مجموعه دادهها نشان میدهد که قوانین به روز رسانی ضربی ارائهشده، سریع همگرا میشوند و در مقایسه با الگوریتمهای دیگر نتایج بهتری را ارائه میکنند.
[1] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, Wiley-Interscience, Hoboken, NJ, 2nd Edition, 2000.
[2] D. D. Lee and H. S. Seung, "Learning the parts of objects by nonnegative matrix factorization," Nature, vol. 401, no. 6755, pp. 788-791, Jan. 1999.
[3] D. Kuang, J. Choo, and H. Park, "Nonnegative matrix factorization for interactive topic modeling and document clustering," In M. Emre Celebi (Ed.), Partitional Clustering Algorithms, pp. 215-243, Springer Cham, 2015.
[4] S. Z. Li, X. Hou, H. Zhang, and Q. Cheng, "Learning spatially localized, parts-based representation," in Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, CVPR'01, pp. 207-212, Kauai, HI, USA, 8-14. Dec. 2001.
[5] W. Xu, X. Liu, and Y. Gong, "Document clustering based on nonnegative matrix factorization," in Proc. of the 26th Annual Int ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 267-273, Toronto, Canada, 28 Jul.-1 Aug. 2003.
[6] D. D. Lee and H. S. Seung, "Algorithms for non-negative matrix factorization," in Proc. of the 2001 Conf. Advances in Neural Information Processing Systems, pp. 556-562, Vancouver, Canada, 3-8 Dec. 2001.
[7] R. Sandler and M. Lindenbaum, "Nonnegative matrix factorization with earth mover's distance metric for image analysis," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 33, no. 8, pp. 1590-1602, Aug. 2011.
[8] V. P. Pauca, J. Piper, and R. J. Plemmons, "Nonnegative matrix factorization for spectral data analysis," Linear Algebra and Its Applications, vol. 416, no. 1, pp. 29-47, Jul. 2006.
[9] M. H. Aghdam and M. D. Zanjani, "A novel regularized asymmetric non-negative matrix factorization for text clustering," Information Processing & Management, vol. 58, no. 6, Article ID: 102694, 6 pp., Nov. 2021.
[10] A. J. Mohammed, Y. Yusof, and H. Husni, "Document clustering based on firefly algorithm," J. of Computer Science, vol. 11, no. 3, pp. 453-465, Mar. 2015.
[11] A. J. Mohammed, Y. Yusof, and H. Husni, "Determining number of clusters using firefly algorithm with cluster merging for text clustering," in Proc. Int. Visual Informatics Conf., pp. 14-24, Bangi, Malaysia, 17- 19 Nov. 2015.
[12] A. Abraham, S. Das, and A. Konar, "Document clustering using differential evolution," in Proc. IEEE Int. Conf. on Evolutionary Computation, pp. 1784-1791, Vancouver, Canada, 16-21 Jul. 2006.
[13] A. Kumar and H. Daum´e, "A co-training approach for multi-view spectral clustering," in Proc. 28th Int. Conf. on Machine Learning, ICML’11, pp. 393-400, Bellevue, WA, USA, 28 Jun.-2 Jul. 2011.
[14] A. Y. Ng, M. I. Jordan, and Y. Weiss, "On spectral clustering: analysis and an algorithm," in Proc. of the 2002 Conf. Advances in Neural Information Processing Systems, pp. 849-856, Vancouver, Canada, 9-14 Dec. 2002.
[15] D. Yogatama and K. Tanaka-Ishii, "Multilingual spectral clustering using document similarity propagation," in Proc. of the Conf. on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, vol. 2, pp. 871-879, Singapore, 6-7 Aug. 2009.
[16] D. Toli´c, N. Antulov-Fantulin, and I. Kopriva, "A nonlinear orthogonal non-negative matrix factorization approach to subspace clustering," Pattern Recognition, vol. 82, pp. 40-55, Oct. 2018.
[17] C. C. Aggarwal and C. Zhai, Mining Text Data, Springer Science & Business Media, 2012.
[18] F. Shahnaz, M. W. Berry, V. P. Pauca, and R. J. Plemmons, "Document clustering using nonnegative matrix factorization," Information Processing & Management, vol. 42, no. 2, pp. 373-386, Mar. 2006.
[19] E. F. Gonzalez and Y. Zhang, Accelerating the Lee-Seung Algorithm for Nonnegative Matrix Factorization, Tech. Rep., 2005.
[20] N. Gillis and S. A. Vavasis, "Fast and robust recursive algorithms for separable nonnegative matrix factorization," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 36, no. 4, pp. 698-714, Nov. 2013.
[21] K. Huang, N. D. Sidiropoulos, and A. Swami, "Non-negative matrix factorization revisited: uniqueness and algorithm for symmetric decomposition," IEEE Trans. on Signal Processing, vol. 62, no. 1, pp. 211-224, Oct. 2013.
[22] K. Kimura, M. Kudo, and Y. Tanaka, "A column-wise update algorithm for nonnegative matrix factorization in bregman divergence with an orthogonal constraint," Machine Learning, vol. 103, no. 2, pp. 285-306, May 2016.
[23] F. Shang, L. Jiao, and F. Wang, "Graph dual regularization non-negative matrix factorization for co-clustering," Pattern Recognition, vol. 45, no. 6, pp. 2237-2250, Jun. 2012.
[24] X. Ma, P. Sun, and G. Qin, "Nonnegative matrix factorization algorithms for link prediction in temporal networks using graph communicability," Pattern Recognition, vol. 71, pp. 361-374, Nov. 2017.
[25] A. David and S. Vassilvitskii, "K-means++: The advantages of careful seeding," in Proc. of the 18th annual ACM-SIAM Symp. on Discrete Algorithms, SODA’07, pp. 1027-1035, New Orleans, LA, USA, 7-9 Jan. 2007.
[26] J. N. Myhre, K. Ø. Mikalsen, S. Løkse, and R. Jenssen, "Robust clustering using a knn mode seeking ensemble," Pattern Recognition, vol. 76, pp. 491-505, Apr. 2018.
[27] W. Jiang, W. Liu, and F. L. Chung, "Knowledge transfer for spectral clustering," Pattern Recognition, vol. 81, pp. 484-496, Sept. 2018.
[28] U. Von Luxburg, "A tutorial on spectral clustering," Statistics and Computing, vol. 17, no. 4, pp. 395-416, Aug. 2007.
[29] J. V. Munoz, M. A. Goncalves, Z. Dias, and R. da S. Torres, "Hierarchical clustering-based graphs for large scale approximate nearest neighbor search," Pattern Recognition, vol. 96, Article ID: 106970, Dec. 2019.
[30] L. Rokach and O. Maimon, Clustering Methods, in Data Mining and Knowledge Discovery Handbook. Springer, pp. 321-352, 2005.
[31] H. Xiong and D. Kong, "Elastic nonnegative matrix factorization," Pattern Recognition, vol. 90, pp. 464-475, Jun. 2019.
[32] N. X. Vinh, J. Epps, and J. Bailey, "Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance," J. of Machine Learning Research, vol. 11, pp. 2837-2854, Dec. 2010.
[33] Z. Yang, R. Algesheimer, and C. J. Tessone, "A comparative analysis of community detection algorithms on artificial networks," Scientific Reports, vol. 6, no. 1, pp. 1-18, Aug. 2016.
[34] T. M. Mitchell, "Machine learning and data mining," Communications of the ACM, vol. 42, no. 11, pp. 30-36, Nov. 1999.