ارائه یک الگوریتم خوشهبندی مبتنی بر چگالی با قابلیت کشف خوشههای با چگالی متفاوت در پایگاه دادههای مکانی
محورهای موضوعی : مهندسی برق و کامپیوترعلی زاده ده بالایی 1 , علیرضا باقری 2 , حامد افشار 3
1 - دانشگاه صنعتی امیرکبیر
2 - دانشگاه صنعتی امیرکبیر
3 - دانشگاه صنعتی امیرکبیر
کلید واژه: چگالی متفاوت خوشهبندی مبتنی بر چگالی دادهکاوی مکانی DBSCAN,
چکیده مقاله :
خوشهبندی یکی از تکنیکهای مهم کشف دانش در پایگاه دادههای مکانی است. الگوریتمهای خوشهبندی مبتنی بر چگالی یکی از روشهای اصلی برای خوشهبندی در دادهکاوی هستند. الگوریتم DBSCAN پایه روشهای خوشهبندی مبتنی بر چگالی است که علیرغم مزایایی که دارد دارای مشکلاتی نظیر سختبودن تعیین پارامترهای ورودی و عدم توانایی کشف خوشههای با چگالی متفاوت نیز است. در این مقاله الگوریتمی ارائه شده که برخلاف الگوریتم DBSCAN، قابلیت تشخیص خوشههای با چگالی متفاوت را دارد. این الگوریتم همچنین خوشههای تودرتو و چسبیده به هم را نیز به خوبی تشخیص میدهد. ایده الگوریتم پیشنهادی به این صورت است که ابتدا با استفاده از تکنیکی چگالیهای مختلف مجموعه داده را تشخیص داده و برای هر چگالی یک شعاع Eps تعیین میکند. سپس الگوریتم DBSCAN جهت اعمال بر روی مجموعه داده، با پارامترهای به دست آمده تطبیق داده میشود. الگوریتم پیشنهادی بر روی مجموعه دادههای استاندارد و مصنوعی تست شده است و نتایج به دست آمده با نتایج حاصل از الگوریتم DBSCAN و پنج بهبود الگوریتم DBSCAN شامل: VDBSCAN، VMDBSCAN، LDBSCAN، DVBSCAN و MDDBSCAN که همگی برای رفع مشکل تغییرات چگالی الگوریتم DBSCAN ارائه شدهاند، بر اساس معیارهای ارزیابی روشهای خوشهبندی مقایسه شدهاند. نتایج ارزیابیها نشان میدهد که الگوریتم پیشنهادی از دقت بالا و درصد خطای پایینی برخوردار بوده و نتایج بهتری نسبت به سایر الگوریتمها داشته است.
Clustering is one of the important techniques for knowledge discovery in spatial databases. density-based clustering algorithms are one of the main clustering methods in data mining. DBSCAN which is the base of density-based clustering algorithms, besides its benefits suffers from some issues such as difficulty in determining appropriate values for input parameters and inability to detect clusters with different densities. In this paper, we introduce a new clustering algorithm which unlike DBSCAN algorithm, can detect clusters with different densities. This algorithm also detects nested clusters and clusters sticking together. The idea of the proposed algorithm is as follows. First, we detect the different densities of the dataset by using a technique and Eps parameter is computed for each density. Then DBSCAN algorithm is adapted with the computed parameters to apply on the dataset. The experimental results which are obtained by running the suggested algorithm on standard and synthetic datasets by using well-known clustering assessment criteria are compared to the results of DBSCAN algorithm and some of its variants including VDBSCAN, VMDBSCAN, LDBSCAN, DVBSCAN and MDDBSCAN. All these algorithms have been introduced to solve the problem of multi-density data sets. The results show that the suggested algorithm has higher accuracy and lower error rate in comparison to the other algorithms.
[1] R. H. Guting, "An introduction to spatial database systems," the the International J. on Very Large Data Bases, vol. 3, no. 4, pp. 357-399, Oct. 1994.
[2] J. Mennis and D. Guo, "Spatial data mining and geographic knowledge discovery-an introduction," Computers, Environment and Urban Systems, vol. 33, no. 6, pp. 403-408, Nov. 2009.
[3] C. J. Matheus, P. K. Chan, and G. Piatetsky-Shapiro, "Systems for knowledge discovery in databases," IEEE Trans. on Knowledge and Data Engineering, vol. 5, no. 6, pp. 903-913, Dec. 1993.
[4] N. Soni and A. Ganatra, "Comparative study of several clustering algorithms," International J. of Advanced Computer Research, vol. 2, no. 6, pp. 37-42, Dec. 2012.
[5] T. Liu, C. Rosenberg, and H. A. Rowley, "Clustering billions of images with large scale nearest neighbor search," in Proc. IEEE Workshop on Applications of Computer Vision, WACV'07, pp. 28-28, Austin, TX, USA, 21-22 Feb. 2007.
[6] M. E. Celebi, Y. A. Aslandogan, and P. R. Bergstresser, "Mining biomedical images with density-based clustering," in Proc. Int. Conf. on Information Technology: Coding and Computing, ITCC'05, pp. 163-168, Las Vegas, NV, USA, 4-6 Apr. 2005.
[7] M. Celik, F. Dadaser-Celik, and A. Dokuz, "Anomaly detection in temperature data using DBSCAN algorithm," in Proc. Int. Symp. on Innovations in Intelligent Systems and Applications, INISTA'11, pp. 91-95, Istanbul, Turkey, 15-18 Jun. 2011.
[8] A. A. Abbasi and M. Younis, "A survey on clustering algorithms for wireless sensor networks," Computer Communications, vol. 30, no. 14, pp. 2826-2841, Oct. 2007.
[9] N. Soni and A. Ganatra, "Categorization of several clustering algorithms from different perspective: a review," International J. of Advanced Research in Computer Science and Software Engineering, vol. 2, no. 8, pp. 63-68, Aug. 2012.
[10] M. Ester, H. P. Kriegel, J. Sander, and X. Xu, "A density-based algorithm for discovering clusters in large spatial databases with noise," in Proc. of the 2nd Int. Conf. on Knowledge Discovery and Data Mining, KDD'06, pp. 226-231, Portland, OR, USA, 2-4 Aug. 1996.
[11] M. Ankerst, M. M. Breunig, H. P. Kriegel, and J. Sander, "Optics: ordering points to identify the clustering structure," in Proc. of the 1999 ACM Int. Conf. on Management of data, SIGMOD'99, pp. 49-60, 31 May-3 Jun. 1999.
[12] P. Liu, D. Zhou, and N. Wu, "VDBSCAN: varied density based spatial clustering of applications with noise," in Proc. Int. Conf. on Service Systems and Service Management, 4 pp., Chengdu, China, 9-11 Jun. 2007.
[13] A. Ram, S. Jalal, A. S. Jalal, and M. Kumar, "A density based algorithm for discovering density varied clusters in large spatial databases," Int. J. of Computer Applications, vol. 3, Article 1, 4 pp., 2010.
[14] A. Fahim, A. Salem, F. Torkey, and M. Ramadan, "Density clustering based on radius of data (DCBRD)," International Scholarly and Scientific Research & Innovation, vol. 2, no. 10, pp. 3463-3469, 2008.
[15] L. Duan, L. Xu, F. Guo, J. Lee, and B. Yan, "A local-density based spatial clustering algorithm with noise," Information Systems, vol. 32, no. 7, pp. 978-986, Nov. 2007.
[16] M. M. Breunig, H. P. Kriegel, R. T. Ng, and J. Sander, "LOF: identifying density-based local outliers," in Proc. of the 1999 ACM Int. Conf. on Management of data, SIGMOD'99, pp. 93-104, 15-18 May 2000.
[17] M. N. Gaonkar and K. Sawant, "AutoEpsDBSCAN: DBSCAN with Eps automatic for large dataset," International J. on Advanced Computer Theory and Engineering, vol. 2, no. 2, pp. 2319-2526, Aug. 2013.
[18] B. Borah and D. K. Bhattacharyya, "DDSC: a density differentiated spatial clustering technique," J. of Computers, vol. 3, no. 2, pp. 72-79, Feb. 2008.
[19] W. Ashour and S. Sunoallah, "Multi density DBSCAN," in Intelligent Data Engineering and Automated Learning-IDEAL, Ed: Springer, pp. 446-453, 2011.
[20] M. T. Elbatta, R. M. Bolbol, and W. M. Ashour, "A vibration method for discovering density varied clusters," International Scholarly Research Notices, vol. 2012, Article ID 723516, 8 pp., 2011.
[21] S. Mitra and J. Nandy, "KDDClus: a simple method for multi-density clustering," in Proc. of the Int. Workshop on Soft Computing Applications and Knowledge Discovery, SKAD'11, pp. 72-76, Moscow, Russia, 25 Jun. 2011.
[22] R. K. Prasad and R. Sarmah, "Variable density spatial data clustering," in Proc. 2nd Int. Conf. on Computer and Communication Technology, ICCCT'11, vol. ???, pp. 53-58, 25 Jun. 2011.
[23] S. Vijayalakshmi and M. Punithavalli, "Improved varied density based spatial clustering algorithm with noise," in Proc. IEEE Int. Conf. on Computational Intelligence and Computing Research, ICCIC'10, 4 pp., Coimbatore, India, 28-29 Dec. 2010.
[24] S. Wang, Y. Liu, and B. Shen, "MDBSCAN: multi-level density based spatial clustering of applications with noise," in Proc. of the the 11th Int. Knowledge Management in Organizations Conf. on the Changing Face of Knowledge Management Impacting Society, p. 24, Hagen, Germany, 25-28 Jul. 2016.
[25] W. Atwa and K. Li, "Constraint-based clustering algorithm for multi-density data and arbitrary shapes," in Proc. Industrial Conf. on Data Mining, ICDM'17, pp. 78-92, 2017.
[26] S. Louhichi, M. Gzara, and H. Ben-Abdallah, "Unsupervised varied density based clustering algorithm using spline," Pattern Recognition Letters, vol. 93, no. 8, pp. 48-57, Jul. 2017.
[27] A. Amini, H. Saboohi, T. Herawan, and T. Ying Wah, "MuDi-Stream: a multi density clustering algorithm for evolving data stream," J. of Network and Computer Applications, vol. 59, no. 3, pp. 370-385, Jan. 2016.
[28] C. Fahy, S. Yang, and M. Augusto Gongora, "Finding multi-density clusters in non-stationary data streams using an ant colony with adaptive parameters," in Proc. IEEE Congress on Evolutionary Computation, pp. 673-680, San Sebastian, Spain, 5-8 Jun. 2017.
[29] T. N. Tran, K. Drab, and M. Daszykowski, "Revised DBSCAN algorithm to cluster data with dense adjacent clusters," Chemometrics and Intelligent Laboratory Systems, vol. 120, no. 1, pp. 92-96, Jan. 2013.
[30] S. Zhou, A. Zhou, W. Jin, Y. Fan, and W. Qian, "FDBSCAN: a fast DBSCAN algorithm," Journal of Software, vol. 11, no. 6, pp. 735-744, Jun. 2000.
[31] J. H. Peter and A. Antonysamy, "An optimised density based clustering algorithm," International J. of Computer Applications, vol. 6, no. 9, pp. 20-25, Sept. 2010.
[32] C. F. Tsai, C. T. Wu, and S. Chen, "GF-DBSCAN; a new efficient and effective data clustering technique for large databases," in Proc. WSEAS Int. Conf. Mathematics and Computers in Science and Engineering, pp. 231-236, Hangzhou, China, 20-22 May 2009.
[33] X. Wang and H. J. Hamilton, "DBRS: a density-based spatial clustering method with random sampling," in Proc. Pacific-Asia Conf. on Knowledge Discovery and Data Mining, PAKDD'03, pp. 563-575, 2003.
[34] J. Sander, M. Ester, H. P. Kriegel, and X. Xu, "Density-based clustering in spatial databases: the algorithm gdbscan and its applications," Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 169-194, Jun. 1998.
[35] D. Birant and A. Kut, "ST-DBSCAN: an algorithm for clustering spatial-temporal data," Data & Knowledge Engineering, vol. 60, no. 1, pp. 208-221, Jan. 2007.
[36] ع. زاده ده بالایی، ع. ر. باقری و ح. افشار، ":IMD-DBSCAN یک الگوریتم خوشهبندی مبتنی بر چگالی افزایشی با قابلیت افزودن خوشههای با چگالی متفاوت در محیطهای انباره داده،" مجموعه مقالات بيستمين کنفرانس ملی سالانه انجمن کامپیوتر ایران، صص. 34-28، مشهد، 14-12 اسفند 1393.
[37] J. H. Friedman, J. L. Bentley, and R. A. Finkel, "An algorithm for finding best matches in logarithmic expected time," ACM Trans. on Mathematical Software, vol. 3, no. 3, pp. 209-226, Sept. 1977.
[38] , Clustering Datasets, [Online]. http://cs.joensuu.fi/sipu/datasets/
[39] W. M. Rand, "Objective criteria for the evaluation of clustering methods," J. of the American Statistical Association, vol. 66, no. 336, pp. 846-850, Dec. 1971.
[40] P. Jaccard, "Distribution of the alpine flora in the dranse's basin and some neighbouring regions," Bulletin de la Societe Vaudoise des Sciences Naturelles, vol. 37, no. 1, pp. 241-272, Jan. 1901.
[41] E. B. Fowlkes and C. L. Mallows, "A method for comparing two hierarchical clusterings," J. of the American Statistical Association, vol. 78, 383, pp. 553-569, Sept. 1983.
[42] J. MacQueen, "Some methods for classification and analysis of multivariate observations," in Proc. of the Fifth Berkeley Symp. on Mathematical Statistics and Probability, vol. 1, no. 14, pp. 281-297, Jun. 1967.
[43] P. Berkhin, "A survey of clustering data mining techniques," in Grouping Multidimensional Data, pp. 25-71, Springer Berlin Heidelberg, 2006.
[44] J. Krumm and A. J. Brush, MSR GPS Privacy Dataset 2009, http://research.microsoft.com /~jckrumm/GPSData2009.