استفاده از خوشهبندی BIRCH و الگوریتم بهینهسازی واکنش شیمیایی جهت کشف تقلب در حوزه سلامت
محورهای موضوعی : مهندسی برق و کامپیوترمجید عبدالرزاق نژاد 1 , مهدی خرد 2
1 - دانشگاه بزرگمهر قائنات
2 - دانشگاه قم
کلید واژه: الگوریتم بهینهسازی واکنش شیمیاییحوزه سلامتخوشهبندی BIRCHکشف تقلب,
چکیده مقاله :
حوزه سلامت به علت وسعت عملکرد مالی و همچنین وسعت کاربرد آن، یکی از سیستمهای ایدهآل برای تقلب است و با وجود راهکارهای مختلف در این زمینه، شناسایی دادههای تقلب هنوز یکی از چالشها برای ارائهدهندگان خدمات سلامت میباشد. در این مقاله برای اولین بار الگوریتم BIRCH به عنوان یک الگوریتم خوشهبندی سلسلهمراتبی با الگوریتم بهینهسازی واکنش شیمیایی (CRO) ترکیب شده است. الگوریتم BIRCH با پیچیدگی زمانی خطی قابلیت کار با حجم بالای دادهها و شناسایی دادههای پرت را دارد و CRO یکی از الگوریتمهای فراابتکاری جدید الهامگرفته از واکنش شیمیایی در دنیای واقعی است که با یک جمعیت پویا از مولکولها توسط چهار عملگر برخورد به دیواره، تجزیه، برخورد بین مولکولی و ترکیب فضای جستجو را مورد کاوش قرار میدهند. الگوریتم خوشهبندی بهبودیافته BIRCH-CRO با حذف فرایند خوشهبندی سراسری داخلی نسخه کلاسیک BIRCH و تعیین بهینه پارامترهای اصلی آن باعث بهبود سرعت و دقت تشخیص دادههای تقلب در حوزه سلامت نسبت به سایر الگوریتمهای بدون نظارت ارائهشده در این حوزه گردیده است. همچنین الگوریتم پیشنهادی توانایی کار با دادههای آنلاین و حجم بالا را دارد و با توجه به نتایج به دست آمده، عملکرد مناسبی را فراهم میکند.
With regard to the scale of the financial transactions and the extent of the healthcare industry, it is one of the ideal systems for fraud. Therefore, suitable identifying fraud data is still one of the challenges facing the healthcare providers, although there are several fraud detection algorithms. In the paper, the BIRCH clustering algorithm, as one hierarchical clustering algorithm, is hybridized with a chemical reaction optimization algorithm (CRO). The BIRCH with linear time complexity is able for clustering large scale data and identifying their noises and the CRO, as one of new meta-heuristic algorithm inspired by the chemical reactions in the real world, explores the search space with a dynamic population size based on four reactions such as on-wall ineffective collision, decomposition, inter-molecular ineffective collision and synthesis. Due to the improved BIRCH-CRO removes the internal clustering process of the classic BIRCH and determines the optimal values of its main parameters, it causes that the computational time decreases and accuracy and precision of detecting fraud data increase since its experimental results is compared with the exist unsupervised algorithms. Also, the proposed fraud detection algorithm has the ability to perform on online data and large scale data, and given the obtained results, it provides a proper performance.
[1] S. Roglaski, "Business intelligence: 360 insight: the intelligence challenge," DM Review Magazine, vol. 68, pp. 90-113, Jun. 2016.
[2] B. H. Pilon, J. J. Murillo-Fuentes, J. P. C. L. da Costa, R. T. de Sousa Junior, and A. M. R. Serrano, "Gaussian process for regression in business intelligence: a fraud detection application," in Proc. of the 7th Int. Joint Conf. on Knowledge Discovery, Knowledge Engineering, and Knowledge Management, vol.3, pp. 39-49, Nov. 2015.
[3] Q. Liu and M. Vasarhelyi, "Healthcare fraud detection: a survey and a clustering model incorporating geo-location information," in Proc. 29th World Continuous Auditing and Reporting Symp., 10, pp., Brisbane, Australia, 21-22 Nov. 2013.
[4] T. Zhang, R. Ramakrishnan, and M. Livny, "BIRCH: an efficient data clustering method for very large databases," ACM Sigmod Record, vol. 25, no. 2, pp. 103-114, Jun. 1996.
[5] A. Y. Lam and V. O. Li, "Chemical reaction optimization: a tutorial," Memetic Computing, vol. 4, no. 1, pp. 3-17, Mar. 2012.
[6] R. M. Musal, "Two models to investigate medicare fraud within unsupervised databases," Expert Systems with Applications, vol. 37, no. 12, pp. 8628-8633, Dec. 2010.
[7] S. Thiprungsri and M. A. Vasarhelyi, "Cluster analysis for anomaly detection in accounting data: an audit approach," 2011.
[8] M. Tang, B. S. U. Mendis, D. W. Murray, Y. Hu, and A. Sutinen, "Unsupervised fraud detection in Medicare Australia," in Proc. of the 9th Australasian Data Mining Conf., Australian Computer Society, AusDM'11, vol. 121, pp. 103-110, Ballarat, Australia, 2011.
[9] R. Ghani and M. Kumar, "Interactive learning for efficiently detecting errors in insurance claims," in Proc. of the 17th ACM SIGKDD Int Conf. on Knowledge Discovery and Data Mining, ACM, pp. 325-333, San Diego, CA, USA, 21-24 Aug. 2011.
[10] T. Ekina, F. Leva, F. Ruggeri, and R. Soyer, "Application of bayesian methods in detection of healthcare fraud," Chemical Engineering Trans., vol. 33, pp. 151-156, Sept. 2013.
[11] C. Ngufor and J. Wojtusiak, "Unsupervised labeling of data for supervised learning and its application to medical claims prediction," Computer Science, vol. 14, no. 2, p. 191-214, 2013.
[12] V. Rawte and G. Anuradha, "Fraud detection in health insurance using data mining techniques," in Proc. IEEE Int. Conf. on Communication, Information & Computing Technology, ICCICT’15, 5 pp., Mumbai, India, 15-17 Jan. 2015.
[13] M. E. Johnson and N. Nagarur, "Multi-stage methodology to detect health insurance claim fraud," Health Care Management Science, vol. 19, no. 3, pp. 249-260, Sept. 2016.
[14] H. Peng and M. You, "The health care fraud detection using the pharmacopoeia spectrum tree and neural network analytic contribution hierarchy process," in Proc. IEEE Trustcom/BigDataSE/ISPA, , pp. 2006-2011, Tianjin, China, 23-26 Aug. 2016.
[15] A. Gangopadhyay and S. Chen, "Health care fraud detection with community detection algorithms," in Proc. IEEE Int. Conf. on Smart Computing, SMARTCOMP’16, 5 pp., St. Louis, MO, USA, 18-20 May 2016.
[16] S. G. Fashoto, et al., "Development of improved k-means clustering to partition health insurance claims," Annals. Computer Science Series, vol. 14, no. 1, pp. 51-58, 2016.
[17] H. Ahmadinejad, A. Norouzi, A. Ahmadi, and A. Yousefi, "Distance based model to detect healthcare insurance fraud within unsupervised database," Indian J. of Science and Technology, Indian J. of Science and Technology, vol. 9, no. 43, pp. 1-6, Nov. 2016.
[18] J. Wu, R. Zhang, X. Shang, and F. Chu, "Medical insurance fraud recognition based on improved outlier detection algorithm," in Proc. 2nd Int. Conf. on Artificial Intelligence and Engineering Applications, AIEA'17, pp. 765-772, Guilin, China, 23-24 Sept. 2017.
[19] H. Cao and R. Zhang, "Using PCA to improve the detection of medical insurance fraud in SOFM neural networks," in Proc. of the 3rd Int. Conf. on Management Engineering, Software Engineering and Service Sciences, pp. 117-122, Wuhan, China, 12-14 Jan. 2019.
[20] T. Ekin, F. Ieva, F. Ruggeri, and R. Soyer, "Statistical medical fraud assessment: exposition to an emerging field," International Statistical Review, vol. 86, no. 3, pp. 379-402, May 2018.
[21] M. H. Soleymani, M. Yaseri, F. Farzadfar, A. Mohammadpour, F. Sharifi, and M. J. Kabir, "Detecting medical prescriptions suspected of fraud using an unsupervised data mining algorithm," DARU J. of Pharmaceutical Sciences, vol. 26, no. 2, pp. 209-214, Dec. 2018.
[22] D. S. Vijayarani and M. P. Jothi, "Hierarchical and partitioning clustering algorithms for detecting outliers in data streams," International J. of Advanced Research in Computer and Communication Engineering, vol. 3, no. 4, pp. 6205-6207, Apr. 2014.
[23] C. A. Ralanamahatana, J. Lin, D. Gunopulos, E. Keogh, M. Vlachos, and G. Das, "Mining Time Series Data," in Data Mining and Knowledge Discovery Handbook: Springer, pp. 1069-1103, 2005.
[24] م. اسماعیلی، دادهکاوی و مفاهیم آن، ناشر نياز دانش، 1394 1394.
[25] D. O. H. H. Services. Heart Attack Payment - Hospital [Online]. Available: https://catalog.data.gov/dataset/heart-attack-payment-hospital.
[26] S. Firdaus and M. A. Uddin, "A survey on clustering algorithms and complexity analysis," International J. of Computer Science Issues, vol. 12, no. 2, pp. 62-85, Mar. 2015.