استفاده از خوشهبندی BIRCH و الگوریتم بهینهسازی واکنش شیمیایی جهت کشف تقلب در حوزه سلامت
الموضوعات :مجید عبدالرزاق نژاد 1 , مهدی خرد 2
1 - دانشگاه بزرگمهر قائنات
2 - دانشگاه قم
الکلمات المفتاحية: الگوریتم بهینهسازی واکنش شیمیاییحوزه سلامتخوشهبندی BIRCHکشف تقلب,
ملخص المقالة :
حوزه سلامت به علت وسعت عملکرد مالی و همچنین وسعت کاربرد آن، یکی از سیستمهای ایدهآل برای تقلب است و با وجود راهکارهای مختلف در این زمینه، شناسایی دادههای تقلب هنوز یکی از چالشها برای ارائهدهندگان خدمات سلامت میباشد. در این مقاله برای اولین بار الگوریتم BIRCH به عنوان یک الگوریتم خوشهبندی سلسلهمراتبی با الگوریتم بهینهسازی واکنش شیمیایی (CRO) ترکیب شده است. الگوریتم BIRCH با پیچیدگی زمانی خطی قابلیت کار با حجم بالای دادهها و شناسایی دادههای پرت را دارد و CRO یکی از الگوریتمهای فراابتکاری جدید الهامگرفته از واکنش شیمیایی در دنیای واقعی است که با یک جمعیت پویا از مولکولها توسط چهار عملگر برخورد به دیواره، تجزیه، برخورد بین مولکولی و ترکیب فضای جستجو را مورد کاوش قرار میدهند. الگوریتم خوشهبندی بهبودیافته BIRCH-CRO با حذف فرایند خوشهبندی سراسری داخلی نسخه کلاسیک BIRCH و تعیین بهینه پارامترهای اصلی آن باعث بهبود سرعت و دقت تشخیص دادههای تقلب در حوزه سلامت نسبت به سایر الگوریتمهای بدون نظارت ارائهشده در این حوزه گردیده است. همچنین الگوریتم پیشنهادی توانایی کار با دادههای آنلاین و حجم بالا را دارد و با توجه به نتایج به دست آمده، عملکرد مناسبی را فراهم میکند.
[1] S. Roglaski, "Business intelligence: 360 insight: the intelligence challenge," DM Review Magazine, vol. 68, pp. 90-113, Jun. 2016.
[2] B. H. Pilon, J. J. Murillo-Fuentes, J. P. C. L. da Costa, R. T. de Sousa Junior, and A. M. R. Serrano, "Gaussian process for regression in business intelligence: a fraud detection application," in Proc. of the 7th Int. Joint Conf. on Knowledge Discovery, Knowledge Engineering, and Knowledge Management, vol.3, pp. 39-49, Nov. 2015.
[3] Q. Liu and M. Vasarhelyi, "Healthcare fraud detection: a survey and a clustering model incorporating geo-location information," in Proc. 29th World Continuous Auditing and Reporting Symp., 10, pp., Brisbane, Australia, 21-22 Nov. 2013.
[4] T. Zhang, R. Ramakrishnan, and M. Livny, "BIRCH: an efficient data clustering method for very large databases," ACM Sigmod Record, vol. 25, no. 2, pp. 103-114, Jun. 1996.
[5] A. Y. Lam and V. O. Li, "Chemical reaction optimization: a tutorial," Memetic Computing, vol. 4, no. 1, pp. 3-17, Mar. 2012.
[6] R. M. Musal, "Two models to investigate medicare fraud within unsupervised databases," Expert Systems with Applications, vol. 37, no. 12, pp. 8628-8633, Dec. 2010.
[7] S. Thiprungsri and M. A. Vasarhelyi, "Cluster analysis for anomaly detection in accounting data: an audit approach," 2011.
[8] M. Tang, B. S. U. Mendis, D. W. Murray, Y. Hu, and A. Sutinen, "Unsupervised fraud detection in Medicare Australia," in Proc. of the 9th Australasian Data Mining Conf., Australian Computer Society, AusDM'11, vol. 121, pp. 103-110, Ballarat, Australia, 2011.
[9] R. Ghani and M. Kumar, "Interactive learning for efficiently detecting errors in insurance claims," in Proc. of the 17th ACM SIGKDD Int Conf. on Knowledge Discovery and Data Mining, ACM, pp. 325-333, San Diego, CA, USA, 21-24 Aug. 2011.
[10] T. Ekina, F. Leva, F. Ruggeri, and R. Soyer, "Application of bayesian methods in detection of healthcare fraud," Chemical Engineering Trans., vol. 33, pp. 151-156, Sept. 2013.
[11] C. Ngufor and J. Wojtusiak, "Unsupervised labeling of data for supervised learning and its application to medical claims prediction," Computer Science, vol. 14, no. 2, p. 191-214, 2013.
[12] V. Rawte and G. Anuradha, "Fraud detection in health insurance using data mining techniques," in Proc. IEEE Int. Conf. on Communication, Information & Computing Technology, ICCICT’15, 5 pp., Mumbai, India, 15-17 Jan. 2015.
[13] M. E. Johnson and N. Nagarur, "Multi-stage methodology to detect health insurance claim fraud," Health Care Management Science, vol. 19, no. 3, pp. 249-260, Sept. 2016.
[14] H. Peng and M. You, "The health care fraud detection using the pharmacopoeia spectrum tree and neural network analytic contribution hierarchy process," in Proc. IEEE Trustcom/BigDataSE/ISPA, , pp. 2006-2011, Tianjin, China, 23-26 Aug. 2016.
[15] A. Gangopadhyay and S. Chen, "Health care fraud detection with community detection algorithms," in Proc. IEEE Int. Conf. on Smart Computing, SMARTCOMP’16, 5 pp., St. Louis, MO, USA, 18-20 May 2016.
[16] S. G. Fashoto, et al., "Development of improved k-means clustering to partition health insurance claims," Annals. Computer Science Series, vol. 14, no. 1, pp. 51-58, 2016.
[17] H. Ahmadinejad, A. Norouzi, A. Ahmadi, and A. Yousefi, "Distance based model to detect healthcare insurance fraud within unsupervised database," Indian J. of Science and Technology, Indian J. of Science and Technology, vol. 9, no. 43, pp. 1-6, Nov. 2016.
[18] J. Wu, R. Zhang, X. Shang, and F. Chu, "Medical insurance fraud recognition based on improved outlier detection algorithm," in Proc. 2nd Int. Conf. on Artificial Intelligence and Engineering Applications, AIEA'17, pp. 765-772, Guilin, China, 23-24 Sept. 2017.
[19] H. Cao and R. Zhang, "Using PCA to improve the detection of medical insurance fraud in SOFM neural networks," in Proc. of the 3rd Int. Conf. on Management Engineering, Software Engineering and Service Sciences, pp. 117-122, Wuhan, China, 12-14 Jan. 2019.
[20] T. Ekin, F. Ieva, F. Ruggeri, and R. Soyer, "Statistical medical fraud assessment: exposition to an emerging field," International Statistical Review, vol. 86, no. 3, pp. 379-402, May 2018.
[21] M. H. Soleymani, M. Yaseri, F. Farzadfar, A. Mohammadpour, F. Sharifi, and M. J. Kabir, "Detecting medical prescriptions suspected of fraud using an unsupervised data mining algorithm," DARU J. of Pharmaceutical Sciences, vol. 26, no. 2, pp. 209-214, Dec. 2018.
[22] D. S. Vijayarani and M. P. Jothi, "Hierarchical and partitioning clustering algorithms for detecting outliers in data streams," International J. of Advanced Research in Computer and Communication Engineering, vol. 3, no. 4, pp. 6205-6207, Apr. 2014.
[23] C. A. Ralanamahatana, J. Lin, D. Gunopulos, E. Keogh, M. Vlachos, and G. Das, "Mining Time Series Data," in Data Mining and Knowledge Discovery Handbook: Springer, pp. 1069-1103, 2005.
[24] م. اسماعیلی، دادهکاوی و مفاهیم آن، ناشر نياز دانش، 1394 1394.
[25] D. O. H. H. Services. Heart Attack Payment - Hospital [Online]. Available: https://catalog.data.gov/dataset/heart-attack-payment-hospital.
[26] S. Firdaus and M. A. Uddin, "A survey on clustering algorithms and complexity analysis," International J. of Computer Science Issues, vol. 12, no. 2, pp. 62-85, Mar. 2015.