Privacy Preserving Big Data Mining: Association Rule Hiding
الموضوعات :Golnar Assadat Afzali 1 , شهریار محمدی 2
1 - K. N. Toosi
2 - دانشکده مهندسی صنایع، دانشگاه صنعتی خواجه نصیرالدین طوسی، ایران
الکلمات المفتاحية: Big Data , Association Rule , Privacy Preserving , Anonymization , Data Mining,
ملخص المقالة :
Data repositories contain sensitive information which must be protected from unauthorized access. Existing data mining techniques can be considered as a privacy threat to sensitive data. Association rule mining is one of the utmost data mining techniques which tries to cover relationships between seemingly unrelated data in a data base.. Association rule hiding is a research area in privacy preserving data mining (PPDM) which addresses a solution for hiding sensitive rules within the data problem. Many researches have be done in this area, but most of them focus on reducing undesired side effect of deleting sensitive association rules in static databases. However, in the age of big data, we confront with dynamic data bases with new data entrance at any time. So, most of existing techniques would not be practical and must be updated in order to be appropriate for these huge volume data bases. In this paper, data anonymization technique is used for association rule hiding, while parallelization and scalability features are also embedded in the proposed model, in order to speed up big data mining process. In this way, instead of removing some instances of an existing important association rule, generalization is used to anonymize items in appropriate level. So, if necessary, we can update important association rules based on the new data entrances. We have conducted some experiments using three datasets in order to evaluate performance of the proposed model in comparison with Max-Min2 and HSCRIL. Experimental results show that the information loss of the proposed model is less than existing researches in this area and this model can be executed in a parallel manner for less execution time
[1] Chen,C.L.P. &Zhang,Ch. (2014). Data Intensive Applications, Challenges, Techniques, and Technologies: A Survey on Big Data. Information Science, Vol.275, pp.314-347.
#[2] Kwon,O. Lee,N&Shin,B. (2014). Data Quality Management, Data Usage Experience and Acquisition Intention of Big Data Analytics, International Journal of Information Management, Vol.34, No.3,pp 387-394.
#[3] Cuzzocrea,A. Leung,C.K.S&Mackinnon,R.K. (2014). Mining Constrained Frequent Item-Sets from Distributed Uncertain Data, Future Generation Computer Systems, Vol.37, pp 117-126.
#[4] Zhang,X. Liu,Ch. Nepal,S. Yang,Ch. Dou,W&Chen,Jinjun. (2014) A Hybrid Approach for Scalable Sub-Tree Anonymization over Big Data using MapReduce on Cloud, Journal of Computer and System Science, Vol.80, No.5, pp 1008-1020.
#[5] Li,Y. Chen,M. Li.Q&Zhen,W. (2012). Enabling Multilevel Trust in Privacy Preserving Data Mining, IEEE Transaction on Knowledge and Data Engineering, Vol.24, No.9, pp 1589-1612.
#[6] Wu,Y.H. Chiang,C&Chen,A.L.P. (2007), Hiding Sensitive Association Rules with Limited Side Effects, IEEE Transaction on Knowledge and Data Engineering, Vol.19, No.1, pp 29-42.
#[7] Gkoulalas.D,A&Verykios,V.S. (2009). Exact Knowledge Hiding through Database Extension, IEEE Transaction on Knowledge and Data Engineering, Vol.21, No.5, pp 699-713.
#[8] Le,H.Q. Arch-int,S. Nguyen,H. Xuan, Arch-int, N. (2013).Association Rule Hiding in Risk Management for Retail Supply Chain Collaboration, Computer in Industry, Vol.64, No.4, pp776-784.
#[9] Li,Y.Ch. Yeh,J.Sh&Chang,Ch. (2007), MCIF: An Effective Sanitization Algorithm for Hiding Sensitive Patterns on Data Mining, Advanced Engineering Informatics, Vol.21, No.3, pp 269-280.
#[10] Keshavamurthy,B.N. Toshniwal,D&Eshwar,B.K. (2012). Hiding Co-Occurring Prioritized Sensitive Patterns over Distributed Progressive Sequential Data Streams, Journal of Network and Computer Applications, Vol.35, No.3, pp1116-1129.
#[11] Wu,X. Zhu,X. Wu,G&Ding,W. (2013). Data Mining with Big Data, IEEE Transaction on Knowledge and Data Engineering, Vol.26, No.1, pp 97-107.
#[12] Nergiz, M.E &Gok, M.Z. (2014). Hybrid K-Anonymity, Computers & Security, Vol.44, pp 51-63.
#[13] Li, B. Erdin, E. Gunes, M.H. Bebis, G. Shipley,T. (2013). An Overview of Anonymity Technology Usage, Computer Communication, Vol.36, No.12, pp 1269-1283.
#[14] Monreale,A. Andrienko,G. Andrienko,N. Giannotti,F. Pedreschi,D. Rinzivillo, S &Wrobel, S. (2010). Movement Data Anonymity through Generalization, Transactions on Data Privacy, Vol.3, No.2.
#[15] Kisilevich,S. Rokach,L. Elovici,Y. Shapira, B. (2010). Efficient Multidimensional Suppression for K-Anonymity, IEEE Transaction on Knowledge and Data Engineering, Vol.22, No.3, pp 334-347.
#[16] Zhang, G. Yang,Y. Liu, X & Chen, J. (2010). A Time-Series Pattern Based Noise Generation Strategy for Privacy Protection in Cloud Computing, International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp 458-465.
#[17] Wang, H. (2013). Quality Measurement for Association Rule Hiding, AASRI Procedia, Vol.5, pp 228-234.
#[18] Moustakides,G.V&Verykios, V.S. (2008). A MaxMin Approach for Hiding Frequent Item Sets, Data & Knowledge Engineering, Vol.65, No.1, pp 75-89.
#[19] Wang, Sh. Parikh,B&Jafari, A. (2007). Hiding Informative Association Rule Sets, Expert Systems and Applications, Vol.33, No.2, pp 316-323.
#[20] Wang,Ch. Tseng,Sh&Hongm, T. (2006). Flexible Online Association Rule Mining Based on Multidimensional Pattern Relations, Information Science, Vol.167, No.12, pp 1752-1780.
#[21] Dasseni, E. Verykios,V.S. Elmagarmid,A.K&Bertino,E. (2001). Hiding Association Rules by Using Confidence and Support, Information Hiding Lecture Notes in Computer Science, Vol.2137, pp 369-383.
#[22] Jung, K. Park,S. Cho,S&Park,S. (2014). A Novel Privacy Preserving Association Rule Mining using Hadoop, The Third International Conference on Data Analytics, pp 131-137.
#[23] Xu, L. Jiang, C. Wang,J. Yuan,J&Ren, Y. Information Security in Big Data: Privacy and Data Mining. IEEE Access, vol.2, pp. 1149-1176.
#[24] Borgelt,Ch& Kruse,R. (2002). Introduction of Association Rules: Apriori Implementation, Physica- Verlog Heidelberg, pp 395-400.
#[25] Borgelt.Ch. (2005). An Implementation of the FP-Growth Algorith, Proceeding of the 1st International Workshop on Open Source Data Mining: Frequent Pattern Mining Implementations, pp 1-5.