Preserving Data Clustering with Expectation Maximization Algorithm
Subject Areas : Data MiningLeila Jafar Tafreshi 1 , Farzin Yaghmaee 2
1 - Semnan University
2 - ُSemnan University -Electrical & Computer Engineering Department
Keywords: Privacy Preserving, Clustering, Data Mining, Expectation Maximization Algorithm,
Abstract :
Data mining and knowledge discovery are important technologies for business and research. Despite their benefits in various areas such as marketing, business and medical analysis, the use of data mining techniques can also result in new threats to privacy and information security. Therefore, a new class of data mining methods called privacy preserving data mining (PPDM) has been developed. The aim of researches in this field is to develop techniques those could be applied to databases without violating the privacy of individuals. In this work we introduce a new approach to preserve sensitive information in databases with both numerical and categorical attributes using fuzzy logic. We map a database into a new one that conceals private information while preserving mining benefits. In our proposed method, we use fuzzy membership functions (MFs) such as Gaussian, P-shaped, Sigmoid, S-shaped and Z-shaped for private data. Then we cluster modified datasets by Expectation Maximization (EM) algorithm. Our experimental results show that using fuzzy logic for preserving data privacy guarantees valid data clustering results while protecting sensitive information. The accuracy of the clustering algorithm using fuzzy data is approximately equivalent to original data and is better than the state of the art methods in this field.
[1] NIRT, RGPV, and Sajjan Singh Nagar, “A review paper on Privacy-Preserving Data Mining.” Scholars Journal of Engineering and Technology (SJET), Vol. 1, No. 3, pp. 117-121, 2013.#
[2] Lokesh Patel, Prof. Ravindra Gupta, “A Survey of Perturbation Technique for Privacy-Preserving of Data.” International Journal of Emerging Technology and Advanced Engineering, Vol. 3, No. 6, pp. 162-166, 2013.#
[3] Jharna Chopra, Sampada Satav, “Privacy preservation techniques in data mining.” International Journal of Research in Engineering and Technology (IJRET), Vol. 2, No. 4, pp. 537-541, Year 2013.#
[4] Tamanna Kachwala, Dr. L. K. Sharma, “A Literature analysis on Privacy Preserving Data Mining.” International Journal of Innovative Research in Computer and Communication Engineering, Vol. 3, Issue 4, April 2015.#
[5] A.P. Dempster, N.M. Laird, D.B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm.” Journal of the Royal Statistical Society, Series B, Vol. 39, No. 1, pp. 1–38,1977.#
[6] Ronica Raj, Veena Kulkarni, “A Study on Privacy Preserving Data Mining: Techniques, Challenges and Future Prospects.” International Journal of Innovative Research in Computer and Communication Engineering, Vol. 3, Issue 11, November 2015.#
[7] R. Agrawal and R. Srikant. “Privacy Preserving DataMining.” In Proc. ACM SIGMOD Conference on Management of Data, Dallas, Texas, 2000, pp. 439-450.#
[8] S. R. M. Oliveira, and O. R.Zaiyane, “Achieving Privacy Preservation When Sharing Data for Clustering.” In Proc. Workshop on Secure Data Management in a Connected World, in conjunction with VLDB, Toronto, Ontario, Canada, 2004, pp. 67–82.#
[9] Kun Liu, Hillol Kargupta, Senior Member, IEEE, and Jessica Ryan, “Random Projection-Based Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining.” IEEE transactions on knowledge and data engineering, Vol. 18, No. 1, PP. 92-106, 2006.#
[10] Liming Li, Sch. of Manage, Fuzhou Univ, Fuzhou, Qishan Zhang, “A Privacy preserving Clustering Technique Using Hybrid Data Transformation Method.” in In Proc. IEEE International Conference, 2009, PP. 1502 - 1506.#
[11] Mohammad Ali Kadampur, D.V.L.N Somayajulu, S.S. Shivaji Dhiraj, and Shailesh G.P. Satyam, “Privacy preserving clustering by cluster bulging for information sustenance.” In Proc. 4th International Conference on Information and Automation for Sustainability (ICIAfS), Colombo, Sri Lanka, 2008, pp. 158-164.#
[12] Jie Liu, Yifeng XU, Harbin, “privacy preserving clustering by Random Response Method of Geometric Transformation.” In Proc. Fourth international conference on internet computing for science and engineering, 2009, pp. 181-188.#
[13] Keke Chen, Ling Liu, “Geometric data perturbation for privacy preserving outsourced data mining.” Knowledge and Information Systems journal, Volume 29, Issue 3, pp 657-695, December 2011.#
[14] Khaled Alotaibi, V. J. Rayward-Smith, Wenjia Wang, and Beatriz de la Iglesia, “Non-linear Dimensionality Reduction for Privacy-Preserving Data Classification.” in Proc. ASE/IEEE International Conference on Social Computing, 2012 and ASE/IEEE International Conference on Privacy, Security, Risk and Trust, 2012, pp. 694 - 701.#
[15] Ms Shalini Lamba, Dr S. Qamar Abbas, “A model for preserving privacy of sensitive data.” International Journal of Technical Research and Applications Vol. 1, No. 3, PP. 07-11, 2013.#
[16] MohammadReza Keyvanpour, Somayyeh Seifi Moradi, “Classification and Evaluation the Privacy Preserving Data Mining Techniques by using a Data Modification–based Framework.” International Journal on Computer Science and Engineering (IJCSE), Vol. 3, No. 2, Feb 2011.#
[17] Keerti Dixit, Bhupendra Pandya, “An overview of Multiplicative data perturbation for privacy preserving Data mining.” International Journal for research in applied science and engineering technology (I JRAS ET), Vol. 2, Issue VII, pp 90-96, July 2014.#
[18] CHEN, K., and LIU, L. “A random rotation perturbation approach to privacy preserving data classification.” In Proc. Intl. Conf. on Data Mining (ICDM) 2005.#
[19] http://www.cs.waikato.ac.nz/ml/weka/#