南京大学学报(自然科学版) ›› 2016, Vol. 52 ›› Issue (6): 1050–.

• • 上一篇    下一篇

基于KD树最优投影划分的k匿名算法

汪小寒1,2,罗永龙1,2*,江叶峰1,赵传信1,2,吴文莉1,郭良敏1,2   

  • 出版日期:2016-11-21 发布日期:2016-11-21
  • 作者简介:1.安徽师范大学数学计算机科学学院,芜湖,241003; 2.安徽师范大学网络与信息安全工程研究中心,芜湖,241003
  • 基金资助:
    基金项目:国家自然科学基金(61672039,61370050),安徽省自然科学基金(1508085QF133) 收稿日期:2016-09-18 *通讯联系人,E­mail:ylluo@ustc.edu.cn

An optimal projection partition k­anonymity algorithm of KD ­tree

Wang Xiaohan1,2,Luo Yonglong1,2*,Jiang Yefeng1,Zhao Chuanxin1,2,Wu Wenli1,Guo Liangmin1,2   

  • Online:2016-11-21 Published:2016-11-21
  • About author: 1.College of Mathematics and Computer Science,Anhui Normal University,Wuhu,241003,China; 2.Engineering Technology Research Center of Network and Information Security, Anhui Normal University,Wuhu,241003,China

摘要: 针对现有数据发布隐私保护保护算法中的“局部最优”划分问题,提出了一种基于KD树最优投影划分的k匿名算法.首先,在全局范围内对每一个属性维度进行遍历,根据投影距离方差值衡量每个维度的离散度,并确定最优维度;然后,在最优属性维度上,计算其划分系数值,并确定最优划分点.进一步引入一种改进的KD树结构,与传统的KD树结点是一个数据点不同,新设计的KD树中的每个结点均是一个集合.用经过划分点并垂直于最优维度的超平面将一个结点分成两部分,分别作为其左、右孩子结点.最后通过理论分析证明了本文算法的正确性,用实验比较和验证了算法的性能,实验结果显示所提算法平均概化范围减小10%~22%,能够实现更优的划分和更好的数据集可用性.

Abstract: Data should be provided to relevant decision makers and researchers.Privacy protection should be carried out before the release because the data contains personal sensitive and privacy information.k­anonymity algorithm is an important algorithm in privacy protection.And partition is a significant method in k­anonymity mechanism.For sake of solving the problem of “local optimization” in the existing privacy preserving algorithm for data publishing,a k­anonymity algorithm based on optimized projection partition of KD­tree is proposed in this paper.Firstly,the optimized attribute dimension is determined by the dispersion degree according to the projection distance variance value in the global domain.Then,in the optimal attribute dimension,the value of the partition coefficient is calculated,and the optimal partition point is determined.Furthermore,an improved KD­tree structure is introduced in this paper.And this structure can solve the problem efficiently.The new designed KD­tree node is a collection rather than a data point in traditional KD­tree.A KD­tree node is divided into two parts as its left child node and its right child node by the hyper plane which pass through the dividing point and is perpendicular to the optimal dimensional.Finally,the validity of the proposed algorithm is verified by theoretical analysis and the performance of the algorithm is demonstrated by comparing experiments.The experimental results show that the proposed algorithm can reduce the average generalization range by 22% to 10%,which can achieve a better division and better data set availability.

[1] 张啸剑,孟小峰.面向数据发布和分析的差分隐私保护.计算机学报,2014,37(4):927-949.(Zhang X J,Meng X F.Differential privacy in data publication and analysis.Chiese Journal of Computers,2014,37(4):927-949.) [2] Sweeney L.k­anonymity:A model for protecting privacy.International Journal of Uncertainty,Fuzziness and Knowledge­Based Systems,2002,10(5):557-570. [3] Sweeney L.Achieving k­anonymity privacy protection using generalization and suppression.International Journal of Uncertainty,Fuzziness and Knowledge­Based Systems,2002,10(5):571-588. [4] Samarati P,Sweeney L.Protecting privacy when disclosing information:k­anonymity and its enforcement through generalization and suppression.Technical report,SRI International,1998. [5] Newton E M,Sweeney L,Malin B.Preserving privacy by de­identifying face images.IEEE transactions on Knowledge and Data Engineering,2005,17(2):232-243. [6] Machanavajjhala A,Kifer D,Gehrke J,et al.l­diversity:Privacy beyond k­anonymity.ACM Transactions on Knowledge Discovery from Data(TKDD),2007,1(1):1-12. [7] Li N,Li T,Venkatasubramanian S.t­closeness:Privacy beyond k­anonymity and l­diversity.In:IEEE 23rd International Conference on Data Engineering(ICDE 2007).Istanbul,Turkey:IEEE,2007:106-115. [8] Wong R C W,Li J,Fu A W C,et al.(α,k)­anonymity:an enhanced k­anonymity model for privacy preserving data publishing.In:Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Philadelphia,PA,USA:ACM,2006:754-759. [9] Andrews M,Wilfong G,Zhang L.Analysis of k­anonymity algorithms for streaming location data.In:2015 IEEE Conference on Computer Communications Workshops(INFOCOM WKSHPS).Hong Kong,China:IEEE,2015:1-6. [10] Di Castro D,Lewin­Eytan L,Maarek Y,et al.Enforcing k­anonymity in web mail auditing.In:Proceedings of the 9th ACM International Conference on Web Search and Data Mining.San Francisco,CA,USA:ACM,2016:327-336. [11] Song D,Sim J,Park K,et al.A privacy­preserving continuous location monitoring system for location­based services.International Journal of Distributed Sensor Networks,2015,11(2):1-10.Article ID:815613.. [12] Ni W,Gu M,Chen X.Location privacy­preserving k nearest neighbor query under user’s preference.Knowledge­Based Systems,2016,103:19-27. [13] Bhat T P,Karthik C,Chandrasekaran K.A privacy preserved data mining approach based on k­partite graph theory.Procedia Computer Science,2015,54:422-430. [14] Liu C G,Liu I,Yao W S,et al.K­anonymity against neighborhood attacks in weighted social networks.Security and Communication Networks,2015,8(18):3864-3882. [15] Sarjapur K,Suma V,Christa S,et al.Big data management system for personal privacy using SW and SDF.Information Systems Design and Intelligent Applications.Springer India,2016:757-763. [16] 冯登国,张 敏,李 昊.大数据安全与隐私保护.计算机学报,2014,37(1):246-258.(Feng D G,Zhang M,Li H.Big data security and privacy protection.Chinese Journal of Computers.2014,37(1):246-258.) [17] 周水庚,李 丰,陶宇飞等.面向数据库应用的隐私保护研究综述.计算机学报,2009,32(5):847-861.(Zhou S G,Li F,Tao Y F,et al.Privacy preservation in database applications.Chinese Journal of Computers,2009,32(5):847-861.) [18] LeFevre K,DeWitt D J,Ramakrishnan R.Mondrian multidimensional k­anonymity.In:Proceedings of the 22nd International Conference on Data Engineering(ICDE’06).Atlanta,GA,USA:IEEE,2006:25-35. [19] 吴英杰,唐庆明,倪巍伟等.基于取整划分函数的k匿名算法.软件学报,2012,8:2138-2148.(Wu Y J,Tang Q M,Ni W W,et al.Algorithm for k­anonymity based on rounded partition function.Journal of Software,2012,8:2138-2148.) [20] 王 超,杨 静,张健沛等.基于投影区域密度划分的k匿名算法.通信学报,2015,8:125-134.(Wang C,Yang J,Zhang J P,et al.Algorithm for k­anonymity based on projection area density partition.Journal of Communications,2015,8:125-134.) [21] Herranz J,Nin J,Solé M.KD­trees and the real disclosure risks of large statistical databases.Information Fusion,2012,13(4):260-273.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!