南京大学学报(自然科学版) ›› 2017, Vol. 53 ›› Issue (2): 368.
贾培灵1,樊建聪1,2*,彭延军1,2
Jia Peiling1,Fan Jiancong1,2*,Peng Yanjun1,2
摘要: 相比其它聚类算法,密度峰值点快速搜索聚类算法(clustering by fast search and find of density peaks,DPC)只需较少的参数就能达到较好的聚类结果,然而当某个类存在多个密度峰值时,聚类结果不理想.针对这一问题,提出一种基于簇边界划分的DPC算法:B-DPC算法.改进算法首先利用一种新的去除噪声准则对数据集进行清理,再调用DPC算法进行首次聚类.最后搜索并发现邻近类的边界样本,根据边界样本的数量和所占比例,对首次聚类结果进行二次聚类.实验证明,B-DPC算法较好地解决了多密度峰值聚类问题,能够发现任意形状的簇,对噪声不敏感.
[1] Han J,Pei J,Kamber M.Data mining:Concepts and techniques.Elsevier,2011,228-321. [2] 梁吉业,钱宇华,李德玉等.大数据挖掘的粒计算理论与方法.中国科学:信息科学,2015,45(11):1355-1369.(Liang J Y,Qian Y H,Li D Y,et al.Theory and method of granular computing for big data mining.Science China:Information Sciences,2015,45(11):1355-1369.) [3] 张 虎,谭红叶,钱宇华等.基于集成学习的中文文本欺骗检测研究.计算机研究与发展,2015,52(5):1005-1013.(Zhang H,Tan H Y,Qian Y H,et al.Chinese text deception based on ensemble learning.Journal of Computer Research and Development,2015,52(5):1005-1013.) [4] Grira N,Crucianu M,Boujemaa N.Active semi-supervised fuzzy clustering for image database categorization.In:ACM SIGM International Workshop on Multimedia Information Retrieval.Singapore:DBLP,2005:9-16. [5] Sun J G,Liu J,Zhao L Y.Clustering algorithms research.Journal of Software,2008,19(1):48-61. [6] Celebi M E,Kingravi H A,Vela P A.A comparative study of efficient initialization methods for the K-means clustering algorithm.Expert Systems with Applications,2013,40(1):200-210. [7] 雷小锋,谢昆青,林 帆等.一种基于K-means局部最优性的高效聚类算法.软件学报,2008,19(7):1683-1692.(Lei X F,Xie K Q,Lin F,et al.An Efficient clustering algorithm based on local optimality of K-means.Journal of Software,2008,19(7):1683-1692.) [8] 蔡宇浩,梁永全,樊建聪等.加权局部方差优化初始簇中心的K-means算法.计算机科学与探索,2016,10(5):732-741.(Cai Y H,Liang Y Q,Fan J C,et al.Optimizing initial cluster centroids by weighted local variance in K-means algorithm.Journal of Frontiers of Computer Science and Technology,2016,10(5):732-741.) [9] Karypis G,Han E H,Kumar V.Chameleon:Hierarchical clustering using dynamic modeling.Computer,1999,32(8):68-75. [10] Tran T N,Drab K,Daszykowski M.Revised DBSCAN algorithm to cluster data with dense adjacent clusters.Chemometrics & Intelligent Laboratory Systems,2013,120(2):92-96. [11] Liu Q,Deng M,Shi Y,et al.A density-based spatial clustering algorithm considering both spatial proximity and attribute similarity.Computers & Geosciences,2012,46:296-309. [12] Ertöz L,Steinbach M,Kumar V.Finding clusters of different sizes,shapes,and densities in noisy,high dimensional data.In:Proceedings of the 2003 SIAM International Conference on Data Mining.San Francisco,CA,USA:Society for Industrial and Applied Mathematics,2003:47-58. [13] Dutta M,Mahanta A K,Pujari A K.QROCK:A quick version of the ROCK algorithm for clustering of categorical data.Pattern Recognition Letters,2005,26(15):2364-2373. [14] 高 琰,谷士文,唐 琏等.机器学习中谱聚类方法的研究.计算机科学,2007,34(2):201-203.(Gao Y,Gu S W,Tang L,et al.Research on spectral clustering in machine learning.Computer Science,2007,34(2):201-203.) [15] Frey B J,Dueck D.Clustering by passing messages between data points.Science,2007,315(5814):972-976. [16] Rodriguez A,Laio A.Clustering by fast search and find of density peaks.Science,2014,344(6191):1492-1496. [17] Cheng Y.Mean shift,mode seeking,and clustering.IEEE Transactions on Pattern Analysis and Machine Intelligence,1995,17(8):790-799. [18] 谢娟英,高红超,谢维信.K近邻优化的密度峰值快速搜索聚类算法.中国科学:信息科学,2016,46(2):258-280.(Xie J Y,Gao H C,Xie W X.K-nearest neighbors optimized clustering algorithm by fast search and finding the density peaks of a dataset.Science China:Information Sciences,2016,46(2):258-280.) [19] Zhang W,Li J.Extended fast search clustering algorithm:widely density clusters,no density peaks.arXiv preprint arXiv:1505.05610,2015. [20] Gionis A,Mannila H,Tsaparas P.Clustering aggregation.ACM Transactions on Knowledge Discovery from Data(TKDD),2007,1(1):4. [21] Zahn C T.Graph-theoretical methods for detecting and describing gestalt clusters.IEEE Transactions on Computers,1971,100(1):68-86. [22] Fu L,Medico E.FLAME,a novel fuzzy clustering method for the analysis of DNA microarray data.BMC Bioinformatics,2007,8(1):1. [23] Jain A K,Law M H.Data clustering:A user’s dilemma.In:International Conference on Pattern Recognition and Machine Intelligence.Springer Berlin Heidelberg,2005:1-10. [24] Fränti P,Virmajoki O.Iterative shrinking method for clustering problems.Pattern Recognition,2006,39(5):761-775. [25] Cover T M,Thomas J A.Information theory and statistics.Elements of Information Theory,1991,279-335. [26] Cai D,He X,Han J.Document clustering using locality preserving indexing.IEEE Transactions on Knowledge and Data Engineering,2005,17(12):1624-1637. |
No related articles found! |
|