南京大学学报(自然科学版) ›› 2017, Vol. 53 ›› Issue (4): 791–.

• • 上一篇    下一篇

密度峰值聚类相关问题的研究

杨 洁1,2,王国胤1*,庞紫玲1   

  • 出版日期:2017-08-03 发布日期:2017-08-03
  • 作者简介:1.重庆邮电大学计算智能重庆市重点实验室,重庆,400065;2.遵义师范学院物理与电子科学学院,遵义,563002
  • 基金资助:
    基金项目:国家自然科学基金(61572091),重庆市研究生科研创新项目(CYB16106),高端人才项目(RC2016005),省级重点学科(黔学位办[2013]18号)
    收稿日期:2017-06-11
    *通讯联系人,E-mail:wanggy@cqupt.edu.cn

Relative researches of clustering by fast search and find of density peaks

Yang Jie1,2,Wang Guoyin1*,Pang Ziling1   

  • Online:2017-08-03 Published:2017-08-03
  • About author:1.Chongqing Key Laboratory of Computational Intelligence,Chongqing University of
    Posts and Telecommunications,Chongqing,400065,China;
    2.School of Physics and Electronic Science,Zunyi Normal College,Zunyi,563002,China

摘要: 相对于其他的密度聚类算法,密度峰值聚类(Density Peaks Clustering,DPC)算法思想简洁新颖,所需参数少,不需要进行迭代求解,而且具有可扩展性.但是,DPC仍然具有一定缺陷,例如存在截断阈值dc的定义模糊以及选取中心点失效等问题.在阐述了DPC的算法思想和原理的基础上,分析了DPC算法的缺陷,然后从多个改进的角度对其相关研究工作进行了综述.通过分析DPC与相关理论(数据场、图论、粒计算等)的联系,针对密度峰值的缺点,提出了基于粒计算的DPC算法改进框架,其中包括由细到粗、由细到粗和双向变粒度这三种机制以及基于网格粒化的密度峰值算法框架.最后对DPC今后的研究工作进行了展望,包括动态密度峰值聚类、利用密度峰值研究网络拓扑、处理复杂任务以及改进其他聚类等,希望为DPC的进一步研究提供新思想

Abstract: Compared with other clustering algorithms,density peaks clustering(DPC) algorithm possess a novel and concise idea,which requires fewer parameters and iteration,besides it is scalable.However,there are also some defects,such as the definition of dc and selection of the centers.In this paper,the principle of DPC is introduced and the defects of DPC are analyzed firstly.Based on the related key problems,main theoretical researches are surveyed in various views.Besides,the relationship between DPC and relevant theory is also analyzed,i.e.data field,graph theory and granular computing.Especially,to address these issue of DPC,three mechanisms as follows:from fine to coarse,from coarse to from,and bi-directional variable granularity,which are presented based on the advantages of granular computing.The mechanism of DPC based on mesh granulation is also presented.Finally,the existing challenge problems are analyzed,and the research directions of DPC in the future are explored,including dynamic density peak clustering,the research of network topology by DPC,complex tasks solving by DPC,and the improvement of other clustering algorithms using DPC,et al,which provide valuable reference for researchers and data analysts who are concerned with clustering.

[1] Jain A K,Murty M N,Flynn P J.Data clustering:A review.ACM Computing Surveys,1999,31(3):264-323.
[2]  Fu L M,Medico E.FLAME,a novel fuzzy clustering method for the analysis of DNA microarray data.BMC Bioinformatics,2007,8:3.
[3]  Ryu T W.Discovery of characteristic knowledge in databases using cluster analysis and genetic programming.Ph.D.Dissertation.Houston,TX,USA:University of Houston,1998.
[4]  Jain A K.Data clustering:50 years beyond K-means.Pattern Recognition Letters,2010,31(8):651-666.
[5]  Hartigan J A,Wong M A.A K-means clustering algorithm.Applied Statistics,1979,28(1):100-108.
[6]  Ester B,Kriegel H P,Sander J,et al.A density-based algorithm for discovering clusters in large spatial databases with noise.In:Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining.Portland,OR,USA:AAAI,1996:226-231.
[7]  Sydow J,Lerch F,Staber U.Planning for path dependence?The case of a network in the Berlin-Brandenburg optics cluster.Economic Geography,2010,86(2):173-195.
[8]  马 帅,王腾蛟,唐世渭等.一种基于参考点和密度的快速聚类算法.软件学报,2003,14(6):1089-1095.(Ma S,Wang T J,Tang S W,et al.A fast clustering algorithm based on reference and density.Journal of Software,2003,14(6):1089-1095.)
[9]  Rodriguez A,Laio A.Clustering by fast search and find of density peaks.Science,2014,344(6191):1492-1496.
[10]  Park H S,Jun C H.A simple and fast algorithm for K-medoids clustering.Expert Systems with Applications,2009,36(2):3336-3341.
[11]  Saragih J M,Lucey S,Cohn J F.Deformable model fitting by regularized landmark mean-shift.International Journal of Computer Vision,2011,91(2):200-215.
[12]  Sun K,Geng X R,Ji L Y.Exemplar component analysis:A fast band selection method for hyperspectral imagery.IEEE Geoscience and Remote Sensing Letters,2015,12(5):998-1002.
[13]  Chen Y W,Lai D H,Qi H,et al.A new method to estimate ages of facial image for large database.Multimedia Tools and Applications,2016,75(5):2877-2895.
[14]  Wu H,Wan Y.Clustering assisted fundamental matrix estimation.arXiv:1504.03409,2015.
[15]  Dean K M,Davis L M,Lubbeck J L,et al.High-speed multiparameter photo physical analyses of fluorophore libraries.Analytical Chemistry,2015,87(10):5026-5030.
[16]  刘颖莹,刘培玉,王智昊等.一种基于密度峰值发现的文本聚类算法.山东大学学报(理学版),2016,51(1):65-70.(Liu Y Y,Liu P Y,Wang Z H,et al.A text clustering algorithm based on find of density peaks.Journal of Shandong University(Natural Science)2016,51(1):65-70.) 
[17]  Zhang Y,Xia Y Q,Liu Y,et al.Clustering sentences with density peaks for multi-document summarization.In:Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Denver,CO,USA:ACL,2015:1262-1267.
[18]  Wang M M,Zuo W L,Wang Y.An improved density peaks-based clustering method for social circle discovery in social networks.Neurocomputing,2016,179:219-227.
[19]  何 伟,邢孟道.基于密度峰值搜索的全极化SAR图像分类.系统工程与电子技术,2016,38(1):60-63.(He W,Xing M D.Classification method for POLSAR images based on find of density peak.Systems Engineering and Electronics,2016,38(1):60-63.)
[20]  Wang S L,Wang D K,Li C Y,et al.Clustering by fast search and find of density peaks with data field.Chinese Journal of Electronics,2016,25(3):397-402.
[21]  Qiu T,Yang K F,Li C Y,et al.A physically inspired clustering algorithm:To evolve like particles.arXiv:1412.5902,2014.
[22]  Xie J Y,Gao H C,Xie W X,et al.Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors.Information Sciences,2016,354:19-40.
[23]  谢娟英,高红超,谢维信.K近邻优化的密度峰值快速搜索聚类算法.中国科学:信息科学,2016,46(2):258-280.(Xie J Y,Gao H C,Xie W X.K-nearest neighbors optimized clustering algorithm by fast search and finding the density peaks of a dataset.Scientia Sinica(Informationis),2016,46(2):258-280.)
[24]  Du M J,Ding S F,Jia H J.Study on density peaks clustering based on K-nearest neighbors and principal component analysis.Knowledge-Based Systems,2016,99:135-145.
[25]  Xu J,Wang G,Li T,et al.Fat node leading tree for data stream clustering with density peaks.Knowledge-Based Systems,2017,120:99-117.
[26]  Wang G Y,Xu J,Zhang Q H,et al.Multi-granularity intelligent information processing.In:Yao Y,Hu Q,Yu H,et al.Rough Sets,Fuzzy Sets,Data Mining,and Granular Computing.Cham,Germany:Springer,2015:36-48.
[27]  Zhang W K,Li J.Extended fast search clustering algorithm:Widely density clusters,no density peaks.arXiv:1505.05610,2015.
[28]  张文开.基于密度的层次聚类算法研究.硕士学位论文.合肥:中国科学技术大学,2015.(Zhang W K.Research on density-based Hierarchical clustering algorithm.Master Dissertation.Hefei:University of Science & Technology of China,2015.)
[29]  Mehmood R,Bie R F,Dawood H,et al.Fuzzy clustering by fast search and find of density peaks.In:Proceedings of 2015 International Conference on Identification,Information,and Knowledge in the Internet of Things.Beijing,China:IEEE,2015:258-261.
[30]  Xu X H,Ju Y S,Liang Y L,et al.Manifold density peaks clustering algorithm.In:Proceedings of the 3rd International Conference on Advanced Cloud and Big Data.Yangzhou,China:IEEE,2015:311-318.
[31]  Zhang Q C,Zhu C S,Yang L T,et al.An incremental CFS algorithm for clustering large data in industrial internet of things.IEEE Transactions on Industrial Informatics,2017,13(3):1.
[32]  Zhang Y F,Chen S M,Yu G.Efficient distributed density peaks for clustering large data sets in MapReduce.IEEE Transactions on Knowledge and Data engineering,2016,28(12):3218-3230.
[33]  Wang S L,Gan W Y,Li D Y,et al.Data field for hierarchical clustering.International Journal of Data Warehousing and Mining,2011,7(4):43-63.
[34]  Li D R,Wang S L,Li D Y.Data field.In:Li D R,Wang S L,Li D Y.Spatial Data Mining:Theory and Application.Springer Berlin Heidelberg,2015:175-185.
[35]  淦文燕,李德毅,王建民.一种基于数据场的层次聚类方法.电子学报,2006,34(2):258-262.(Gan W Y,Li D Y,Wang J M.An hierarchical clustering method based on data fields.Acta Electronica Sinica,2006,34(2):258-262.)
[36]  Tenenbaum J B,de Silva V,Langford J C.A global geometric framework for nonlinear dimensionality reduction.Science,2000,290(5500):2319-2323.

[37]  Frey B J,Dueck D.Clustering by passing messages between data points.Science,2007,315(5814):972-976.
[38]  Xu J,Wang G Y.Leading tree in DPCLUS and its impact on building hierarchies.arXiv:1506.03879,2015.
[39]  Yao Y Y.Artificial intelligence perspectives on granular computing.In:Pedrycz W,Chen S M.Granular Computing and Intelligent Systems.Springer Berlin Heidelberg,2011:17-34.
[40]  Yao Y Y.Perspectives of granular computing.In:Proceedings of 2005 IEEE International Conference on Granular Computing.Beijing,China:IEEE,2005,1:85-90.
[41]  Gómez D,Zarrazola E,Yez J,et al.A divide-and-link algorithm for hierarchical clustering in networks.Information Sciences,2015,316:308-328.
[42]  Chen L.Topological structure in visual perception.Science,1982,218(4573):699-700.
[43]  Skowron A,Jankowski A,Dutta S.Toward problem solving support based on big data and domain knowledge:Interactive granular computing and adaptive judgement.In:Japkowicz N,Stefanowski J.Big Data Analysis:New Algorithms for A New Society.Springer Berlin Heidelberg,2016:49-90.
[44]  Hinneburg A,Keim D A.Optimal grid-clustering:Towards breaking the curse of dimensionality in high-dimensional clustering.In:Proceedings of the 25th International Conference on Very Large Data Base.San Francisco,CA,USA:Morgan Kaufmann Publishers Inc.,1999:506-517.
[45]  Simon H A,Ando A.Aggregation of variables in dynamic systems.Econometrica,1961,29(2):111-138.
[46]  Liu Y C,Li D Y,He W,et al.Granular computing based on Gaussian cloud transfor-mation.Fundamenta Informaticae,2013,127(1-4):385-398.
 

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!