南京大学学报(自然科学版) ›› 2016, Vol. 52 ›› Issue (4): 724.
华佳林1*,朱 杰1,2,于 剑1
Hua Jialin1*,Zhu Jie1,2,Yu Jian1
摘要: 利用数据点的密度堆积起来的山脉能反映数据的结构,从而催生了山峰聚类(Mountains Clustering).遗憾的是,目前的山峰聚类算法深受数据分布结构的影响.提出一个新的聚类方法,称为分割-合并聚类算法(divisionjoin clustering framework,DJCF),它能发现由密度堆积的整个山脉中所有的山峰,然后将这些山峰根据彼此之间的关系进行合并,得到的结果对应最终的聚类.通过由两个阶段组成的一个流程,DJCF算法能对任何形状和分布的数据进行聚类.算法第一个阶段的目的是将数据集分割成多个划分(partition),真正的类由若干个划分组合而成.在这个阶段中利用K-近邻(KNN)设计了一种密度计算方式,然后将新密度计算方式运用到Clusterdp算法中,使用了新密度计算方式的Clusterdp算法能更准确地找到数据集的划分.算法的第二个阶段是将找出来的划分根据彼此之间的关系组合成最终的聚类.在人工数据和实际数据中的实验验证了该算法的简单和有效性.
[1] Jain A K,Murty M N,Flynn P J.Data clustering:Areview.ACM Computing Surveys(CSUR),1999,31(3):264-323. [2] Manning C E.Fractal clustering of metamorphic veins.Geology,1994,22(4):335-338. [3] Wang J,Wu X,Zhang C.Support vector machines based on Kmeans clustering for realtime business intelligence systems.International Journal of Business Intelligence and Data Mining,2005,1(1):54-64. [4] Szolovits P.Artificial intelligence in medicine.Boulder Colorado:Westview Press,1982,25-60. [5] McQuitty L L.Elementary linkage analysis for isolating orthogonal and oblique types and typal relevancies.Educational and Psychological Measurement,1957,17:207-229. [6] Bezdek J C.Pattern recognition with fuzzy objective function algorithms.Springer Science & Business Media,2013,24-29. [7] Krishna K,Murty M N.Genetic Kmeans algorithm.IEEE Transactions onSystems,Man,and Cybernetics,Part B:Cybernetics,1999,29(3):433-439. [8] Yager R R,Filev D P.Essentials of fuzzy modeling and control.New York,1994,388. [9] Velthuizen R P,Hall L O,Clarke L P,et al.An investigation of mountain method clustering for large data sets.Pattern Recognition,1997,30(7):1121-1135. [10] Yang M S,Wu K L.A modified mountain clustering algorithm.Pattern Analysis and Applications,2005,8(1-2):125-138. [11] Rodriguez A,Laio A.Clustering by fast search and find of density peaks.Science,2014,344(6191):1492-1496. [12] Liu M,Jiang X,Kot A C.A multiprototype clustering algorithm.Pattern Recognition,2009,42(5):689-698. [13] Luo T,Zhong C,Li H,et al.A multiprototype clustering algorithm based on minimum spanning tree.In:2010 the 7th International Conference on Fuzzy Systems and Knowledge Discovery(FSKD).Yantai,China:IEEE,2010,4:1602-1607. [14] Chen Y W,Du J X.A new method for classifying chinese text based on semantic topics and desity peaks.International Journal of Applied Mathematics and Machine Learning,2014,1(1):35-54. [15] Kobak D,Brendel W,Constantinidis C,et al.Demixed principal component analysis of population activity in higher cortical areas reveals independent representation of task parameters.arXiv preprint arXiv:1410.6031,2014. [16] Decelle A,RicciTersenghi F.Solving the inverse Ising problem by meanfield methods in a clustered phase space with many states.arXiv preprint arXiv:1501.03034,2015. [17] Wu H,Wan Y.Clustering assisted fundamental matrix estimation.arXiv preprint arXiv:1504.03409,2015. [18] Zhang W,Li J.Extended fast search clustering algorithm:Widely density clusters,no density peaks.arXiv preprint arXiv:1505.05610,2015. [19] Wang S,Wang D,Li C,et al.Comment on “Clustering by fast search and find of density peaks”.arXiv preprint arXiv:1501.04267,2015. [20] Johnson R A,Wichern D W.Clustering,distance methods,and ordination.Applied Multivariate Statistical Analysis,1998,4:726-99. [21] Pandit S,Gupta S.A comparative study on distance measuring approaches for clustering.International Journal of Research in Computer Science,2011,2(1):29-31. [22] Croft W B.Clustering large files of documents using the singlelink method.Journal of the American Society for Information Science,1977,28(6):341-344. [23] Defays D.An efficient algorithm for a complete link method.The Computer Journal,1977,20(4):364-366. [24] Fu L,Medico E.FLAME,a novel fuzzy clustering method for the analysis of DNA microarray data.BMC Bioinformatics,2007,8(1):3. [25] Gionis A,Mannila H,Tsaparas P.Clustering aggregation.ACM Transactions on Knowledge Discovery from Data(TKDD),2007,1(1):4. [26] Zahn C T.Graphtheoretical methods for detecting and describing gestalt clusters.IEEE Transactions on Computers,1971,100(1):68-86. [27] Jain A K,Law M H C.Data clustering:A user’s dilemma.Pattern Recognition and Machine Intelligence.Springer Berlin Heidelberg,2005:1-10. [28] Lichman M.UCI Machine Learning Repository.http://archive.ics.uci.edu/ml.Irvine,CA:University of California,School of Information and Computer Science,2013. |
No related articles found! |
|