南京大学学报(自然科学版) ›› 2016, Vol. 52 ›› Issue (4): 724–.

• • 上一篇    下一篇

一种分割-合并聚类算法

华佳林1*,朱 杰1,2,于 剑1   

  • 出版日期:2016-07-24 发布日期:2016-07-24
  • 作者简介:1.北京交通大学计算机与信息技术学院,交通数据分析与挖掘北京市重点实验室,北京,100044;2.中央司法警官学院信息管理系,保定,071000
  • 基金资助:
    基金项目:国家自然科学基金(61033013,61370129,61375062,61300072,61105056,61402462),高等学校博士学科点专项科研基金(20120009110006),河北省教育厅青年基金(QN2015099),河北省社会科学基金(HB15TQ013) 收稿日期:2016-04-27 *通讯联系人,E­mail:nhma0004@126.com

A new division­join clustering algorithm

Hua Jialin1*,Zhu Jie1,2,Yu Jian1   

  • Online:2016-07-24 Published:2016-07-24
  • About author: 1.Beijing Key Laboratory of Traffic Data Analysis and Mining,School of Computer and Information Technology,Beijing Jiaotong University,Beijing,100044,China;2.Department of Information Management,The Central Institute for Correctional Police,Baoding,071000,China

摘要: 利用数据点的密度堆积起来的山脉能反映数据的结构,从而催生了山峰聚类(Mountains Clustering).遗憾的是,目前的山峰聚类算法深受数据分布结构的影响.提出一个新的聚类方法,称为分割-合并聚类算法(division­join clustering framework,DJCF),它能发现由密度堆积的整个山脉中所有的山峰,然后将这些山峰根据彼此之间的关系进行合并,得到的结果对应最终的聚类.通过由两个阶段组成的一个流程,DJCF算法能对任何形状和分布的数据进行聚类.算法第一个阶段的目的是将数据集分割成多个划分(partition),真正的类由若干个划分组合而成.在这个阶段中利用K-近邻(KNN)设计了一种密度计算方式,然后将新密度计算方式运用到Cluster­dp算法中,使用了新密度计算方式的Cluster­dp算法能更准确地找到数据集的划分.算法的第二个阶段是将找出来的划分根据彼此之间的关系组合成最终的聚类.在人工数据和实际数据中的实验验证了该算法的简单和有效性.

Abstract: The mountain,which heaps up by densities of data points,intuitively reflects the structures of data set.Unfortunately,the previous mountain­based clustering methods suffer from the layout of the hills.We proposed a new clustering framework,the division­join clustering framework(DJCF).It can find out almost all of the hills and merge them into the mountains which correspond to the clusters.Through two phases,the DJCF algorithm can recognize clusters regardless of their shape and the layout of the data points.The aim of the first phase is to divid the data set into some partitions which the real clusters are consisted of.In this phase,we take advantage of K­nearest neighbor to design a new way to compute density and arm the Cluster­dp algorithm with the new density.As a result,it can be more accurate to get the partitions of the data set.In the second phase,the partitions are merged into the resulted clusters depending on the relations of partitions.Experiments on artificial data sets and real data sets demonstrate our new approach’s simplicity and effectiveness.

[1] Jain A K,Murty M N,Flynn P J.Data clustering:Areview.ACM Computing Surveys(CSUR),1999,31(3):264-323. [2] Manning C E.Fractal clustering of metamorphic veins.Geology,1994,22(4):335-338. [3] Wang J,Wu X,Zhang C.Support vector machines based on K­means clustering for real­time business intelligence systems.International Journal of Business Intelligence and Data Mining,2005,1(1):54-64. [4] Szolovits P.Artificial intelligence in medicine.Boulder Colorado:Westview Press,1982,25-60. [5] McQuitty L L.Elementary linkage analysis for isolating orthogonal and oblique types and typal relevancies.Educational and Psychological Measurement,1957,17:207-229. [6] Bezdek J C.Pattern recognition with fuzzy objective function algorithms.Springer Science & Business Media,2013,24-29. [7] Krishna K,Murty M N.Genetic K­means algorithm.IEEE Transactions onSystems,Man,and Cybernetics,Part B:Cybernetics,1999,29(3):433-439. [8] Yager R R,Filev D P.Essentials of fuzzy modeling and control.New York,1994,388. [9] Velthuizen R P,Hall L O,Clarke L P,et al.An investigation of mountain method clustering for large data sets.Pattern Recognition,1997,30(7):1121-1135. [10] Yang M S,Wu K L.A modified mountain clustering algorithm.Pattern Analysis and Applications,2005,8(1-2):125-138. [11] Rodriguez A,Laio A.Clustering by fast search and find of density peaks.Science,2014,344(6191):1492-1496. [12] Liu M,Jiang X,Kot A C.A multi­prototype clustering algorithm.Pattern Recognition,2009,42(5):689-698. [13] Luo T,Zhong C,Li H,et al.A multi­prototype clustering algorithm based on minimum spanning tree.In:2010 the 7th International Conference on Fuzzy Systems and Knowledge Discovery(FSKD).Yantai,China:IEEE,2010,4:1602-1607. [14] Chen Y W,Du J X.A new method for classifying chinese text based on semantic topics and desity peaks.International Journal of Applied Mathematics and Machine Learning,2014,1(1):35-54. [15] Kobak D,Brendel W,Constantinidis C,et al.Demixed principal component analysis of population activity in higher cortical areas reveals independent representation of task parameters.arXiv preprint arXiv:1410.6031,2014. [16] Decelle A,Ricci­Tersenghi F.Solving the inverse Ising problem by mean­field methods in a clustered phase space with many states.arXiv preprint arXiv:1501.03034,2015. [17] Wu H,Wan Y.Clustering assisted fundamental matrix estimation.arXiv preprint arXiv:1504.03409,2015. [18] Zhang W,Li J.Extended fast search clustering algorithm:Widely density clusters,no density peaks.arXiv preprint arXiv:1505.05610,2015. [19] Wang S,Wang D,Li C,et al.Comment on “Clustering by fast search and find of density peaks”.arXiv preprint arXiv:1501.04267,2015. [20] Johnson R A,Wichern D W.Clustering,distance methods,and ordination.Applied Multivariate Statistical Analysis,1998,4:726-99. [21] Pandit S,Gupta S.A comparative study on distance measuring approaches for clustering.International Journal of Research in Computer Science,2011,2(1):29-31. [22] Croft W B.Clustering large files of documents using the single­link method.Journal of the American Society for Information Science,1977,28(6):341-344. [23] Defays D.An efficient algorithm for a complete link method.The Computer Journal,1977,20(4):364-366. [24] Fu L,Medico E.FLAME,a novel fuzzy clustering method for the analysis of DNA microarray data.BMC Bioinformatics,2007,8(1):3. [25] Gionis A,Mannila H,Tsaparas P.Clustering aggregation.ACM Transactions on Knowledge Discovery from Data(TKDD),2007,1(1):4. [26] Zahn C T.Graph­theoretical methods for detecting and describing gestalt clusters.IEEE Transactions on Computers,1971,100(1):68-86. [27] Jain A K,Law M H C.Data clustering:A user’s dilemma.Pattern Recognition and Machine Intelligence.Springer Berlin Heidelberg,2005:1-10. [28] Lichman M.UCI Machine Learning Repository.http://archive.ics.uci.edu/ml.Irvine,CA:University of California,School of Information and Computer Science,2013.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!