南京大学学报(自然科学版) ›› 2018, Vol. 54 ›› Issue (1): 48.
程明畅1,刘友波2,张程嘉2,马铁丰1*
Cheng Mingchang1,Liu Youbo2,Zhang Chengjia2,Ma Tiefeng1*
摘要: K-means算法是应用最广泛的聚类算法之一,但存在明显缺陷:对初始值敏感,还需给定类的数目.层次K-means算法提出将多次k取固定值的K-means运算所得到的中心点作为类的代表,并通过对这些中心点进行层次聚类来得到更好的初始聚类中心,然而在中心的融合过程中并没有有效利用类的几何信息.从类的几何特征入手,提出一种基于类的分位数半径的动态K-means算法(QRD K-means).此算法在层次K-means的基础上令每次K-means运算的k值变动起来,且又引入了分位数半径的概念,用样本点到类中心距离的分位数作为类的半径,将样本点间的关系简化为各个类的分位数半径与类中心的关系.通过中心点间距离与分位数半径大小的比较对中心点进行融合形成新类,从而快速给出良好的聚类结果,同时也确定了类的数目.在仿真实验中,通过与不同算法在时间和分类精确度上的比较分析,也证明该方法快速有效.
[1] 金建国.聚类方法综述.计算机科学,2014,41(11A):288-293.(Jin J G.Review of clustering method.Computer Science,2014,41(11A):288-293.) [2] Huang Z X.Extensions to the k-means algorithm for clustering large data sets with categorical values.Data Mining and Knowledge Discovery,1998,2(3):283-304. [3] 苏锦旗,薛惠锋,詹海亮.基于划分的K-均值初始聚类中心优化算法.微电子学与计算机,2009,26(1):8-11.(Su J Q,Xue H F,Zhan H L.K-means initial clustering center optimal algorithm based on partitioning.Microelectronics & Computer,2009,26(1):8-11.) [4] Li J H,Song S J,Zhang L Y,et al.Robust K-median and k-means clustering algorithms for incomplete data.Mathematical Problems in Engineering,2017,Article ID:4321928,doi:10.1155/2016/4321928. [5] Munir M U,Javed M Y,Khan S A.A hierarchical k-means clustering based fingerprint quality classification.Neurocomputing,2012,85:62-67. [6] Liao K Y,Liu G Z,Xiao L,et al.A sample-based hierarchical adaptive K-means clustering method for large-scale video retrieval.Knowledge-Based Systems,2013,49:123-133. [7] MacQueen J B.Some methods for classification and analysis of multivariate observations ∥ Proceeding of the 5th Berkeley Symposium on Mathematical Statistics and Probability.Berkeley,CA,USA:University of California Press,1967:281-297. [8] Ruspini E H.A new approach to clustering.Information and Control,1969,15(1):22-32. [9] Arthur D,Vassilvitskii S.k-means++:The advantages of careful seeding ∥ Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms.Pennsylvania,PA,USA:SIAM,2007:1027-1035. [10] Elkan C.Using the triangle inequality to accelerate k-means ∥ Proceedings of the 12th International Conference on International Conference on Machine Learning.Menlo Park,CA,USA:AAAI Press,2003:147-153. [11] Hamerly G.Making k-means even faster ∥ Proceedings of the 2010 SIAM International Conference on Data Mining.Pennsylvania,PA,USA:SIAM,2010:130-140. [12] Likas A,Vlassis N,Verbeek J J.The global k-means clustering algorithm.Pattern Recognition,2003,36(2):451-461. [13] Bhatia S K.Adaptive k-means clustering ∥ Proceedings of the 17th International Florida Artificial Intelligence Research Society Conference.Menlo Park,CA,USA:AAAI Press,2004:695-699. [14] Mashor M Y.Hybrid training algorithm for RBF network.International Journal of the Computer,the Internet and Management,2000,8(2):50-65. [15] Pelleg D,Moore A W.X-means:Extending K-means with efficient estimation of the number of clusters ∥ Proceedings of the 17th Interna-tional Conference on Machine Learning.San Francisco,CA,USA:Morgan Kaufmann,2000:727-734. [16] Kulis B,Jordan M I.Revisiting k-means:New algorithms via bayesian nonparametrics ∥ Proceedings of the 29th International Conference on Machine Learning.Edinburgh,United Kingdom:ICML,2012:513-520. [17] Arai K,Barakbah A R.Hierarchical k-means:An algorithm for centroids initialization for K-means.Reports of the Faculty of Science and Engineering,Saga University,2007,36(1):25-31. |
No related articles found! |
|