南京大学学报(自然科学版) ›› 2021, Vol. 57 ›› Issue (5): 775784.doi: 10.13232/j.cnki.jnju.2021.05.007
• • 上一篇
Na Li, Youxiang Duan(), Qifeng Sun, Nan Shen
摘要:
针对聚类算法常见的难以确定参数、难以适应各种形状的数据集、在提高算法普适性时时间复杂度增大的问题,提出一种新的聚类算法:结合数据集全局和局部的特征寻找样本点距离的突变位置,通过计算样本点的簇内最小距离实现凸球型数据集的聚类;在此基础上提出子簇连结性强弱的概念,依据两个容易确定的参数进行子簇合并来适应各种形状的数据集.将该算法与DBSCAN (Density?Based Spatial Clustering of Applications with Noise)等多种聚类算法在四种经典数据集上比较,结果表明,该算法适用于类簇形状复杂的数据集,在同等聚类能力的算法中计算速度更快,且具有参数少、易确定的优点,在综合性能上表现优秀.
中图分类号:
1 | Xu D K,Tian Y J. A comprehensive survey of clustering algorithms. Annals of Data Science,2015,2(2):165-193. |
2 | Zhao L H,Li Y. Study on urban road network traffic district division based on clustering analysis∥2018 Chinese Automation Congress. Xi'an,China:IEEE,2018:3556-3560. |
3 | Liu J,Chen Y C,Zou D P,et al. Study on segmented pricing method of electric vehicles charging service fee based on clustering algorithms∥2019 11th International Conference on Intelligent Human?Machine Systems and Cybernetics. Hangzhou,China:IEEE,2019:76-79. |
4 | Ramírez?Rozo T J,García?álvarez J C,Castellanos?Domínguez C G. Infrared thermal image segmentation using expectation?maximization?based clustering∥2012 17th Symposium of Image,Signal Processing and Artificial Vision. Medellin,CO,USA:IEEE,2012:223-226. |
5 | Bruse J L,Zuluaga M A,Khushnood A,et al. Detecting clinically meaningful shape clusters in medical image data:Metrics analysis for hierarchical clustering applied to healthy and pathological aortic arches. IEEE Transactions on Biomedical Engineering,2017,64(10):2373-2383. |
6 | Wan C Y,Ye M Q,Yao C W,et al. Brain MR image segmentation based on Gaussian filtering and improved FCM clustering algorithm∥2017 10th International Congress on Image and Signal Processing,BioMedical Engineering and Informatics. Shanghai,China:IEEE,2017:1-5. |
7 | 许进文. 数据挖掘中聚类分析算法及应用研究. 计算机光盘软件与应用,2013,16(6):176-177. |
Xu J W. Reaserch on clustering analysis algorithm and application in data mining. Computer CD Software and Application,2013,16(6):176-177. | |
8 | Ester M,Kriegel H P,Sander J,et al. A density?based algorithm for discovering clusters in large spatial databases with noise∥Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. Portland,OR,USA:AAAI Press,1996:226-231. |
9 | 周爱武,于亚飞. K?Means聚类算法的研究. 计算机技术与发展,2011,21(2):62-65. |
Zhou A W,Yu Y F. The research about clustering algorithm of K?Means. Computer Technology and Development,2011,21(2):62-65. | |
10 | Zhang T,Ramakrishnan R,Livny M. BIRCH:An efficient data clustering method for very large databases. ACM SIGMOD Record,1996,25(2):103-114. |
11 | 杨黎刚,苏宏业,张英等. 基于SOM聚类的数据挖掘方法及其应用研究. 计算机工程与科学,2007,29(8):133-136. |
Yang L G,Su H Y,Zhang Y,et al. A method of data mining based on SOM clustering and its application. Computer Engineering and Science,2007,29(8):133-136. | |
12 | 王凯南,金立左. 基于高斯混合模型的EM算法改进与优化. 工业控制计算机,2017,30(5):115-116,118. |
Wang K N,Jin L Z. Improvement and optimization of EM algorithm based on Gaussian mixture model. Industrial Control Computer,2017,30(5):115-116,118. | |
13 | Zelnik?Manor L,Perona P. Self?tuning spectral clustering∥Proceedings of the 17th International Conference on Neural Information Processing Systems. Cambridge,MA,USA:MIT Press,2004:1601-1608. |
14 | Xie J Y,Zhou Y,Ding L J. Local standard deviation spectral clustering∥2018 IEEE International Conference on Big Data and Smart Computing. Shanghai,China:IEEE,2018:242-250. |
15 | Rodriguez A,Laio A. Clustering by fast search and find of density peaks. Science,2014,344(6191):1492-1496. |
16 | Xie J Y,Girshick R,Farhadi A. Unsupervised deep embedding for clustering analysis∥Proceedings of the 33rd International Conference on International Conference on Machine Learning. New York,NY,USA:ACM,2016(48):478-487. |
17 | 张亚平. 谱聚类算法及其应用研究. 硕士学位论文. 太原:中北大学,2014. |
Zhang Y P. Spectral clustering algorithm and its application research. Master Dissertation. Taiyuan:North China University,2014. | |
18 | Gionis A,Mannila H,Tsaparas P. Clustering aggregation. ACM Transactions on Knowledge Discovery from Data,2007,1(1):4. |
19 | Botev Z I,Grotowski J F,Kroese D P. Kernel density estimation via diffusion. Annals of Statistics,2010,38(5):2916-2957. |
20 | 董晓君,程春玲. 基于核密度估计的K?CFSFDP聚类算法. 计算机科学,2018,45(11):244-248. |
Dong X J,Cheng C L. K?CFSFDP clustering algorithm based on kernel density estimation. Computer Science,2018,45(11):244-248. | |
21 | 王光,林国宇. 改进的自适应参数DBSCAN聚类算法. 计算机工程与应用,2020,56(14):45-51. |
Wang G,Lin G Y. Improved adaptive parameter DBSCAN clustering algorithm. Computer Engineering and Application,2020,56(14):45-51. | |
22 | Tung C W. Prediction of pupylation sites using the composition of k?spaced amino acid pairs. Journal of Theoretical Biology,2013(336):11-17. |
23 | Yang X H,Zhu Q P,Huang Y J,et al. Parameter?free Laplacian centrality peaks clustering. Pattern Recognition Letters,2017(100):167-173. |
24 | McDaid A F,Greene D,Hurley N. Normalized mutual information to evaluate overlapping community finding algorithms. 2013,arXiv:1110. 2515. |
25 | Steinley D. Properties of the Hubert?arable adjusted rand index. Psychological Methods,2004,9(3):386-396. |
26 | 任彧,顾成成. 基于HOG特征和SVM的手势识别. 科技通报,2011,27(2):211-214. |
Ren Y,Gu C C. Hand gesture recognition based on HOG characters and SVM. Bulletin of Science and Technology,2011,27(2):211-214. |
[1] | 乔宇鑫, 葛洪伟. 自适应样本加权的多视图聚类算法[J]. 南京大学学报(自然科学版), 2021, 57(4): 544-550. |
[2] | 杜淑颖, 侯海薇, 丁世飞. 基于多层次特征的深度集成聚类算法[J]. 南京大学学报(自然科学版), 2021, 57(4): 575-581. |
[3] | 许梦菲, 梁亚静, 臧慧, 李磊. 聚⁃4⁃甲基⁃1⁃戊烯中空纤维膜式人工肺组件的氧气传质性能的研究[J]. 南京大学学报(自然科学版), 2021, 57(4): 641-647. |
[4] | 叶子琪, 蒋小峰, 汤其阳, 李梅. 聚乙烯微塑料对蚕豆幼苗的毒性效应[J]. 南京大学学报(自然科学版), 2021, 57(3): 385-392. |
[5] | 杨宗彩, 肖琳. 电化学氧化改善剩余污泥脱水性能的研究[J]. 南京大学学报(自然科学版), 2021, 57(3): 445-450. |
[6] | 王颖俐, 魏玲. 基于改进的区间损失函数聚合法的三支决策[J]. 南京大学学报(自然科学版), 2021, 57(3): 493-501. |
[7] | 周小亮, 吴东洋, 曹磊, 王玉鹏, 业宁. 基于修剪树的优化聚类中心算法[J]. 南京大学学报(自然科学版), 2021, 57(2): 167-176. |
[8] | 陈迪, 刘惊雷. 基于乘法更新规则的k⁃means与谱聚类的联合学习[J]. 南京大学学报(自然科学版), 2021, 57(2): 177-188. |
[9] | 邵长龙, 孙统风, 丁世飞. 基于信息熵加权的聚类集成算法[J]. 南京大学学报(自然科学版), 2021, 57(2): 189-196. |
[10] | 夏菁, 丁世飞. 基于低秩稀疏约束的自权重多视角子空间聚类[J]. 南京大学学报(自然科学版), 2020, 56(6): 862-869. |
[11] | 王丽娟,丁世飞,丁玲. 基于迁移学习的软子空间聚类算法[J]. 南京大学学报(自然科学版), 2020, 56(4): 515-523. |
[12] | 陈俊芬,赵佳成,韩洁,翟俊海. 基于深度特征表示的Softmax聚类算法[J]. 南京大学学报(自然科学版), 2020, 56(4): 533-540. |
[13] | 王露,王士同. 改进模糊聚类在医疗卫生数据的Takagi⁃Sugeno模糊模型[J]. 南京大学学报(自然科学版), 2020, 56(2): 186-196. |
[14] | 杨红鑫,杨绪兵,张福全,业巧林. 半监督平面聚类算法设计[J]. 南京大学学报(自然科学版), 2020, 56(1): 9-18. |
[15] | 柴变芳,魏春丽,曹欣雨,王建岭. 面向网络结构发现的批量主动学习算法[J]. 南京大学学报(自然科学版), 2019, 55(6): 1020-1029. |
|