南京大学学报(自然科学版) ›› 2019, Vol. 55 ›› Issue (1): 143153.doi: 10.13232/j.cnki.jnju.2019.01.015
陆慎涛1,2,葛洪伟1,2*,周 竞2
Lu Shentao1,2,Ge Hongwei1,2*,Zhou Jing2
摘要: 移动时间层次聚类(Travel-Time based Hierarchical Clustering,TTHC)是一种新的势能聚类算法,尽管具有较好的聚类效果,但是该算法需要人工设定聚类数目,而且在分配样本的时候仅根据相似度,忽略了距离和势能的影响. 针对以上问题,提出一种自动确定聚类中心的移动时间势能聚类算法. 首先计算每个数据点的势能和相似度,然后根据相似度确定数据点的父节点,得到数据点与父节点的距离;然后,根据数据点与父节点的相似度、距离和数据点的势能得到综合考量值,根据综合考量值自动确定聚类中心;最后,将剩余数据点分配到比其势能小且与其相似度最大的数据点所属类簇,得到聚类结果. 将新算法与TTHC算法进行比较,在人工数据集和真实数据集上的实验结果表明,新算法不仅能够自动确定聚类数目,而且采用了更优的分配机制,可以产生更好的聚类结果.
中图分类号:
[1] 代 明,钟才明,庞永明等. 基于数据集属性相似性的聚类算法推荐. 南京大学学报(自然科学),2016,52(5):908-917.(Dai M,Zhong C M,Pang Y M,et al. Clustering algorithm recommendation based on dataset attributes similarity. Nanjing University(Natural Sciences),2016,52(5):908-917.). [2] 董利梅,赵 红,杨文元. 基于稀疏聚类的无监督特征选择. 南京大学学报(自然科学),2018,54(1):107-115.(Dong L M,Zhao H,Yang W Y. Unsupervised feature selection via sparse representation clustering. Journal of Nanjing University(Natural Science),2018,54(1):107-115.) [3] Kumar K M,Reddy A R M. A fast DBSCAN clustering algorithm by accelerating neighbor searching using Groups method. Pattern Recognition,2016,58:39-48. [4] Thong P H,Son L H. A novel automatic picture fuzzy clustering method based on particle swarm optimization and picture composite cardinality. Knowledge-Based Systems,2016,109:48-60. [5] Min L,Yu T,Wu X H,et al. C-DEVA:Detection,evaluation,visualization and annotation of clusters from biological networks. Biosystems,2016,150:78-86. [6] 刘 铭,刘秉权,刘远超. 面向信息检索的快速聚类算法. 计算机研究与发展,2013,50(7):1452-1463.(Liu M,Liu B Q,Liu Y C. A fast clustering algorithm for information retrieval. Journal of Computer Research and Development,2013,50(7):1452-1463.) [7] 程明畅,刘友波,张程嘉等. 基于分位数半径的动态K-means算法. 南京大学学报(自然科学),2018,54(1):48-55.(Cheng M C,Liu Y B,Zhang C J,et al. Dynamic K-means algorithm based on quantile radius. Journal of Nanjing University(Natural Science),2018,54(1):48-55.) [8] Karypis G,Han E H,Kumar V. Chameleon:Hierarchical clustering using dynamic modeling. Computer,2002,32(8):68-75. [9] 关超华,陈泳丹,陈慧岩等. 基于改进DBSCAN算法的激光雷达车辆探测方法. 北京理工大学学报,2010,30(6):732-736.(Guan C H,Chen Y D,Chen H Y,et al. Improved DBSCAN clustering algorithm based vehicle detection using a vehicle-mounted laser scanner. Transactions of Beijing Institute of Technology,2010,30(6):732-736.) [10] Rodriguez A,Laio A. Clustering by fast search and find of density peaks. Science,2014,344(6191):1492-1496. [11] Lu Y G,Wan Y. Clustering by sorting potential values(CSPV):A novel potential-based clustering method. Pattern Recognition,2012,45(9):3512-3522. [12] Lu Y G,Wan Y. PHA:A fast potential-based hierarchical agglomerative clustering method. Pattern Recognition,2013,46(5):1227-1239. [13] Lu Y G,Hou X L,Chen X R. A novel travel-time based similarity measure for hierarchical clustering. Neurocomputing,2016,173:3-8. [14] Monien B,Sudborough I H. Min cut is NP-complete for edge weighted trees. Theoretical Computer Science,1988,58(1-3):209-229. [15] Fowlkes E B,Mallows C L. A method for comparing two hierarchical clusterings. Journal of the American Statistical Association,1983,78(383):553-569. [16] Lodhi H,Saunders C,Shawe-Taylor J,et al. Text classification using string kernels. Journal of Machine Learning Research,2002,2(3):419-444. [17] Santos J M,Embrechts M. On the use of the adjusted rand index as a metric for evaluating supervised classification ∥ Alippi C,Polycarpou M,Panayiotou C,et al. Artificial Neural Networks-ICANN 2009. Springer Berlin Heidelberg,2009:175-184. |
[1] | 郑丽容1,洪志令2*. HSEC:基于聚类的启发式选择性集成[J]. 南京大学学报(自然科学版), 2018, 54(1): 116-. |
[2] | 王灿伟1,2*. 基于主题提取的海量微博情感分析[J]. 南京大学学报(自然科学版), 2017, 53(3): 549-. |
[3] | 贾培灵1,樊建聪1,2*,彭延军1,2. 一种基于簇边界的密度峰值点快速搜索聚类算法[J]. 南京大学学报(自然科学版), 2017, 53(2): 368-. |
[4] | 华佳林1*,朱 杰1,2,于 剑1 . 一种分割-合并聚类算法[J]. 南京大学学报(自然科学版), 2016, 52(4): 724-. |
|