南京大学学报(自然科学版) ›› 2019, Vol. 55 ›› Issue (1): 143–153.doi: 10.13232/j.cnki.jnju.2019.01.015

• • 上一篇    下一篇

自动确定聚类中心的移动时间势能聚类算法

陆慎涛1,2,葛洪伟1,2*,周 竞2   

  1. 1.轻工过程先进控制教育部重点实验室,江南大学,无锡,214122;2.江南大学物联网工程学院,无锡,214122
  • 接受日期:2018-10-26 出版日期:2019-02-01 发布日期:2019-01-26
  • 通讯作者: 葛洪伟, E-mail:ghw8601@163.com E-mail:ghw8601@163.com
  • 基金资助:
    国家自然科学基金(61305017),江苏省普通高校研究生科研创新计划(KYLX15_1169)

Travel-time based potential clustering by automatic determination of cluster centers

Lu Shentao1,2,Ge Hongwei1,2*,Zhou Jing2   

  1. 1. Ministry of Education Key Laboratory of Advanced Process Control for Light Industry,Jiangnan University,Wuxi,214122,China; 2. School of Internet of Things,Jiangnan University,Wuxi,214122,China
  • Accepted:2018-10-26 Online:2019-02-01 Published:2019-01-26
  • Contact: Ge Hongwei, E-mail:ghw8601@163.com E-mail:ghw8601@163.com

摘要: 移动时间层次聚类(Travel-Time based Hierarchical Clustering,TTHC)是一种新的势能聚类算法,尽管具有较好的聚类效果,但是该算法需要人工设定聚类数目,而且在分配样本的时候仅根据相似度,忽略了距离和势能的影响. 针对以上问题,提出一种自动确定聚类中心的移动时间势能聚类算法. 首先计算每个数据点的势能和相似度,然后根据相似度确定数据点的父节点,得到数据点与父节点的距离;然后,根据数据点与父节点的相似度、距离和数据点的势能得到综合考量值,根据综合考量值自动确定聚类中心;最后,将剩余数据点分配到比其势能小且与其相似度最大的数据点所属类簇,得到聚类结果. 将新算法与TTHC算法进行比较,在人工数据集和真实数据集上的实验结果表明,新算法不仅能够自动确定聚类数目,而且采用了更优的分配机制,可以产生更好的聚类结果.

关键词: 聚 类, TTHC, 移动时间, 自动确定聚类数目

Abstract: TTHC(Travel-Time based Hierarchical Clustering)is a new potential clustering algorithm. Although it has a good clustering effect,the algorithm needs to set the number of clusters manually,and only allocates samples based on similarity,ignoring the influence of distance and potential values. In response to the above problems,a new algorithm that can automatically determine the clustering centers is proposed. Firstly,the new algorithm calculates the potential values and similarity of each data point,and then determines the parent node of the data point according to similarity,so we can obtain the distance from the parent node. Secondly,according to the similarity and distance between data points and parent nodes and the potential values of data points,the comprehensive consideration value is obtained. The clustering center is automatically determined according to the comprehensive consideration value. Finally,the remaining data points are assigned to clusters whose potential values are smaller and similarity is the largest,and a clustering result is obtained. Comparing the new algorithm with the TTHC algorithm,the experimental results on synthetic datasets and real datasets show that the new algorithm can not only automatically determine the number of clusters,but also adopt a better distribution mechanism and thus has better clustering results. In addition,the new algorithm shows better performance than two other potential clustering algorithms.

Key words: clustering, TTHC, travel-time, automatically clustering

中图分类号: 

  • TP391.4
[1] 代 明,钟才明,庞永明等. 基于数据集属性相似性的聚类算法推荐. 南京大学学报(自然科学),2016,52(5):908-917.(Dai M,Zhong C M,Pang Y M,et al. Clustering algorithm recommendation based on dataset attributes similarity. Nanjing University(Natural Sciences),2016,52(5):908-917.).
[2] 董利梅,赵 红,杨文元. 基于稀疏聚类的无监督特征选择. 南京大学学报(自然科学),2018,54(1):107-115.(Dong L M,Zhao H,Yang W Y. Unsupervised feature selection via sparse representation clustering. Journal of Nanjing University(Natural Science),2018,54(1):107-115.)
[3] Kumar K M,Reddy A R M. A fast DBSCAN clustering algorithm by accelerating neighbor searching using Groups method. Pattern Recognition,2016,58:39-48.
[4] Thong P H,Son L H. A novel automatic picture fuzzy clustering method based on particle swarm optimization and picture composite cardinality. Knowledge-Based Systems,2016,109:48-60.
[5] Min L,Yu T,Wu X H,et al. C-DEVA:Detection,evaluation,visualization and annotation of clusters from biological networks. Biosystems,2016,150:78-86.
[6] 刘 铭,刘秉权,刘远超. 面向信息检索的快速聚类算法. 计算机研究与发展,2013,50(7):1452-1463.(Liu M,Liu B Q,Liu Y C. A fast clustering algorithm for information retrieval. Journal of Computer Research and Development,2013,50(7):1452-1463.)
[7] 程明畅,刘友波,张程嘉等. 基于分位数半径的动态K-means算法. 南京大学学报(自然科学),2018,54(1):48-55.(Cheng M C,Liu Y B,Zhang C J,et al. Dynamic K-means algorithm based on quantile radius. Journal of Nanjing University(Natural Science),2018,54(1):48-55.)
[8] Karypis G,Han E H,Kumar V. Chameleon:Hierarchical clustering using dynamic modeling. Computer,2002,32(8):68-75.
[9] 关超华,陈泳丹,陈慧岩等. 基于改进DBSCAN算法的激光雷达车辆探测方法. 北京理工大学学报,2010,30(6):732-736.(Guan C H,Chen Y D,Chen H Y,et al. Improved DBSCAN clustering algorithm based vehicle detection using a vehicle-mounted laser scanner. Transactions of Beijing Institute of Technology,2010,30(6):732-736.)
[10] Rodriguez A,Laio A. Clustering by fast search and find of density peaks. Science,2014,344(6191):1492-1496.
[11] Lu Y G,Wan Y. Clustering by sorting potential values(CSPV):A novel potential-based clustering method. Pattern Recognition,2012,45(9):3512-3522.
[12] Lu Y G,Wan Y. PHA:A fast potential-based hierarchical agglomerative clustering method. Pattern Recognition,2013,46(5):1227-1239.
[13] Lu Y G,Hou X L,Chen X R. A novel travel-time based similarity measure for hierarchical clustering. Neurocomputing,2016,173:3-8.
[14] Monien B,Sudborough I H. Min cut is NP-complete for edge weighted trees. Theoretical Computer Science,1988,58(1-3):209-229.
[15] Fowlkes E B,Mallows C L. A method for comparing two hierarchical clusterings. Journal of the American Statistical Association,1983,78(383):553-569.
[16] Lodhi H,Saunders C,Shawe-Taylor J,et al. Text classification using string kernels. Journal of Machine Learning Research,2002,2(3):419-444.
[17] Santos J M,Embrechts M. On the use of the adjusted rand index as a metric for evaluating supervised classification ∥ Alippi C,Polycarpou M,Panayiotou C,et al. Artificial Neural Networks-ICANN 2009. Springer Berlin Heidelberg,2009:175-184.
[1]  郑丽容1,洪志令2*.  HSEC:基于聚类的启发式选择性集成[J]. 南京大学学报(自然科学版), 2018, 54(1): 116-.
[2]  王灿伟1,2*.  
基于主题提取的海量微博情感分析
[J]. 南京大学学报(自然科学版), 2017, 53(3): 549-.
[3] 贾培灵1,樊建聪1,2*,彭延军1,2. 一种基于簇边界的密度峰值点快速搜索聚类算法[J]. 南京大学学报(自然科学版), 2017, 53(2): 368-.
[4] 华佳林1*,朱 杰1,2,于 剑1 . 一种分割-合并聚类算法[J]. 南京大学学报(自然科学版), 2016, 52(4): 724-.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 林 銮,陆武萍,唐朝生,赵红崴,冷 挺,李胜杰. 基于计算机图像处理技术的松散砂性土微观结构定量分析方法[J]. 南京大学学报(自然科学版), 2018, 54(6): 1064 -1074 .
[2] 段新春,施 斌,孙梦雅,魏广庆,顾 凯,冯晨曦. FBG蒸发式湿度计研制及其响应特性研究[J]. 南京大学学报(自然科学版), 2018, 54(6): 1075 -1084 .
[3] 梅世嘉,施 斌,曹鼎峰,魏广庆,张 岩,郝 瑞. 基于AHFO方法的Green-Ampt模型K0取值试验研究[J]. 南京大学学报(自然科学版), 2018, 54(6): 1085 -1094 .
[4] 卢 毅,于 军,龚绪龙,王宝军,魏广庆,季峻峰. 基于DFOS的连云港第四纪地层地面沉降监测分析[J]. 南京大学学报(自然科学版), 2018, 54(6): 1114 -1123 .
[5] 胡 淼,王开军,李海超,陈黎飞. 模糊树节点的随机森林与异常点检测[J]. 南京大学学报(自然科学版), 2018, 54(6): 1141 -1151 .
[6] 洪思思,曹辰捷,王 喆*,李冬冬. 基于矩阵的AdaBoost多视角学习[J]. 南京大学学报(自然科学版), 2018, 54(6): 1152 -1160 .
[7] 魏 桐,童向荣. 基于加权启发式搜索的鲁棒性信任路径生成[J]. 南京大学学报(自然科学版), 2018, 54(6): 1161 -1170 .
[8] 秦 娅, 申国伟, 赵文波, 陈艳平. 基于深度神经网络的网络安全实体识别方法[J]. 南京大学学报(自然科学版), 2019, 55(1): 29 -40 .
[9] 仲昭朝, 邹 婷, 唐惠炜, 庄 重, 张 臻. 铜胁迫对蚕豆根尖细胞凋亡及线粒体功能的影响[J]. 南京大学学报(自然科学版), 2019, 55(1): 154 -160 .
[10] 李嘉明, 邹 勇, 郑 浩, 魏钟波, 杨柳燕, 缪爱军. 养殖塘生态系统重金属污染状况与风险评价[J]. 南京大学学报(自然科学版), 2019, 55(2): 272 -281 .