南京大学学报(自然科学版) ›› 2021, Vol. 57 ›› Issue (2): 167–176.doi: 10.13232/j.cnki.jnju.2021.02.001

• •    

基于修剪树的优化聚类中心算法

周小亮, 吴东洋(), 曹磊, 王玉鹏, 业宁   

  1. 南京林业大学信息科学技术学院,南京,210037
  • 收稿日期:2020-10-19 出版日期:2021-03-23 发布日期:2021-03-23
  • 通讯作者: 吴东洋 E-mail:eassun2000@sina.com
  • 作者简介:E⁃mail:eassun2000@sina.com
  • 基金资助:
    国家重点研发计划(2016YFD0600101);江苏省住房和城乡建设厅计划(2016ZD44);江苏省研究生科研与实践创新计划(KYCX20_0894);汕尾市省级科技创新项目(2018D2002)

Optimized clustering center based on trimmed tree

Xiaoliang Zhou, Dongyang Wu(), Lei Cao, Yupeng Wang, Ning Ye   

  1. School of Information Science and Technology,Nanjing Forestry University,Nanjing,210037,China
  • Received:2020-10-19 Online:2021-03-23 Published:2021-03-23
  • Contact: Dongyang Wu E-mail:eassun2000@sina.com

摘要:

针对传统聚类算法存在样本形状及孤立点敏感的问题,提出基于修剪树的优化聚类中心(Optimized Clustering Center Based on Trimmed Tree,OCT)算法.该算法自适应地寻找裁剪尺寸来修剪并分割最小生成树为森林,获取森林全部叶子结点并再次构造最小生成树,根据预设簇数n,修剪最小生成树的n-1条最长边,得到包含n棵树的森林,计算森林中每棵树的质心并将其置为初始类簇聚类中心.在仿真数据集和真实数据集上的测试结果表明,OTC算法的平均识别率分别为98.8%和95.7%,平均耗时为57 ms和10.53 ms.

关键词: 最小生成树, 质心偏量, 样本偏量, 聚类分析

Abstract:

To solve the problem of sample shape and outlier sensitivity in traditional clustering algorithm,an optimized clustering center algorithm,OCT (Optimized Clustering Center Based on Trimmed Tree) based on pruning tree is proposed. This algorithm adaptively searches the cutting size to prune and divide the minimum spanning tree into a forest,obtains all the leaf nodes of the forest and constructs the minimum spanning tree again. According to the preset number of clusters n,the OCT algorithm trims the longest n-1 edge of the minimum spanning tree to get the forest containing n trees,then calculates centroid of each tree in the forest and sets it as the initial cluster center. The experimental results on the simulation dataset and the real dataset show that the average recognition rate of OCT algorithm is 98.8% and 95.7%,and the average time?consuming is 57 ms and 10.53 ms.

Key words: minimum spanning tree, centroid offset, sample offset, clustering analysis

中图分类号: 

  • TP181

图1

修剪边数量与Q的关系"

图2

LineBlobs数据集中η和μ与识别率的关系"

图3

five_cluster数据集中η和μ与识别率的关系"

表1

实验使用的人工数据集的信息"

DatasetNumber of samplesAttributesClusters
five_cluster431,67,891,280,33125
fourty25,25,…,25240
LineBlobs118,75,3323
Sticks117,123,150,12224
Eyes56,82,10023
ThreeCircles601,1001,200123
Twomoons501,100122

图4

k?means,DBSCAN和OCT算法在five_cluster数据集上的聚类结果"

图5

k?means,DBSCAN和OCT算法在fourty数据集上的聚类结果"

图6

k?means,DBSCAN和OCT算法在LineBlobs数据集上的聚类结果"

图7

k?means,DBSCAN和OCT算法在Sticks数据集上的聚类结果"

图8

k?means,DBSCAN和OCT算法在Eyes数据集上的聚类结果"

图9

k?means,DBSCAN和OCT算法在ThreeCircles数据集上的聚类结果"

图10

k?means,DBSCAN和OCT算法在Twomoons数据集上的聚类结果"

表2

k?means,DBSCAN和OCT算法在人工数据集上的聚类精度对比"

Datasetk?meansDBSCANOCT
AVG73.86%95.95%98.8%
five_cluster99.5%93.5%99.6%
fourty87.34%100%100%
LineBlobs74.44%100%100%
Sticks63.44%100%100%
Eyes85.29%78.15%92.01%
ThreeCircles33.67%100%100%
Twomoons73.37%100%100%

图11

k?means,DBSCAN和OCT算法在人工数据集上的聚类时间比较"

图12

测试实验中使用的叶片数据集"

表3

植物叶片数据集"

DatasetNumber of samplesAttributesClusters
BBQ30,30,1033
QUG10,10,1033
GQN30,10,1073
BQG30,10,1043

表4

k?means,DBSCAN和OCT算法在四种叶片数据集上的聚类精度比较"

Datasetk?means(%)DBSCAN(%)OCT(%)
AVG75.175%90.425%95.7%
BBQ70%85.7%90%
QUG96.7%100%100%
GQN76%96%100%
BQG58%80%92.8%

图13

k?means,DBSCAN和OCT算法在四种叶片数据集上的聚类时间对比"

1 Hartigan J A,Wong M A. Algorithm AS 136:a k?means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics),1979,28(1):100-108.
2 李正兵,罗斌,翟素兰等. 基于关联图划分的k?means算法. 计算机工程与应用,2013,49(21):141-144,151.
Li Z B,Luo B,Zhai S L,et al. k?means algorithm based on partition of correlational graph. Computer Engineering and Applications,2013,49(21):141-144,151.
3 姚跃华,史秀岭. 一种优化初始中心的k?means粗糙聚类算法. 计算机工程与应用,2010,46(34):130-132.
Yao Y H,Shi X L. k?means rough clustering algorithm based on optimized initial center. Computer Engineering and Applications,2010,46(34):130-132.
4 赵兴旺,梁吉业. 一种基于信息熵的混合数据属性加权聚类算法. 计算机研究与发展,2016,53(5):1018-1028.
Zhao X W,Liang J Y. An attribute weighted clustering algorithm for mixed data based on information entropy. Journal of Computer Research and Development,2016,53(5):1018-1028.
5 周水庚,周傲英,曹晶. 基于数据分区的DBSCAN算法. 计算机研究与发展,2000,37(10):1153-1159.
Zhou S G,Zhou A Y,Cao J. A data?partitioning?based DBSCAN algorithm. Journal of Computer Research and Development,2000,37(10):1153-1159.
6 倪巍伟,陈耿,孙志挥. 一种基于数据垂直划分的分布式密度聚类算法. 计算机研究与发展,2007,44(9):1612-1617.
Ni W W,Chen G,Sun Z H. Anefficient density?based clustering algorithm for vertically partitioned distributed datasets. Journal ofComputer Research and Development,2007,44(9):1612-1617.
7 胡健,朱海湾,毛伊敏. 基于自适应蜂群优化的DBSCAN聚类算法. 计算机工程与应用,2019,55(14):105-114. (Hu J,Zhu H W,Mao Y M. DBSCAN clustering algorithm based on adaptive bee colony
optimization. Computer Engineering and Applications,2019,55(14):105-114.)
8 冯振华,钱雪忠,赵娜娜. Greedy DBSCAN:一种针对多密度聚类的DBSCAN改进算法. 计算机应用研究,2016,33(9):2693-2696,2700. (Feng Z H,Qian X Z,Zhao N N. Greedy DBSCAN:an
improved DBSCAN algorithm on multi?density
clustering. Application Research of Computers,2016,33(9):2693-2696,2700.)
9 徐沁,罗斌. 结合mean?shift与MST的k?means聚类算法. 计算机工程,2013,39(12):204-210.
Xu Q,Luo B. k?means clustering algorithm combined with mean?shift and minimum spanning tree. Computer Engineering,2013,39(12):204-210.
10 Ye J. Single?valued neutrosophic minimum spanning tree and its clustering method. Journal of Intelligent Systems,2014,23(3):311-324.
11 徐晨凯,高茂庭. 改进的最小生成树自适应分层聚类算法. 计算机工程与应用,2014,50(22):149-153.
Xu C K,Gao M T. Improved adaptive hierarchical clustering algorithm based on minimum spanning tree. Computer Engineering and Applications,2014,50(22):149-153.
12 贾瑞玉,李振. 基于最小生成树的层次k?means聚类算法. 微电子学与计算机,2016,33(3):86-88,93. (Jia R Y,Li Z. The level of K?means clustering algorithm based on the minimum spanning tree.
Microelectronics & Computer,2016,33(3):86-88,93.)
13 Jothi R,Mohanty S K,Ojha A. Fast approximate minimum spanning tree based clustering algorithm. Neurocomputing,2018,272:542-557.
14 Li J,Wang X C,Wang X L. A scaled?MST?based clustering algorithm and application on image segmentation. Journal of Intelligent Information Systems,2020,54(3):501-525.
15 Mishra G,Mohanty S K. A fast hybrid clustering technique based on local nearest neighbor using minimum spanning tree. Expert Systems with Applications,2019,132:28-43.
16 李蓟涛,梁永全. 基于最小生成树的分割区域密度聚类算法. 计算机辅助设计与图形学学报,2019,31(9):1628-1635. (Li J T,Liang Y Q. Segmentation region density clustering algorithm based on minimum spanning tree. Journal of Computer?Aided
Design & Computer Graphics,2019,31(9):1628-1635.)
17 Mishra G,Mohanty S K. A minimum spanning tree based partitioning and merging technique for clustering heterogeneous data sets. Journal of Intelligent Information Systems,2020,55(3):587-606.
[1] 徐秀芳, 徐 森, 花小朋, 徐 静, 皋 军, 安 晶. 一种基于t-分布随机近邻嵌入的文本聚类方法[J]. 南京大学学报(自然科学版), 2019, 55(2): 264-271.
[2]  孙梦梦1,唐旭清1,2*.  基于粒度空间的最小生成树分类算法[J]. 南京大学学报(自然科学版), 2017, 53(5): 963-.
[3] 李飞江1*,成红红2,钱宇华1. 全粒度聚类算法[J]. 南京大学学报(自然科学版), 2014, 50(4): 505-.
[4]  陈永彬1,张琢1,2**
.  智能单粒子优化算法在聚类分析中的应用*
[J]. 南京大学学报(自然科学版), 2011, 47(5): 578-584.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!