南京大学学报(自然科学版) ›› 2019, Vol. 55 ›› Issue (4): 546–552.doi: 10.13232/j.cnki.jnju.2019.04.004

所属专题: 测试专题

• • 上一篇    下一篇

基于稳定性的三支聚类

杨鑫1,施虹1,王平心2(),徐刚3   

  1. 1. 江苏科技大学计算机学院,镇江,212003
    2. 江苏科技大学理学院,镇江,212003
    3. 江苏科技大学船舶与海洋工程学院,镇江,212003
  • 收稿日期:2019-05-22 出版日期:2019-07-30 发布日期:2019-07-23
  • 通讯作者: 王平心 E-mail:wangpingxin@just.edu.cn
  • 基金资助:
    江苏省高校自然科学研究重大项目(18KJA1300);江苏省高校自然科学研究项目(15KJB110004)

Three⁃way clustering based on sample‘s stability

Xin Yang1,Hong Shi1,Pingxin Wang2(),Gang Xu3   

  1. 1. School of Computer Science, Jiangsu University of Science and Technology, Zhenjiang, 212003, China
    2. School of Science, Jiangsu University of Science and Technology, Zhenjiang, 212003, China
    3. School of Naval Architecture and Ocean Engineering, Jiangsu University of Science and Technology, Zhenjiang, 212003, China
  • Received:2019-05-22 Online:2019-07-30 Published:2019-07-23
  • Contact: Pingxin Wang E-mail:wangpingxin@just.edu.cn

摘要:

二支聚类要求聚类结果必须具有清晰的边界,即每个对象要么属于一个类,要么不属于一个类.然而在许多实际问题中,一个对象和类别可能会有三种关系:即确定属于、确定不属于和无法确定.为了克服二支聚类的这一问题,三支聚类使用核心域,边界域和琐碎域来表示每个类别,较好地处理了具有不确定性对象的聚类问题.给出一种基于样本稳定性的三支聚类算法.首先使用聚类集成的结果计算出每个数据的稳定性,然后基于阈值将这些数据元素分为两部分:核与环.对核中的数据采用硬聚类进行聚类,对环中的数据通过比较环中数据到聚类中心的距离将它们分到相应类的边界域中.通过以上策略,可以得到三支聚类的核心域和边界域.在UCI数据集上的实验结果显示,该方法能更好地显示出聚类的结构.

关键词: 聚类集成, 稳定性, 二支聚类, 三支聚类

Abstract:

Two?way clustering algorithms produce clusters with clear and sharp boundaries,which does not truly reflect the fact that a cluster may not necessarily have a well?defined boundary in many real world situations. To tackle this deficiency,three?way clustering uses three regions through a pair of sets to represent a cluster instead of using two regions to represent a cluster by a single set,which reflects the three types of relationship between an object and a cluster,namely,belong?to definitely,uncertain and not belong?to definitely. In this paper,we propose a three?way clustering algorithm by using the stability of each sample. We use clustering ensemble results to compute the sample’s stability and divide the universe into cluster core and cluster halo based on sample’s stability. The elements in the cluster core are assigned into the core region of each cluster by using traditional clustering algorithm. The elements in the cluster halo are assigned into the fringe region of corresponding cluster according to distances between the elements and the centers of the cluster core region. Therefore,a three?way clustering is naturally formed. Experimental results on UCI datasets show that this method can improve the structure of the clustering results.

Key words: clustering ensemble, stability, two?way clustering, three?way clustering

中图分类号: 

  • TP391

表1

实验中使用的数据集"

DatasetsSample numbers

Sample

dimensions

Categories
Bank137242
Glass21496
Wine178133
Congressional435162
Breast10696

表2

UCI数据集上的实验结果"

DatasetsAlgorithmDBIASACC
Bankk?means1.19130.50000.5758
Three?k?means1.17720.50790.5751
Glassk?means0.96250.53250.5981
Three?k?means0.92520.61290.6774
Winek?means1.30530.47630.9550
Three?k?means1.24300.51210.9704
Congressionalk?means1.48650.44070.8666
Three?k?means1.38890.47230.8812
Breastk?means0.88260.56440.7735
Three?k?means0.72880.68170.7945
1 HoppnerF,KlawonnF,KruseR,et al. Fuzzy cluster analysis:methods for classification,data analysis and image recognition. New York:Wiley,1999,770.
2 YaoY Y,LingrasP,WangR Z,et al. Interval set cluster analysis:A re?formulation∥Sakai H,Chakraborty M K,Hassanien A E,et al. Rough sets,fuzzy sets,data mining and granular computing. Springer Berlin Heidelberg,2009:398-405.
3 YuH,ChuS S,YangD C. Autonomous knowledge?oriented clustering using decision?theoretic rough set theory. Fundamenta Informaticae,2012,115:141-156.
4 YuH,LiuZ G,WangG Y. An automatic method to determine the number of clusters using decision?theoretic rough set. International Journal of Approximate Reasoning,2014,55(1):101-115.
5 LiF J,QianY H,WangJ T,et al. Clustering ensemble based on sample’s stability. Artificial Intelligence,2019,273:37-55.
6 YaoY Y. Three?way decisions with probabilistic rough sets. Information Sciences,2010,180(3):341-353.
7 YaoY Y. The superiority of three?way decisions in probabilistic rough set models. Information Sciences,2011,181(6):1086-1096.
8 YaoY Y. An outline of a theory of three?way decisions∥Yao J. Rough sets and current trends in computing. Springer Berlin Heidelberg,2012:1-17.
9 YuH. A framework of three?way cluster analysis∥Proceedings of International Joint Conference on Rough Sets. Springer Berlin Heidelberg,2017:300-312.
10 YuH,JiaoP,YaoY Y,et al. Detecting and refining overlapping regions in complex networks with three?way decisions. Information Sciences,2016,373:21-41.
11 YuH,ZhangC,WangG Y. A tree?based incre?mental overlapping clustering method using the three?way decision theory. Knowledge?Based Systems,2016,91:189-203.
12 ZhangQ H,XiaD Y,WangG Y. Three?way decision model with two types of classification errors. Information Sciences,2017,420:431-453.
13 LiJ H,HuangC C,QiJ J,et al. Three?way cognitive concept learning via multi?granularity. Information Sciences,2017,378:244-263.
14 HaoC,LiJ H,FanM,et al. Optimal scale selection in dynamic multi?scale decision tables based on sequential three?way decisions. Informa?tion Sciences,2017,415-416:213-232.
15 李金海,邓硕. 概念格与三支决策及其研究展望. 西北大学学报(自然科学版),2017,47(3):321-329.
Li J H,Deng S.Concept lattice,three?way decisions and their research outlooks. Journal of Northwest University (Natural Science Edition),2017,47(3):321-329.
16 YuH,ChuS S,YangD C. Autonomous knowledge?oriented clustering using decision?theoretic rough set theory∥Yu J,Greco S,Lingras P,et al. Rough set and knowledge technology. Springer Berlin Heidelberg,2010:687-694.
17 StrehlA,GhoshJ. Cluster ensembles:a knowledge reuse framework for combining multiple partitions. The Journal of Machine Learning Research,2002,3:583-617.
18 MacQueenJ. Some methods for classification and analysis of multivariate observations∥Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley,CA,USA:University of California Press,1967:281-297.
19 OtsuN. A threshold selection method from gray?level histograms. IEEE Transactions on Systems,Man,and Cybernetics,1979,9(1):62-66.
20 Sch?lkopfB,PlattJ,HofmannT. A local learning approach for Clustering∥International Conference on Neural Information Processing Systems. Vancouver,Canada:MIT Press,2007:1529-1536.
21 孙吉贵,刘杰,赵连宇. 聚类算法研究. 软件学报. 2008,19(1):48-61.
Sun J G,Liu J,Zhao L Y.Clustering algorithms research. Journal of Software,2008,19(1):48-61.
22 FahadA,AlshatriN,TariZ,et al. A survey of clustering algorithms for big data:taxonomy and empirical analysis. IEEE Transactions on Emerging Topics in Computing,2014,2(3):267-279.
[1] 郭小松,赵红丽,贾俊芳,杨静,孟祥军. 密度泛函理论方法研究第一系列过渡金属对甘氨酸的配位能力[J]. 南京大学学报(自然科学版), 2019, 55(6): 1040-1046.
[2] 王彤, 魏巍, 王锋. 基于样本对加权共协关系矩阵的聚类集成算法[J]. 南京大学学报(自然科学版), 2019, 55(4): 592-600.
[3] 汪 勇,刘 瑾*,宋泽卓,白玉霞,王琼亚,祁长青,孙少锐. 高分子稳定剂加固河道边坡表层砂土室内试验研究[J]. 南京大学学报(自然科学版), 2018, 54(6): 1095-1104.
[4]  严丽宇1,魏 巍1,2*,郭鑫垚1,崔军彪1.  一种基于带核随机子空间的聚类集成算法[J]. 南京大学学报(自然科学版), 2017, 53(6): 1033-.
[5]  孟 娜1,梁吉业1,2*,庞天杰1. 一种基于抽样的谱聚类集成算法
[J]. 南京大学学报(自然科学版), 2016, 52(6): 1090-.
[6] 林 巨1* ,赵 越1,王 欢1,陈 鹏2. 基于射线稳定性参数的声传播特性分析[J]. 南京大学学报(自然科学版), 2015, 51(6): 1223-1233.
[7]  高恒娟1﹡,丁贤荣1,葛小平1,康彦彦2,张婷婷1.  沿海滩涂稳定性的长系列遥感定量分析方法[J]. 南京大学学报(自然科学版), 2014, 50(5): 585-592.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 仲昭朝, 邹 婷, 唐惠炜, 庄 重, 张 臻. 铜胁迫对蚕豆根尖细胞凋亡及线粒体功能的影响[J]. 南京大学学报(自然科学版), 2019, 55(1): 154 -160 .
[2] 徐扬,周文瑄,阮慧彬,孙雨,洪宇. 基于层次化表示的隐式篇章关系识别[J]. 南京大学学报(自然科学版), 2019, 55(6): 1000 -1009 .
[3] 黄华娟,韦修喜. 基于自适应调节极大熵的孪生支持向量回归机[J]. 南京大学学报(自然科学版), 2019, 55(6): 1030 -1039 .
[4] 李勤,陆现彩,张立虎,程永贤,刘鑫. 蒙脱石层间阳离子交换的分子模拟[J]. 南京大学学报(自然科学版), 2019, 55(6): 879 -887 .
[5] 韩普,刘亦卓,李晓艳. 基于深度学习和多特征融合的中文电子病历实体识别研究[J]. 南京大学学报(自然科学版), 2019, 55(6): 942 -951 .
[6] 徐媛媛,张恒汝,闵帆,黄雨婷. 三支交互推荐[J]. 南京大学学报(自然科学版), 2019, 55(6): 973 -983 .
[7] 罗春春,郝晓燕. 基于双重注意力模型的微博情感倾向性分析[J]. 南京大学学报(自然科学版), 2020, 56(2): 236 -243 .
[8] 朱伟,张帅,辛晓燕,李文飞,王骏,张建,王炜. 结合区域检测和注意力机制的胸片自动定位与识别[J]. 南京大学学报(自然科学版), 2020, 56(4): 591 -600 .