南京大学学报(自然科学版) ›› 2022, Vol. 58 ›› Issue (4): 570583.doi: 10.13232/j.cnki.jnju.2022.04.002
Ding Zhang, Youlong Yang(), Liqin Sun
摘要:
半监督聚类集成旨在利用成对约束提升聚类集成的精度,但在高维空间的聚类效果却显著降低,另外,当只有少量的成对约束可以利用时,聚类性能很难提升.针对这些问题,提出一种新颖的半监督聚类集成算法WSCEC (Weighted Semi?supervised Clustering Ensemble Algorithm Based on Extended Constraint Projection).首先,利用多种聚类算法对数据的特征空间进行聚类,再使用随机子空间进行降维,以减少冗余特征的影响;其次,根据每对约束的k个最近或最远的样本以及约束间的传递关系来扩展原有的约束集,通过约束投影技术将原始数据空间投影到低维空间以满足尽可能多的约束;最后,设计了一个聚类解的加权策略,为每一个聚类解分配一个适当的权重以降低低质量聚类解的影响.在多个数据集上的实验结果证明了提出算法的有效性.
中图分类号:
1 | Jain A K. Data clustering:50 years beyond K?means. Pattern Recognition Letters,2010,31(8):651-666. |
2 | Wu J J, Liu H F, Xiong H,et al. K?means?based consensus clustering:A unified view. IEEE Transactions on Knowledge and Data Engineering,2015,27(1):155-169. |
3 | Yu Z W, Luo P N, Liu J M,et al. Semi?supervised ensemble clustering based on selected constraint projection. IEEE Transactions on Knowledge and Data Engineering,2018,30(12):2394-2407. |
4 | Fern X Z, Brodley C E. Random projection for high dimensional data clustering:A cluster ensemble approach∥Proceedings of the 20th International Conference on Machine Learning. Washington,DC,USA:ACM,2003:186-193. |
5 | Fred A L N, Jain A K. Combining multiple clusterings using evidence accumulation. IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(6):835-850. |
6 | Iam?On N, Boongoen T, Garrett S,et al. A link?based approach to the cluster ensemble problem. IEEE Transactions on Pattern Analysis and Machine Intelligence,2011,33(12):2396-2409. |
7 | Huang D, Wang C D, Lai J H,et al. Locally weighted ensemble clustering. IEEE Transactions on Cybernetics,2018,48(5):1460-1473. |
8 | Strehl A, Ghosh J. Cluster ensembles:A knowledge reuse framework for combining multiple partitions. The Journal of Machine Learning Research,2002,3(3):583-617. |
9 | Mimaroglu S, Erdil E. Combining multiple clusterings using similarity graph. Pattern Recognition,2011,44(3):694-703. |
10 | Li T, Ding C, Jordan M I. Solving consensus and semi?supervised clustering problems using nonnegative matrix factorization∥Proceeding of the 21th International Conference on Machine Learning. Omaha,NE,USA:IEEE,2007:577-582. |
11 | Huang D, Lai J H, Wang C D. Ensemble clustering using factor graph. Pattern Recognition,2016(50):131-142. |
12 | Tian J L, Ren Y Z, Cheng X. Stratified feature sampling for semi?supervised ensemble clustering. IEEE Access,2019(7):128669-128675. DOI:10.1109/ACCESS.2019.2939581 . |
13 | Yu Z W, Luo P N, You J E,et al. Incremental semi?supervised clustering ensemble for high dimensional data clustering. IEEE Transactions on Knowledge and Data Engineering,2016,28(3):701-714. |
14 | Lai Y X, He S Y, Lin Z J,et al. An adaptive robust semi?supervised clustering framework using weighted consensus of random K?means ensemble. IEEE Transactions on Knowledge and Data Engineering,2021,33(5):1877-1890. DOI:10.1109/TKDE. 2019.2952596 . |
15 | Iqbal A M, Moh'd A, Khan Z. Semi?supervised clustering ensemble by voting. 2009,arXiv:1208. 4138. |
16 | Wei S T, Li Z X, Zhang C L. Combined constraint?based with metric?based in semi?supervised clustering ensemble. International Journal of Machine Learning and Cybernetics,2018,9(7):1085-1100. |
17 | Yang Y, Jiang J M. Bi?weighted ensemble via HMM?based approaches for temporal data clustering. Pattern Recognition,2018(76):391-403. |
18 | Yu Z W, Chen H S, You J,et al. Double selection based semi?supervised clustering ensemble for tumor clustering from gene expression profiles. IEEE/ACM Transactions on Computational Biology and Bioinformatics,2014,11(4):727-740. |
19 | Yu Z W, Kuang Z Q, Liu J M,et al. Adaptive ensembling of semi?supervised clustering solutions. IEEE Transactions on Knowledge and Data Engineering,2017,29(8):1577-1590. |
20 | Wang H J, Qi J H, Zheng W F,et al. Semi?supervised cluster ensemble based on binary similarity matrix∥2010 2nd IEEE International Conference on Information Management and Engineering. Chengdu,China:IEEE,2010:251-254. |
21 | Yang F, Li T, Zhou Q F,et al. Cluster ensemble selection with constraints. Neurocomputing,2017(235):59-70. |
22 | Xiao W C, Yang Y, Wang H J,et al. Semi?supervised hierarchical clustering ensemble and its application. Neurocomputing,2016(173):1362-1376. |
23 | Ho T K. The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence,1998,20(8):832-844. |
24 | Zhang D Q, Chen S C, Zhou Z H,et al. Constraint projections for ensemble learning∥Proceeding of the 23rd National Conference on Artificial Intelligence. Chicago,IL,USA:AAAI Press,2008:758-763. |
25 | Lu Z W, Peng Y X. Exhaustive and efficient constraint propagation:A graph?based learning approach and its applications. International Journal of Computer Vision,2013,103(3):306-325. |
26 | Wagstaff K, Cardie C, Rogers S,et al. Constrained k?means clustering with background knowledge∥Proceedings of the 18th International Conference on Machine Learning. San Francisco,CA,USA:Morgan Kaufmann Publishers Inc.,2001:557-584. |
27 | Wang H J, Li T, Li T R,et al. Constraint neighborhood projections for semi?supervised clustering. IEEE Transactions on Cybernetics,2014,44(5):636-643. |
28 | Liu H F, Wu J J, Liu T L,et al. Spectral ensemble clustering via weighted K?means:Theoretical and practical evidence. IEEE Transactions on Knowledge and Data Engineering,2017,29(5):1129-1143. |
29 | Huang D, Wang C D, Wu J S,et al. Ultra?scalable spectral clustering and ensemble clustering. IEEE Transactions on Knowledge and Data Engineering,2020,32(6):1212-1226. |
30 | Xiong S C, Azimi J, Fern X Z. Active learning of constraints for semi?supervised clustering. IEEE Transactions on Knowledge and Data Engineering,2014,26(1):43-54. |
[1] | 邵长龙, 孙统风, 丁世飞. 基于信息熵加权的聚类集成算法[J]. 南京大学学报(自然科学版), 2021, 57(2): 189-196. |
[2] | 杨红鑫,杨绪兵,张福全,业巧林. 半监督平面聚类算法设计[J]. 南京大学学报(自然科学版), 2020, 56(1): 9-18. |
[3] | 柴变芳,魏春丽,曹欣雨,王建岭. 面向网络结构发现的批量主动学习算法[J]. 南京大学学报(自然科学版), 2019, 55(6): 1020-1029. |
[4] | 杨鑫, 施虹, 王平心, 徐刚. 基于稳定性的三支聚类[J]. 南京大学学报(自然科学版), 2019, 55(4): 546-552. |
[5] | 王彤, 魏巍, 王锋. 基于样本对加权共协关系矩阵的聚类集成算法[J]. 南京大学学报(自然科学版), 2019, 55(4): 592-600. |
[6] | 严丽宇1,魏 巍1,2*,郭鑫垚1,崔军彪1. 一种基于带核随机子空间的聚类集成算法[J]. 南京大学学报(自然科学版), 2017, 53(6): 1033-. |
[7] | 孟 娜1,梁吉业1,2*,庞天杰1. 一种基于抽样的谱聚类集成算法 [J]. 南京大学学报(自然科学版), 2016, 52(6): 1090-. |
[8] | 常瑜1.2** ,梁吉业1, 2,高嘉伟1,2,杨静1·2 . 一种基于Seeds集和成对约束的半监督聚类算法*[J]. 南京大学学报(自然科学版), 2012, 48(4): 405-411. |
[9] | 申 彦**,宋顺林,朱玉全 . 一种基于半监督的大规模数据集聚类算法* [J]. 南京大学学报(自然科学版), 2011, 47(4): 372-382. |
|