南京大学学报(自然科学版) ›› 2021, Vol. 57 ›› Issue (2): 177–188.doi: 10.13232/j.cnki.jnju.2021.02.002

• • 上一篇    

基于乘法更新规则的k⁃means与谱聚类的联合学习

陈迪, 刘惊雷()   

  1. 烟台大学计算机与控制工程学院,烟台,264005
  • 收稿日期:2020-10-09 出版日期:2021-03-23 发布日期:2021-03-23
  • 通讯作者: 刘惊雷 E-mail:jinglei_liu@sina.com
  • 作者简介:E⁃mail:jinglei_liu@sina.com
  • 基金资助:
    国家自然科学基金(61572419)

Joint learning of k⁃means and spectral clustering based on multiplication update rule

Di Chen, Jinglei Liu()   

  1. School of Computer and Control Engineering,Yantai University,Yantai,264005,China
  • Received:2020-10-09 Online:2021-03-23 Published:2021-03-23
  • Contact: Jinglei Liu E-mail:jinglei_liu@sina.com

摘要:

k?means和谱聚类是两种应用最广泛的聚类技术.k?means是基于矩阵分解的聚类方法,并且是在数据空间上基于误差极小化的聚类方法.谱聚类是基于图的聚类方法,并且是基于两点在数据空间和特征空间的相似性保持的聚类方法.为了利用两者的优势,提出一种基于乘法更新规则的k?means和谱聚类的联合学习方法,该方法将k?means和谱聚类结合成一个统一的聚类模型,该模型可在单次优化中同时优化k?means和谱聚类的目标;此外,还基于乘法更新规则设计了对聚类中心C与聚类指示器Y进行迭代更新的优化算法.重要的是,在理论上证明了所设计算法的正确性和收敛性.在典型的数据集上进行测试,实验结果表明提出的联合学习算法在聚类精度和标准互信息度指标上都有所提高.

关键词: k?means, 谱聚类, 联合学习方法, 乘法更新规则, 正确性和收敛性

Abstract:

k?means and spectral clustering are two of the most widely used clustering techniques. k?means is a clustering method based on matrix decomposition and error minimization in the data space. Spectral clustering is a graph based clustering method and a similarity preserving clustering method based on two points in data space and feature space. In order to make full use of the advantages of both techniques,a joint learning method of k?means and spectral clustering based on the multiplication update rule is proposed. This method combines k?means and spectral clustering into a uni?ed clustering model,in the single optimization,the model optimizes the target of k?means and spectral clustering simultaneously. In addition,in order to solve the optimization goal of the clustering problem,an iterative updating algorithm for cluster center C and cluster indicator Y is designed based on the multiplication update rule. The correctness and convergence of the proposed algorithm are proved theoretically. Finally,a test is carried out on a typical dataset,and the experimental results show that the proposed joint learning algorithm improves both the clustering accuracy and the standard mutual information degree.

Key words: k?means clustering, spectral clustering, joint learning method, multiplication update rule, correctness and convergence

中图分类号: 

  • TP391

表1

本研究中使用的符号"

符 号描 述
XRm×n数据矩阵
C聚类中心矩阵
G,Y聚类指示器矩阵
L拉普拉斯矩阵
A,B两类数据点
Z带权邻接矩阵
W相似度矩阵
D度矩阵
M原始数据空间的聚类中心矩阵
U,V非负矩阵
Tr?矩阵的迹
XijXi,j元素
c数据集中类的数量
n数据样本数量
m数据维度
σ尺度参数
λ正则化参数
XF2=TrXTXX的Frobenius范数平方

图1

k?means和KMSC在COIL20上的收敛曲线"

图2

k?means和KMSC在TDT2上的收敛曲线"

表2

COIL20数据集的重要统计信息"

数据集种类维度
COIL201440102420

图3

来自COIL20数据库的图像样本"

表3

COIL20数据集上的聚类结果:精度和标准互信息"

PACC (%)NMI (%)
k?meansSCPCANMFKMSCk?meansSCPCANMFKMSC
469.2783.9284.6975.5990.3162.5776.1376.3865.3589.06
663.2976.2377.773.9289.6365.074.7775.5471.4989.65
863.7273.0072.4471.8687.8168.6176.3973.7671.9989.92
1059.7272.8570.8369.4284.266.8475.973.672.6288.18
1259.3673.1367.0369.5981.8468.7677.8272.7573.9888.18
1459.0771.3466.5966.4179.6769.2778.0473.7872.888.46
1656.3969.0564.9965.7680.5170.3479.3674.8273.4388.54
1855.6971.9165.264.5178.1370.3479.3674.8273.4388.54
2055.2768.462.5863.077.9170.5577.9673.5672.8388.99

表4

TDT2数据集的重要统计信息"

数据集种类维度
TDT293943677130

表5

TDT2数据集聚类结果:精度和标准互信息"

PACC (%)NMI (%)
k?meansSCSVDNMFKMSCk?meansSCSVDNMFKMSC
565.3583.6482.795.599.6578.174.976.892.798.91
1068.588.268.283.691.473.169.269.282.489.44
1564.982.165.379.993.474.071.871.882.088.0
2063.979.063.476.391.275.771.571.580.685.9
2561.574.360.875.088.674.670.970.979.083.9
3061.271.265.971.988.674.774.774.765.1386.21

表6

UCI中的两种数据集统计"

数据集样本数量类别数属性数类的大小
Iris1503450,50,50
Vehicle846418199,217,218,212

表7

UCI数据集聚类结果:精度和标准互信息"

KACC (%)NMI (%)
k?meansSCNMFk?means++KMSCk?meansSCNMFk?means++KMSC
Iris89.3389.3369.3366.6790.6775.1474.579.1552.2379.6
Vehicle45.2741.6139.1340.1945.7418.1516.3212.314.1920.46

图4

COIL20 (a)和TDT2 (b)数据集上KMSC算法随参数λ的性能变化"

1 Tasdemir K,Merenyi E. Exploiting data topology in visualization and clustering of self?organizing maps. IEEE Transactions on Neural Networks,2009,20(4):549-562.
2 Kumar A,Rai P,Daumé H. Co?regularized multi?view spectral clustering∥Proceedings of the 24th International Conference on Neural Information Processing Systems. Red Hook,NY,USA:Curran Associates Inc.,2011:1413-1421.
3 Ben?Hur A,Horn D,Siegelmann H T,et al. Support vector clustering. Journal of Machine Learning Research,2002,2(12):125-137.
4 Girolami M. Mercer kernel?based clustering in feature space. IEEE Transactions on Neural Networks,2002,13(3):780-784.
5 Zhang R,Nie F P,Guo M H,et al. Joint learning of fuzzy k?means and nonnegative spectral clustering with side information. IEEE Transactions on Image Processing,2019,28(5):2152-2162,doi:10.1109/TIP.2018.2882925.
6 Zhao W L,Deng C H,Ngo C W. K?means:a revisit. Neurocomputing,2018,291:195-206.
7 Xu J L,Han J W,Nie F P,et al. Re?weighted discriminatively embedded k?means for multi?view clustering. IEEE Transactions on Image Processing,2017,26(6):3016-3027,doi:10.1109/TIP.2017. 2665976.
8 Bauckhage C. K?means clustering is matrix factorization. 2015,arXiv,1512.07548.
9 Hu Z X,Nie F P,Wang R,et al. Multi?view spectral clustering via integrating nonnegative embedding and spectral embedding. Information Fusion,2020,55:251-259.
10 Kailkhura B,Thiagarajan J J,Rastogi C,et al. A spectral approach for the design of experiments:design,analysis and algorithms. Journal of Machine Learning Research,2018,19(34):1-46.
11 Arias?Castro E,Lerman G,Zhang T. Spectral clustering based on local PCA. Journal of Machine Learning Research,2017,18(1):1-57.
12 Lee D D,Seung H S. Algorithms for non?negative matrix factorization∥Proceedings of the 13th International Conference on Neural Information Processing Systems. Cambridge,MA,USA:MIT Press,2000,13:535-541.
13 Duda R O,Hart P E. Pattern classi?cation and scene analysis. New York:Wiley,1973,512.
14 Mao J C,Jain A K. A self?organizing network for hyperellipsoidal clustering. IEEE Transactions on Neural Networks,1996,7(1):16-29.
15 Likas A,Vlassis N,Verbeek J J. The global k?means clustering algorithm. Pattern Recognition,2003,36(2):451-461.
16 Hansen P,Ngai E,Cheung B K. Analysis of global k?means,an incremental heuristic for minimum sum?of?squares clustering. Journal of Classi?cation,2005,22(2):287-310.
17 Shi J B,Malik J M. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence,2000,22(8):888-905,doi:10.1109/34.868688.
18 Ding C,He X F,Simon H D. Nonnegative lagrangian relaxation of k?means and spectral clustering∥Gama J,Camacho R,Brazdil P B,et al. Machine Learning:ECML2005.
Springer Berlin Heidelberg,2005,3720:530-538.
19 Ng A Y,Jordan M I,Weiss Y. On spectral clustering:analysis and an algorithm∥Proceedings of the 14th International Conference on Neural Information Processing Systems:Natural and Synthetic. Cambridge,MA,USA:MIT Press,2001,14:849-856.
20 Wan M H,Lai Z H,Ming Z,et al. An improve face representation and recognition method based on graph regularized non?negative matrix factorization. Multimedia Tools and Applications,2019,78(15):22109-22126.
21 Lee D D,Seung H S. Learning the parts of objects by non?negative matrix factorization. Nature,1999,401(6755):788-791.
22 Ding C,He X F,Simon H D. On the equivalence of nonnegative matrix factorization and spectral clustering∥Proceedings of the SIAM International Conference on Data Mining. Philadelphia,PA,USA:Society for Industrial and Applied Mathematics, 2005: 606-610..
23 Ding C H Q,Li T,Jordan M I. Convex and semi?nonnegative matrix factorizations. IEEE Transactions on Pattern Analysis and Machine Intelligence,2010,32(1):45-55.
24 Ding C,Li T,Peng W. Nonnegative matrix factorization and probabilistic latent semantic indexing:equivalence,chi?square statistic,and a hybrid method∥Proceedings of the 21st National Conference on Arti?cial Intelligence. Boston,MA,USA:AAAI Press,2006:342-347.
25 Lin C J. On the convergence of multiplicative update algorithms for nonnegative matrix factorization. IEEE Transactions on Neural Networks,2007,18(6):1589-1596.
26 Xu W,Liu X,Gong Y H. Document clustering based on non?negative matrix factorization∥Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval. New York,NY,USA:ACM,2003:267-273.
27 Jolliffe I T. Principal component analysis. Springer Berlin Heidelberg,1986,231-246.
28 Deerwester S,Dumais S T,Furnas G W,et al. Indexing by latent semantic analysis. Journal of the American Society for Information Science,1990,41(6):391-407.
29 Zha H Y,He X F,Ding C,et al. Spectral relaxation for k?smeans clustering∥Proceedings of the 14th International Conference on Neural Information Processing Systems:Natural and Synthetic. Cambridge,MA,USA:MIT Press,2001:1057-1064.
30 von Luxburg U. A tutorial on spectral clustering. Statistics and Computing,2007,17(4):395-416.
[1] 段友祥,柳璠,孙歧峰,李洪强. 基于相带划分的孔隙度预测[J]. 南京大学学报(自然科学版), 2019, 55(6): 934-941.
[2] 帅 惠, 袁晓彤, 刘青山. 基于L0约束的稀疏子空间聚类[J]. 南京大学学报(自然科学版), 2018, 54(1): 23-.
[3]  孟 娜1,梁吉业1,2*,庞天杰1. 一种基于抽样的谱聚类集成算法
[J]. 南京大学学报(自然科学版), 2016, 52(6): 1090-.
[4] 曹江中1*,陈 佩2,戴青云3,凌永权1. 基于Markov随机游走的谱聚类相似图构造方法[J]. 南京大学学报(自然科学版), 2015, 51(4): 772-780.
[5] 贾洪杰1,2丁世飞1,2. 基于邻域粗糙集约减的谱聚类算法[J]. 南京大学学报(自然科学版), 2013, 49(5): 619-627.
[6]  高尚兵1**,周静波2,严云洋1.  一种新的基于超像素的谱聚类图像分割算法*[J]. 南京大学学报(自然科学版), 2013, 49(2): 169-175.
[7]  纳跃跃,于剑**
.  一种用于谱聚类图像分割的像素相似度计算方法*[J]. 南京大学学报(自然科学版), 2013, 49(2): 159-168.
[8]  刘娜1.2,肖智博1,鲁明羽1**.  基于形态学的单词一文档谱聚类方法*
[J]. 南京大学学报(自然科学版), 2012, 48(2): 154-163.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!