南京大学学报(自然科学版) ›› 2023, Vol. 59 ›› Issue (3): 388–397.doi: 10.13232/j.cnki.jnju.2023.03.003

• • 上一篇    下一篇

多样性诱导的潜在嵌入多视图聚类

张绎凡1,2, 李婷1,2, 葛洪伟1,2()   

  1. 1.江南大学人工智能与计算机学院,无锡,214122
    2.江苏省模式识别与计算智能工程实验室(江南大学),无锡,214122
  • 收稿日期:2023-03-01 出版日期:2023-05-31 发布日期:2023-06-09
  • 通讯作者: 葛洪伟 E-mail:ghw8601@163.com
  • 基金资助:
    国家自然科学基金(61806006);江苏省研究生创新计划(KYLX16_0718)

Diversity⁃induced multi⁃view clustering in latent embedded space

Yifan Zhang1,2, Ting Li1,2, Hongwei Ge1,2()   

  1. 1.School of Artificial Intelligence and Computer Science,Jiangnan University,Wuxi,214122,China
    2.Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence,Jiangnan University,Wuxi,214122,China
  • Received:2023-03-01 Online:2023-05-31 Published:2023-06-09
  • Contact: Hongwei Ge E-mail:ghw8601@163.com

摘要:

目前,多视图子空间聚类在模式识别和机器学习领域得到了广泛的研究.以前的多视图聚类算法大多将多视图数据划分在其原始特征空间中,其功效在很大程度上隐式地依赖于原始特征呈现的质量.此外,不同视图包含同一对象的特定信息,如何利用这些视图恢复潜在的多样性信息对后续聚类尤其重要.为了解决上述问题,提出一种多样性诱导的潜在嵌入多视图聚类方法,使用特定于视图的投影矩阵从多视图数据中恢复潜在嵌入空间.考虑到多视图数据不同视图之间的多样性信息,采用经验的希尔伯特施密特独立准则约束特定于视图的投影矩阵,将潜在嵌入学习、多样性学习、全局相似性学习和聚类指标学习整合在同一框架中,还设计了一种交替优化方案来有效处理优化问题.在几个真实的多视图数据集上的实验表明,提出的算法具有一定的优势.

关键词: 子空间聚类, 多样性, 潜在嵌入空间, 希尔伯特施密特独立准则

Abstract:

Currently,multi?view subspace clustering is widely studied in fields of pattern recognition and machine learning. Previous multi?view clustering algorithms mostly partition the multi?view data in their original feature space,while the efficacy of which heavily and implicitly relies on the quality of the original feature presentation. In addition,different views contain specific information in a same object and how to use these views to recover latent diverse information is particularly important for clustering.To solve the above problems,this paper proposes a method named Diversity?induced Multi?view Clustering in Latent Embedded Space (DiMCLES),which uses projection matrix on specific view to recover latent embedded space from multi?view data. This paper uses an emprical Hibert Schmidt Independent Criterion to constrain the projection matrix on specific view which considers the diverse information of multi?view data between different views. Latent embedded learning,diversity learning,global similarity learning and clustering indicator learning are integrated into a unified framework,and an alternating optimaization scheme is introduced for optimization. Experiments on several real?world multi?view datasets verifies the superiority of our approach.

Key words: subspace clustering, diversity, latent embedding space, Hilbert Schmidt Independence Criterion

中图分类号: 

  • TP391.41

图1

潜在嵌入空间示意图"

表1

实验使用的数据集介绍"

数据集3SourcesNotting⁃HillYaleMSRCv1ORLBBCSport
类别数65157405
样本数169550165210400544
视图数333432

图2

DiMCLES,LMSC和MCLES在六个基准数据集上的收敛曲线"

图3

在六个基准数据集上根据ACC对α,β,γ,λ和d的参数分析"

图4

t?SNE在MSRCv1和3Sources数据集上的可视化结果"

表2

不同算法在3Sources数据集上的聚类表现"

方法ACCNMIPURRI
DiMCLES77.21% (1.41)67.28% (2.24)78.42% (1.41)83.43% (1.96)
SCbest69.70% (2.10)60.80% (1.50)79.70% (0.70)84.90% (0.40)
ConPCA66.30% (1.30)59.00% (2.20)72.50% (1.80)77.50% (0.50)
Co⁃Reg55.10% (0.30)48.60% (0.20)67.70% (0.20)76.70% (0.10)
Co⁃Training59.80% (0.50)55.30% (0.50)74.10% (0.50)80.70% (0.20)
Min⁃Dis52.70% (0.70)48.30% (0.60)68.80% (0.50)76.40% (0.40)
RMSC54.20% (1.40)50.50% (1.20)68.70% (0.90)76.20% (0.60)
LMSC71.00% (1.00)64.90% (1.50)80.00% (1.10)83.30% (0.70)
MVGL34.90% (0.00)12.10% (0.00)41.40% (0.00)36.00% (0.00)
MCLES66.00% (2.10)61.90% (2.00)77.00% (1.50)83.00% (1.20)
LSRMSC70.20% (1.80)64.26% (1.90)71.31% (1.40)82.64% (1.30)

表3

不同算法在Notting?Hill数据集上的聚类表现"

方法ACCNMIPURRI
DiMCLES93.45% (0.00)88.27% (0.00)93.45% (0.00)95.62% (0.00)
SCbest80.00% (0.00)61.60% (0.00)80.00% (0.00)87.10% (0.00)
ConPCA74.90% (0.30)64.00% (0.60)77.20% (0.20)86.50% (0.10)
Co⁃Reg78.80% (0.40)72.80% (0.22)67.68% (0.21)76.67% (0.09)
Co⁃Training81.30% (0.50)76.40% (0.80)84.30% (0.30)91.30% (0.20)
Min⁃Dis79.80% (0.60)72.10% (0.40)82.50% (0.40)90.10% (0.20)
RMSC75.70% (6.00)72.70% (3.10)82.20% (3.10)88.40% (2.20)
LMSC81.00% (5.80)68.60% (6.50)81.00% (5.80)86.20% (4.30)
MVGL90.50% (0.00)81.20% (0.00)90.50% (0.00)93.30% (0.00)
MCLES83.40% (3.30)79.70% (2.20)85.80% (2.40)92.30% (1.00)
LSRMSC86.55% (2.20)82.23% (1.50)86.55% (2.20)93.95% (0.80)

表4

不同算法在Yale数据集上的聚类表现"

方法ACCNMIPURRI
DiMCLES71.00% (1.42)73.52% (1.37)71.36% (1.52)93.89% (0.53)
SCbest63.18% (3.46)65.07% (2.56)63.67% (3.41)92.50% (0.43)
ConPCA56.38% (3.71)60.88% (2.81)57.50% (3.48)92.48% (0.42)
Co⁃Reg59.56% (0.55)63.62% (0.41)60.65% (0.48)93.10% (0.02)
Co⁃Training62.23% (0.39)65.61% (0.49)62.87% (0.51)93.45% (0.21)
Min⁃Dis59.74% (0.66)63.03% (0.48)60.26% (0.70)92.82% (0.13)
RMSC56.25% (4.26)52.42% (3.73)55.11% (3.55)93.03% (0.20)
LMSC66.73% (1.76)68.69% (1.55)67.06% (1.69)93.57% (0.33)
MVGL63.03% (0.00)63.81% (0.00)64.24% (0.00)92.44% (0.00)
MCLES70.02% (1.18)71.74% (1.60)70.12% (1.35)93.54% (0.64)
LSRMSC69.55% (1.30)72.12% (1.50)69.64% (1.40)93.50% (0.50)

表5

不同算法在MSRCv1数据集上的聚类表现"

方法ACCNMIPURRI
DiMCLES89.52% (0.00)82.42% (0.00)89.52% (0.00)94.71% (0.00)
SCbest69.45% (1.86)53.55% (1.51)69.45% (1.86)87.20% (0.21)
ConPCA61.53% (0.30)50.61% (0.91)64.72% (0.82)85.30% (0.22)
Co⁃Reg62.33% (0.57)51.04% (0.36)64.48% (0.48)85.13% (0.11)
Co⁃Training69.81% (0.99)61.65% (0.64)71.79% (0.71)91.15% (0.24)
Min⁃Dis59.23% (0.71)51.74% (0.46)60.75% (0.67)85.18% (0.18)
RMSC29.98% (1.89)28.19% (1.38)28.26% (1.63)79.39% (0.19)
LMSC67.43% (5.91)57.76% (6.06)69.00% (6.24)86.42% (2.43)
MVGL67.14% (0.00)57.75% (0.00)70.48% (0.00)86.27% (0.00)
MCLES87.44% (0.40)79.23% (0.89)87.59% (0.40)93.74% (0.24)
LSRMSC80.98% (0.10)72.11% (0.20)80.98% (0.10)91.57% (0.00)

表6

不同算法在ORL数据集上的聚类表现"

方法ACCNMIPURRI
DiMCLES77.50% (1.51)89.91% (0.52)81.70% (1.00)98.58% (0.16)
SCbest77.35% (2.61)89.10% (1.02)80.25% (2.05)98.17% (0.12)
ConPCA65.30% (2.51)80.29% (1.39)69.00% (2.33)97.77% (0.11)
Co⁃Reg69.21% (0.37)83.76% (0.17)72.94% (0.28)98.00% (0.00)
Co⁃Training75.39% (0.58)88.13% (0.31)78.79% (0.50)98.42% (0.00)
Min⁃Dis72.59% (0.62)86.17% (0.30)76.25% (0.51)98.18% (0.00)
RMSC76.03% (2.59)72.00% (2.09)73.87% (1.69)98.39% (0.00)
LMSC80.13% (3.33)89.66% (2.04)83.79% (2.93)98.81% (0.21)
MVGL73.50% (0.00)86.51% (0.00)79.50% (0.00)97.07% (0.00)
MCLES79.73% (0.40)89.02% (1.20)84.02% (1.81)98.52% (0.18)
LSRMSC82.31% (2.20)89.88% (0.70)84.00% (2.30)98.56% (0.10)

表7

不同算法在BBCSport数据集上的聚类表现"

方法ACCNMIPURRI
DiMCLES88.60% (0.00)85.68% (0.00)88.60% (0.00)94.96% (0.00)
SCbest84.53% (0.12)67.17% (0.18)84.53% (0.12)88.81% (0.00)
ConPCANANANANA
Co⁃Reg69.28% (0.70)53.75% (0.21)73.48% (0.33)85.13% (0.09)
Co⁃Training69.79% (0.39)56.57% (0.17)76.01% (0.20)91.16% (0.14)
Min⁃Dis85.07% (0.87)78.43% (0.55)87.15% (0.46)92.64% (0.28)
RMSC77.37% (0.98)76.45% (1.17)75.97% (1.06)92.38% (0.79)
LMSC85.12%(12.03)74.48%(13.56)85.60%(10.53)94.75% (0.12)
MVGL41.91% (0.00)8.80% (0.00)42.28% (0.00)33.34% (0.00)
MCLES87.28% (0.32)80.01% (1.14)87.28% (0.32)93.89% (0.43)
LSRMSC88.54% (0.00)83.12% (0.00)88.54% (0.00)94.75% (0.00)
1 Yang Y, Wang H. Multi?view clustering:A survey. Big Data Mining and Analytics20181(2):83-107.
2 Nigam K, Ghani R. Analyzing the effectiveness and applicability of co?training∥Proceedings of the 9th International Conference on Information and Knowledge Management. McLean,VA,USA:ACM,2000:86-93.
3 Kumar A, Rai P, Daumé H. Co?regularized multi?view spectral clustering∥Proceedings of the 24th International Conference on Information Processing Systems. Granada,Spain:Curran Associates Inc.,2011:1413-1421.
4 Huang S D, Kang Z, Tsang I W,et al. Auto?weighted multi?view clustering via kernelized graph learning. Pattern Recognition2019(88):174-184.
5 Liu J, Cao F Y, Gao X Z,et al. A cluster?weighted kernel k?means method for multi?view clustering∥Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto,CA,USA:AAAI Press,2020:4860-4867.
6 Zhan K, Zhang C Q, Guan J P,et al. Graph learning for multiview clustering. IEEE Transactions on Cybernetics201848(10):2887-2895.
7 Wang X B, Guo X J, Lei Z,et al. Exclusivity?consistency regularized multi?view subspace clustering∥Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu,HI,USA:IEEE,2017:1-9.
8 Zhang X Q, Wang J, Xue X Q,et al. Confidence level auto?weighting robust multi?view subspace clustering. Neurocomputing2022(475):38-52.
9 Zhang C Q, Hu Q H, Fu H Z,et al. Latent multi?view subspace clustering∥Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu,HI,USA:IEEE,2017:4333-4341.
10 Huang B F, Yuan H L, Lai L L. Latent shared representation for multi?view subspace clustering∥2021 International Joint Conference on Neural Networks. Shenzhen,China:IEEE,2021:1-8.
11 Chen M S, Huang L, Wang C D,et al. Relaxed multi?view clustering in latent embedding space. Information Fusion2021(68):8-21.
12 Chen M S, Huang L, Wang C D,et al. Multi?view clustering in latent embedding space∥Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto,CA,USA:AAAI Press,2020:3513-3520.
13 Xia S Y, Peng D W, Meng D Y,et al. Ball k?Means:A fast adaptive clustering with no bounds. IEEE Transactions on Pattern Analysis and Machine Intelligence202244(1):87-99.
14 Cao X C, Zhang C Q, Fu H Z,et al. Diversity?induced multi?view subspace clustering∥Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston,MA,USA:IEEE,2015:586-594.
15 Kang Z, Peng C, Cheng Q. Twin learning for similarity and clustering:A unified kernel approach∥Proceedings of the 31st AAAI Conference on Artificial Intelligence. San Francisco,CA,USA:AAAI Press,2017:2080-2086.
16 Kumar A, Daume III H. A co?training approach for multi?view spectral clustering∥Proceedings of the 28th International Conference on International Conference on Machine Learning. Bellevue,WA,USA:Omnipress,2011:393-400.
17 DE SA V R. Spectral clustering with two views∥2005 ICML Workshop on Learning with Multiple Views. New York,USA:ACM,2005:20-27.
18 Xia R K, Pan Y, Du L,et al. Robust multi?view spectral clustering via low?rank and sparse decomposition∥Proceedings of the 28th AAAI Conference on Artificial Intelligence. Québec City,Canada:AAAI Press,2014:2149-2155.
[1] 卢桂馥, 汤荣, 姚亮. 双重结构的最小二乘回归子空间聚类算法[J]. 南京大学学报(自然科学版), 2022, 58(6): 1050-1058.
[2] 韩迪, 陈怡君, 廖凯, 林坤玲. 推荐系统中的准确性、新颖性和多样性的有效耦合与应用[J]. 南京大学学报(自然科学版), 2022, 58(4): 604-614.
[3] 夏菁, 丁世飞. 基于低秩稀疏约束的自权重多视角子空间聚类[J]. 南京大学学报(自然科学版), 2020, 56(6): 862-869.
[4] 王丽娟,丁世飞,丁玲. 基于迁移学习的软子空间聚类算法[J]. 南京大学学报(自然科学版), 2020, 56(4): 515-523.
[5] 洪佳明,黄云,刘少鹏,印鉴. 具有结果多样性的近似子图查询算法[J]. 南京大学学报(自然科学版), 2019, 55(6): 960-972.
[6] 帅 惠, 袁晓彤, 刘青山. 基于L0约束的稀疏子空间聚类[J]. 南京大学学报(自然科学版), 2018, 54(1): 23-.
[7]  严丽宇1,魏 巍1,2*,郭鑫垚1,崔军彪1.  一种基于带核随机子空间的聚类集成算法[J]. 南京大学学报(自然科学版), 2017, 53(6): 1033-.
[8]  李兴亮1,毛 睿2*.  基于近期最远遍历的支撑点选择[J]. 南京大学学报(自然科学版), 2017, 53(3): 483-.
[9]  万晨洁,余益军,张 莉,张晓辉,刘红玲*,于红霞. 太湖有机污染物的生态风险研究[J]. 南京大学学报(自然科学版), 2017, 53(2): 256-.
[10] 韩永和,贾梦茹,傅景威,向 萍,史孝霞,崔昕毅,罗 军,陈焱山*. 不同浓度砷酸盐胁迫对蜈蚣草根际微生物群落功能多样性特征的影响[J]. 南京大学学报(自然科学版), 2017, 53(2): 275-.
[11] 屈伟洋, 俞 扬. 多样性正则的神经网络训练方法探索[J]. 南京大学学报(自然科学版), 2017, 53(2): 340-.
[12] 相景昌1,2,陈 爽1*,余 成1,3,李广宇1,3. 南黄海辐射沙脊群生物多样性非使用价值评估[J]. 南京大学学报(自然科学版), 2014, 50(5): 723-732.
[13] 左 平1,2*,欧志吉1,姜启吴1,刘 明3. 江苏盐城原生滨海湿地土壤中的微生物群落功能多样性分析[J]. 南京大学学报(自然科学版), 2014, 50(5): 715-722.
[14] 于雯雯,刘培廷*,张朝晖,张 虎,高继先,吴福权,许程林,贲成恺,袁健美. 南黄海辐射沙脊群浮游动物群落结构及季节变化[J]. 南京大学学报(自然科学版), 2014, 50(5): 706-714.
[15] 刘 波1, 王红军1*,成 聪2,杨 燕1. 基于属性最大间隔的子空间聚类[J]. 南京大学学报(自然科学版), 2014, 50(4): 482-.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!