基于迁移学习的软子空间聚类算法

doi:10.13232/j.cnki.jnju.2020.04.009

南京大学学报(自然科学版) ›› 2020, Vol. 56 ›› Issue (4): 515–523.doi: 10.13232/j.cnki.jnju.2020.04.009

基于迁移学习的软子空间聚类算法

王丽娟^1,²,丁世飞¹(),丁玲¹

^1.中国矿业大学计算机科学与技术学院，徐州，221116
^2.徐州工业职业技术学院信息与电气工程学院，徐州，221400

收稿日期:2020-06-20 出版日期:2020-07-30 发布日期:2020-08-06
通讯作者: 丁世飞 E-mail:dingsf@cumt.edu.cn
基金资助:
国家自然科学基金(61672522);2020年江苏省高校“青蓝工程”

Soft subspace clustering algorithm based on transfer learning

Lijuan Wang^1,²,Shifei Ding¹(),Ling Ding¹

^1.School of Computer Science and Technology，China University of Mining and Technology，Xuzhou，221116，China
^2.School of Information and Electrical Engineering，Xuzhou College of Industrial Technology，Xuzhou，221400，China

Received:2020-06-20 Online:2020-07-30 Published:2020-08-06
Contact: Shifei Ding E-mail:dingsf@cumt.edu.cn

摘要/Abstract

摘要：

随着大数据时代的到来，大量的高维数据在生活中无处不在.聚类是分析描述数据并按照某种相似性将数据归类的一项技术.传统聚类算法在面对高维数据时，往往无法进行有效的聚类处理.软子空间聚类是通过分配权重，描述样本隶属于不同簇的不确定性来进行聚类，然而，当数据残缺或信息不准时，现有的软子空间聚类的准确度和效率会受到很大的影响.从软子空间聚类面临的问题出发，提出一种改进的软子空间聚类算法；同时针对数据残缺不足的问题，引入迁移学习来削弱数据量不足对聚类分析的影响；通过引入信息熵的概念，用信息熵确定高维数据权重.实验证明，通过结合迁移学习和信息熵，有效地提高了软子空间聚类算法精确度和准确度.

关键词: 子空间聚类, 迁移学习, 信息熵, 高维数据

Abstract:

With the advent of the era of big data，a large number of high?dimensional data have become very common. Clustering is a technique of analyzing，describing and classifying data according to some similarity. When faced with high?dimensional data，the traditional clustering algorithms are often unable to carry out effective clustering processing. Soft subspace clustering is based on the distribution of weights to describe the uncertainty of samples belonging to different clusters. However，the accuracy and efficiency of existing soft subspace clustering will be significantly affected when the data is incomplete or the information is not timely. Starting from the problems faced by soft subspace clustering，this paper proposes an improved soft subspace clustering algorithm. At the same time，aiming at the problem of insufficient data，we introduce migration learning to reduce the impact of insufficient data on clustering analysis. By introducing the concept of information entropy，we use information entropy to determine the weight of high?dimensional data. By combining migration learning and information entropy，the accuracy and accuracy of soft subspace clustering algorithm are effectively improved.

Key words: subspace clustering, transfer learning, information entropy, high dimensional data

中图分类号:

TP301

王丽娟,丁世飞,丁玲. 基于迁移学习的软子空间聚类算法[J]. 南京大学学报(自然科学版), 2020, 56(4): 515–523.

Lijuan Wang,Shifei Ding,Ling Ding. Soft subspace clustering algorithm based on transfer learning[J]. Journal of Nanjing University(Natural Sciences), 2020, 56(4): 515–523.

图/表 7

图1

图2

表1

表2

表3

表4

表5

参考文献 28

1	Chan E Y，Ching W K，Ng M K，et al. An optimization algorithm for clustering using weighted dissimilarity measures. Pattern Recognition，2004，37(5)：943-952.
2	Gan G J，Wu J H，Yang Z J. A fuzzy subspace algorithm for clustering high dimensional data∥International Conference on Advanced Data Mining and Applications. Springer Berlin Heidelberg，2006：271-278.
3	Jing L P，Ng M K，Huang J Z. An entropy weighting k?means algorithm for subspace clustering of high?dimensional sparse data. IEEE Transactions on Knowledge and Data Engineering，2007，19(8)：1026-1041.
4	Domeniconi C，Gunopulos D，Ma S，et al. Locally adaptive metrics for clustering high dimensional data. Data Mining and Knowledge Discovery，2007，14(1)：63-97.
5	Deng Z H，Choi K S，Chung F L，et al. Enhanced soft subspace clustering integrating within?cluster and between?cluster information. Pattern Recognition，2010，43(3)：767-781.
6	Lu Y P，Wang S R，Li S Z，et al. Particle swarm optimizer for variable weighting in clustering high?dimensional data. Machine Learning，2011，82(1)：43-70.
7	Wang X B，Lei Z，Shi H L，et al. Co?referenced subspace clustering∥2018 IEEE International Conference on Multimedia and Expo (ICME). San Diego，CA，USA：IEEE，2018：1-6.
8	Elhamifar E,Vidal R. Sparse subspace clustering：algorithm,theoryand applications. IEEE Transac?tions on Pattern Analysis & Machine Intelligence, 2012, 35(11):2765-2781.
9	Dai W，Xue G R，Yang Q，et al. Transferring naive Bayes classifiers for text classification∥Proceedings of the 22^nd AAAI Conference on Artificial Intelligence. Vancouver，Canada：AAAI Press，2007：540-545.
10	Wei F M，Zhang J P，Chu Y，et al. FSFP：transfer learning from long texts to the short. Applied Mathematics & Information Sciences，2014，8(4)：2033-2040.
11	Dai W Y，Yang Q，Xue G R，et al. Boosting for transfer learning∥Proceedings of the 24^th international conference on Machine learning. Helsinki Finland：ACM，2007：193-200.
12	Agrawal R，Gehrke J，Gunopulos D，et al. Automatic subspace clustering of high dimensional data for data mining applications∥ACM SIGMOD Record. Seattle,WA，USA：ACM，1998：94-105.
13	钱鹏江，孙寿伟，蒋亦棒等. 知识迁移极大熵聚类算法. 控制与决策，2015，30(6)：1001-1006.
	Qian P J，Sun S W，Jiang Y B，et al. Knowledge Transfer based maximum entropy clustering. Control and Decision，2015，30(6)：1001-1006.
14	Yu J，Shi H B，Huang H K，et al. Counterexamples to convergence theorem of maximum?entropy clustering algorithm. Science in China Series F：Information Sciences，2003，46(5)：321-326.
15	王熙照，安素芳. 基于极大模糊熵原理的模糊产生式规则中的权重获取方法研究. 计算机研究与发展，2006，43(4)：673-678.
	Wang X Z，An S F. Research on learning weights of fuzzy production rules based on maximum fuzzy entropy. Journal of Computer Research and Development，2006，43(4)：673-678.
16	邓赵红，王士同，吴锡生等. 鲁棒的极大熵聚类算法RMEC及其例外点标识. 中国工程科学，2004，6(9)：38-45.
	Deng Z H，Wang S T，Wu X S，et al. Robust maximum entropy clustering algorithm RMEC and its outlier labeling. Engineering Science，2004，6(9)：38-45.
17	Jiang W H，Chung F L. Transfer spectral clustering∥Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases. Springer Berlin Heidelberg，2012：790-803.
18	Jain A K，Murty M N，Flynn P J. Data Clustering：a review. ACM Computing Surveys (CSUR)，1999，31(3)：265-320.
19	Guo G D,Chen S,Chen L F. Soft subspace clustering with an improved feature weight self?adjustment mechanism. International Journal of Machine Learning & Cybernetics,2012,3(1)：39-49.
20	Xu Y M，Wang C D，Lai J H. Weighted multi?view clustering with feature selection. Pattern Recognition，2016，53：25-35.
21	Zhao X R，Evans N，Dugelay J L. A subspace co?training framework for multi?view clustering. Pattern Recognition Letters，2014，41：73-82.
22	Ji J C，Bai T，Zhou C G，et al. An improved k?prototypes clustering algorithm for mixed numeric and categorical data. Neurocomputing，2013，120：590-596.
23	黄王非，黎飞，青山. 基于子空间维度加权的密度聚类算法. 计算机工程2010，36(9)：65-67. (Huang W F，Li F，Qing S. Density clustering algorithm based on subspace dimensional weighting. Computer Engineering，2010，36(9)：65-67.)
24	Donoho D L. High?dimensional data analysis：The curses and blessings of dimensionality. American Mathematical Society Math Challenges Lecture，2000，1：32.
25	许亚骏. 子空间聚类算法研究及应用. 硕士学位论文. 无锡：江南大学，2016.
	Xu Y J. Research on subspace clustering algorithms and its applications. Master Dissertation. Wuxi：Jiangnan University，2016.
26	Weiss K，Khoshgoftaar T M，Wang D D. A survey of transfer learning. Journal of Big Data，2016，3：9.
27	Günnemann S，Boden B，Seidl T. DB?CSC： a density?based approach for subspace clustering in graphs with feature vectors∥Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer Berlin Heidelberg，2011：565-580.
28	Wan S J，Wong S K M，Prusinkiewicz P. An algorithm for multidimensional data clustering. ACM Transactions on Mathematical Software，14(2)：153-162.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

序号	名称	样本数N	维度D	聚类数目C
1	Iris	150	4	3
2	Wine	178	13	3
3	Vehicle	208	18	4
4	Australian	690	14	2

数据集	EWKM	ESSC	FSC
Iris	0.8523	0.8720	0.8423
Wine	0.8415	0.8975	0.8358
Vehicle	0.3747	0.5261	0.3854
Australian	0.7552	0.7123	0.7348

数据集	EWKM	ESSC	FSC
Iris	0.7523	0.7441	0.7105
Wine	0.7015	0.7025	0.7158
Vehicle	0.1042	0.1225	0.1156
Australian	0.4835	0.3454	0.3855

数据集	X_current?all				X_current?lost
数据集	EWKM	ESSC	FSC	TSC	EWKM	ESSC	FSC	TSC
Iris	0.6235	0.6358	0.6135	0.7852	0.5442	0.5317	0.5423	0.7561
Wine	0.6552	0.6884	0.6451	0.7245	0.6075	0.6023	0.5997	0.8024
Vehicle	0.6578	0.6021	0.6077	0.8245	0.3871	0.3561	0.3534	0.6122
Australian	0.6021	0.5975	0.5988	0.7846	0.4223	0.4125	0.4108	0.7241

数据集	X_current?all				X_current?lost
数据集	EWKM	ESSC	FSC	TSC	EWKM	ESSC	FSC	TSC
Iris	0.5247	0.5365	0.5286	0.6807	0.4275	0.4562	0.3925	0.5803
Wine	0.6218	0.6452	0.6102	0.7534	0.5842	0.6231	0.5714	0.6744
Vehicle	0.1204	0.1078	0.1107	0.1608	0.0214	0.0451	0.0168	0.1256
Australian	0.2536	0.2237	0.2496	0.4532	0.2453	0.1431	0.1087	0.3998

基于迁移学习的软子空间聚类算法

Soft subspace clustering algorithm based on transfer learning

RichHTML

PDF (PC)

摘要/Abstract

引用本文

使用本文

图/表 7

参考文献 28

相关文章 8

Metrics

本文评价

推荐阅读 10

[1]	钟琪,冯亚琴,王蔚. 跨语言语料库的语音情感识别对比研究[J]. 南京大学学报(自然科学版), 2019, 55(5): 765-773.
[2]	帅　惠, 袁晓彤, 刘青山. 基于L0约束的稀疏子空间聚类[J]. 南京大学学报(自然科学版), 2018, 54(1): 23-.
[3]	严丽宇1，魏　巍1，2*，郭鑫垚1，崔军彪1. 一种基于带核随机子空间的聚类集成算法[J]. 南京大学学报(自然科学版), 2017, 53(6): 1033-.
[4]	李新玉1，徐桂云1，任世锦2*，杨茂云1，2. 基于可靠性的正则化加权软k－均值的子空间聚类[J]. 南京大学学报(自然科学版), 2017, 53(3): 525-.
[5]	孟佳娜*, 赵丹丹, 于玉海, 孙世昶. 归纳式迁移学习在跨领域情感倾向性分析中的应用[J]. 南京大学学报(自然科学版), 2016, 52(1): 175-183.
[6]	廖娟 1* , 王江 1 , 徐亮 2 , 李勃 1 , 陈启美 1 . 相机抖动场景下的运动前景检测算法 [J]. 南京大学学报(自然科学版), 2015, 51(2): 219-226.
[7]	刘波1, 王红军1*，成聪2，杨燕1. 基于属性最大间隔的子空间聚类[J]. 南京大学学报(自然科学版), 2014, 50(4): 482-.
[8]	贾洪杰1,2丁世飞1,2. 基于邻域粗糙集约减的谱聚类算法[J]. 南京大学学报(自然科学版), 2013, 49(5): 619-627.