南京大学学报(自然科学版) ›› 2020, Vol. 56 ›› Issue (4): 515523.doi: 10.13232/j.cnki.jnju.2020.04.009
Lijuan Wang1,2,Shifei Ding1(),Ling Ding1
摘要:
随着大数据时代的到来,大量的高维数据在生活中无处不在.聚类是分析描述数据并按照某种相似性将数据归类的一项技术.传统聚类算法在面对高维数据时,往往无法进行有效的聚类处理.软子空间聚类是通过分配权重,描述样本隶属于不同簇的不确定性来进行聚类,然而,当数据残缺或信息不准时,现有的软子空间聚类的准确度和效率会受到很大的影响.从软子空间聚类面临的问题出发,提出一种改进的软子空间聚类算法;同时针对数据残缺不足的问题,引入迁移学习来削弱数据量不足对聚类分析的影响;通过引入信息熵的概念,用信息熵确定高维数据权重.实验证明,通过结合迁移学习和信息熵,有效地提高了软子空间聚类算法精确度和准确度.
中图分类号:
1 | Chan E Y,Ching W K,Ng M K,et al. An optimization algorithm for clustering using weighted dissimilarity measures. Pattern Recognition,2004,37(5):943-952. |
2 | Gan G J,Wu J H,Yang Z J. A fuzzy subspace algorithm for clustering high dimensional data∥International Conference on Advanced Data Mining and Applications. Springer Berlin Heidelberg,2006:271-278. |
3 | Jing L P,Ng M K,Huang J Z. An entropy weighting k?means algorithm for subspace clustering of high?dimensional sparse data. IEEE Transactions on Knowledge and Data Engineering,2007,19(8):1026-1041. |
4 | Domeniconi C,Gunopulos D,Ma S,et al. Locally adaptive metrics for clustering high dimensional data. Data Mining and Knowledge Discovery,2007,14(1):63-97. |
5 | Deng Z H,Choi K S,Chung F L,et al. Enhanced soft subspace clustering integrating within?cluster and between?cluster information. Pattern Recognition,2010,43(3):767-781. |
6 | Lu Y P,Wang S R,Li S Z,et al. Particle swarm optimizer for variable weighting in clustering high?dimensional data. Machine Learning,2011,82(1):43-70. |
7 | Wang X B,Lei Z,Shi H L,et al. Co?referenced subspace clustering∥2018 IEEE International Conference on Multimedia and Expo (ICME). San Diego,CA,USA:IEEE,2018:1-6. |
8 | Elhamifar E,Vidal R. Sparse subspace clustering:algorithm,theoryand applications. IEEE Transac?tions on Pattern Analysis & Machine Intelligence, 2012, 35(11):2765-2781. |
9 | Dai W,Xue G R,Yang Q,et al. Transferring naive Bayes classifiers for text classification∥Proceedings of the 22nd AAAI Conference on Artificial Intelligence. Vancouver,Canada:AAAI Press,2007:540-545. |
10 | Wei F M,Zhang J P,Chu Y,et al. FSFP:transfer learning from long texts to the short. Applied Mathematics & Information Sciences,2014,8(4):2033-2040. |
11 | Dai W Y,Yang Q,Xue G R,et al. Boosting for transfer learning∥Proceedings of the 24th international conference on Machine learning. Helsinki Finland:ACM,2007:193-200. |
12 | Agrawal R,Gehrke J,Gunopulos D,et al. Automatic subspace clustering of high dimensional data for data mining applications∥ACM SIGMOD Record. Seattle,WA,USA:ACM,1998:94-105. |
13 | 钱鹏江,孙寿伟,蒋亦棒等. 知识迁移极大熵聚类算法. 控制与决策,2015,30(6):1001-1006. |
Qian P J,Sun S W,Jiang Y B,et al. Knowledge Transfer based maximum entropy clustering. Control and Decision,2015,30(6):1001-1006. | |
14 | Yu J,Shi H B,Huang H K,et al. Counterexamples to convergence theorem of maximum?entropy clustering algorithm. Science in China Series F:Information Sciences,2003,46(5):321-326. |
15 | 王熙照,安素芳. 基于极大模糊熵原理的模糊产生式规则中的权重获取方法研究. 计算机研究与发展,2006,43(4):673-678. |
Wang X Z,An S F. Research on learning weights of fuzzy production rules based on maximum fuzzy entropy. Journal of Computer Research and Development,2006,43(4):673-678. | |
16 | 邓赵红,王士同,吴锡生等. 鲁棒的极大熵聚类算法RMEC及其例外点标识. 中国工程科学,2004,6(9):38-45. |
Deng Z H,Wang S T,Wu X S,et al. Robust maximum entropy clustering algorithm RMEC and its outlier labeling. Engineering Science,2004,6(9):38-45. | |
17 | Jiang W H,Chung F L. Transfer spectral clustering∥Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases. Springer Berlin Heidelberg,2012:790-803. |
18 | Jain A K,Murty M N,Flynn P J. Data Clustering:a review. ACM Computing Surveys (CSUR),1999,31(3):265-320. |
19 | Guo G D,Chen S,Chen L F. Soft subspace clustering with an improved feature weight self?adjustment mechanism. International Journal of Machine Learning & Cybernetics,2012,3(1):39-49. |
20 | Xu Y M,Wang C D,Lai J H. Weighted multi?view clustering with feature selection. Pattern Recognition,2016,53:25-35. |
21 | Zhao X R,Evans N,Dugelay J L. A subspace co?training framework for multi?view clustering. Pattern Recognition Letters,2014,41:73-82. |
22 | Ji J C,Bai T,Zhou C G,et al. An improved k?prototypes clustering algorithm for mixed numeric and categorical data. Neurocomputing,2013,120:590-596. |
23 | 黄王非,黎飞,青山. 基于子空间维度加权的密度聚类算法. 计算机工程2010,36(9):65-67. (Huang W F,Li F,Qing S. Density clustering algorithm based on subspace dimensional weighting. Computer Engineering,2010,36(9):65-67.) |
24 | Donoho D L. High?dimensional data analysis:The curses and blessings of dimensionality. American Mathematical Society Math Challenges Lecture,2000,1:32. |
25 | 许亚骏. 子空间聚类算法研究及应用. 硕士学位论文. 无锡:江南大学,2016. |
Xu Y J. Research on subspace clustering algorithms and its applications. Master Dissertation. Wuxi:Jiangnan University,2016. | |
26 | Weiss K,Khoshgoftaar T M,Wang D D. A survey of transfer learning. Journal of Big Data,2016,3:9. |
27 | Günnemann S,Boden B,Seidl T. DB?CSC: a density?based approach for subspace clustering in graphs with feature vectors∥Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer Berlin Heidelberg,2011:565-580. |
28 | Wan S J,Wong S K M,Prusinkiewicz P. An algorithm for multidimensional data clustering. ACM Transactions on Mathematical Software,14(2):153-162. |
[1] | 钟琪,冯亚琴,王蔚. 跨语言语料库的语音情感识别对比研究[J]. 南京大学学报(自然科学版), 2019, 55(5): 765-773. |
[2] | 帅 惠, 袁晓彤, 刘青山. 基于L0约束的稀疏子空间聚类[J]. 南京大学学报(自然科学版), 2018, 54(1): 23-. |
[3] | 严丽宇1,魏 巍1,2*,郭鑫垚1,崔军彪1. 一种基于带核随机子空间的聚类集成算法[J]. 南京大学学报(自然科学版), 2017, 53(6): 1033-. |
[4] | 李新玉1,徐桂云1,任世锦2*,杨茂云1,2. 基于可靠性的正则化加权软k-均值的子空间聚类[J]. 南京大学学报(自然科学版), 2017, 53(3): 525-. |
[5] | 孟佳娜*, 赵丹丹, 于玉海, 孙世昶. 归纳式迁移学习在跨领域情感倾向性分析中的应用[J]. 南京大学学报(自然科学版), 2016, 52(1): 175-183. |
[6] | 廖 娟 1* , 王 江 1 , 徐 亮 2 , 李 勃 1 , 陈启美 1 . 相机抖动场景下的运动前景检测算法 [J]. 南京大学学报(自然科学版), 2015, 51(2): 219-226. |
[7] | 刘 波1, 王红军1*,成 聪2,杨 燕1. 基于属性最大间隔的子空间聚类[J]. 南京大学学报(自然科学版), 2014, 50(4): 482-. |
[8] | 贾洪杰1,2丁世飞1,2. 基于邻域粗糙集约减的谱聚类算法[J]. 南京大学学报(自然科学版), 2013, 49(5): 619-627. |
|