南京大学学报(自然科学版) ›› 2017, Vol. 53 ›› Issue (3): 525–.

• • 上一篇    下一篇

 基于可靠性的正则化加权软k-均值的子空间聚类

 李新玉1,徐桂云1,任世锦2*,杨茂云1,2   

  • 出版日期:2017-05-30 发布日期:2017-05-30
  • 作者简介: 1.中国矿业大学机电工程学院,徐州,221116;2.徐州师范大学计算机学院,徐州,221116
  • 基金资助:
     基金项目:国家自然科学基金(60974056)
    收稿日期:2016-12-01
    *通讯联系人,E­mail:sjren_phd@163.com

 Reliability­based regularized weighted soft k­means algorithm for subspace clustering

 Li Xinyu1,Xu Guiyun1,Ren Shijin2*,Yang Maoyun1,2   

  • Online:2017-05-30 Published:2017-05-30
  • About author: 1.School of Mechanical and Electrical Engineering,China University of Mining and Technology,Xuzhou,221116,China;
    2.School of Computer Science & Technology,Jiangsu Normal University,Xuzhou,221116,China

摘要:  子空间聚类已经广泛应用于多个涉及高维数据聚类应用领域,受到机器学习研究者的广泛关注.子空间聚类方法是一种使用特征选择的聚类分析技术,通过选择重要特征子集实现对高维空间的低维表示,在实际应用中能够取得更好的性能,成为流行的高维数据聚类方法.与硬聚类方法相比,软聚类能够给出复杂数据更有意义的划分.扩展k-均值聚类并提出基于可靠性的正则化加权软k-均值新的子空间聚类方法(Reliability­based regularized weighted soft k­means clustering algorithm,RRWSKM),该方法能够计算每个特征对每个聚类的贡献度,从而找到与不同聚类相关的重要特征子集.另外,该方法能够通过调整模型参数准确地辨识数据模式,具有良好的聚类性能.该方法把维度加权熵和划分熵作为正则化项引入到目标函数,避免过拟合问题同时使更多的特征参与辨识聚类.为了提高算法的鲁棒性,使用可靠性测度获得特征权重初始值,提高算法的可靠性和性能.考虑到该算法是非凸优化问题,使用迭代优化方法得到优化问题的最优解.使用多个实际数据集对本文算法进行仿真验证,结果表明,与其他子空间聚类算法相比,该算法能够有效发现高维数据的低维表示,具有良好的聚类性能,适合高维数据的聚类.

Abstract:  Subspace clustering methods have been widely employed in many fields involved in high­dimensional data clustering and attracted more and more attentions.Subspace clustering method is a clustering analysis technique with feature selection and can achieve better performances by selecting a subset of salient features and performing clustering on the low­dimensional representation of the high­dimensional data.In many practical applications,it is known that soft clustering can provide more meaningful partition of complex data than hard clustering.In this paper,we extend the k­means clustering and present a novel reliability­based regularized weighted soft k­means clustering algorithm(RRWSKM).The method can calculate the contribution of each dimension in each cluster and find different subsets of salient dimensions relevant to different clusters.Furthermore,it can also identify the exact data patterns by tuning model parameters and exhibit good performance.These are achieved by incorporating dimension weight entropy and partition entropy terms as regularizations into the objective function to avoid overfitting and stimulate more dimensions to contribute to identify the clusters.In addition,the reliability of dimension weights is retained by exploiting the data reliability measure,and the initial dimension weights can be determined,enhancing the performances and robustness of the proposed algorithm greatly.Since the optimization problem of RRWSKM is non­convex,the optimal solution is achieved by solving the optimization problem through an iterative update formulations.Some experiments on real­world data sets are conducted to verify the novel algorithm.The results of the experiments showed that the proposed method can exhibit the low­dimensionality representations of high­dimensional data and achieve better clustering performances than other subspace clustering methods and can handle with the high­dimensional data well.

 [1] Huang X,Ye Y,Xiong L,et al.Time series k­means:A new k­means type smooth subspace clustering for time series data.Information Sciences,2016,367-368:1-13.
[2] Yin X,Chen S,Hu E.Regularized soft k­means for discriminant analysis.Neurocomputing,2013,103:29-42.
[3] Ehsan E,Rene V.Sparse subspace clustering:Algorithm,theory,and applications.IEEE Transaction on Pattern Analysis and Machine Intelligence,2013,35(11):2765-2781.
[4] Li B,Lu C,Wen Z,et al.Locality­constrained nonnegative robust shape interaction subspace clustering and its applications.Digital Signal Processing,2017,60:113-121.
[5] Xu J,Xu K,Chen K,et al.Reweighted sparse subspace.Computer Vision and Image Understanding,2015,138:25-37.
[6] Jing L,Michael K Ng,Huang J Z.An entropy weighting k­means algorithm for subspace clustering of high­dimensional sparse data.IEEE Transactions on Knowledge and Data Engineering,2007,19(8):1026-1041.
[7] Deng Z,Choi Kup­Sze,Jiang Y,et al.A survey on soft subspace clustering.Information Sciences,2016,348:84-106.
[8] Vidal R.Subspace clustering.IEEE Signal Process Magazine,2011,28(2):1129-1139.
[9] Liu G,Lin Z,Yan S,et al.Robust recovery of subspace structures by low­rank representation.IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(1):171-184.
[10] Favaro P,Vidal R,Ravichandran A.A closed form solution for robust subspace estimation and clustering.In:The 24th IEEE Conference on Computer Vision and Pattern Recognition.Colorado Springs,USA:IEEE Press,2011:1801-1807.
[11] Amir A,Michael E,Yacov Hel­Or.Probabilistic subspace clustering via sparse representations.IEEE Signal Processing Letters,2013,20(1):63-66.
[12] Huang J Z,Michael K Ng,Rong H,et al.Automated variable weighting in k­means type clustering.IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(5):1-12.
[13] Chen L,Wang S,Wang K,et al.Soft subspace clustering categorical data with probabilistic distance.Pattern Recognition,2016,51:322-332. 
[14] Boongoen T,Shang C,Lam N,et al.Extending data reliability measure to a filter approach for soft subspace clustering.IEEE Transactions on Systems,Man,and Cybernetics - Part B:Cybernetics,2011,41(6):170541-1750564.
[15] Christos B,Malik Magdon­Ismail.Deterministic feature selection for k­means clustering.IEEE Transactions on Information Theory,2013,59(9):6099-6110.
[16] Gao J,Wang S T.Fuzzy clustering algorithm with ranking features and identifying noise simultaneously.Acta Automatica Sinca,2009,35(2):145-153.
[17] Chen Chien­Hsing.A hybrid intelligent model of analyzing clinical breast cancer data using clustering techniques with feature selection.Applied Soft Computing,2014,20:4-14.
[18] Boongoen T,Shen Q.Nearest­neighbor guided evaluation of data reliability and its applications.IEEE Transactions on System,Man,Cybernetics - Part B:Cybernetics,2010,40(6):1622-1633.
[19] Domeniconi C,Gunopulos D,Ma S,et al.Locally adaptive metrics for clustering high dimensional data.Data Mining Knowledge Discovery,2007,14(1):63-97.
[20] Deng Z,Choi K,Chung F,et al.Enhanced soft subspace clustering integrating within­cluster and between­cluster information.Pattern Recognition,2010,43(3):767-781.
[21] Bezdek J C,Hathaway R,Sobin M,et al.Convergence theory for fuzzy C­means:Counter examples and repairs.IEEE Transactions on Systems,Man,and Cybernetics,1987,17(5):873-877.
[22] Wang Q,Ye Y M,Huang J Z.Fuzzy k­means with variable weighting in high dimensional data analysis.In:Proceeding of the 9th International Conference on Web­Age Information Management.Zhangjiajie,China:IEEE,2008:365-372.
[23] Eschrich S,Ke J,Hall L O,et al.Fast accurate fuzzy clustering through data reduction.IEEE Transactions on Fuzzy System,2003,11:262-270.
[24] Ding C,Li T.Adaptive dimension reduction using discriminant analysis and k­means clustering.In:Proceedings of the 24th International Conference on Machine Larning(ICML2007).Corvalis,USA:ACM,2007:521-528.
[25] Dave R N,Sen S.Robust fuzzy clustering of relational data.IEEE Transactions on Fuzzy System,2002,10:713-727.
[26] Deng Z,Choi K,Chung F,et al.,Enhanced soft subspace clustering integrating within­cluster and between­cluster information.Pattern Recognition,2010,43(3):767-781.
[27] Tang W,Xiong H,Zhong S,et al.Enhancing semi­supervised clustering:A feature projection perspective.In:Proceedings of the Knowledge Discovery and Data Mining.San Jose,USA:Springer,2007:707-716.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!