南京大学学报(自然科学版) ›› 2017, Vol. 53 ›› Issue (4): 775–.

• • 上一篇    下一篇

 联合依赖最大化与稀疏表示的无监督特征选择方法

 李 婵,杨文元*,赵 红   

  • 出版日期:2017-08-03 发布日期:2017-08-03
  • 作者简介: 闽南师范大学粒计算重点实验室,漳州,363000
  • 基金资助:
     基金项目:国家自然科学基金(61379049,61379089)
    收稿日期:2017-06-09
    *通讯联系人,E-mail:yangwy@xmu.edu.cn

 Unsupervised feature selection method via dependence maximization and sparse representation

 Li Chan,Yang Wenyuan*,Zhao Hong   

  • Online:2017-08-03 Published:2017-08-03
  • About author: Laboratory of Granular Computing,Minnan Normal University,Zhangzhou,363000,China

摘要:  高维数据分析任务中,无监督特征选择是一项重要并具有挑战性的任务.传统的无监督特征选择算法通过保持流形结构或者特征之间相关性进行特征选择,而没有直接考虑选择特征与原始数据的依赖程度.通过考虑投影后的低维空间数据与原始数据信息之间的依赖性,提出有良好性能的特征依赖于原始数据的度量原则.首先利用最大化依赖使投影后数据尽可能保持原始数据的特征信息,据此获得投影矩阵,从而对原始数据达到降维效果.然后联合稀疏表示进行特征选择.提出一种新的无监督特征选择算法,称之为联合依赖最大化与稀疏表示的无监督特征选择方法(DMSR).在4个实际的数据集上进行实验,并与3种已有的无监督特征选择算法进行比较,在两种评价指标聚类精度和互信息上的实验结果表明,提出的DMSR算法是有效的.

Abstract:  Unsupervised feature selection is an important and challenging task in high dimensional data analysis tasks.The traditional unsupervised feature selection algorithms make a feature selection by keeping manifold structure or the correlation in features,however,it fails to directly think about the dependence between the selected features and the original data.In contrast,we consider the dependence between original data and low dimensional spatial data after projection,and propose a measurement principle that the feature with good performance based on the original data in this paper.First,we make the projected data retain the characteristic information of the original data as much as possible by maximizing the dependence to count projection matrix.Thus the dimensionality reduction effect is achieved for the original data.Then,we combine sparse representation to make feature selection and put forward an unsupervised feature selection algorithm.The proposed algorithm is termed as unsupervised feature selection method via dependence maximization and sparse representation(DMSR).Finally,experiments are carried out on four public data sets to compare with three existing unsupervised feature selection algorithms.The experimental results on two evaluation indexes,which are clustering accuracy and mutual information show that the proposed DMSR algorithm is effective.

 [1] Belkin M,Niyogi P.Laplacian eigenmaps for dimensionality reduction and data representation.Neural Computation,2003,15(6):1373-1396.
[2] 曹冬寅,王 琼,张兴敢.基于稀疏重构残差和随机森林的集成分类算法.南京大学学报(自然科学),2016,52(6):1127-1132.(Cao D Y,Wang Q,Zhang X G.Ensemble classification method based on sparse reconstruction residuals and random forest.Journal of Nanjing University(Natural Sciences),2016,52(6):1127-1132.)
[3] Wang S P,Pedrycz W,Zhu Q X,et al.Subspace learning for unsupervised feature selection via matrix factorization.Pattern Recognition,2015,48(1):10-19.
[4] Chapelle O,Scholkopf B,Zien A.Semi-Supervised learning.IEEE Transactions on Neural Networks,2009,20(3):542.
[5] Lee Rodgers J,Nicewander W A.Thirteen ways to look at the correlation coefficient.The American Statistician,1988,42(1):59-66.
[6] Mitchell D,Bridge R.A test of Chargaff’s second rule.Biochemical and Biophysical Research Communications,2006,340(1):90-94.
[7] Cover T M,Thomas J A.Elements of information theory.The 2nd Edition.New York:Wiley,2006,792.
[8] Gretton A,Bousquet O,Smola A,et al.Measuring statistical dependence with Hilbert-Schmidt norms.In:Jain S,Simon H U,Tomita E.Algorithmic Learning Theory.Springer Berlin Heidelberg,2005:63-77.
[9] Gretton A,Fukumizu K,Teo C H,et al.A kernel statistical test of independence.In:Proceedings of the 20th International Conference on Neural Information Processing Systems.Vancouver,Canada:Curran Associates Inc.,2007:585-592.
[10] Zhang Y,Zhou Z H.Multilabel dimensionality reduction via dependence maximization.ACM Transactions on Knowledge Discovery from Data(TKDD),2010,4(3):14.
[11] Li Z C,Yang Y,Liu J,et al.Unsupervised feature selection using nonnegative spectral analysis.In:Proceedings of the 26th AAAI Conference on Artificial Intelligence.Toronto,Canada:AAAI Press,2012:1026-1032.
[12] Dy J G.Unsupervised feature selection.In:Liu H,Motoda H.Computational Methods of Feature Selection.Boca Raton,FL,USA:Chapman & Hall,CRC,2008:19-39.
[13] 谢娟英,屈亚楠,王明钊.基于密度峰值的无监督特征选择算法.南京大学学报(自然科学),2016,52(4):735-745.(Xie J Y,Qu Y N,Wang M Z.Unsupervised feature selection algorithms based on density peaks.Journal of Nanjing University(Natural Sciences),2016,52(4):735-745.)
[14] Cai D,Zhang C Y,He X F.Unsupervised feature selection for multi-cluster data.In:Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Washington D C,USA:ACM,2010:333-342.
[15] He X F,Cai D,Niyogi P.Laplacian score for feature selection.In:Proceedings of the 18th International Conference on Neural Information Processing Systems.Vancouver,Canada:MIT Press,2005:507-514.
[16] Zhao Z,Liu H.Spectral feature selection for supervised and unsupervised learning.In:Proceedings of the 24th International Conference on Machine Learning.Corvalis,OR,USA:ACM,2007:1151-1157.
[17] Zhu P F,Zuo W M,Zhang L,et al.Unsupervised feature selection by regularized self-representation.Pattern Recognition,2015,48(2):438-446.
[18] Song L,Smola A,Gretton A,et al.A dependence maximization view of clustering.In:Proceedings of the 24th International Conference on Machine Learning.Corvalis,OR,USA:ACM,2007:815-822.
[19] Nie F P,Huang H,Cai X,et al.Efficient and robust feature selection via joint l2,1-norms minimization.In:Proceedings of the 23rd International Conference on Neural Information Processing Systems.Vancouver,Canada:Curran Associates Inc.,2010:1813-1821.
[20] Hou C P,Nie F P,Li X L,et al.Joint embedding learning and sparse regression:A framework for unsupervised feature selection.IEEE Transactions on Cybernetics,2014,44(6):793-804.
[21] Feature Selection Datasets.http://featureselection.asu.edu/old/datasets.php.
[22] Publications & Codes.http://www.escience.cn/people/fpnie/papers.html.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!