Feature selection is designed to select the relevant feature subset f rom the original features,which can facilitate data clustering,classification and retrieval.The most existing unsupervised feature selection algorithms establish a mathematical model by casting high-dimensional data into low-dimensional space.The scores for each feature are computed independently to select the top-ranked features.In this paper,we propose an unsupervised feature selection via sparse representation clustering.Firstly,the data of high-dimensional space is mapped into low-dimensional space by the Laplacian eigenmaps of manifold learning.Specifically,we construct a nearest neighbor graph with the number of samples.The low-dimensional space is found by embedding the graph,and the manifold structure of the original dataset is maintained.Secondly,we obtain the “flat” embedded matrix,which measures the importance of each feature and differentiates the contribution of each feature for each cluster.We can construct an objective function based on the low-dimensional space to fit high-dimensional space.The Least Angel Regression algorithm can be used to solve the optimization regression problem.We perform L1-norm sparse regression to accurately estimate the importance of features instead of evaluating the contribution of each feature,respectively.We can achieve the top-ranked features according to their finals-cores.Experimental results on six real-life datasets show that the proposed algorithm achieves good experimental results in clustering and mutual information.It can effectively select the important features and has good performance in the dimension reduction.In addition,the proposed algorithm is superior to several typical feature selection algorithms in the experimental process.
Dong Limei,Zhao Hong*,Yang Wenyuan.
Unsupervised feature selection via sparse representation clustering[J]. Journal of Nanjing University(Natural Sciences), 2018, 54(1): 107
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] Dy J G,Brodley C E,Kak A,et al.Unsupervised feature selection applied to content-based retrieval of lung images.IEEE Transactions on Pattern Analysis and Machine Intelligence,2003,25(3):373-378. [2] Tang J L,Liu H.Unsupervised feature selection for linked social media data ∥ Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York,NY,USA:ACM,2012:904-912. [3] Boutemedjet S,Bouguila N,Ziou D.A hybrid feature extraction selection approach for high-dimensional non-Gaussian data clustering.IEEE Transactions on Pattern Analysis and Machine Intelligence,2009,31(8):1429-1443. [4] Boutsidis C,Mahoney M W,Drineas P.Unsupervised feature selection for principal components analysis ∥ Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York,NY,USA:ACM,2008:61-69. [5] Guyon I,Elisseeff A.An introduction to variable and feature selection.Journal of Machine Learning Research,2002,3(6):1157-1182. [6] Zhao Z,Wang L,Liu H.Efficient spectral feature selection with minimum redundancy ∥ Proceedings of the 24th AAAI Conference on Artificial Intelligence.Atlanta,GA,USA:AAAI Press,2010:673-678. [7] He X F,Cai D,Niyogi P.Laplacian score for feature selection ∥ Proceedings of the 18th International Conference on Neural Information Processing Systems.Vancouver,Canada:MIT Press,2005:507-514. [8] Roweis S T,Saul L K.Nonlinear dimensionality reduction by locally linear embedding.Science,2000,290(5500):2323-2326. [9] Hong Y,Kwong S,Chang Y C,et al.Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm.Pattern Recognition,2008,41(9):2742-2756. [10] Tenenbaum J B,de Silva V,Langford J C.A global geometric framework for nonlinear dimensionality reduction.Science,2000,290(5500):2319-2323. [11] Belkin M,Niyogi P.Laplacian eigenmaps for dimensionality reduction and data representation.Neural Computation,2003,15(6):1373-1396. [12] Belkin M,Niyogi P.Laplacian eigenmaps and spectral techniques for embedding and clustering ∥ Proceedings of the 14th International Conference on Neural Information Processing Systems:Natural and Synthetic.Vancouver,Canada:MIT Press,2002:585-591. [13] Li Z C,Yang Y,Liu J,et al.Unsupervised feature selection using nonnegative spectral analysis ∥ Proceedings of the 26th AAAI Conference on Artificial Intelligence.Toronto,Canada:AAAI Press,2012:1026-1032. [14] Nie F P,Huang H,Cai X,et al.Efficient and robust feature selection via joint L2,1-norms minimization ∥ Proceedings of the 23rd International Conference on Neural Information Processing Systems.Vancouver,Canada:Curran Associates Inc.,2010:1813-1821. [15] Vapnik,Vladimir N.The nature of statistical learning theory.IEEE Transactions on Neural Networks,1997,38(4):409-409. [16] Keller J M,Gray M R,Givens J A.A fuzzy k-nearest neighbor algorithm.IEEE Transactions on Systems,Man,and Cybernetics,1985,15(4):580-585. [17] Jeribi A.Spectral theory and applications of linear operators and block operator matrices.Springer New York,2015. [18] Cai D,Zhang C Y,He X F.Unsupervised feature selection for multi-cluster data ∥ Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Washington D C,USA:ACM,2010:333-342. [19] Yang Y,Shen H T,Ma Z G,et al.L2,1-norm regularized discriminative feature selection for unsupervised learning ∥ Proceedings of the 22nd International Joint Conference on Artificial Intelligence.Barcelona,Spain:AAAI Press,2011:1589-1594.