|本期目录/Table of Contents|

[1]董利梅,赵 红*,杨文元. 基于稀疏聚类的无监督特征选择[J].南京大学学报(自然科学),2018,54(1):107.[doi:10.13232/j.cnki.jnju.2018.01.012]
 Dong Limei,Zhao Hong*,Yang Wenyuan. Unsupervised feature selection via sparse representation clustering[J].Journal of Nanjing University(Natural Sciences),2018,54(1):107.[doi:10.13232/j.cnki.jnju.2018.01.012]
点击复制

 基于稀疏聚类的无监督特征选择()
     

《南京大学学报(自然科学)》[ISSN:0469-5097/CN:32-1169/N]

卷:
54
期数:
2018年第1期
页码:
107
栏目:
出版日期:
2018-02-01

文章信息/Info

Title:
 Unsupervised feature selection via sparse representation clustering
作者:
 董利梅赵 红*杨文元
闽南师范大学粒计算及其应用重点实验室,漳州,363000
Author(s):
 Dong LimeiZhao Hong*Yang Wenyuan
Laboratory of Granular Computing,Minnan Normal University,Zhangzhou,363000,China
关键词:
无监督特征选择流形学习特征映射稀疏回归
Keywords:
unsupervised feature selectionmanifold learningLaplacian eigenmapssparse regression
分类号:
TP311
DOI:
10.13232/j.cnki.jnju.2018.01.012
文献标志码:
A
摘要:
特征选择是从特征集合中选择相关特征子集,方便数据聚类、分类和检索等.现有的无监督特征选择算法是将高维数据映射到低维空间并计算每个特征的得分,选择排名靠前的特征.提出一种基于稀疏聚类的无监督特征选择算法:首先利用流形学习的特征映射思想将高维空间的数据映射到低维空间中,用样本构造近邻图,通过图的嵌入找到低维空间,降维后的空间能保持原始数据集的流形结构.其次,得到的样本嵌入矩阵表示特征的重要性,是区分特征对每一个聚类簇的贡献大小的指标,利用低维空间对高维空间的拟合,构造一个目标函数.最后,目标函数本质是回归问题,求解回归优化问题常用最小角回归算法,使用L1范数进行稀疏回归计算每个特征的得分,选出得分靠前的特征.在六个现实数据集上的实验结果表明:该算法在聚类精度和互信息上取得了较好的实验结果,能有效地选出重要特征,在降维方面具有良好性能,优于其他对比算法.
Abstract:
Feature selection is designed to select the relevant feature subset f rom the original features,which can facilitate data clustering,classification and retrieval.The most existing unsupervised feature selection algorithms establish a mathematical model by casting high-dimensional data into low-dimensional space.The scores for each feature are computed independently to select the top-ranked features.In this paper,we propose an unsupervised feature selection via sparse representation clustering.Firstly,the data of high-dimensional space is mapped into low-dimensional space by the Laplacian eigenmaps of manifold learning.Specifically,we construct a nearest neighbor graph with the number of samples.The low-dimensional space is found by embedding the graph,and the manifold structure of the original dataset is maintained.Secondly,we obtain the “flat” embedded matrix,which measures the importance of each feature and differentiates the contribution of each feature for each cluster.We can construct an objective function based on the low-dimensional space to fit high-dimensional space.The Least Angel Regression algorithm can be used to solve the optimization regression problem.We perform L1-norm sparse regression to accurately estimate the importance of features instead of evaluating the contribution of each feature,respectively.We can achieve the top-ranked features according to their finals-cores.Experimental results on six real-life datasets show that the proposed algorithm achieves good experimental results in clustering and mutual information.It can effectively select the important features and has good performance in the dimension reduction.In addition,the proposed algorithm is superior to several typical feature selection algorithms in the experimental process.

参考文献/References:

[1] Dy J G,Brodley C E,Kak A,et al.Unsupervised feature selection applied to content-based retrieval of lung images.IEEE Transactions on Pattern Analysis and Machine Intelligence,2003,25(3):373-378.
[2] Tang J L,Liu H.Unsupervised feature selection for linked social media data ∥ Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York,NY,USA:ACM,2012:904-912.
[3] Boutemedjet S,Bouguila N,Ziou D.A hybrid feature extraction selection approach for high-dimensional non-Gaussian data clustering.IEEE Transactions on Pattern Analysis and Machine Intelligence,2009,31(8):1429-1443.
[4] Boutsidis C,Mahoney M W,Drineas P.Unsupervised feature selection for principal components analysis ∥ Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York,NY,USA:ACM,2008:61-69.
[5] Guyon I,Elisseeff A.An introduction to variable and feature selection.Journal of Machine Learning Research,2002,3(6):1157-1182.
[6] Zhao Z,Wang L,Liu H.Efficient spectral feature selection with minimum redundancy ∥ Proceedings of the 24th AAAI Conference on Artificial Intelligence.Atlanta,GA,USA:AAAI Press,2010:673-678.
[7] He X F,Cai D,Niyogi P.Laplacian score for feature selection ∥ Proceedings of the 18th International Conference on Neural Information Processing Systems.Vancouver,Canada:MIT Press,2005:507-514.
[8] Roweis S T,Saul L K.Nonlinear dimensionality reduction by locally linear embedding.Science,2000,290(5500):2323-2326.
[9] Hong Y,Kwong S,Chang Y C,et al.Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm.Pattern Recognition,2008,41(9):2742-2756.
[10] Tenenbaum J B,de Silva V,Langford J C.A global geometric framework for nonlinear dimensionality reduction.Science,2000,290(5500):2319-2323.
[11] Belkin M,Niyogi P.Laplacian eigenmaps for dimensionality reduction and data representation.Neural Computation,2003,15(6):1373-1396.
[12] Belkin M,Niyogi P.Laplacian eigenmaps and spectral techniques for embedding and clustering ∥ Proceedings of the 14th International Conference on Neural Information Processing Systems:Natural and Synthetic.Vancouver,Canada:MIT Press,2002:585-591.
[13] Li Z C,Yang Y,Liu J,et al.Unsupervised feature selection using nonnegative spectral analysis ∥ Proceedings of the 26th AAAI Conference on Artificial Intelligence.Toronto,Canada:AAAI Press,2012:1026-1032.
[14] Nie F P,Huang H,Cai X,et al.Efficient and robust feature selection via joint L2,1-norms minimization ∥ Proceedings of the 23rd International Conference on Neural Information Processing Systems.Vancouver,Canada:Curran Associates Inc.,2010:1813-1821.
[15] Vapnik,Vladimir N.The nature of statistical learning theory.IEEE Transactions on Neural Networks,1997,38(4):409-409.
[16] Keller J M,Gray M R,Givens J A.A fuzzy k-nearest neighbor algorithm.IEEE Transactions on Systems,Man,and Cybernetics,1985,15(4):580-585.
[17] Jeribi A.Spectral theory and applications of linear operators and block operator matrices.Springer New York,2015.
[18] Cai D,Zhang C Y,He X F.Unsupervised feature selection for multi-cluster data ∥ Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Washington D C,USA:ACM,2010:333-342.
[19] Yang Y,Shen H T,Ma Z G,et al.L2,1-norm regularized discriminative feature selection for unsupervised learning ∥ Proceedings of the 22nd International Joint Conference on Artificial Intelligence.Barcelona,Spain:AAAI Press,2011:1589-1594.

相似文献/References:

备注/Memo

备注/Memo:
 基金项目:国家自然科学基金(61703196)
收稿日期:2017-12-21
*通讯联系人,E-mail:hongzhaocn@163.com
更新日期/Last Update: 2018-01-31