南京大学学报(自然科学版) ›› 2018, Vol. 54 ›› Issue (1): 107.
董利梅,赵 红*,杨文元
Dong Limei,Zhao Hong*,Yang Wenyuan
摘要: 特征选择是从特征集合中选择相关特征子集,方便数据聚类、分类和检索等.现有的无监督特征选择算法是将高维数据映射到低维空间并计算每个特征的得分,选择排名靠前的特征.提出一种基于稀疏聚类的无监督特征选择算法:首先利用流形学习的特征映射思想将高维空间的数据映射到低维空间中,用样本构造近邻图,通过图的嵌入找到低维空间,降维后的空间能保持原始数据集的流形结构.其次,得到的样本嵌入矩阵表示特征的重要性,是区分特征对每一个聚类簇的贡献大小的指标,利用低维空间对高维空间的拟合,构造一个目标函数.最后,目标函数本质是回归问题,求解回归优化问题常用最小角回归算法,使用L1范数进行稀疏回归计算每个特征的得分,选出得分靠前的特征.在六个现实数据集上的实验结果表明:该算法在聚类精度和互信息上取得了较好的实验结果,能有效地选出重要特征,在降维方面具有良好性能,优于其他对比算法.
[1] Dy J G,Brodley C E,Kak A,et al.Unsupervised feature selection applied to content-based retrieval of lung images.IEEE Transactions on Pattern Analysis and Machine Intelligence,2003,25(3):373-378. [2] Tang J L,Liu H.Unsupervised feature selection for linked social media data ∥ Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York,NY,USA:ACM,2012:904-912. [3] Boutemedjet S,Bouguila N,Ziou D.A hybrid feature extraction selection approach for high-dimensional non-Gaussian data clustering.IEEE Transactions on Pattern Analysis and Machine Intelligence,2009,31(8):1429-1443. [4] Boutsidis C,Mahoney M W,Drineas P.Unsupervised feature selection for principal components analysis ∥ Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York,NY,USA:ACM,2008:61-69. [5] Guyon I,Elisseeff A.An introduction to variable and feature selection.Journal of Machine Learning Research,2002,3(6):1157-1182. [6] Zhao Z,Wang L,Liu H.Efficient spectral feature selection with minimum redundancy ∥ Proceedings of the 24th AAAI Conference on Artificial Intelligence.Atlanta,GA,USA:AAAI Press,2010:673-678. [7] He X F,Cai D,Niyogi P.Laplacian score for feature selection ∥ Proceedings of the 18th International Conference on Neural Information Processing Systems.Vancouver,Canada:MIT Press,2005:507-514. [8] Roweis S T,Saul L K.Nonlinear dimensionality reduction by locally linear embedding.Science,2000,290(5500):2323-2326. [9] Hong Y,Kwong S,Chang Y C,et al.Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm.Pattern Recognition,2008,41(9):2742-2756. [10] Tenenbaum J B,de Silva V,Langford J C.A global geometric framework for nonlinear dimensionality reduction.Science,2000,290(5500):2319-2323. [11] Belkin M,Niyogi P.Laplacian eigenmaps for dimensionality reduction and data representation.Neural Computation,2003,15(6):1373-1396. [12] Belkin M,Niyogi P.Laplacian eigenmaps and spectral techniques for embedding and clustering ∥ Proceedings of the 14th International Conference on Neural Information Processing Systems:Natural and Synthetic.Vancouver,Canada:MIT Press,2002:585-591. [13] Li Z C,Yang Y,Liu J,et al.Unsupervised feature selection using nonnegative spectral analysis ∥ Proceedings of the 26th AAAI Conference on Artificial Intelligence.Toronto,Canada:AAAI Press,2012:1026-1032. [14] Nie F P,Huang H,Cai X,et al.Efficient and robust feature selection via joint L2,1-norms minimization ∥ Proceedings of the 23rd International Conference on Neural Information Processing Systems.Vancouver,Canada:Curran Associates Inc.,2010:1813-1821. [15] Vapnik,Vladimir N.The nature of statistical learning theory.IEEE Transactions on Neural Networks,1997,38(4):409-409. [16] Keller J M,Gray M R,Givens J A.A fuzzy k-nearest neighbor algorithm.IEEE Transactions on Systems,Man,and Cybernetics,1985,15(4):580-585. [17] Jeribi A.Spectral theory and applications of linear operators and block operator matrices.Springer New York,2015. [18] Cai D,Zhang C Y,He X F.Unsupervised feature selection for multi-cluster data ∥ Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Washington D C,USA:ACM,2010:333-342. [19] Yang Y,Shen H T,Ma Z G,et al.L2,1-norm regularized discriminative feature selection for unsupervised learning ∥ Proceedings of the 22nd International Joint Conference on Artificial Intelligence.Barcelona,Spain:AAAI Press,2011:1589-1594. |
No related articles found! |
|