南京大学学报(自然科学版) ›› 2023, Vol. 59 ›› Issue (2): 263–272.doi: 10.13232/j.cnki.jnju.2023.02.009

• • 上一篇    下一篇

基于伪标签回归和流形正则化的无监督特征选择算法

宋雨, 肖玉柱(), 宋学力()   

  1. 长安大学理学院,西安,710064
  • 收稿日期:2023-01-09 出版日期:2023-03-31 发布日期:2023-04-07
  • 通讯作者: 肖玉柱,宋学力 E-mail:yuzhuxiao@chd.edu.cn
  • 基金资助:
    长安大学中央高校基金科研业务费(310812163504)

An unsupervised feature selection algorithm based on pseudo⁃label regression and manifold regularization

Yu Song, Yuzhu Xiao(), Xueli Song()   

  1. School of Science,Chang'an University,Xi'an,710064,China
  • Received:2023-01-09 Online:2023-03-31 Published:2023-04-07
  • Contact: Yuzhu Xiao, Xueli Song E-mail:yuzhuxiao@chd.edu.cn

摘要:

无监督特征选择是无标签高维数据预处理过程中一种有效的数据降维技术,然而大多数无监督特征选择算法忽略了数据样本本身的类簇结构特性,选择具有低判别性信息的特征.基于此,提出一种基于伪标签回归和流形正则化的无监督特征选择算法.具体地,联合伪标签回归和最大化类间散度来保证算法在迭代过程中学习伪标签,同时,自适应学习数据样本之间的局部几何结构,获得更加精准的标签信息和结构信息,进而选择具有高判别性且能保持数据流形结构的特征.在四个公开数据集上的对比实验表明,提出算法的特征选择结果优于现有的一些无监督特征选择算法.

关键词: 无监督特征选择算法, 判别信息, 伪标签回归, 最大化类间散度, 流形正则化

Abstract:

Unsupervised feature selection is an effective dimensionality reduction technique in the processing of unlabeled high?dimensional data. However,most unsupervised feature selection algorithms ignore the peculiarity of cluster structure of data samples and select the features with low discriminant information. This paper proposes an unsupervised feature selection algorithm based on pseudo?label regression and manifold regularization. Specifically,it combines pseudo?label regression and maximizing the divergence between classes to ensure that the algorithm learns pseudo?labels in the iterative process. At the same time,it learns the local geometric structure between data samples adaptively to obtain more accurate label and structure information,and furtherly selects features with high discrimination and keep the manifold structure of data. Experimental results on four public datasets show that the proposed algorithm outperforms some existing unsupervised feature selection ones.

Key words: unsupervised feature selection algorithm, discriminant information, pseudo?label regression, inter?class divergence maximization, manifold regularization

中图分类号: 

  • TP391

表1

实验使用的四个公开数据集"

DatasetsNumber of samplesFeaturesClasses

Number of

selected features

dermatology3663468,10,,24
control60060616,20,,48
JAFFE2132561020,40,,180
ATT4040010244020,40,,180

图1

不同特征数时各算法的最佳聚类精确度"

图2

不同特征数时各算法的最佳归一化互信息"

表2

各算法在四个数据集上的最佳精确度"

MethodsdermatologycontrolJAFFEATT40
Ours0.9645±0.05400.9467±0.06910.8967±0.02230.6075±0.0230
AMRSR0.9399±0.07440.9517±0.05500.8826±0.01990.6000±0.0314
MRSR0.9044±0.02670.8533±0.05900.8498±0.02680.5825±0.0239
RNE0.7531±0.03460.7833±0.04570.7114±0.01520.4913±0.0246
MCFS0.8743±0.01490.7065±0.04420.7155±0.02120.5330±0.0285
JELSR0.7222±0.01870.6700±0.05120.7000±0.01760.4937±0.0154
EGCFS0.9563±0.03200.8667±0.04140.8779±0.01960.6075±0.0243
k⁃means0.6926±0.01220.6236±0.04460.6902±0.01820.5211±0.0145

表3

各算法在四个数据集上的最佳归一化互信息"

MethodsdermatologycontrolJAFFEATT40
Ours0.9379±0.02210.8938±0.03320.8908±0.01460.7867±0.0136
AMRSR0.9224±0.01630.8182±0.05000.8713±0.01530.7887±0.0222
MRSR0.9322±0.02990.7951±0.02540.7951±0.02210.7645±0.0177
RNE0.8488±0.04880.7488±0.02660.7798±0.04300.7200±0.0123
MCFS0.7344±0.01990.7774±0.02150.7643±0.04110.7504±0.0125
JELSR0.7815±0.05800.7815±0.01760.7793±0.03630.7265±0.0147
EGCFS0.9322±0.03140.7758±0.01060.8549±0.03890.7885±0.0160
k⁃means0.8298±0.02120.7002±0.01140.7416±0.01640.7457±0.0175

图3

PRMR算法在ATT40数据集上的两个效果示例图"

图4

三个正则化参数对PRMR算法的影响"

图5

PRMR算法在ATT40数据集上的收敛性"

1 蒋胜利. 高维数据的特征选择与特征提取研究. 博士学位论文. 西安:西安电子科技大学,2011.
Jiang S L. Research on feature selection and feature extraction of high?dimensional data. Ph.D. Dissertation. Xi'an:Xidian University,2011.
2 林书亮. 联合L2,1范数正则约束的特征选择方法. 科技与企业2013(24):383-384.
3 Liang S Q, Xu Q, Zhu P F,et al. Unsupervised feature selection by manifold regularized self?represen?tation∥Proceedings of 2017 IEEE International Conference on Image Processing. Beijing,China:IEEE,2017:2398-2402.
4 方威. 自适应图正则非负矩阵分解聚类算法的研究. 硕士学位论文. 扬州:扬州大学,2021.
Fang W. Research on clustering algorithm of adaptive graph regularized non?negative matrix factorization. Master Dissertation. Yangzhou:Yangzhou University,2021.
5 章永来,周耀鉴. 聚类算法综述. 计算机应用201939(7):1869-1882.
Zhang Y L, Zhou Y J. Review of clustering algorithms. Journal of Computer Applications201939(7):1869-1882.
6 杜世强. 基于维数约简的无监督聚类算法研究. 博士学位论文. 兰州:兰州大学,2017.
Du S Q. Unsupervised clustering algorithm based on dimension reduction. Ph.D. Dissertation. Lanzhou:Lanzhou University,2017.
7 汪志远. 无监督特征选择方法研究. 硕士学位论文. 太原:太原理工大学,2020.
Wang Z Y. Research on unsupervised feature selection. Master Dissertation. Taiyuan:Taiyuan University of Technology,2020.
8 Li Z C, Yang Y, Liu J,et al. Unsupervised feature selection using nonnegative spectral analysis∥Proceedings of the 26th AAAI Conference on Artificial Intelligence. Toronto,Canada:AAAI,201226(1):1026-1032.
9 Yang Y, Shen H T, Ma Z G,et al . L 2,1?norm regu?larized discriminative feature selection for unsupervised learning∥Proceedings of the 22nd International Joint Conference on Artificial Intelligence. Barcelona,Spain:AAAI,2011:1589-1594.
10 Cai D, Zhang C Y, He X F. Unsupervised feature selection for multi?cluster data∥Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington,DC,USA:ACM,2010:333-342.
11 Hou C P, Nie F P, Yi D Y,et al. Feature selection via joint embedding learning and sparse regression∥Proceedings of the 22nd International Joint Conference on Artificial Intelligence. Barcelona,Spain:AAAI,2011:1324-1329.
12 Tang C, Liu X W, Li M M,et al. Robust unsupervised feature selection via dual self?representation and manifold regularization. Knowledge?Based Systems2018(145):109-120.
13 Du L, Shen Y D. Unsupervised feature selection with adaptive structure learning∥Proceedings of the 21th ACM SIGKDD International Conference on Know?ledge Discovery and Data Mining. Sydney,Australia:ACM,2015:209-218.
14 Zhang R, Zhang Y X, Li X L. Unsupervised feature selection via adaptive graph learning and constraint. IEEE Transactions on Neural Networks and Learning Systems202233(3):1355-1362.
15 Zhu P F, Zhu W C, Hu Q H,et al. Subspace clus?tering guided unsupervised feature selection. Pattern Recognition2017(66):364-374.
16 盛超,宋鹏,郑文明,等. 基于子空间学习和伪标签回归的无监督特征选择. 信号处理202137(9):1701-1708.
Sheng C, Song P, Zheng W M,et al. Subspace learning and virtual label regression based unsupervised feature selection. Journal of Signal Processing202137(9):1701-1708.
17 周志华. 机器学习. 北京:清华大学出版社,2016:1-425.
18 Liu X W, Wang L, Zhang J,et al. Global and local structure preservation for feature selection. IEEE Transactions on Neural Networks and Learning Systems201425(6):1083-1095.
19 Nie F P, Huang H, Cai X,et al. Efficient and robust feature selection via joint L2,1?norms minimization∥Proceedings of the 23rd International Conference on Neural Information Processing Systems. Vancouver,Canada:Curran Associates Inc.,2010:1813-1821.
20 丛思安,王星星. K?means算法研究综述. 电子技术与软件工程2018(17):155-156.
[1] 时照群, 刘兆伟, 刘惊雷. 基于相关熵和流形正则化的图像聚类[J]. 南京大学学报(自然科学版), 2022, 58(3): 469-482.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!