南京大学学报(自然科学版) ›› 2022, Vol. 58 ›› Issue (1): 60–70.doi: 10.13232/j.cnki.jnju.2022.01.007

• • 上一篇    下一篇

基于多粒度一致性邻域的多标记特征选择

卢舜1,2, 林耀进1,2(), 吴镒潾1,2, 包丰浩1,2, 王晨曦1,2   

  1. 1.闽南师范大学计算机学院, 漳州, 363000
    2.福建省数据科学与智能应用高校重点实验室, 闽南师范大学, 漳州, 363000
  • 收稿日期:2021-06-16 出版日期:2022-01-30 发布日期:2022-02-22
  • 通讯作者: 林耀进 E-mail:zzlinyaojin@163.com
  • 作者简介:E⁃mail:zzlinyaojin@163.com
  • 基金资助:
    国家自然科学基金(62076116);福建省自然科学基金(2021J02049)

Multi⁃label feature selection based on multi⁃granularity consistent neighborhood

Shun Lu1,2, Yaojin Lin1,2(), Yilin Wu1,2, Fenghao Bao1,2, Chenxi Wang1,2   

  1. 1.School of Computer Science,Minnan Normal University,Zhangzhou,363000,China
    2.Key Laboratory of Data Science and Intelligence Application,Minnan Normal University,Zhangzhou,363000,China
  • Received:2021-06-16 Online:2022-01-30 Published:2022-02-22
  • Contact: Yaojin Lin E-mail:zzlinyaojin@163.com

摘要:

多标记学习广泛应用于图像分类、疾病诊断等领域,然而特征的高维性给多标记分类算法带来时间负担、过拟合和性能低等问题.基于多粒度邻域一致性设计相应的多标记特征选择算法:首先利用标记空间和特征空间邻域一致性来粒化所有样本,并基于多粒度邻域一致性观点定义新的多标记邻域信息熵和多标记邻域互信息;其次,基于邻域互信息构建一个评价候选特征质量的目标函数用于评价每个特征的重要性;最后通过多个指标验证了所提算法的有效性.

关键词: 多标记学习, 特征选择, 多粒度, 邻域一致性

Abstract:

Multi?label learning is widely used in image classification,disease diagnosis and other fields. However,the high dimension of features brings time burden,over fitting and low performance to multi?label classification algorithms. In this paper,a multi?label feature selection algorithm is designed based on multi?granularity neighborhood consistency. Firstly,all samples are granulated by using the neighborhood consistency of label space and feature space. Moreover,new multi?label neighborhood information entropy and multi?label neighborhood mutual information are defined based on the view of multi?granularity neighborhood consistency. Secondly,an objective function is constructed to evaluate the quality of candidate features based on multi?label new neighborhood mutual information,which is used to evaluate the importance of each feature. The effectiveness of the proposed algorithm is verified by several measure criteria.

Key words: multi?label learning, feature selection, multi?granularity, neighborhood consistency

中图分类号: 

  • TP181

图1

多标记图片示例"

图2

x的间隔"

表1

实验使用的数据集的描述"

数据集样本数特征数类别数

训练

样本数

测试

样本数

Arts50004622620003000
Computer50006813320003000
Health50006123220003000
Recreation50006062220003000
Reference50007933320003000
Scene2407294612111196

图3

Arts数据集上MFSNC与对比算法在四个评价指标上的实验结果"

图4

Computer数据集上MFSNC与对比算法在四个评价指标上的实验结果"

图5

Health数据集上MFSNC与对比算法在四个评价指标上的实验结果"

图6

Recreation数据集上MFSNC与对比算法在四个评价指标上的实验结果"

图7

Reference数据集上MFSNC与对比算法在四个评价指标上的实验结果"

图8

Scene数据集上MFSNC与对比算法在四个评价指标上的实验结果"

表2

Arts数据集上六种特征选择算法的比较评估"

算法AP (↑)HL (↓)CV (↓)RL (↓)
MLNB0.49910.06125.50400.1542
MDDMspc0.50720.06075.47400.1521
MDDMproj0.49430.06125.55530.1555
PMU0.49440.06155.49170.1527
RF?ML0.48230.06275.48530.1540
MFSNC0.52750.05975.28730.1452

表3

Computer数据集上六种特征选择算法的比较评估"

算法AP (↑)HL (↓)CV (↓)RL (↓)
MLNB0.63910.04014.37400.0910
MDDMspc0.63450.04064.39870.0916
MDDMproj0.62840.04064.44370.0934
PMU0.62760.04134.50130.0941
RF?ML0.62850.04215.48530.0931
MFSNC0.63320.03924.33130.0896

表4

Health数据集上六种特征选择算法的比较评估"

算法AP (↑)HL (↓)CV (↓)RL (↓)
MLNB0.66700.04423.55530.0681
MDDMspc0.65850.04453.49730.0665
MDDMproj0.64820.04583.62500.0699
PMU0.62760.04433.40000.0636
RF?ML0.62850.04653.42570.0643
MFSNC0.72020.03983.11830.0567

表5

Recreation数据集上六种特征选择算法的比较评估"

算法AP (↑)HL (↓)CV (↓)RL (↓)
MLNB0.46130.06045.15470.1936
MDDMspc0.47380.06204.89870.1826
MDDMproj0.46650.06164.97630.1872
PMU0.43570.06345.14800.1957
RF?ML0.44650.06305.08600.1917
MFSNC0.52520.05844.82670.1775

表6

Reference数据集上六种特征选择算法的比较评估"

算法AP (↑)HL (↓)CV (↓)RL (↓)
MLNB0.62340.02963.43130.0889
MDDMspc0.61260.03223.43900.0888
MDDMproj0.61060.03113.44600.0889
PMU0.61690.03063.36600.0868
RF?ML0.61510.03453.32700.0856
MFSNC0.64140.02863.27600.0839

表7

Scene数据集上六种特征选择算法的比较评估"

算法AP (↑)HL (↓)CV (↓)RL (↓)
MLNB0.83510.09840.59360.0976
MDDMspc0.83130.10280.62120.1036
MDDMproj0.83830.10400.60030.0990
PMU0.82770.10520.63550.1006
RF?ML0.79330.12000.75750.1307
MFSNC0.84310.09620.60280.0996

表8

在四个评价准则下不同算法的平均排序值"

算法APHLCVRL
MLNB2.5002.2503.8333.917
MDDMspc3.0003.5833.3333.333
MDDMproj4.1673.8304.5004.583
PMU5.0004.5004.0004.000
RF?ML5.0005.8334.0003.833
MFSNC1.3331.0001.3331.333

表9

不同指标下的Friedman统计(k=6, N=6)"

评价指标FF临界值α=0.10
AP8.23532.0922
HL2.6911
CV2.8358
RL2.9046

图9

通过Bonferroni?Dunn测试比较MFSNC与其他算法的性能差异"

1 Boutell M R,Luo J B,Shen X P,et al. Learning multi?label scene classification. Pattern Recognition,2004,37(9):1757-1771.
2 Zhang P,Liu G X,Gao W F. Distinguishing two types of labels for multi?label feature selection. Pattern Recognition,2019(95):72-82.
3 Wold H. Estimation of principal components and related models by iterative least squares∥Krishnajah P R. Multivariate analysis. New York:Academic Press,1966:391-420.
4 Hotelling H. Relations between two sets of variates∥Kotz S,Johnson N L. Breakthroughs in statistics. Springer Berlin Heidelberg,1992:162-190.
5 Fukunaga K. Introduction to statistical pattern recognition. The 2nd Edition. New York:Academic Press,1990,592.
6 Gharroudi O,Elghazel H,Aussem A. A comparison of multi?label feature selection methods using the random forest paradigm∥Canadian Conference on Artificial Intelligence. Springer Berlin Heidelberg,2014:95-106.
7 Gu Q Q,Li Z H,Han J W. Correlated multi?label feature selection∥Proceedings of the 20th ACM International Conference on Information and Knowledge Management. New York,NY,USA:ACM,2011:1087-1096.
8 Slavkov I,Karcheska J,Kocev D,et al. Relieff for hierarchical multi?label classification∥Proceedings of the 2nd International Workshop on New Frontiers in Mining Complex Patterns. Springer Berlin Heidelberg,2013:148-161.
9 Zhang L J,Hu Q H,Duan J,et al. Multi?label feature selection with fuzzy rough sets∥Proceedings of the 9th International Conference on Rough Sets and Knowledge Technology. Springer Berlin Heidelberg,2014:121-128.
10 Ding C,Peng H C. Minimum redundancy feature selection from microarray gene expression data. Journal of Bioinformatics and Computational Biology,2005,3(2):185-205.
11 Lee J,Kim D W. Mutual information?based multi?label feature selection using interaction information. Expert Systems with Applications,2015,42(4):2013-2025.
12 Li Y W,Lin Y J,Liu J H,et al. Feature selection for multi?label learning based on kernelized fuzzy rough sets. Neurocomputing,2018(318):271-286.
13 Lin Y J,Hu Q H,Liu J H,et al. Multi?label feature selection based on neighborhood mutual information. Applied Soft Computing,2016(38):244-256.
14 Zhang M L,Pe?a J M,Robles V. Feature selection for multi?label naive bayes classification. Information Sciences,2009,179(19):3218-3229.
15 Zhang Y,Zhou Z H. Multilabel dimensionality reduction via dependence maximization. ACM Transactions on Knowledge Discovery from Data,2010,4(3):1-21.
16 Lee J,Kim D W. Feature selection for multi?label classification using multivariate mutual information. Pattern Recognition Letters,2013,34(3):349-357.
17 Spola?r N,Cherman E A,Monard M C,et al. ReliefF for multi?label feature selection∥2013 Brazilian Conference on Intelligent Systems. Fortaleza,Brazil:IEEE,2013:6-11.
18 Friedman M. A comparison of alternative tests of significance for the problem of m rankings. The Annals of Mathematical Statistics,1940,11(1):86-92.
19 Dunn O J. Multiple comparisons among means. Journal of the American statistical Association,1961,56(293):52-64.
[1] 李苓玉, 刘治平. 基于机器学习的自发性早产生物标记物发现[J]. 南京大学学报(自然科学版), 2021, 57(5): 767-774.
[2] 刘琼, 代建华, 陈姣龙. 区间值数据的代价敏感特征选择[J]. 南京大学学报(自然科学版), 2021, 57(1): 121-129.
[3] 郑文彬, 李进金, 张燕兰, 廖淑娇. 基于矩阵的多粒度粗糙集粒度约简方法[J]. 南京大学学报(自然科学版), 2021, 57(1): 141-149.
[4] 李佳佳, 丁伟, 王伯伟, 聂秀山, 崔超然. 基于随机森林的民俗体育对身体指标影响评估方法[J]. 南京大学学报(自然科学版), 2021, 57(1): 59-67.
[5] 任睿,张超,庞继芳. 有限理性下多粒度q⁃RO模糊粗糙集的最优粒度选择及其在并购对象选择中的应用[J]. 南京大学学报(自然科学版), 2020, 56(4): 452-460.
[6] 程玉胜,陈飞,庞淑芳. 标记倾向性的粗糙互信息k特征核选择[J]. 南京大学学报(自然科学版), 2020, 56(1): 19-29.
[7] 陈超逸,林耀进,唐莉,王晨曦. 基于邻域交互增益信息的多标记流特征选择算法[J]. 南京大学学报(自然科学版), 2020, 56(1): 30-40.
[8] 刘亮,何庆. 基于改进蝗虫优化算法的特征选择方法[J]. 南京大学学报(自然科学版), 2020, 56(1): 41-50.
[9] 刘 素, 刘惊雷. 基于特征选择的CP-nets结构学习[J]. 南京大学学报(自然科学版), 2019, 55(1): 14-28.
[10] 陈海娟,冯 翔,虞慧群. 基于预测算子的GSO特征选择算法[J]. 南京大学学报(自然科学版), 2018, 54(6): 1206-1215.
[11] 陈琳琳1*,陈德刚2. 一种基于核对齐的分类器链的多标记学习算法[J]. 南京大学学报(自然科学版), 2018, 54(4): 725-.
[12] 温 欣1,李德玉1,2*,王素格1,2. 一种基于邻域关系和模糊决策的特征选择方法[J]. 南京大学学报(自然科学版), 2018, 54(4): 733-.
[13] 靳义林1,2*,胡 峰1,2. 基于三支决策的中文文本分类算法研究[J]. 南京大学学报(自然科学版), 2018, 54(4): 794-.
[14]  王一宾1,2,程玉胜1,2*,裴根生1.  结合均值漂移的多示例多标记学习改进算法[J]. 南京大学学报(自然科学版), 2018, 54(2): 422-.
[15]  董利梅,赵 红*,杨文元.  基于稀疏聚类的无监督特征选择[J]. 南京大学学报(自然科学版), 2018, 54(1): 107-.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!