基于多粒度一致性邻域的多标记特征选择

doi:10.13232/j.cnki.jnju.2022.01.007

南京大学学报(自然科学版) ›› 2022, Vol. 58 ›› Issue (1): 60–70.doi: 10.13232/j.cnki.jnju.2022.01.007

基于多粒度一致性邻域的多标记特征选择

卢舜¹^,², 林耀进¹^,²(), 吴镒潾¹^,², 包丰浩¹^,², 王晨曦¹^,²

^1.闽南师范大学计算机学院, 漳州, 363000
^2.福建省数据科学与智能应用高校重点实验室, 闽南师范大学, 漳州, 363000

收稿日期:2021-06-16 出版日期:2022-01-30 发布日期:2022-02-22
通讯作者: 林耀进 E-mail:zzlinyaojin@163.com
作者简介:E⁃mail：zzlinyaojin@163.com
基金资助:
国家自然科学基金(62076116);福建省自然科学基金(2021J02049)

Multi⁃label feature selection based on multi⁃granularity consistent neighborhood

Shun Lu¹^,², Yaojin Lin¹^,²(), Yilin Wu¹^,², Fenghao Bao¹^,², Chenxi Wang¹^,²

^1.School of Computer Science，Minnan Normal University，Zhangzhou，363000，China
^2.Key Laboratory of Data Science and Intelligence Application，Minnan Normal University，Zhangzhou，363000，China

Received:2021-06-16 Online:2022-01-30 Published:2022-02-22
Contact: Yaojin Lin E-mail:zzlinyaojin@163.com

摘要/Abstract

摘要：

多标记学习广泛应用于图像分类、疾病诊断等领域，然而特征的高维性给多标记分类算法带来时间负担、过拟合和性能低等问题.基于多粒度邻域一致性设计相应的多标记特征选择算法：首先利用标记空间和特征空间邻域一致性来粒化所有样本，并基于多粒度邻域一致性观点定义新的多标记邻域信息熵和多标记邻域互信息；其次，基于邻域互信息构建一个评价候选特征质量的目标函数用于评价每个特征的重要性；最后通过多个指标验证了所提算法的有效性.

关键词: 多标记学习, 特征选择, 多粒度, 邻域一致性

Abstract:

Multi?label learning is widely used in image classification，disease diagnosis and other fields. However，the high dimension of features brings time burden，over fitting and low performance to multi?label classification algorithms. In this paper，a multi?label feature selection algorithm is designed based on multi?granularity neighborhood consistency. Firstly，all samples are granulated by using the neighborhood consistency of label space and feature space. Moreover，new multi?label neighborhood information entropy and multi?label neighborhood mutual information are defined based on the view of multi?granularity neighborhood consistency. Secondly，an objective function is constructed to evaluate the quality of candidate features based on multi?label new neighborhood mutual information，which is used to evaluate the importance of each feature. The effectiveness of the proposed algorithm is verified by several measure criteria.

Key words: multi?label learning, feature selection, multi?granularity, neighborhood consistency

中图分类号:

TP181

卢舜, 林耀进, 吴镒潾, 包丰浩, 王晨曦. 基于多粒度一致性邻域的多标记特征选择[J]. 南京大学学报(自然科学版), 2022, 58(1): 60–70.

Shun Lu, Yaojin Lin, Yilin Wu, Fenghao Bao, Chenxi Wang. Multi⁃label feature selection based on multi⁃granularity consistent neighborhood[J]. Journal of Nanjing University(Natural Sciences), 2022, 58(1): 60–70.

图/表 18

图1

图2

表1

图3

图4

图5

图6

图7

图8

表2

表3

表4

表5

表6

表7

表8

表9

不同指标下的Friedman统计(k=6, N=6)"

评价指标	$F F$	临界值 $α = 0.10$
AP	8.2353	2.0922
HL	2.6911
CV	2.8358
RL	2.9046

表9

图9

参考文献 19

1	Boutell M R，Luo J B，Shen X P，et al. Learning multi?label scene classification. Pattern Recognition，2004，37(9)：1757-1771.
2	Zhang P，Liu G X，Gao W F. Distinguishing two types of labels for multi?label feature selection. Pattern Recognition，2019(95)：72-82.
3	Wold H. Estimation of principal components and related models by iterative least squares∥Krishnajah P R. Multivariate analysis. New York：Academic Press，1966：391-420.
4	Hotelling H. Relations between two sets of variates∥Kotz S，Johnson N L. Breakthroughs in statistics. Springer Berlin Heidelberg，1992：162-190.
5	Fukunaga K. Introduction to statistical pattern recognition. The 2^nd Edition. New York：Academic Press，1990，592.
6	Gharroudi O，Elghazel H，Aussem A. A comparison of multi?label feature selection methods using the random forest paradigm∥Canadian Conference on Artificial Intelligence. Springer Berlin Heidelberg，2014：95-106.
7	Gu Q Q，Li Z H，Han J W. Correlated multi?label feature selection∥Proceedings of the 20th ACM International Conference on Information and Knowledge Management. New York，NY，USA：ACM，2011：1087-1096.
8	Slavkov I，Karcheska J，Kocev D，et al. Relieff for hierarchical multi?label classification∥Proceedings of the 2^nd International Workshop on New Frontiers in Mining Complex Patterns. Springer Berlin Heidelberg，2013：148-161.
9	Zhang L J，Hu Q H，Duan J，et al. Multi?label feature selection with fuzzy rough sets∥Proceedings of the 9th International Conference on Rough Sets and Knowledge Technology. Springer Berlin Heidelberg，2014：121-128.
10	Ding C，Peng H C. Minimum redundancy feature selection from microarray gene expression data. Journal of Bioinformatics and Computational Biology，2005，3(2)：185-205.
11	Lee J，Kim D W. Mutual information?based multi?label feature selection using interaction information. Expert Systems with Applications，2015，42(4)：2013-2025.
12	Li Y W，Lin Y J，Liu J H，et al. Feature selection for multi?label learning based on kernelized fuzzy rough sets. Neurocomputing，2018(318)：271-286.
13	Lin Y J，Hu Q H，Liu J H，et al. Multi?label feature selection based on neighborhood mutual information. Applied Soft Computing，2016(38)：244-256.
14	Zhang M L，Pe?a J M，Robles V. Feature selection for multi?label naive bayes classification. Information Sciences，2009，179(19)：3218-3229.
15	Zhang Y，Zhou Z H. Multilabel dimensionality reduction via dependence maximization. ACM Transactions on Knowledge Discovery from Data，2010，4(3)：1-21.
16	Lee J，Kim D W. Feature selection for multi?label classification using multivariate mutual information. Pattern Recognition Letters，2013，34(3)：349-357.
17	Spola?r N，Cherman E A，Monard M C，et al. ReliefF for multi?label feature selection∥2013 Brazilian Conference on Intelligent Systems. Fortaleza，Brazil：IEEE，2013：6-11.
18	Friedman M. A comparison of alternative tests of significance for the problem of m rankings. The Annals of Mathematical Statistics，1940，11(1)：86-92.
19	Dunn O J. Multiple comparisons among means. Journal of the American statistical Association，1961，56(293)：52-64.

相关文章 15

[1]	李苓玉, 刘治平. 基于机器学习的自发性早产生物标记物发现[J]. 南京大学学报(自然科学版), 2021, 57(5): 767-774.
[2]	刘琼, 代建华, 陈姣龙. 区间值数据的代价敏感特征选择[J]. 南京大学学报(自然科学版), 2021, 57(1): 121-129.
[3]	郑文彬, 李进金, 张燕兰, 廖淑娇. 基于矩阵的多粒度粗糙集粒度约简方法[J]. 南京大学学报(自然科学版), 2021, 57(1): 141-149.
[4]	李佳佳, 丁伟, 王伯伟, 聂秀山, 崔超然. 基于随机森林的民俗体育对身体指标影响评估方法[J]. 南京大学学报(自然科学版), 2021, 57(1): 59-67.
[5]	任睿,张超,庞继芳. 有限理性下多粒度q⁃RO模糊粗糙集的最优粒度选择及其在并购对象选择中的应用[J]. 南京大学学报(自然科学版), 2020, 56(4): 452-460.
[6]	程玉胜,陈飞,庞淑芳. 标记倾向性的粗糙互信息k特征核选择[J]. 南京大学学报(自然科学版), 2020, 56(1): 19-29.
[7]	陈超逸,林耀进,唐莉,王晨曦. 基于邻域交互增益信息的多标记流特征选择算法[J]. 南京大学学报(自然科学版), 2020, 56(1): 30-40.
[8]	刘亮,何庆. 基于改进蝗虫优化算法的特征选择方法[J]. 南京大学学报(自然科学版), 2020, 56(1): 41-50.
[9]	刘　素, 刘惊雷. 基于特征选择的CP－nets结构学习[J]. 南京大学学报(自然科学版), 2019, 55(1): 14-28.
[10]	陈海娟，冯　翔，虞慧群. 基于预测算子的GSO特征选择算法[J]. 南京大学学报(自然科学版), 2018, 54(6): 1206-1215.
[11]	陈琳琳1*，陈德刚2. 一种基于核对齐的分类器链的多标记学习算法[J]. 南京大学学报(自然科学版), 2018, 54(4): 725-.
[12]	温　欣1，李德玉1，2*，王素格1，2. 一种基于邻域关系和模糊决策的特征选择方法[J]. 南京大学学报(自然科学版), 2018, 54(4): 733-.
[13]	靳义林1，2*，胡　峰1，2. 基于三支决策的中文文本分类算法研究[J]. 南京大学学报(自然科学版), 2018, 54(4): 794-.
[14]	王一宾1，2，程玉胜1，2*，裴根生1. 结合均值漂移的多示例多标记学习改进算法[J]. 南京大学学报(自然科学版), 2018, 54(2): 422-.
[15]	董利梅，赵　红*，杨文元. 基于稀疏聚类的无监督特征选择[J]. 南京大学学报(自然科学版), 2018, 54(1): 107-.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

数据集	样本数	特征数	类别数	训练样本数	测试样本数
Arts	5000	462	26	2000	3000
Computer	5000	681	33	2000	3000
Health	5000	612	32	2000	3000
Recreation	5000	606	22	2000	3000
Reference	5000	793	33	2000	3000
Scene	2407	294	6	1211	1196

算法	AP (↑)	HL (↓)	CV (↓)	RL (↓)
MLNB	0.4991	0.0612	5.5040	0.1542
MDDMspc	0.5072	0.0607	5.4740	0.1521
MDDMproj	0.4943	0.0612	5.5553	0.1555
PMU	0.4944	0.0615	5.4917	0.1527
RF?ML	0.4823	0.0627	5.4853	0.1540
MFSNC	0.5275	0.0597	5.2873	0.1452

算法	AP (↑)	HL (↓)	CV (↓)	RL (↓)
MLNB	0.6391	0.0401	4.3740	0.0910
MDDMspc	0.6345	0.0406	4.3987	0.0916
MDDMproj	0.6284	0.0406	4.4437	0.0934
PMU	0.6276	0.0413	4.5013	0.0941
RF?ML	0.6285	0.0421	5.4853	0.0931
MFSNC	0.6332	0.0392	4.3313	0.0896

算法	AP (↑)	HL (↓)	CV (↓)	RL (↓)
MLNB	0.6670	0.0442	3.5553	0.0681
MDDMspc	0.6585	0.0445	3.4973	0.0665
MDDMproj	0.6482	0.0458	3.6250	0.0699
PMU	0.6276	0.0443	3.4000	0.0636
RF?ML	0.6285	0.0465	3.4257	0.0643
MFSNC	0.7202	0.0398	3.1183	0.0567

算法	AP (↑)	HL (↓)	CV (↓)	RL (↓)
MLNB	0.4613	0.0604	5.1547	0.1936
MDDMspc	0.4738	0.0620	4.8987	0.1826
MDDMproj	0.4665	0.0616	4.9763	0.1872
PMU	0.4357	0.0634	5.1480	0.1957
RF?ML	0.4465	0.0630	5.0860	0.1917
MFSNC	0.5252	0.0584	4.8267	0.1775

基于多粒度一致性邻域的多标记特征选择

Multi⁃label feature selection based on multi⁃granularity consistent neighborhood

RichHTML

PDF (PC)

摘要/Abstract

引用本文

使用本文

图/表 18

参考文献 19

相关文章 15

Metrics

本文评价

推荐阅读 0

算法	AP (↑)	HL (↓)	CV (↓)	RL (↓)
MLNB	0.6234	0.0296	3.4313	0.0889
MDDMspc	0.6126	0.0322	3.4390	0.0888
MDDMproj	0.6106	0.0311	3.4460	0.0889
PMU	0.6169	0.0306	3.3660	0.0868
RF?ML	0.6151	0.0345	3.3270	0.0856
MFSNC	0.6414	0.0286	3.2760	0.0839

算法	AP (↑)	HL (↓)	CV (↓)	RL (↓)
MLNB	0.8351	0.0984	0.5936	0.0976
MDDMspc	0.8313	0.1028	0.6212	0.1036
MDDMproj	0.8383	0.1040	0.6003	0.0990
PMU	0.8277	0.1052	0.6355	0.1006
RF?ML	0.7933	0.1200	0.7575	0.1307
MFSNC	0.8431	0.0962	0.6028	0.0996

算法	AP	HL	CV	RL
MLNB	2.500	2.250	3.833	3.917
MDDMspc	3.000	3.583	3.333	3.333
MDDMproj	4.167	3.830	4.500	4.583
PMU	5.000	4.500	4.000	4.000
RF?ML	5.000	5.833	4.000	3.833
MFSNC	1.333	1.000	1.333	1.333