基于邻域交互增益信息的多标记流特征选择算法

doi:10.13232/j.cnki.jnju.2020.01.004

南京大学学报(自然科学版) ›› 2020, Vol. 56 ›› Issue (1): 30–40.doi: 10.13232/j.cnki.jnju.2020.01.004

基于邻域交互增益信息的多标记流特征选择算法

陈超逸¹,林耀进^1,²(),唐莉^1,²,王晨曦^1,²

1. 闽南师范大学计算机学院，漳州，363000
2. 数据科学与智能应用福建省教育厅重点实验室，漳州，363000

收稿日期:2019-08-14 出版日期:2020-01-30 发布日期:2020-01-10
通讯作者: 林耀进 E-mail:zzlinyaojin@163.com
基金资助:
国家自然科学基金(61672272);福建省自然科学基金(2018J01548);福建省教育厅科技项目(JT180318)

Streaming multi⁃label feature selection based on neighborhood interaction gain information

Chaoyi Chen¹,Yaojin Lin^1,²(),Li Tang^1,²,Chenxi Wang^1,²

1. School of Computer Science，Minnan Normal University，Zhangzhou，363000，China
2. Key Laboratory of;Data Science and Intelligence Application，Department of Education of Fujian Province，Zhangzhou，363000，China

Received:2019-08-14 Online:2020-01-30 Published:2020-01-10
Contact: Yaojin Lin E-mail:zzlinyaojin@163.com

摘要/Abstract

摘要：

现有的多标记特征选择一般假设特征空间是固定已知的，然而实际应用中很多特征是需要在提取过程中实时地进行筛选.为此，提出基于邻域交互增益信息的多标记在线流特征选择算法.首先，基于多标记邻域互信息和邻域交互增益信息提出在线相关性分析与在线冗余性分析两种策略来评价特征；其次，基于邻域交互增益信息构建了在线流多标记特征选择的目标优化函数；最后，在六个多标记数据集和四个评价指标上，实验结果证明了该算法的有效性和稳定性.

关键词: 在线流特征, 多标记学习, 邻域熵, 邻域交互增益信息

Abstract:

The existing multi?label feature selection methods generally assume that the feature space is fixed and known. However，a lot of features need to be filtered in real?time during the extraction in practical application. Therefore，a streaming multi?label feature selection based on neighborhood interaction gain information is proposed. Firstly，we propose online correlation analysis and online redundancy analysis to evaluate features based on multi?label neighborhood mutual information and neighborhood interaction gain information. Secondly，based on neighborhood interaction gain information，we construct an objective optimization function for streaming multi?label feature selection. Finally，experimental results on six multi?label datasets and four criteria demonstrate the effectiveness and stability of the algorithm.

Key words: online stream features, multi?label learning, neighborhood entropy, neighborhood interaction gain

中图分类号:

TP391

陈超逸,林耀进,唐莉,王晨曦. 基于邻域交互增益信息的多标记流特征选择算法[J]. 南京大学学报(自然科学版), 2020, 56(1): 30–40.

Chaoyi Chen,Yaojin Lin,Li Tang,Chenxi Wang. Streaming multi⁃label feature selection based on neighborhood interaction gain information[J]. Journal of Nanjing University(Natural Sciences), 2020, 56(1): 30–40.

图/表 14

表1

表2

表3

表4

表5

图1

图2

图3

图4

图5

表6

图6

图7

表7

参考文献 22

1	Boutell M R，Luo J B，Shen X P，et al. Learning multi?label scene classification. Pattern Recognition，2004，37(9)：1757-1771.
2	Lewis D D，Yang Y M，Rose T G，et al. RCV1：a new benchmark collection for text categorization research. The Journal of Machine Learning Research，2004，5(2)：361-397.
3	Elisseeff A，Weston J. A kernel method for multi?labelled classification∥Proceedings of the 14^th International Conference on Neural Information Processing Systems：Natural and Synthetic. Cambridge，MA，USA：MIT Press，2001.
4	Trohidis K，Tsoumakas G，Kalliris G，et al. Multi?label classification of music by emotion. EURASIP Journal on Audio，Speech，and Music Processing，2011，2011(1)：4.
5	段洁，胡清华，张灵均等. 基于邻域粗糙集的多标记分类特征选择算法. 计算机研究与发展，2015，52(1)：56-65.
	Duan J，Hu Q H，Zhang L J，et al. Feature selection for multi?label classification based on neighborhood rough set. Journal of Computer Research and Development，2015，52(1)：56-65.
6	Hotelling H. Relations between two sets of variates. Biometrika，1936，28(3-4)：321-377.
7	Zhang Y，Zhou Z H. Multilabel dimensionality reduction via dependence maximization. ACM Transactions on Knowledge Discovery from Data，2010，4(3)：14.
8	Yu K，Yu S P，Tresp V. Multi?label informed latent semantic indexing∥Proceedings of the 28^th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.Salvador，Brazil：ACM，2005：258-265.
9	许行，张凯，王文剑. 一种小样本数据的特征选择方法. 计算机研究与发展，2018，55(10)：2321-2330.
	Xu X，Zhang K，Wang W J.A feature selection method for small samples. Journal of Computer Research and Development，2018，55(10)：2321-2330.
10	Zhang L J，Hu Q H，Duan J，et al. Multi?label feature selection with fuzzy rough sets∥International Conference on Rough Sets and Knowledge Technology. Springer Berlin Heidelberg，2014：121-128.
11	Lin Y J，Hu Q H，Liu J H，et al. Multi?label feature selection based on neighborhood mutual information. Applied Soft Computing，2016，38：244-256.
12	Hu L，Gao W F，Zhao K，et al. Feature selection considering two types of feature relevancy and feature interdependency. Expert Systems with Applications，2018，93：423-434.
13	程玉胜，李雨，王一宾等. 动态滑动窗口加权互信息流特征选择. 南京大学学报(自然科学)，2018，54(5)：974-985.
	Cheng Y S，Li Y，Wang Y B，et al. Streaming feature selection with weighted fuzzy mutual information based on dynamic sliding window. Journal of Nanjing University (Natural Science)，2018，54(5)：974-985.
14	Lin Y J，Hu Q H，Liu J H，et al. Streaming feature selection for multilabel learning based on fuzzy mutual information. IEEE Transactions on Fuzzy Systems，2017，25(6)：1491-1507.
15	Liu J H，Lin Y J，Li Y W，et al. Online multi?label streaming feature selection based on neighborhood rough set. Pattern Recognition，2018，84：273-287.
16	Lin Y J，Hu Q H，Liu J H，et al. Multi?label fea?ture selection based on max?dependency and min?redundancy. Neurocomputing，2015，168：92-103.
17	Kwak N，Choi C H. Input feature selection for classification problems. IEEE Transactions on Neural Networks，2002，13(1)：143-159.
18	Zhang M L，Pe?a J M，Robles V. Feature selection for multi?label naive Bayes classification. Information Sciences，2009，179(19)：3218-3229.
19	Lee J，Kim D W. Feature selection for multi?label classification using multivariate mutual information. Pattern Recognition Letters，2013，34(3)：349-357.
20	Spola?r N，Cherman E A，Monard M C，et al. ReliefF for multi?label feature selection∥2013 Brazilian Conference on Intelligent Systems (BRACIS). Fortaleza，Brazil：IEEE，2013：6-11.
21	Friedman M. A comparison of alternative tests of significance for the problem of m rankings. The Annals of Mathematical Statistics，1940，11(1)：86-92.
22	Dunn O J. Multiple comparisons among means. Journal of the American Statistical Association，1961，56(293)：52-64.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

数据集	样本数	特征数	类别数	训练样本数	测试样本数
Arts(A)	5000	462	26	2000	3000
Birds(B)	645	260	19	322	323
Business(C)	5000	438	30	2000	3000
Education(D)	5000	550	33	2000	3000
Emotions(E)	593	72	6	391	202
Yeast(F)	2417	103	14	1499	918

数据集	MLNB	MDDM?spc	MDDM?proj	PMU	RF?ML	SMFS
平均值	0.6520	0.6316	0.6311	0.6397	0.6499	0.6707
A	0.4991	0.4735	0.4669	0.4917	0.4834	0.5319
B	0.5052	0.4818	0.4564	0.5082	0.5263	0.5199
C	0.8713	0.8639	0.8633	0.8698	0.8742	0.8762
D	0.5478	0.4441	0.4824	0.4798	0.5114	0.5557
E	0.7529	0.7772	0.7683	0.7399	0.7566	0.7871
F	0.7355	0.7488	0.7490	0.7488	0.7476	0.7533

数据集	MLNB	MDDM?spc	MDDM?proj	PMU	RF?ML	SMFS
平均值	0.1508	0.1548	0.1583	0.1540	0.1489	0.1405
A	0.1542	0.1631	0.1662	0.1584	0.1576	0.1418
B	0.2237	0.2441	0.2608	0.2042	0.2093	0.2160
C	0.0419	0.0456	0.0465	0.0439	0.0427	0.0410
D	0.0922	0.1138	0.1089	0.1099	0.1016	0.0923
E	0.2055	0.1825	0.1904	0.2301	0.2040	0.1749
F	0.1871	0.1797	0.1768	0.1774	0.1781	0.1768

数据集	MLNB	MDDM?spc	MDDM?proj	PMU	RF?ML	SMFS
平均值	0.1054	0.1006	0.1055	0.1053	0.1028	0.0983
A	0.0612	0.0621	0.0622	0.0607	0.0614	0.0593
B	0.0494	0.0521	0.0554	0.0486	0.0486	0.0500
C	0.0283	0.0286	0.0286	0.0280	0.0278	0.0274
D	0.0405	0.0446	0.0443	0.0442	0.0427	0.0409
E	0.2450	0.2153	0.2417	0.2475	0.2318	0.2137
F	0.2080	0.2010	0.2010	0.2025	0.2047	0.1983

数据集	MLNB	MDDM?spc	MDDM?proj	PMU	RF?ML	SMFS
平均值	0.3911	0.3550	0.3363	0.3720	0.3640	0.4364
A	0.1093	0.0565	0.0471	0.1279	0.0925	0.2345
B	0.1653	0.1489	0.0761	0.2359	0.1235	0.1769
C	0.6792	0.6679	0.6704	0.6798	0.6813	0.6927
D	0.2070	0.0041	0.0041	0.0023	0.0856	0.2314
E	0.5811	0.6319	0.5867	0.5690	0.5849	0.6597
F	0.6045	0.6208	0.6334	0.6171	0.6163	0.6232

基于邻域交互增益信息的多标记流特征选择算法

Streaming multi⁃label feature selection based on neighborhood interaction gain information

RichHTML

PDF (PC)

摘要/Abstract

引用本文

使用本文

图/表 14

参考文献 22

相关文章 6

Metrics

本文评价

推荐阅读 10

	AP		RL		HL
	SMFS	OM?NRS	SMFS	OM?NRS	SMFS	OM?NRS
平均值	0.6707	0.6562	0.1405	0.1408	0.0983	0.0989
A	0.5319	0.5217	0.1419	0.1440	0.0593	0.0606
B	0.5199	0.4842	0.2160	0.2190	0.0500	0.0518
C	0.8762	0.8760	0.0410	0.0411	0.0274	0.0274
D	0.5557	0.5398	0.0923	0.0912	0.0409	0.0408
E	0.7871	0.7608	0.1749	0.1765	0.2137	0.2104
F	0.7533	0.7545	0.1768	0.1732	0.1983	0.2021

[1]	程玉胜,陈飞,庞淑芳. 标记倾向性的粗糙互信息k特征核选择[J]. 南京大学学报(自然科学版), 2020, 56(1): 19-29.
[2]	陈琳琳1*，陈德刚2. 一种基于核对齐的分类器链的多标记学习算法[J]. 南京大学学报(自然科学版), 2018, 54(4): 725-.
[3]	王一宾1，2，程玉胜1，2*，裴根生1. 结合均值漂移的多示例多标记学习改进算法[J]. 南京大学学报(自然科学版), 2018, 54(2): 422-.
[4]	蔡亚萍，杨　明* . 一种利用局部标记相关性的多标记特征选择算法[J]. 南京大学学报(自然科学版), 2016, 52(4): 693-.
[5]	梁新彦^1，2，钱宇华^1，2*，郭　倩²，成红红^1，2. 面向多标记学习的局部粗糙集[J]. 南京大学学报(自然科学版), 2016, 52(2): 270-.
[6]	吕静,何志芬. 一种基于正则化最小二乘的多标记分类算法[J]. 南京大学学报(自然科学版), 2015, 51(1): 139-147.

评价指标	F_F	临界值（α=0.1）
AP	3.9821	2.0922
RL	2.2914
HL	2.5911
Mi?F1	2.4687