南京大学学报(自然科学版) ›› 2021, Vol. 57 ›› Issue (1): 121–129.doi: 10.13232/j.cnki.jnju.2021.01.013

• • 上一篇    

区间值数据的代价敏感特征选择

刘琼1,2, 代建华1,2(), 陈姣龙1,2   

  1. 1.智能计算与语言信息处理湖南省重点实验室,湖南师范大学,长沙,410081
    2.湖南师范大学信息科学与工程学院,长沙,410081
  • 收稿日期:2020-09-04 出版日期:2021-01-21 发布日期:2021-01-21
  • 通讯作者: 代建华 E-mail:jhdai@hunnu.edu.cn
  • 作者简介:E⁃mail:jhdai@hunnu.edu.cn
  • 基金资助:
    国家自然科学基金(61976089);湖南省科技计划(2018TP1018)

Cost⁃sensitive feature selection for interval⁃valued data

Qiong Liu1,2, Jianhua Dai1,2(), Jiaolong Chen1,2   

  1. 1.Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, Hunan Normal University,Changsha,410081, China
    2.College of Information Science and Engineering,Hunan Normal University,Changsha,410081,China
  • Received:2020-09-04 Online:2021-01-21 Published:2021-01-21
  • Contact: Jianhua Dai E-mail:jhdai@hunnu.edu.cn

摘要:

特征选择是区间值信息系统中数据分析的研究热点,但是目前针对区间值数据提出的特征选择很少考虑数据自身的测试代价和误分类代价.为了解决这一问题,首先利用邻域粗糙集给出了区间值邻域的概念,进而重新定义了基于区间值邻域的熵结构,其次构造了区间值系统下的代价敏感函数,最后提出基于代价敏感的区间值特征选择方法.通过实验对比,证实了该方法的合理性和有效性.

关键词: 特征选择, 区间值数据, 区间邻域粗糙集, 代价敏感

Abstract:

Feature selection is a research hotspot of data analysis for interval?valued information systems. However,most existing feature selection methods for interval?valued data neglect the test cost and misclassification cost in the data. In order to solve this problem,the concept of interval?valued neighborhood is defined by neighborhood rough set theory,and the entropy construct based on interval?valued neighborhood is constructed. Then,the cost sensitive function for interval?valued information system is designed. Finally,the feature selection approach based on the cost sensitive for interval?valued data is proposed. Through the comparative experiments,the rationality and effectiveness of the proposed approach is verified.

Key words: feature selection, interval?valued data, interval neighborhood rough set, cost sensitive

中图分类号: 

  • TP18

表1

区间决策系统一个实例"

a1a2a3d
x122.5,35.50,20.15,0.31
x219,320.6,3.20.15,0.241
x325.5,630.84,4.970.14,0.221
x419.2,310,14.70.09,0.172
x513.7,251,150.05,0.282
x613,20.50,0.40,02

表2

邻 域"

UEa1xEa2xEa3x
x1x1,x2,x3,x4,x5,x6x1,x2,x3x1,x6
x2x1,x2,x3,x4,x5,x6x1,x2,x3x2,x6
x3x1,x2,x3,x4x1,x2,x3x3,x6
x4x1,x2,x3,x4,x5,x6x4,x5x4,x6
x5x1,x2,x4,x5,x6x4,x5x5,x6
x6x1,x2,x4,x5,x6x6x1,x2,x3,x4,x5,x6

表 3

数据集介绍"

DataSetData typeSamplesAttributesClasses
Faceinterval1269
Fishinterval27134
Glassreal value31696
Librasreal value5009015
Lungreal value17871292
Sonarreal value198602
Spectfreal value208442
Waterinterval360492
Waveformreal value96403
Winereal value214133
Wpbcreal value267322

表 4

三种方法的平均总代价的比较"

TTCAMCATC
NFSRARCECSIVARANFSRARCECSIVARANFSRARCECSIVARA
Face274.40274.4096.700.000.000.00274.40274.4096.70
Fish382.50172.2087.400.003.33×1030.00382.503.50×10387.40
Glass318.30185.50318.303.89×1046.13×1043.89×1043.92×1046.15×1043.92×104
Libras2.51×103116.001.25×1036.24×1044.66×1056.74×1046.49×1044.66×1056.86×104
Lung2.10×105154.0099.000.000.000.002.10×105154.0099.00
Sonar1.57×103105.001.072×1031.26×1059.86×1041.09×1051.27×1059.87×1041.10×105
Spectf1.18×103247.90595.001.29×1031.79×104936.002.47×1031.81×1041531.00
Water1.38×103183.60193.500.003.23×1030.001.38×1033.41×103193.50
Waveform1.07×103139.001.07×1034.38×1031.41×1064.56×1035.45×1031.41×1065.63×103
Wine337.20206.70245.50674.209.44×103674.201.01×1039.64×103919.70
Wpbc1.12×103191.00661.002.65×1051.96×1051.82×1052.65×1051.96×1051.82×105

图 1

在KNN (k=3)分类器上的平均分类精度结果"

1 Du W S,Hu B Q. Approximate distribution reducts in inconsistent interval?valued ordered decision tables. Information Sciences,2014,271:93-114.
2 Yang X B,Qi Y,Yu D J,et al. α?Dominance relation and rough sets in interval?valued information systems. Information Sciences,2015,294:334-347.
3 Dai J H,Zheng G J,Han H F,et al. Probability approach for interval?valued ordered decision systems in dominance?based fuzzy rough set theory. Journal of Intelligent and Fuzzy Systems,2017,32(1):703-710.
4 Li L F. Multi?level interval?valued fuzzy concept lattices and their attribute reduction. International Journal of Machine Learning and Cybernetics,2017,8(1):45-56.
5 Dai J H,Yan Y J,Li Z W,et al. Dominance?based fuzzy rough set approach for incomplete interval?valued data. Journal of Intelligent and Fuzzy Systems,2018,34(1):423-436.
6 尹继亮,张楠,赵立威等. 区间值决策系统的局部属性约简. 计算机科学,2018,45(7):178-185.
Yin J L,Zhang N,Zhao L W,et al. Local attribute reduction in interval?valued decision systems. Computer Science,2018,45(7):178-185.
7 Shu W H,Qian W B,Xie Y H,et al. An efficient uncertainty measure?based attribute reduction approach for interval?valued data with missing values. International Journal of Uncertainty,Fuzziness and Knowledge?Based Systems,2019,27(6):931-947.
8 闫岳君,代建华. 区间序信息系统的无监督特征选择. 模式识别与人工智能,2017,30(10):928-936.
Yan Y J,Dai J H. Unsupervised feature selection for interval ordered information systems. Pattern Recognition and Artificial Intelligence,2017,30(10):928-936.
9 Ardagna D,Francalanci C,Trubian M. A multi?model algorithm for the cost?oriented design of Internet?based systems. Information Sciences,2006,176(21):3105-3131.
10 Shu W H,Shen H. Multi?criteria feature selection on cost?sensitive data with missing values. Pattern Recognition,2016,51:268-280.
11 Scott C,Davenport M. Regression level set estimation via cost?sensitive classification. IEEE Transactions on Signal Processing,2007,55(6):2752-2757.
12 Yang X B,Qi Y S,Song X N,et al. Test cost sensitive multigranulation rough set:model and minimal cost selection. Information Sciences,2013,250:184-199.
13 Zhang S C. Cost?sensitive classification with respect to waiting cost. Knowledge?Based Systems,2010,23(5):369-378.
14 Zhou Q F,Zhou H,Li T. Cost?sensitive feature selection using random forest:selecting low?cost subsets of informative features. Knowledge?Based Systems,2016,95:1-11.
15 刘亮. 区间值数据的概率处理方法. 硕士学位论文. 杭州:浙江大学,2015.
Liu L. Interval data processing using probabilistic model. Master Dissertation. Hangzhou:Zhejiang University,2015.
16 Liao S J,Zhu Q X,Qian Y H,et al. Multi?granularity feature selection on cost?sensitive data with measurement errors and variable costs. Knowledge?Based Systems,2018,158:25-42.
17 Wang G Y,Ma X A,Yu H. Monotonic uncertainty measures for attribute reduction in probabilistic rough set model. International Journal of Approximate Reasoning,2015,59:41-67.
18 Jia X Y,Liao W H,Tang Z M,et al. Minimum cost attribute reduction in decision?theoretic rough set models. Information Sciences,2013,219:151-167.
19 Li X J,Zhao H,Zhu W. An exponent weighted algorithm for minimal cost feature selection. International Journal of Machine Learning and Cybernetics,2016,7(5):689-698.
20 Tan A H,Wu W Z,Tao Y Z. A set?cover?based approach for the test?cost?sensitive attribute reduction problem. Soft Computing,2017,21(20):6159-6173.
21 Min F,He H P,Qian Y H,et al. Test?cost?sensitive attribute reduction. Information Sciences,2011,181(22):4928-4942.
22 Dai J H,Hu H,Zheng G J,et al. Attribute reduction in interval?valued information systems based on information entropies. Frontiers of Information Technology & Electronic Engineering,2016,17(9):919-928.
23 Dai J H,Wang W T,Mi J S. Uncertainty measurement for interval?valued information systems. Information Sciences,2013,251:63-78.
[1] 刘鑫,胡军,张清华. 属性组序下基于代价敏感的约简方法[J]. 南京大学学报(自然科学版), 2020, 56(4): 469-479.
[2] 汪敏,赵飞,闵帆. 储层预测的代价敏感主动学习算法[J]. 南京大学学报(自然科学版), 2020, 56(4): 561-569.
[3] 程玉胜,陈飞,庞淑芳. 标记倾向性的粗糙互信息k特征核选择[J]. 南京大学学报(自然科学版), 2020, 56(1): 19-29.
[4] 刘亮,何庆. 基于改进蝗虫优化算法的特征选择方法[J]. 南京大学学报(自然科学版), 2020, 56(1): 41-50.
[5] 张龙波, 李智远, 杨习贝, 王怡博. 决策代价约简求解中的交叉验证策略[J]. 南京大学学报(自然科学版), 2019, 55(4): 601-608.
[6] 刘 素, 刘惊雷. 基于特征选择的CP-nets结构学习[J]. 南京大学学报(自然科学版), 2019, 55(1): 14-28.
[7] 陈海娟,冯 翔,虞慧群. 基于预测算子的GSO特征选择算法[J]. 南京大学学报(自然科学版), 2018, 54(6): 1206-1215.
[8] 温 欣1,李德玉1,2*,王素格1,2. 一种基于邻域关系和模糊决策的特征选择方法[J]. 南京大学学报(自然科学版), 2018, 54(4): 733-.
[9] 靳义林1,2*,胡 峰1,2. 基于三支决策的中文文本分类算法研究[J]. 南京大学学报(自然科学版), 2018, 54(4): 794-.
[10]  董利梅,赵 红*,杨文元.  基于稀疏聚类的无监督特征选择[J]. 南京大学学报(自然科学版), 2018, 54(1): 107-.
[11]  崔 晨,邓赵红*,王士同.  面向单调分类的简洁单调TSK模糊系统[J]. 南京大学学报(自然科学版), 2018, 54(1): 124-.
[12]  方 宇1,闵 帆1*,刘忠慧1,杨 新2.  序贯三支决策的代价敏感分类方法[J]. 南京大学学报(自然科学版), 2018, 54(1): 148-.
[13]  李 婵,杨文元*,赵 红.  联合依赖最大化与稀疏表示的无监督特征选择方法[J]. 南京大学学报(自然科学版), 2017, 53(4): 775-.
[14]  姚 晟1,2*,徐 风1,2,赵 鹏1,2,刘政怡1,2,陈 菊1,2.  基于改进邻域粒的模糊熵特征选择算法[J]. 南京大学学报(自然科学版), 2017, 53(4): 802-.
[15] 黄伟婷1*,赵 红2. 基于误差数据的最小代价属性选择分治算法[J]. 南京大学学报(自然科学版), 2016, 52(5): 890-.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!