南京大学学报(自然科学版) ›› 2019, Vol. 55 ›› Issue (4): 601–608.doi: 10.13232/j.cnki.jnju.2019.04.010

所属专题: 测试专题

• • 上一篇    下一篇

决策代价约简求解中的交叉验证策略

张龙波1,李智远1,杨习贝2(),王怡博3   

  1. 1. 江苏师范大学科文学院,徐州,221116
    2. 江苏科技大学计算机学院,镇江,212003
    3. 东南大学计算机科学与工程学院,南京,211189
  • 收稿日期:2019-05-28 出版日期:2019-07-30 发布日期:2019-07-23
  • 通讯作者: 杨习贝 E-mail:zhenjiangyangxibei@163.com
  • 基金资助:
    国家自然科学基金(61572242,61502211,61503160)

Cross⁃validation strategy in attribute reduction based on decision cost

Longbo Zhang1,Zhiyuan Li1,Xibei Yang2(),Yibo Wang3   

  1. 1. Kewen College, Jiangsu Normal University, Xuzhou, 221116, China
    2. School of Computer, Jiangsu University of Science and Technology, Zhenjiang, 212003, China
    3. School of Computer Science and Engineering, Southeast University, Nanjing, 211189, China
  • Received:2019-05-28 Online:2019-07-30 Published:2019-07-23
  • Contact: Xibei Yang E-mail:zhenjiangyangxibei@163.com

摘要:

属性约简是粗糙集理论中的核心问题,其目的是剔除冗余属性以找到具有较好泛化能力的属性子集.在决策粗糙集理论中,决策代价经常被作为属性约简的约束条件.但值得注意的是,虽然基于决策代价的约简求解算法可以有效地降低训练样本集上的总决策代价,但其往往忽视了测试样本集上的总决策代价.为解决这一问题,利用交叉验证的基本思想,设计了以决策代价为约束条件的一种新的属性约简求解算法.在八个UCI数据集上的实验结果表明,相较于传统基于决策代价的约简求解算法,所提算法不仅能有效地降低训练集合和测试集合的总决策代价,而且找出的属性子集亦可以带来更好的分类性能.

关键词: 决策粗糙集, 属性约简, 交叉验证, 代价敏感

Abstract:

Attribute reduction is a core problem in rough set theory,with its purpose of getting rid of redundant attributes to obtain a reduct with better generalized performance. In decision?theoretic rough set,the decision cost is frequently regarded as constraint of attribute reduction. However,it is worthy to notice that although the reduct obtained by the algorithm based on the consideration of decision cost can effectively reduce the decision cost of the training set,it may fails to effectively reduce the decision cost of the test set. To solve such problem,a new algorithm, based on the method of cross?validation,is designed through using the decision cost as constraint. The experimental result over eight UCI data sets show that compared with the traditional algorithm based on decision cost,the proposed algorithm not only reduces the decision cost of training set and the test set,but also brings better classification performance.

Key words: decision?theoretic rough set, attribute reduction, cross?validation, cost?sensitive

中图分类号: 

  • TP3

表1

决策粗糙集中的代价矩阵"

X~X
aPλPPλPN
aBλBPλBN
aNλNPλNN

表2

数据集的描述"

ID数据集样本个数属性个数决策类个数
1Breast Tissue10696
2Connectionist Bench (Vowel Recognition?Deterding Data)9901411
3Dermatology366346
4Ecoli33678
5Forest523274
6Ionosphere351342
7Statlog(Vehicle?Silhouettes)846194
8Urban Land Cover6751479

表3

分类精度和约简长度对比"

IDKNNSVMCART约简长度
算法1算法2算法1算法2算法1算法2算法1算法2
10.60870.60430.53150.53870.61470.63182.142.58
20.86520.91960.52900.53840.72300.75655.789.82
30.89440.91850.90020.92700.89560.93005.707.44
40.81540.86470.80680.83810.77210.80784.285.64
50.81520.84580.82230.83800.78160.80032.824.64
60.89580.89750.85810.86180.85380.88062.323.80
70.63510.66720.53160.55070.61170.64773.425.04
80.78740.81630.74530.79140.76710.79823.304.78

图1

两种算法的总决策代价对比"

表4

两种算法的时间消耗对比"

ID算法1算法2
10.06760.3095
23.605221.7105
32.094011.3334
40.29411.2293
51.59008.2857
60.95104.7678
71.66399.9960
810.4979149.1929
1 PawlakZ. Rough sets. International Journal of Computer & Information Sciences,1982,11(1):341-356.
2 HuQ H,PanW W,ZhangL,et al. Feature selection for monotonic classification. IEEE Transactions on Fuzzy Systems,2012,20(1):69-81.
3 JiaX Y,LiaoW H,TangZ M,et al. Minimum cost attribute reduction in decision?theoretic rough set models. Information Sciences,2013,219:151-167.
4 MinF,ZhuW. Attribute reduction of data with error ranges and test costs. Information Sciences,2012,211:48-67.
5 QianY H,ChengH H,WangJ T,et al. Grouping granular structures in human granulation intelligence. Information Sciences,2017,382-383:150-169.
6 HuQ H,AnS,YuD R. Soft fuzzy rough sets for robust feature evaluation and selection. Informa?tion Sciences,2010,180(22):4384-4400.
7 MiaoD Q,GaoC,ZhangN,et al. Diverse reduct subspaces based co?training for partially labeled data. International Journal of Approximate Reasoning,2011,52(8):1103-1117.
8 ChenY F,YueX D,FujitaH,et al. Three?way decision support for diagnosis on focal liver
lesions. Knowledge?Based Systems,2017,127:85-99.
9 LiuD,LiangD C,WangC C. A novel three?way decision model based on incomplete information system. Knowledge?Based Systems,2016,91:32-45.
10 LiuK Y,YangX B,YuH L,et al. Rough set based semi?supervised feature selection via ensemble selector. Knowledge?Based Systems,2019,165:282-296.
11 SongJ J,TsangE C C,ChenD G,et al. Minimal decision cost reduct in fuzzy decision?theoretic rough set model. Knowledge?Based Systems,2017,126:104-112.
12 JuH R,LiH X,YangX B,et al. Cost?sensitive rough set:a multi?granulation approach. Knowledge?Based Systems,2017,123:137-153.
13 YaoY Y,ZhangX Y. Class?specific attribute reducts in rough set theory. Information Sciences,2017,418-419:601-618.
14 李智远,杨习贝,徐苏平等. 邻域决策一致性的属性约简方法研究. 河南师范大学学报(自然科学版),2017,45(5):68-73.
Li Z Y,Yang X B,Xu S P,et al.Attribute reduction approach to neighbor?hood decision agreement. Journal of Henan Normal University (Natural Science Edition),2017,45(5):68-73.
15 李智远,杨习贝,陈向坚等. 类别近似质量约束下的属性约简方法研究. 河南师范大学学报(自然科学版),2018,46(3):112-118.
Li Z Y,Yang X B,Chen X J,et al.Attribute reduction constrained by class?specific approximate quality. Journal of Henan Normal University (Natural Science Edition),2018,46(3):112-118.
16 杨习贝,徐苏平,戚湧等. 基于多特征空间的粗糙数据分析方法. 江苏科技大学学报(自然科学版),2016,30(4):370-373.
Yang X B,Xu S P,Qi Y,et al.Rough data analysis method based on multi?feature space. Journal of Jiangsu University of Science and Technology,2016,30(4):370-373.
17 李华雄,刘盾,周献中. 决策粗糙集模型研究综述. 重庆邮电大学学报(自然科学版),2010,22(5):624-630.
Li H X,Liu D,Zhou X Z.Review on decision?theoretic rough set model. Pattern Recognition &Artificial IntelligenceJournal of Chongqing University of Posts and Telecommuni?cations (Natural Science Edition),2010,22(5):624-630.
18 YaoY Y,WongS K M,LingrasP. A decision?theoretic rough set model∥Ras Z W,Zemankova M,Emrich M L. Methodologies for Intelligent Systems. New York:North?Holland,1990:17-25.
19 杨习贝,戚湧,宋晓宁等. 决策单调约简的启示. 琼州学院学报,2014,21(5):17-25.
Yang X B,Qi Y,Song X N,et al.Inspiration of decision?monotonicity reduct. Journal of Qiongzhou University,2014,21(5):17-25.
20 LiJ Z,ChenX J,WangP X,et al. Local view based cost?sensitive attribute reduction. Filomat,2018,32(5):1817-1822.
21 MinF,HeH P,QianY H,et al. Test?cost?sensitive attribute reduction. Information Sciences,2011,181(22):4928-4942.
22 YangX B,QiY S,SongX N,et al. Test cost sensitive multigranulation rough set:model and minimal cost selection. Information Sciences,2013,250:184-199.
23 杨习贝,颜旭,徐苏平等. 基于样本选择的启发式属性约简方法研究. 计算机科学,2016,43(1):40-43.
Yang X B,Yan X,Xu S P,et al.New heuristic attribute reduction algorithm based on sample selection. Computer Science,2016,43(1):40-43.
24 XuS P,YangX B,YuH L,et al. Multi?label learning with label?specific feature reduction. Knowledge Based Systems,2016,104:52-61.
25 JiangG X,WangW J. Markov cross?validation for time series model evaluations. Information Sciences,2017,375:219-233.
26 HuQ H,YuD R,XieZ X. Neighborhood classifiers. Expert Systems with Applications,2008,34(2):866-876.
[1] 汪敏,赵飞,闵帆. 储层预测的代价敏感主动学习算法[J]. 南京大学学报(自然科学版), 2020, 56(4): 561-569.
[2] 刘鑫,胡军,张清华. 属性组序下基于代价敏感的约简方法[J]. 南京大学学报(自然科学版), 2020, 56(4): 469-479.
[3] 程永林, 李德玉, 王素格. 基于极大相容块的邻域粗糙集模型[J]. 南京大学学报(自然科学版), 2019, 55(4): 529-536.
[4] 张 婷1,2,张红云1,2*,王 真3. 基于三支决策粗糙集的迭代量化的图像检索算法[J]. 南京大学学报(自然科学版), 2018, 54(4): 714-.
[5] 陶玉枝1,2,赵仕梅1,2,谭安辉1,2*. 一种基于决策表约简的集覆盖问题的近似解法[J]. 南京大学学报(自然科学版), 2018, 54(4): 821-.
[6]  方 宇1,闵 帆1*,刘忠慧1,杨 新2.  序贯三支决策的代价敏感分类方法[J]. 南京大学学报(自然科学版), 2018, 54(1): 148-.
[7] 赵天娜1,米据生1*,解 滨2,梁美社1,3. 基于多伴随直觉模糊粗糙集的三支决策[J]. 南京大学学报(自然科学版), 2017, 53(6): 1081-.
[8]  李 敬,王利东*.  不完备信息系统中的广义多粒度双相对定量决策粗糙集[J]. 南京大学学报(自然科学版), 2017, 53(4): 782-.
[9] 李俊余1,2,王 霞1,2*,刘庆凤3. 属性定向概念格的协调近似表示空间[J]. 南京大学学报(自然科学版), 2017, 53(2): 333-.
[10] 黄伟婷1*,赵 红2. 基于误差数据的最小代价属性选择分治算法[J]. 南京大学学报(自然科学版), 2016, 52(5): 890-.
[11] 施玉杰1*,杨宏志2,徐久成3. α-先验概率优势关系下的粗糙集模型研究[J]. 南京大学学报(自然科学版), 2016, 52(5): 899-.
[12] 梁新彦1,2,钱宇华1,2*,郭 倩2,成红红1,2. 面向多标记学习的局部粗糙集[J]. 南京大学学报(自然科学版), 2016, 52(2): 270-.
[13] 杨静1 王瑞波2李济洪2*. 一种均衡的RHS交叉验证[J]. 南京大学学报(自然科学版), 2015, 51(4): 842-849.
[14] 刘莹莹1,吕跃进2*. 基于相似度的集值信息系统属性约简算法基于相似度的集值信息系统属性约简算法[J]. 南京大学学报(自然科学版), 2015, 51(2): 384-389.
[15] 张燕平1,2, 邹慧锦1,2,赵姝1,2. 基于CCA的代价敏感三支决策模型[J]. 南京大学学报(自然科学版), 2015, 51(2): 447-452.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 韩海阳,贾龙飞,李歌,张豹山,唐东明,杨燚. 嵌套斯格明子的自旋动力势效应研究[J]. 南京大学学报(自然科学版), 2019, 55(5): 774 -780 .
[2] 刘作国,陈笑蓉. 汉语句法分析中的论元关系模型研究[J]. 南京大学学报(自然科学版), 2019, 55(6): 1010 -1019 .
[3] 张弘,申俊峰,董国臣,刘圣强,王冬丽,王伟清. 云南来利山锡矿锡石标型特征及其找矿意义[J]. 南京大学学报(自然科学版), 2019, 55(6): 888 -897 .
[4] 洪佳明,黄云,刘少鹏,印鉴. 具有结果多样性的近似子图查询算法[J]. 南京大学学报(自然科学版), 2019, 55(6): 960 -972 .
[5] 李子龙,周勇,鲍蓉. AdaBoost图像到类距离学习的图像分类方法[J]. 南京大学学报(自然科学版), 2020, 56(1): 51 -56 .
[6] 任睿,张超,庞继芳. 有限理性下多粒度q⁃RO模糊粗糙集的最优粒度选择及其在并购对象选择中的应用[J]. 南京大学学报(自然科学版), 2020, 56(4): 452 -460 .
[7] 顾萍萍,周献中. 基于概率语言术语集评价的三支决策方法研究[J]. 南京大学学报(自然科学版), 2020, 56(4): 505 -514 .