南京大学学报(自然科学版) ›› 2019, Vol. 55 ›› Issue (4): 529–536.doi: 10.13232/j.cnki.jnju.2019.04.002

所属专题: 测试专题

• • 上一篇    下一篇

基于极大相容块的邻域粗糙集模型

程永林1,李德玉1,2(),王素格1,2   

  1. 1. 山西大学计算机与信息技术学院,太原,030006
    2. 计算智能与中文信息处理教育部重点实验室,山西大学,太原,030006
  • 收稿日期:2019-05-28 出版日期:2019-07-30 发布日期:2019-07-23
  • 通讯作者: 李德玉 E-mail:lidy@sxu.edu.cn
  • 基金资助:
    国家自然科学基金(61672331,61573231,61432011,61806116);山西省重点研发计划(201803D421024);山西省自然科学基金(201801D221175);山西省高等学校科技创新项目(201802014)

Neighborhood rough set model based on maximal consistent blocks

Yonglin Cheng1,Deyu Li1,2(),Suge Wang1,2   

  1. 1. School of Computer & Information Technology, Shanxi University, Taiyuan, 030006, China
    2. Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan, 030006, China
  • Received:2019-05-28 Online:2019-07-30 Published:2019-07-23
  • Contact: Deyu Li E-mail:lidy@sxu.edu.cn

摘要:

对于数值型数据而言,邻域粗糙集模型是处理不确定信息的有效工具.现有的邻域粗糙集模型仅关注那些邻域中所有样本都属于同一个决策类的一致性情形,无法利用邻域中与多个决策类相交的边界样本所蕴含的信息.针对邻域粗糙集的这一局限性,将相容关系的极大相容块与邻域粗糙集相结合,选取样本邻域内的最大等价块作为最小的信息粒,通过重新定义邻域粗糙集的上下近似和属性重要度等概念,建立了一种基于极大相容块的邻域粗糙集模型.该模型可在更小的信息粒度下将原来边界样本转化成一致性样本来增大正域.运用前向贪婪策略构建了相应的属性约简算法.在七个公开的UCI数据集上的对比实验验证了提出模型的有效性.

关键词: 属性约简, 边界样本, 邻域粗糙集, 极大相容块

Abstract:

For numerical data,neighborhood rough set model is an effective tool for dealing with uncertain information. The existing neighborhood rough set models only focus on the consistent situation that all samples in the neighborhood are in a single decision class. So they cannot make use of the information contained in the boundary samples with multiple decision class labels in the neighborhood. Aiming at this limitation of neighborhood rough set mode,we combine the concept of maximal consistent block of a tolerance relation with neighborhood rough set model and select the largest equivalent block in a sample neighborhood as the minimum information granule. We establish a new model,called neighborhood rough set model based on maximal consistent block,by redefining some concepts,such as upper and lower approximations, attribute importance and so on. The new model can enlarge the positive region by transforming the boundary samples into consistent samples in smaller information granules. In addition,we construct the corresponding attribute reduction algorithm by using the forward greedy strategy. The effectiveness of the proposed model is validated by the experiments on seven public UCI data sets.

Key words: attribute reduction, boundary sample, neighborhood rough set, maximal consistent block

中图分类号: 

  • TP391

表1

X的相容关系矩阵"

x1x2x3x4x5
x111101
x211000
x310101
x400010
x510101

图1

例1中样本分布图"

表2

数据集描述"

数据集样本数属性数类别数
Wine178133
Wdbc569302
Glass214106
Wpbc198322
Ionos351332
Snoar208602
Cancer68392

表3

最优属性子集"

DataNRSVP?NRSMAX?NRSNMCBRS
Wine

1,13,12,11,5,2,3,

7,4

1,12,13,2,5,9,31,7,3,13,4,10,5,91,12,13,5,2,11,3
Wdbc

1,28,22,12,25,10,

19,2

25,23,27,2,10,30,14,19,

24,28,15,5,22,21,11,9,1,

8,3,12,18,4,6,7,26,13

1,25,22,7,29,9,15,

3,4,12,2

1,28,22,26
Glass

1,9,2,10,5,6,4,

3,7,8

1,3,5,6,101,7,41,10
Wpbc

1,29,3,2,32,12,3,

10,6,26

1,11,9,13,3,10,2

1,13,12,6,10,32,5,

23,26,2

1,20,12,26,6
Ionos

1,26,8,2,31,6,5,

22,3,32,4,15,7

1,2,3,6,12,11,8,15,

5,4,7,9,10,28,13

1,2,3,12,6,28,4,8,5,

15,24

2,28,29,20,

21,1,19,4

Snoar

1,54,45,11,27,24,

37,17,29,13,12,39,

8,25,26,22,18,2

1,12,25,17,37,29,8,

27,40,22,2

1,12,24,16,26,5,56,

9,29,48,14,10,27,46,

23,2

1,23,22,36,19,12,13,45,

2

Cancer6,1,2,8,4,5,313,6,8,5,46,1,2,4

表6

最优属性子集在KNN(k=3)下的分类精度"

DataRAWNRSVP?NRSMAX?NRSNMCBRS
Wine94.21±6.7797.60±4.4794.73±6.0897.89±2.7297.44±3.65
Wdbc97.23±1.8196.02±2.0096.71±2.7696.01±2.1895.26±2.40
Glass89.61±5.1790.67±2.6887.87±7.8793.11±6.4993.71±6.45
Wpbc70.14±8.7976.44±11.8076.89±8.5875.42±11.9181.82±7.93
Ionos83.52±9.5386.01±6.2685.01±4.4985.18±5.1989.18±3.97
Snoar80.23±9.6581.40±9.1880.14±8.4381.32±6.7181.51±9.06
Cancer96.62±3.9496.38±6.3889.00±4.7595.78±1.5196.68±2.02
Average87.36±6.5289.21±6.1187.19±6.1389.24±5.2490.85±5.21

表4

最优属性子集个数"

DataRAWNRSVP?NRSMAX?NRSNMCBRS
Wine139787
Wdbc30826114
Glass1010532
Wpbc32107105
Ionos331315118
Snoar602011169
Cancer97154
Average26.711110.299.145.57

表5

最优属性子集在SVM下的分类精度"

DataRAWNRSVP?NRSMAX?NRSNMCBRS
Wine93.68±4.8497.36±3.7295.26±3.8895.20±6.3197.89±2.42
Wdbc94.62±4.5395.15±1.7795.15±2.2694.63±2.0995.43±2.08
Glass77.55±8.5778.41±6.0180.33±5.0877.82±6.4980.92±6.57
Wpbc76.00±12.2875.94±9.8075.94±9.8475.94±9.8476.42±7.59
Ionos84.98±12.3487.45±5.0087.45±5.4687.72±5.4887.57±4.67
Snoar71.32+10.9872.82±10.0873.68±9.7870.62±12.7775.96±10.20
Cancer96.34±3.7796.50±2.4285.69±4.6395.93±1.8396.71±2.40
Average84.92±8.1886.23±5.5484.78±5.8485.40±6.4086.61+5.45
1 HuQ H,YuD R, LiuJ F,et al. Neighborhood rough set based heterogeneous feature subset selection. Information Sciences,2008,178(18):3577-3594.
2 QianY H,LiangJ Y,PedryczW,et al. Positive approximation: an accelerator for attribute reduction in rough set theory. Artificial Intelligence,2010,174(9-10):597-618.
3 WangQ,QianY H,LiangX Y,et al. Local neighborhood rough set. Knowledge?Based Systems,2018,153:53-64.
4 胡清华,赵辉,于达仁. 基于邻域粗糙集的符号与数值属性快速约简算法. 模式识别与人工智能,2008,21(6):730-738.
Hu Q H, Zhao H,Yu D R. Efficient symbolic and numerical attribute reduction with neighborhood rough sets. Pattern Recognition &Artificial Intelligence,2008,21(6):732-738.
5 陈铁桥,柳嫁航,朱锋等. 适用于遥感分类的多邻域粗糙集加权特征提取方法. 武汉大学学报(信息科学版),2018,43(2):311-317.
Chen T Q,Liu J H,Zhu F,et al. A novel multi?radius neighbor?hood rough set weighted feature extraction method for remote sensing image classification. Geomatics and Information Science of Wuhan University,2018,43(2):311-317.
6 SuoM L,AnR M,ZhouD,et al. Grid?clustered rough set model for self?learning and fast reduction. Pattern Recognition Letters,2018,106:61-68.
7 LuoC,LiT R,ChenH M,et al. Incremental rough set approach for hierarchical multicriteria classification. Information Sciences,2018,429:72-87.
8 MaY Y,LuoX Y,LiX L,et al. Selection of rich model steganalysis features based on decision rough set α?positive region reduction. IEEE Transactions on Circuits and Systems for Video Technology,2018,29(2):336-350.
9 MiJ S,WuW Z,ZhangW X. Approaches to knowledge reduction based on variable precision rough set model. Information Sciences,2004,159(3-4):255-272.
10 YaoY Y. Probabilistic rough set approximations. International Journal of Approximate Reasoning,2008,49(2):255-271.
11 KangY,WuS X,LiY W,et al. A variable precision grey?based multi?granulation rough set model and attribute reduction. Knowledge?Based Systems,2018,148:131-145.
12 LiuJ B,ZhaoP Y,LiuJ,et al. Multi?granularity fuzzy soft probability rough sets∥2018 Chinese Control And Decision Conference. Shenyang,China:IEEE,2018:6204-6209.
13 SuoZ Y,ChengS Y,RenJ S. Probability rough set model based on the semantic in set?valued information system∥2nd IEEE International Conference on Computer and Communications. Chengdu,China:IEEE,2017:1244-1249.
14 张清华,薛玉斌,王国胤. 粗糙集的最优近似集. 软件学报,2016,27(2):295-308.
Zhang Q H,Xue Y B,Wang G Y.Optimal approximation sets of rough sets. Journal of Software,2016,27(2):295-308.
15 杨帅华,张清华. 粗糙集近似集的KNN文本分类算法研究. 小型微型计算机系统,2017,38(10):2192-2196.
Yang S H,Zhang Q H.Research on k?nearest neighbor text classification algorithm of approximation set of rough set. Journal of Chinese Computer Systems,2017,38(10):2192-2196.
16 FanX D,ZhaoW D,WangC Z,et al. Attribute reduction based on max?decision neighborhood rough set model. Knowledge?Based Systems,2018,151:16-23.
17 YeeL,LiD Y. Maximal consistent block technique for rule acquisition in incomplete information systems. Information Sciences,2003,153:85-106.
18 ClarkP G,GaoC,Grzymala?BusseJ W,et al. Characteristic sets and generalized maximal consistent blocks in mining incomplete data. Information Sciences,2018,453:66-79.
19 江效尧,程玉胜,胡林生. 基于极大相容块的粗糙性度量及其属性约简. 合肥工业大学学报(自然科学版),2012,35(4):476-480.
Jiang X Y,Cheng Y S,Hu L S.Roughness measure and attribute reduction based on maximal consistent block. Journal of Hefei University of Technology(Natural Science),2012,35(4):476-480.
20 黄治国,张天伍. 基于极大团的不完备系统规则获取方法. 重庆邮电大学学报(自然科学版),2017,29(2):279-284.
Huang Z G,Zhang T W.Maximal clique?based method for acquiring association rules in incomplete system. Journal of Chongqing University of Posts and Telecommu?nications(Natural Science Edition),2017,29(2):279-284.
21 钱宇华,梁吉业,王锋. 面向非完备决策表的正向近似特征选择加速算法. 计算机学报,2011,34(3):435-442.
Qian Y H,Liang J Y,Wang F.A positive?approximation based accelerated algori?thm to feature selection from incomplete decision tables. Chinese Journal of Computers,2011,34(3):435-442.
[1] 刘鑫,胡军,张清华. 属性组序下基于代价敏感的约简方法[J]. 南京大学学报(自然科学版), 2020, 56(4): 469-479.
[2] 张龙波, 李智远, 杨习贝, 王怡博. 决策代价约简求解中的交叉验证策略[J]. 南京大学学报(自然科学版), 2019, 55(4): 601-608.
[3] 陶玉枝1,2,赵仕梅1,2,谭安辉1,2*. 一种基于决策表约简的集覆盖问题的近似解法[J]. 南京大学学报(自然科学版), 2018, 54(4): 821-.
[4] 李俊余1,2,王 霞1,2*,刘庆凤3. 属性定向概念格的协调近似表示空间[J]. 南京大学学报(自然科学版), 2017, 53(2): 333-.
[5] 施玉杰1*,杨宏志2,徐久成3. α-先验概率优势关系下的粗糙集模型研究[J]. 南京大学学报(自然科学版), 2016, 52(5): 899-.
[6] 梁新彦1,2,钱宇华1,2*,郭 倩2,成红红1,2. 面向多标记学习的局部粗糙集[J]. 南京大学学报(自然科学版), 2016, 52(2): 270-.
[7] 刘莹莹1,吕跃进2*. 基于相似度的集值信息系统属性约简算法基于相似度的集值信息系统属性约简算法[J]. 南京大学学报(自然科学版), 2015, 51(2): 384-389.
[8] 贾洪杰1,2丁世飞1,2. 基于邻域粗糙集约减的谱聚类算法[J]. 南京大学学报(自然科学版), 2013, 49(5): 619-627.
[9]  于洪**,姚园,赵军
.  一种有效的基于风险最小化的属性约简算法*[J]. 南京大学学报(自然科学版), 2013, 49(2): 133-141.
[10]  陈玉明**,吴克寿,孙金华.  基于幂树的决策表最小属性约简*
[J]. 南京大学学报(自然科学版), 2012, 48(2): 164-171.
[11]  谢娟英**,李楠1,2,乔子茵1
.  基于邻域粗糙集的不完整决策系统特征选择算法*[J]. 南京大学学报(自然科学版), 2011, 47(4): 383-390.
[12]  赵荣泳 1 , 李翠玲 2 ** , 高晓康 3 , 王昭云 4
.  电子镇流器故障诊断的变精度粗糙集模型*
[J]. 南京大学学报(自然科学版), 2010, 46(5): 494-500.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 阚 威, 李 云. 基于LSTM的脑电情绪识别模型[J]. 南京大学学报(自然科学版), 2019, 55(1): 110 -116 .
[2] 狄 岚, 何锐波, 梁久祯. 基于可能性聚类和卷积神经网络的道路交通标识识别算法[J]. 南京大学学报(自然科学版), 2019, 55(2): 238 -250 .
[3] 王哲成,张 云,于 军,龚绪龙. 基于扩展有限元的应力强度因子计算精度研究[J]. 南京大学学报(自然科学版), 2019, 55(3): 361 -369 .
[4] 王彤, 魏巍, 王锋. 基于样本对加权共协关系矩阵的聚类集成算法[J]. 南京大学学报(自然科学版), 2019, 55(4): 592 -600 .
[5] 张弘,申俊峰,董国臣,刘圣强,王冬丽,王伟清. 云南来利山锡矿锡石标型特征及其找矿意义[J]. 南京大学学报(自然科学版), 2019, 55(6): 888 -897 .
[6] 洪佳明,黄云,刘少鹏,印鉴. 具有结果多样性的近似子图查询算法[J]. 南京大学学报(自然科学版), 2019, 55(6): 960 -972 .
[7] 王宝丽,姚一豫. 信息表中约简补集对及其一般定义[J]. 南京大学学报(自然科学版), 2020, 56(4): 461 -468 .