南京大学学报(自然科学版) ›› 2023, Vol. 59 ›› Issue (4): 680–689.doi: 10.13232/j.cnki.jnju.2023.04.014

• • 上一篇    下一篇

一种基于相容块划分的动态增量式属性约简方法

徐阳1, 王磊1,2(), 张义宗1, 王诚彪1   

  1. 1.南昌工程学院信息工程学院,南昌,330099
    2.江西省水信息协同感知与智能处理重点实验室,南昌工程学院信息工程学院,南昌,330099
  • 收稿日期:2023-01-09 出版日期:2023-07-31 发布日期:2023-08-18
  • 通讯作者: 王磊 E-mail:ezhoulei@163.com
  • 基金资助:
    国家自然科学基金(61562061);江西省教育厅科技项目(GJJ170995)

A dynamic incremental attribute reduction method based on consisitent block partitioning

Yang Xu1, Lei Wang1,2(), Yizong Zhang1, Chengbiao Wang1   

  1. 1.School of Information Engineering,Nanchang Institute of Engineering,Nanchang,330099,China
    2.Jiangxi Province Key Laboratory of Water Information Cooperative Sensing and Intelligent Processing,School of Information Engineering,Nanchang Institute of Engineering,Nanchang,330099,China
  • Received:2023-01-09 Online:2023-07-31 Published:2023-08-18
  • Contact: Lei Wang E-mail:ezhoulei@163.com

摘要:

属性约简是数据挖掘、机器学习等研究领域中的一个颇为重要的预处理步骤,其效率的高低会直接影响相关任务的性能.针对已有的非增量式属性约简方法在相容块粗糙集模型中对象集发生变化时无法高效更新属性约简的问题,提出一种以区分度为启发信息的增量式属性约简方法.首先,引入相容块的概念并运用相容块对论域进行划分,在此基础上给出不完备信息系统的区分度定义;然后,详细分析对象集发生变化条件下区分度的更新机理;进一步,以区分度为启发式信息构造增量式属性约简算法;最后,选取六个UCI数据集进行增量式约简的更新实验.实验结果表明,在不影响属性约简精度的前提下,该增量式方法的时间消耗比非增量式更新方法平均缩短50%,更加可行和高效.

关键词: 属性约简, 相容块, 划分, 区分度, 增量学习

Abstract:

Attribute reduction is essential for preprocessing in research fields such as data mining and machine learning since its efficiency has an immediate impact on the performances of the above?mentioned fields. In this paper,an incremental attribute reduction approach with discrimination as heuristic information is proposed against the problem of the non?incremental attribute reduction method failing to efficiently update the attribute reduction when the object set is changed in the consistent block rough set model. To begin with,the concept of compatible block is introduced to divide the domain of discourse using compatible blocks. Then,the differentiation degree of incomplete information system is defined before the analysis on the update mechanism of discrimination when the object set is changed. Moreover,an incremental attribute reduction algorithm is built with the discrimination as heuristic information. At last,an incremental reduction updating experiment is conducted on six UCI data sets selected. The time consumed by the incremental method is 50 % shorter than that by the non?incremental updating method on the premise of not affecting the accuracy of attribute reduction,according to the experimental results. To sum up,the proposed incremental reduction algorithm is more feasible and efficient than the non?incremental reduction algorithm.

Key words: attribute reduction, consistent block, partition, discrimination, incremental learning

中图分类号: 

  • TP391

表1

一个不完备信息系统"

Ua1a2a3a4
x11102
x22*02
x3**01
x41*01
x5**01
x6210*

图1

极大相容块在论域上的覆盖"

图2

相容块在论域上的划分"

表2

实验使用的数据集"

序号数据集对象数属性数类别数
1Lymphography148183
2Diabetes_data_upload520162
3Anneal798385
4Credit Approval690152
5Maternal Health Risk101463
6Obesity Dataset2111167

图3

两种算法在六个数据集上增加一组对象时消耗的时间对比"

图4

两种算法在六个数据集上删除一组对象时消耗的时间对比"

表3

两种属性约简方法的实验结果"

数据集增量算法非增量算法
Lymphography7,15,14,2,16,4,3,11,123,4,7,8,11,13,14,15,17,2
Diabetes_data_upload1,5,6,4,2,10,91,2,5,3,15,16,14,8
Anneal1,3,4,8,9,12,33,34,35,71,3,4,5,7,8,33,34,37
Credit Approval2,3,8,62,3,8,6
Maternal Health Risk2,3,4,9,11,13,14,15,1,7,82,3,4,12,13,14,15,16,9
Obesity Dataset1,3,4,6,5,21,3,4,6,5,2

表4

两种算法在六个数据集上的分类精度比较"

数据集增量算法非增量算法
Lymphography81.376%81.352%
Diabetes_data_upload92.233%92.383%
Anneal74.375%74.575%
Credit Approval84.453%84.453%
Maternal Health Risk67.843%67.843%
Obesity Dataset78.778%78.778%

表5

四种算法增加或者删除20%的数据集时的运行时间比较"

数据集文献[19]文献[20]文献[21]本文提出的方法
增加删除增加删除增加删除增加删除
Lymphography0.23 s0.54 s0.17 s0.67 s0.47 s1.03 s0.14 s0.61 s
Diabetes_data_upload45.43 s95.43 s79.52 s80.44 s52.19 s65.10 s35.64 s67.4 s
Anneal42.10 s275.43 s39.17 s160.43 s79.52 s105.90 s6.08 s153.64 s
Credit Approval0.35 s0.79 s1.43 s1.59 s2.11 s1.77 s0.37 s0.57 s
Maternal Health Risk103.42 s257.79 s39.57 s423.12 s57.23 s251.43 s23.16 s237.12 s
Obesity Dataset509.42 s1503.2 s645.2 s2709.4 s1002.5 s2347.6 s1116.32 s2001.55 s
1 Pawlak Z. Rough sets. International Journal of Computer & Information Sciences198211(5):341-356.
2 Pattaraintakorn P, Cercone N. Integrating rough set theory and medical applications. Applied Mathematics Letters200821(4):400-403.
3 Golan R H, Ziarko W. A methodology for stock market analysis utilizing rough set theory∥Proceedings of 1995 Conference on Computational Intelligence for Financial Engineering. New York,NY,USA:IEEE,1995:32-40.
4 Szul T, Tabor S, Pancerz K. Application of the BORUTA algorithm to input data selection for a model based on rough set theory (RST) to prediction energy consumption for building heating. Energies202114(10):2779.
5 Huang H, Meng F Z, Zhou S H,et al. Brain image segmentation based on FCM clustering algorithm and rough set. IEEE Access2019,7:12386-12396.
6 Mitra S, Banka H. Application of rough sets in pattern recognition∥Peters J F,Skowron A,Marek V W,et al. Transactions on rough sets VII. Springer Berlin Heidelberg,2007:151-169.
7 Kryszkiewicz M. Rough set approach to incomplete information systems. Information Sciences1998112(1-4):39-49.
8 Leung Y, Li D Y. Maximal consistent block technique for rule acquisition in incomplete information systems. Information Sciences2003(153):85-106.
9 杨习贝,黄佳玲,周君仪,等. 不完备系统中基于特征相容块的粗糙集. 山东大学学报(工学版)201242(5):1-6.
Yang X B, Huang J L, Zhou J Y,et al. Characteristic consistent blocks based rough set in incomplete system. Journal of Shandong University (Engineering Science)201242(5):1-6.
10 孙妍,米据生,冯涛,等. 变精度极大相容块粗糙集模型及其属性约简. 计算机科学与探索202014(5):892-900.
Sun Y, Mi J S, Feng T,et al. Maximum consistent block based variable precision rough set model and attribute reduction. Journal of Frontiers of Computer Science and Technology202014(5):892-900.
11 周石泉,蒙祖强. 基于数据相容填补的极大相容块构造算法. 计算机科学201239(9):192-197.
Zhou S Q, Meng Z Q. Algorithm of constructing maximal consistent block based on consistent data reinforcement. Computer Science201239(9):192-197.
12 王敬前,张小红. 基于极大相容块的不完备信息处理新方法及其应用. 南京大学学报(自然科学)202258(1):82-93.
Wang J Q, Zhang X H. A new method of incomplete information processing based on maximal consistent block and its application. Journal of Nanjing University (Natural Sciences)202258 (1):82-93.
13 胡宝清,温彪. 基于极大相容块的不完备模糊目标信息系统的近似约简. 江西师范大学学报(自然科学版)201539(1):15-19.
Hu B Q, Wen B. The approximate reduction in incomplete and fuzzy objective information systems based on maximal tolerance classes. Journal of Jiangxi Normal University (Natural Science)201539(1):15-19.
14 Wang L, Liu B, Cai X X,et al. A tolerance classes partition?based re?definition of the rough approximations for incomplete information system∥The International Conference on Image,Vision and Intelligent Systems. Springer Berlin Heidelberg,2022:1003-1012.
15 刘斌,王磊,王冲,等. 对象集变化时相容块的近似集增量更新方法. 山东大学学报(工学版)202353(2):109-117.
Liu B, Wang L, Wang C,et al. An incremental method for updating approximations of consistent blocks while the universe evolves over time. Journal of Shandong University (Engineering Science)202353(2):109-117.
16 Wang G Q. Valid incremental attribute reduction algorithm based on attribute generalization for an incomplete information system. Chinese Journal of Electronics201928(4):725-736.
17 Xie X J, Qin X L. A novel incremental attribute reduction approach for dynamic incomplete decision systems. International Journal of Approximate Reasoning2018(93):443-462.
18 纪霞,李龙澍. 基于属性分辨度的最大相容块规则提取算法. 控制与决策201328(12):1837-1842,1848,DOI:10.13195/j.kzyjc.2013.12.025 .
Ji X, Li L S. Algorithm for rules acquisition from maximal consistent blocks based on attribute discernibility. Control and Decision201328(12):1837-1842,1848,DOI:10.13195/j.kzyjc.2013.12.025 .
19 张扩,续欣莹,阎高伟,等. 信息观下批增量式属性约简算法. 山西大学学报(自然科学版)201639(3):357-370.
Zhang K, Xu X Y, Yan G W,et al. Batch of incremental attribute reduction algorithm under information view. Journal of Shanxi University (Natural Science Edition)201639(3):357-370.
20 Zhong Q Q, Wang L, Yang W,et al. Incremental attribute reduction based on knowledge granularity under incomplete data. Journal of Physics:Conference Series2021,2025:012042.
21 刘海涛,续欣莹,谢珺,等. 信息观下增量式属性约简方法研究. 小型微型计算机系统201637(2):375-380.
Liu H T, Xu X Y, Xie J,et al. Incremental attribute reduction algorithm under the information view. Journal of Chinese Computer Systems201637(2):375-380.
[1] 贺青青, 马建敏, 丁娜. 形式背景上基于矩阵信息熵的矩阵熵约简[J]. 南京大学学报(自然科学版), 2023, 59(1): 98-106.
[2] 吴天宇, 王士同. 核化的多视角特权协同随机矢量功能链接网络及其增量学习方法[J]. 南京大学学报(自然科学版), 2022, 58(2): 275-285.
[3] 李同军, 张晓雨, 吴伟志, 谭安辉. 形式模糊背景中单边模糊概念格的属性约简[J]. 南京大学学报(自然科学版), 2022, 58(1): 38-48.
[4] 王敬前, 张小红. 基于极大相容块的不完备信息处理新方法及其应用[J]. 南京大学学报(自然科学版), 2022, 58(1): 82-93.
[5] 刘鑫, 胡军, 张清华, 于洪. 多用户偏好下基于三支决策的动态属性约简[J]. 南京大学学报(自然科学版), 2022, 58(1): 9-18.
[6] 刘小伟, 景运革. 一种有效更新多源数据约简的增量算法[J]. 南京大学学报(自然科学版), 2021, 57(6): 1083-1091.
[7] 刘鑫,胡军,张清华. 属性组序下基于代价敏感的约简方法[J]. 南京大学学报(自然科学版), 2020, 56(4): 469-479.
[8] 袁燕,陈伯伦,朱国畅,花勇,于永涛. 基于社区划分的空气质量指数(AQI)预测算法[J]. 南京大学学报(自然科学版), 2020, 56(1): 142-150.
[9] 程永林, 李德玉, 王素格. 基于极大相容块的邻域粗糙集模型[J]. 南京大学学报(自然科学版), 2019, 55(4): 529-536.
[10] 张龙波, 李智远, 杨习贝, 王怡博. 决策代价约简求解中的交叉验证策略[J]. 南京大学学报(自然科学版), 2019, 55(4): 601-608.
[11] 孙 玫,张 森,聂培尧,聂秀山. 基于朴素贝叶斯的网络查询日志session划分方法研究[J]. 南京大学学报(自然科学版), 2018, 54(6): 1132-1140.
[12] 陶玉枝1,2,赵仕梅1,2,谭安辉1,2*. 一种基于决策表约简的集覆盖问题的近似解法[J]. 南京大学学报(自然科学版), 2018, 54(4): 821-.
[13] 李俊余1,2,王 霞1,2*,刘庆凤3. 属性定向概念格的协调近似表示空间[J]. 南京大学学报(自然科学版), 2017, 53(2): 333-.
[14] 汪小寒1,2,罗永龙1,2*,江叶峰1,赵传信1,2,吴文莉1,郭良敏1,2 . 基于KD树最优投影划分的k匿名算法[J]. 南京大学学报(自然科学版), 2016, 52(6): 1050-.
[15] 施玉杰1*,杨宏志2,徐久成3. α-先验概率优势关系下的粗糙集模型研究[J]. 南京大学学报(自然科学版), 2016, 52(5): 899-.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!