南京大学学报(自然科学版) ›› 2024, Vol. 60 ›› Issue (1): 106117.doi: 10.13232/j.cnki.jnju.2024.01.011
• • 上一篇
李丽红1,2,3(), 董红瑶1,2,3,6, 刘文杰4, 李宝霖1,2,3, 代琪5
Lihong Li1,2,3(), Hongyao Dong1,2,3,6, Wenjie Liu4, Baolin Li1,2,3, Qi Dai5
摘要:
针对不完备混合信息系统的分类问题,结合粒计算中的邻域容差关系和互信息理论,定义邻域容差互信息的概念,并利用集成学习的思想,提出不完备数据集的邻域容差互信息选择集成分类算法.该算法首先根据缺失属性得到信息粒,划分粒层构建粒空间,在不同的粒层上使用以BP神经网络作为基分类器的集成算法,构建新的基分类器;然后,根据每个信息粒的缺失属性计算出关于类属性的邻域容差互信息,来衡量各个信息粒的重要度,并根据基分类器预测准确率以及邻域容差互信息重新定义基分类器权重;最后,根据预测样本对基分类器加权集成预测分类结果,并与传统的集成分类算法进行对比分析.对于部分不完备混合型数据集,新提出的集成分类算法能有效提升分类准确率.
中图分类号:
1 | 邓建新,单路宝,贺德强,等. 缺失数据的处理方法及其发展趋势. 统计与决策,2019,35(23):8-34. |
Deng J X, Shan L B, He D Q,et al. Processing method of missing data and its developing tendency. Statistics and Decision,2019,35(23):28-34. | |
2 | Tran C T, Zhang M J, Andreae P,et al. An effective and efficient approach to classification with incomplete data. Knowledge?Based Systems,2018,154:1-16. |
3 | 张利亭,冯涛,李欢. 不完备信息系统的直觉模糊决策粗糙集. 郑州大学学报(理学版),2021,53(2):57-65. |
Zhang L T, Feng T, Li H. Intuitionistic fuzzy decision rough sets for incomplete information systems. Journal of Zhengzhou University (Natural Science Edition),2021,53(2):57-65. | |
4 | 杨美丽. 基于相容关系的不完整数据集成分类方法研究. 硕士学位论文. 合肥:安徽大学,2021. |
Yang M L. Incomplete data ensemble classification based?on tolerance relationship. Master Dissertation. Hefei:Anhui University,2021. | |
5 | 刘海峰,续欣莹,申雪芬,等. 基于限制邻域关系的不完备混合决策系统属性约简. 广西师范大学学报(自然科学版),2013,31(3):30-36. |
Liu H F, Xu X Y, Shen X F,et al. Attribute reduction of incomplete mixed decision system based on limited neighborhood relation. Journal of Guangxi Normal University (Natural Science Edition),2013,31(3):30-36. | |
6 | Zhao H, Qin K Y. Mixed feature selection in incomplete decision table. Knowledge?Based Systems,2014,57:181-190. |
7 | 梁吉业,钱宇华,李德玉,等. 大数据挖掘的粒计算理论与方法. 中国科学:信息科学,2015,45(11):1355-1369. |
Liang J Y, Qian Y H, Li D Y,et al. Theory and method of granular computing for big data mining. Science in China (Information Sciences),2015,45(11):1355-1369. | |
8 | Krause S, Polikar R. An ensemble of classifiers approach for the missing feature problem ∥ Proceedings of the International Joint Conference on Neural Networks,2003. Portland,OR,USA:IEEE,2003:553-558. |
9 | 吕靖,舒礼莲. 基于AdaBoost的不完整数据的信息熵分类算法. 计算机与现代化,2013(9):31-34. |
Lü J, Shu L L. Incomplete data information entropy classification algorithm based on AdaBoost. Computer and Modernization,2013,9:31-34. | |
10 | Chen H X, Du Y P, Jiang K. Classification of incomplete data using classifier ensembles ∥ 2012 International Conference on Systems and Informatics (ICSAI2012). Yantai,China:IEEE,2012:2229-2232. |
11 | Yan Y T, Zhang Y P, Zhang Y W. Multi?granulation ensemble classification for incomplete data ∥ 9th International Conference on Rough Sets and Knowledge Technology. Springer Berlin Heidelberg,2014:343-351. |
12 | Zhang T, Dai Q, Ma Z C. Extreme learning machines' ensemble selection with GRASP. Applied Intelligence,2015,43(2):439-459. |
13 | Ma Z C, Dai Q, Liu N Z. Several novel evaluation measures for rank?based ensemble pruning with applications to time series prediction. Expert Systems with Applications,2015,42(1):280-292. |
14 | Chen T Q, He T, Benesty M,et al. Xgboost:Extreme gradient boosting,2015,1(4):1-4. |
15 | Yan Y T, Zhang Y P, Zhang Y W,et al. A selective neural network ensemble classification for incomplete data. International Journal of Machine Learning and Cybernetics,2017,8(5):1513-1524. |
16 | 彭莉,张海清,李代伟,等. 基于粗糙集理论的不完备数据分析方法的混合信息系统填补算法. 计算机应用,2021,41(3):677-685. |
Peng L, Zhang H Q, Li D W,et al. Imputation algorithm for hybrid information system of incomplete data analysis approach based on rough set theory. Journal of Computer Applications,2021,41(3):677-685. | |
17 | 李金海,王飞,吴伟志,等. 基于粒计算的多粒度数据分析方法综述. 数据采集与处理,2021,36(3):418-435. |
Li J H, Wang F, Wu W Z,et al. Review of multi?granularity data analysis methods based on granular computing. Journal of Data Acquisition and Processing,2021,36(3):418-435. | |
18 | 李明,甘秀娜,王月波. 基于集成学习的决策粗糙集特定类属性约简算法. 计算机应用与软件,2021,38(6):262-270. |
Li M, Gan X N, Wang Y B. Class-specific attribute reduction algorithm for decision-theoretic rough sets based on ensemble learning. Computer Applications and Software,2021,38(6):262-270. | |
19 | 杨小平. 粗集中最大相似度的不完备数据补齐. 计算机工程与应用,2012,48(36):164-166. |
Yang X P. Completing incomplete data based on maximum similarity in rough sets. Computer Engineering and Applications,2012,48(36):164-166. | |
20 | 姚晟,陈菊,吴照玉. 一种基于邻域容差信息熵的组合度量方法. 小型微型计算机系统,2020,41(1):46-50. |
Yao S, Chen J, Wu Z Y. Combination measurement method based on neighborhood tolerance information entropy. Journal of Chinese Computer Systems,2020,41(1):46-50. | |
21 | 刘丹,徐立新,李敬伟. 不完备邻域多粒度决策理论粗糙集与三支决策. 计算机应用与软件,2019,36(5):145-157. |
Liu D, Xu L X, Li J W. Incomplete neighborhood multi?granulation decision?theoretic rough set and three?way decision. Computer Applications and Software,2019,36(5):145-157. | |
22 | 滕书华,鲁敏,杨阿锋,等. 基于一般二元关系的粗糙集加权不确定性度量. 计算机学报,2014,37(3):649-665. |
Teng S H, Lu M, Yang A F,et al. A weighted uncertainty measure of rough sets based on general binary relation. Chinese Journal of Computers,2014,37(3):649-665. | |
23 | Hu Q H, Yu D R, Liu J F,et al. Neighborhood rough set based heterogeneous feature subset selection. Information Sciences,2008,178(18):3577-3594. |
24 | He Q, Xie Z X, Hu Q H,et al. Neighborhood based sample and feature selection for SVM classification learning. Neurocomputing,2011,74(10):1585-1594. |
25 | Shannon C E. A mathematical theory of communication. The Bell System Technical Journal,1948,27(3):379-423. |
[1] | 刘芳, 李磊军, 米据生, 李美争. 分层多尺度决策信息系统的序贯三支决策[J]. 南京大学学报(自然科学版), 2023, 59(6): 981-995. |
[2] | 韩雪, 周晨. 大气探测激光雷达的分类和特征[J]. 南京大学学报(自然科学版), 2023, 59(5): 900-913. |
[3] | 孟元, 张轶哲, 张功萱, 宋辉. 基于特征类内紧凑性的不平衡医学图像分类方法[J]. 南京大学学报(自然科学版), 2023, 59(4): 580-589. |
[4] | 谭嘉辰, 董永权, 张国玺. SSM: 基于孪生网络的糖尿病视网膜眼底图像分类模型[J]. 南京大学学报(自然科学版), 2023, 59(3): 425-434. |
[5] | 吕佳, 肖锋. 内存有效的快速双层深度规则分类器[J]. 南京大学学报(自然科学版), 2023, 59(3): 446-459. |
[6] | 仲兆满, 熊玉龙, 黄贤波. 基于异构集成学习的多元文本情感分析研究[J]. 南京大学学报(自然科学版), 2023, 59(3): 471-482. |
[7] | 冯海, 马甲林, 许林杰, 杨宇, 谢乾. 融合标签嵌入和知识感知的多标签文本分类方法[J]. 南京大学学报(自然科学版), 2023, 59(2): 273-281. |
[8] | 陈瑞, 徐金东, 刘兆伟, 阎维青, 王璇, 宋永超, 倪梦莹. 基于模糊空谱特征的高光谱图像分类[J]. 南京大学学报(自然科学版), 2023, 59(1): 145-154. |
[9] | 田小瑜, 秦永彬, 黄瑞章, 陈艳平. 基于相关性约束矩阵分解的多标签分类方法[J]. 南京大学学报(自然科学版), 2023, 59(1): 76-84. |
[10] | 张艳莎, 冯夫健, 王杰, 潘凤, 谭棉, 张再军, 王林. 基于张量特征的小样本图像快速分类方法[J]. 南京大学学报(自然科学版), 2022, 58(6): 1059-1069. |
[11] | 刘艳鹏, 龚安民, 赵磊, 罗建功, 王帆, 伏云发. 不同实验范式下言语想象的脑神经机制[J]. 南京大学学报(自然科学版), 2022, 58(5): 836-845. |
[12] | 孙晓燕, 乔娅利. 基于迁移与半监督共生融合的虚假评论识别[J]. 南京大学学报(自然科学版), 2022, 58(5): 846-855. |
[13] | 徐旭, 张凡, 李义丰. 基于非线性Lamb波改进全聚焦成像的板中损伤分类与定位[J]. 南京大学学报(自然科学版), 2022, 58(5): 894-903. |
[14] | 梁纬, 逯洋, 王淳, 张桂杰. 尺度选择完备局部导数模式及其在热轧带钢图像分类中的应用研究[J]. 南京大学学报(自然科学版), 2022, 58(4): 615-628. |
[15] | 曾艺祥, 林耀进, 范凯钧, 曾伯儒. 基于层次类别邻域粗糙集的在线流特征选择算法[J]. 南京大学学报(自然科学版), 2022, 58(3): 506-518. |
|