南京大学学报(自然科学版) ›› 2019, Vol. 55 ›› Issue (4): 564–572.doi: 10.13232/j.cnki.jnju.2019.04.006

• • 上一篇    下一篇

基于多次学习和关联度的关联分类改进算法

李家辉1,2,周忠眉1,2()   

  1. 1. 闽南师范大学计算机学院,漳州,363000
    2. 数据科学与智能应用福建省高等学校重点实验室,漳州,363000
  • 收稿日期:2019-05-16 出版日期:2019-07-30 发布日期:2019-07-23
  • 通讯作者: 周忠眉 E-mail:zzm@zju.edu.cn
  • 基金资助:
    福建省自然科学基金(2018J01545)

Improved association classification algorithm based onmultiple learning and correlation degree

Jiahui Li1,2,Zhongmei Zhou1,2()   

  1. 1. School of Computer Science, Minnan Normal University, Zhang Zhou, 363000, China
    2. Key Laboratory of;Data Science and Intelligence Application, Fujian Province University, Zhangzhou, 363000, China
  • Received:2019-05-16 Online:2019-07-30 Published:2019-07-23
  • Contact: Zhongmei Zhou E-mail:zzm@zju.edu.cn

摘要:

基于支持度置信度框架的关联分类算法在生成规则时难以提出大量高质量规则,而且在一些数据集尤其是不平衡数据集上,部分训练实例未被产生的关联规则所覆盖,导致算法的分类准确率不高.基于以上问题提出了改进的关联分类的算法(Improved Algorithm based on Multiple learning and Correlation degree,IAMC).首先,在提取规则时,IAMC对训练集进行多次关联分类学习,尽量多地提出高质量的规则.其次,在生成规则时采用综合考虑了置信度,补类支持度的新度量关联度,以提高生成的规则的质量.最后,在关联分类规则提取后,对利用已有规则无法判断类别的和未被已有规则覆盖的训练实例用决策树方法再次提取规则,并加入到规则集中.实验结果表明,IAMC算法能提出更多高质量的规则,在多个UCI数据集上具有较高的分类准确率.

关键词: 关联分类, 多次学习, 关联度, 分类准确率

Abstract:

The Association classification algorithms based on support?confidence framework are difficult to generate a large number of high?quality rules; Meanwhile,in several types of data sets like imbalanced data sets,the generated rules cannot cover all training instances,which causes low classification accuracy. To solve these problems,we propose an Improved Algorithm based on Multiple learning and Correlation degree (IAMC). Firstly,when extracting rules,IAMC performs multiple times of associative classification learning on the training set,in order to generate more rules. Secondly,to improve the quality of the generated rules,a new measure of correlation degree is adopted by IAMC,which takes into account the confidence and the CCS degree. Finally,after extracting the association classification rules,IAMC uses the decision tree method to extract the rules from the training instances which cannot be covered by the associative rules. The experimental results show that IAMC algorithm provides more high?quality rules and can also achieve higher classification accuracy than existing associative algorithms on multiple UCI datasets.

Key words: associative classification, multiple learning, correlation degree, classification accuracy

中图分类号: 

  • TP311.13
1 YuK,WuX,WeiD,et al. Causal associative classification∥IEEE International Conference on Data Mining Workshops. Brussels,Belgium:IEEE,2012:914-923 .
2 LuS H,ChiangD A,KehH C,et al. Chinese text classification by the Na?ve Bayes classifier and the associative classifier with multiple confidence
threshold values. Knowledge?Based Systems,2010,23(6):598-604.
3 AlwidianJ,HammoB H,ObeidN. WCBA:weighted classification based on association rules algorithm for breast cancer disease. Applied Soft Computing,2017,62:536-549.
4 HeH B,GarciaE A. Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering,2009,21(9):1263-1284.
5 DengH,RungerG,TuvE,et al. CBC:an asso?ciative classifier with a small number of rules. Decision Support Systems,2014,59:163-170.
[1] 许 林,张 巍*,梁小龙,肖 瑞,曹剑秋. 岩土介质孔隙结构参数灰色关联度分析[J]. 南京大学学报(自然科学版), 2018, 54(6): 1105-1113.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 魏 桐,童向荣. 基于加权启发式搜索的鲁棒性信任路径生成[J]. 南京大学学报(自然科学版), 2018, 54(6): 1161 -1170 .
[2] 阚 威, 李 云. 基于LSTM的脑电情绪识别模型[J]. 南京大学学报(自然科学版), 2019, 55(1): 110 -116 .
[3] 李黎, 张瑞芳, 杜娜娜, 柳寰宇. 基于有限临时删边的病毒传播控制策略[J]. 南京大学学报(自然科学版), 2019, 55(4): 651 -659 .
[4] 刘文平,周政,吴娟,罗超,吴伟,姜磊,焦堃,叶玥豪,邓宾. 川南盆地长宁页岩气田五峰组⁃龙马溪组成藏动力学过程及其意义[J]. 南京大学学报(自然科学版), 2020, 56(3): 393 -404 .
[5] 王丽娟,丁世飞,丁玲. 基于迁移学习的软子空间聚类算法[J]. 南京大学学报(自然科学版), 2020, 56(4): 515 -523 .