南京大学学报(自然科学版) ›› 2016, Vol. 52 ›› Issue (5): 890.
黄伟婷1*,赵 红2
Huang Weiting1*,Zhao Hong2
摘要: 最小代价属性选择是数据挖掘的重要问题之一,问题的优化目标是得到总代价最小的属性子集.在实际数据的获取过程中,测量误差是不可避免的.基于测量误差,目前已有一些相关的最小代价属性选择方法.但这些方法存在效率上的问题,特别是对大规模数据集.为解决这一关键问题,提出一种基于误差数据的最小代价属性选择分治算法.该算法将数据集按列拆分为若干个互不相交的子数据集,实现对各子数据集的求解,分而治之.对于不同规模的数据集,其子数据集的大小及总个数并非固定不变,而是根据各数据集的规模自适应设定的.该算法通过拆分数据集来降低问题规模,有效地提高了计算效率.对6个不同规模UCI数据集的实验分析表明该算法的有效性,与经典回溯算法相比,该算法的效果相当但效率至少提高了30%,更能适应实际问题的需要.
[1] Turney P D.Types of cost in inductive concept learning.In:Proceedings of the Workshop on CostSensitive Learning at the 17th International Conference on Machine Learning(ICML).Canada:National Research Council,2000:15-21. [2] Liu D,Li T R,Ruan D.Probabilistic model criteria with decisiontheoretic rough sets.Information Sciences,2011,181(17):3709-3722. [3] Min F,Zhu W.Minimal cost attribute reduction through backtracking.Database Theory and Application,BioScience and BioTechnology.Heidelberg,Berlin:Springer,2011:100-107. [4] Qian W B,Shu W H,Yang J,et al.Costsensitive feature selection on heterogeneous data.Advances in Knowledge Discovery and Data Mining.Springer International Publishing,2015:397-408. [5] Yang X B,Qi Y S,Song X N,et al.Test cost sensitive multigranulation rough set:Model and minimal cost selection.Information Sciences,2013,250:184-199. [6] BolónCanedo V,PortoDíaz I,SánchezMaroño N,et al.A framework for costbased feature selection.Pattern Recognition,2014,47(7):2481-2489. [7] Li X J,Zhao H,Zhu W.A cost sensitive decision tree algorithm with two adaptive mechanisms.KnowledgeBased Systems,2015,88:24-33. [8] Shu W H,Shen H.Multicriteria feature selection on costsensitive data with missing values.Pattern Recognition,2016,51:268-280. [9] Yuan D R,Huang X M,Duan Q L,et al.A strategy of constructing heterogeneous costsensitive decision tree.Advanced Materials Research,2013,756:3414-3418. [10] 张燕平,邹慧锦,赵 姝.基于CCA的代价敏感三支决策模型.南京大学学报(自然科学),2015,51(2):447-452.(Zhang Y P,Zou H J,Zhao S.Costsensitive threeway decisions model based on CCA.Journal of Nanjing University(Natural Sciences),2015,51(2):447-452.) [11] Zhou Z H,Chawla N V,Jin Y C,et al.Big data opportunities and challenges:Discussions from data analytics perspectives.Computational Intelligence Magazine,IEEE,2014,9(4):62-74. [12] 蒋盛益,谢照青,余 雯.基于代价敏感的朴素贝叶斯不平衡数据分类研究.计算机研究与发展,2011,48:387-390.(Jiang S Y,Xie Z Q,Yu W.Naive bayes classification algorithm based on cost sensitive for imbalanced data distribution.Journal of Computer Research and Development,2011,48:387-390.) [13] Wu W Z,Zhang W X.Neighborhood operator systems and approximations.Information Sciences,2002,144(1):201-217. [14] Zhao H,Min F,Zhu W.Testcostsensitive attribute reduction based on neighborhood rough set.In:2011 IEEE International Conference on Granular Computing(GrC).Kaohsiung:IEEE,2011:802-806. [15] Min F,Zhu W.Attribute reduction of data with error ranges and test costs.Information Sciences,2012,211:48-67. [16] Zhao H,Zhu W.Optimal costsensitive granularization based on rough sets for variable costs.KnowledgeBased Systems,2014,65:72-82. [17] Yao Y Y.A partition model of granular computing.Transactions on Rough Sets I.Heidelberg,Berlin:Springer,2004:232-253. |
No related articles found! |
|