南京大学学报(自然科学版) ›› 2018, Vol. 54 ›› Issue (1): 116.
郑丽容1,洪志令2*
Zheng Lirong1,Hong Zhiling2*
摘要: 提出一种基于聚类的启发式选择性集成学习算法.集成学习通过组合多个弱分类器获得比单一分类器更好的学习效果,把多个弱分类器提升为一个强分类器.理论上来说弱分类器的个数越多,组合的模型效果越好,但是随着弱分类器的增多,模型的训练时间和复杂度也随之递增.通过聚类的方法去除相似的弱分类器,一方面有效降低模型的复杂度,另一方面选出差异性较大的弱分类器作为候选集合.之后采用启发式的选择性集成算法,对弱分类器进行有效的组合,从而提升模型的分类性能.同时采用并行的集成策略,提高集成学习选取最优分类器子集效率,可以有效地减少模型的训练时间.实验结果表明,该算法较传统方法在多项指标上都有着一定的提升.
[1] 张春霞,张讲社.选择性集成学习算法综述.计算机学报,2011,34(8):1399-1410.(Zhang C X,Zhang J S.A survey of selective ensemble learning algorithms.Chinese Journal of Computers,2011,34(8):1399-1410.) [2] 杨 春,殷绪成,郝红卫等.基于差异性的分类器集成:有效性分析及优化集成.自动化学报,2014,40(4):660-674.(Yang C,Yin X C,Hao H W,et al.Classifier ensemble with diversity:Effectiveness analysis and ensemble optimization.Acta Automatica Sinica,2014,40(4):660-674.) [3] Zhou Z H,Wu J X,Tang W,et al.Combining regression estimators:GA-based selective neural network ensemble.International Journal of Computational Intelligence and Applications,2001,1(4):341-356. [4] Zhou Z H,Wu J X,Tang W.Ensembling neural networks:Many could be better than all.Artificial Intelligence,2002,137(1-2):239-263. [5] 唐 伟,周志华.基于Bagging的选择性聚类集成.软件学报,2005,16(4):496-502.(Tang W,Zhou Z H.Bagging-based selective clusterer ensemble.Journal of Software,2005,16(4):496-502.) [6] 毕 华,梁洪力,王 珏.重采样方法与机器学习.计算机学报,2009,32(5):862-877.(Bi H,Liang H L,Wang J.Resampling methods and machine learning.Chinese Journal of Computers,2009,32(5):862-877.) [7] 曹 莹,苗启广,刘家辰等.AdaBoost算法研究进展与展望.自动化学报,2013,39(6):745-758.(Cao Y,Miao Q G,Liu J C,et al.Advance and prospects of Ada Boost algorithm.Acta Automatica Sinica,2013,39(6):745-758.) [8] 李 凯,崔丽娟.集成学习算法的差异性及性能比较.计算机工程,2008,34(6):35-37.(Li K,Cui L J.Diversity and performance comparison for ensemble learning algorithms.Computer Engineering,2008,34(6):35-37.) [9] Zhou Z H,Wu J X,Tang W.Ensembling neural networks:Many could be better than all.Artificial intelligence,2002,137(1-2):239-263. [10] Laslett D,Canback B.ARAGORN,a program to detect tRNA genes and tmRNA genes in nucleotide sequences.Nucleic Acids Research,2004,32(1):11-16. [11] Cole S T,Brosch R,Parkhill J,et al.Deciphering the biology of Mycobacteriumtuberculosis from the complete genome sequence.Nature,1998,393(6685):537-544. [12] Laslett D,Canbck B.ARWEN:A program to detect tRNA genes in metazoan mitochondrial nucleotide sequences.Bioinformatics,2008,24(2):172-175. [13] Chan P K,Stolfo S J.Toward scalable learning with non-uniform class and cost-distributions:A case study in credit card fraud detection ∥ International Conference on Knowledge Discovery & Data Mining.New York,NY,USA:AAAI Press,1988:164-168. [14] Guo J S,Zeng J C,Chen J X,et al.Selective ensemble learning with parallel optimization and hierarchical selection ∥ Proceedings of 2015 Interna-tional Conference on Machine Learning and Cyber-netics.Guangzhou,China:IEEE,2015:194-199. [15] Brigo D,Capponi A.Bilateral counterparty risk with application to CDSs.Risk,2010,23(3):85-90. [16] Kohavi R.A study of cross-validation and bootstrap for accuracy estimation and model selection ∥ Proceedings of the 14th International Joint Conference on Artificial Intelligence.Montreal,Canada:Morgan Kaufmann Publishers Inc.,1995,2:1137-1145. |
No related articles found! |
|