南京大学学报(自然科学版) ›› 2014, Vol. 50 ›› Issue (4): 457.
周国静,李 云*
Zhou Guojing, Li Yun
摘要: 特征选择是机器学习和数据挖掘中的一个关键问题,它可以实现数据维度的约减,从而提高学习模型的泛化能力。近年来,为了提高特征选择算法的性能,集成思想被应用到特征选择算法中,即将多个基特征选择器进行集成。本文从提高特征选择算法对大规模数据处理能力的角度出发,提出了一种基于最小最大策略的集成特征选择方法。它主要包括三个步骤:第一,将原始数据根据类别信息划分成多个相对较小的平衡数据子集;第二,在每一个数据子集上进行特征选择,得到多个特征选择结果;第三,对多个特征选择结果依据最小—最大策略进行集成,得出最终的特征选择结果。通过实验对比了该集成策略与其它三种集成策略对分类准确率的影响,结果表明最小最大集成策略在大部分情况下能够获得较好的性能,且基于最小最大策略的集成特征选择可以有效处理大规模数据。
[1] Han J W, Kamber M, Jian P. Data mining: Concepts and techniques. San Francisco: Morgan Kaufmann, 2006, 800. [2] Tang J L, Alelyani S, Liu H. Feature selection for classification: A review. Florida: The Chemical Rubber Company Press, 2013, 33. [3] Li Y, Gao S Y, Chen S C. Ensemble feature weighting based on local learning and diversity. In: Proceedings of the 26th AAAI Conference on Artificial Intelligence. California: AAAI Press, 2012, 1019~1025. [4] Song Q B, Ni J J, Wang G G. A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Transactions on Knowledge and Data Enegineering, 2013, 25(1): 1~14. [5] Gu Q Q, Li Z H, Han J W. Generalized Fisher score for feature selection. In: Proceedings of the International Conference on Uncertainty of Artificial Intelligence. California: Morgan Kaufmann Publishers, 2011: 266~273. [6] Marko R S, Igor K. Theoretical and empirical analysis of ReliefF and RreliefF. Machine Learning, 2003, 53(1-2): 23~69. [7] Guyon I, Weston J, Barnhill S, et al. Gene selection for cancer classification using support vector machines. Machine Learning, 2002, 46(1-3): 389~422. [8] Liu H, Yu L. Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(4): 491~502. [9] Woznica A, Nguyen P, Kalousis A. Model mining for robust feature selection. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2012: 913~921. [10] Hoi S C H, Wang J L, Zhao P L, et al. Online feature selection for mining big data. In: Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications. New York: ACM Press, 2012: 93~100. [11] Wu X D, Yu K, Wang H. Online streaming feature selection. In: Proceedings of the 27th international Conference on Machine Learning. California: AAAI Press, 2010: 1159~1166. [12] Awada W, Khoshgoftaar T M, Dittman D, et al. A review of the stability of feature selection techniques for bioinformatics data. In: Proceedinds of the IEEE 13th International Conference on Information Reuse and Integration. United Kingdom: Emerald Group Publishing Limited, 2012: 356~363. [13] Li Y, Feng L L. Integrating feature selection and Min-Max modular SVM for powerful ensember. In: Proceedings of the IEEE International Joint Conference on Neural Networks. New York: IEEE Conference Publications, 2012: 1~8. [14] Lu X, Mu Z. Stochastic stepwise ensembers for variable selection. Journal of Computational and Graphical Statistics, 2012, 21(2): 275~294. [15] Lu B L, Ito M. Task decomposition and module combination based on class relations: A modular neural network for pattern classification. IEEE Transactions on Neural Networks, 1999, 10(5): 1244~1256. [16] Wang K A, Zhao H, Lu B L. Task decomposition using geometric relation for Min-Max modular SVMs. Advances in Neural Networks, 2005: 887~892. [17] Chu X L, Ma C, Li J, et al. Large-scale patent classification with Min-Max modular support vector machines. In: Proceedings of the IEEE International Joint Conference on Neural Networks. New York: IEEE Conference Publications, 2008: 3972~3979. [18] Wu K, Lu B L, Utiyama M, et al. An empirical comparison of min-max-modular k-NN with different voting methods to large-scale text categorization. Soft Computing-A Fusion of Foundations, Methodologies and Applications, 2008, 12(7): 647~655. [19] Chang C C, Lin C J. LIBSVM: a library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/, 2014-4-1. [20] Guyon I. Agnostic learning vs prior knowledge. http://www.agnostic.inf.ethz.ch/datasets.php, 2007-2-1. [21] Liu H, Ye J P, Zhao Z, et al. Feature selection at Arizona State University in conjunction with the DMML. http://featureselection.asu.edu/datasets.php, 2008-1-14. |
No related articles found! |
|