南京大学学报(自然科学版) ›› 2018, Vol. 54 ›› Issue (6): 1206–1215.doi: 10.13232/j.cnki.jnju.2018.06.016

• • 上一篇    下一篇

基于预测算子的GSO特征选择算法

陈海娟1,冯 翔1,2*,虞慧群1   

  1. 1.华东理工大学信息科学与工程学院,上海,200237;2.上海交通大学智慧城市协同创新中心,上海,200240
  • 接受日期:2018-08-10 出版日期:2018-12-01 发布日期:2018-12-01
  • 通讯作者: 冯 翔, xfeng@ecust.edu.cn E-mail:xfeng@ecust.edu.cn
  • 基金资助:
    国家自然科学基金(61472139,61462073),上海市经济和信息化委员会“信息化发展专项资金”(201602008)

GSO feature selection algorithm based on predictive operators

Chen Haijuan1,Feng Xiang1,2*,Yu Huiqun1   

  1. 1.School of Information Science and Engineering,East China University of Science and Technology,Shanghai,200237,China; 2.Smart City Collaborative Innovation Center,Shanghai Jiao Tong University,Shanghai,200240,China
  • Accepted:2018-08-10 Online:2018-12-01 Published:2018-12-01
  • Contact: Feng Xiang, xfeng@ecust.edu.cn E-mail:xfeng@ecust.edu.cn

摘要: 如今很多领域能采集到的特征变量数以万计,而能作为训练集的样本量却远小于特征数量,因此利用特征选择降低数据维数并提高算法的性能成了首要工作. 特征选择的三类主流方法为过滤式、包裹式和嵌入式,但最近用演化计算(Evolutionary Computing,EC)技术进行特征选择获得了更多的关注,已有实验证明EC技术能取得更好的性能. 提出一种基于预测算子的群搜索(Group Search Optimizer,GSO)特征选择算法(GSO feature selection algorithm based on predictive operators,PGSO)用于特征选择问题. 首先在GSO算法基础上引入基于轮盘赌选择算法的变异算子,按变异概率来选择粒子某一维度的值进行变异,若变异后的粒子的适应度值更优,则保留该变异,以此来保持群体的多样性,提高算法的搜索性能. 再者,对GSO算法加入预测算子,选取种群中5%的粒子学习生产者的历史最优位置,来预测下一生产者的位置,这样很大程度上加快了粒子的寻优速度. 最后,在UCI中的六个数据集上将其与基于粒子群优化(Particle Swarm Optimization,PSO)算法、GSO算法以及竞争选择(Competitive Selection Optimization,CSO)算法的特征选择算法进行比较,实验结果验证了所提出的算法在单目标特征选择问题上有较低的错误率和快速收敛的性能,且它不易陷入局部最优.

关键词: 特征选择, PGSO, 轮盘赌选择, 变异算子, 预测算子

Abstract: Nowadays,we can collect tens of thousands of feature variables in many fields. However,the sample size which can be used as a training set is much smaller than the number of features. Therefore,it has become primary task to use feature selection to reduce the dimensions of the data and improve the performance of the algorithm. The three main methods of feature selection are filtering,parceling and embedding,whereas using Evolutionary Computing(EC)techniques to feature selection has gained more and more attention recently and obtained better performance. A novel GSO(Group Search Optimizer)feature selection algorithm based on predictive operators(PGSO)is proposed to solve the problem of the feature selection in this paper. Firstly,a mutation operator based on roulette wheel selection algorithm is introduced to this algorithm,where the value of one dimension for one particle is selected to change according to the mutation probability. If the fitness value of the mutated particles is better,the variability is retained to maintain the diversity of the population and improve the search performance of the algorithm. Secondly,the forecasting operator is added to the GSO algorithm,where 5% of the particles in the population are selected to learn the producer’s historical optimal position,so as to predict the position of the next producer. In this way it can greatly accelerate the speed of the particle optimization. Finally,the proposed algorithm is carried on six datasets of the UCI. The results show that for a single objective feature selection problem,PGSO has low error rate and faster convergence performance,and it isn’t easy to fall into local optimum compared to the feature selection algorithm based on Particle Swarm Optimization(PSO)algorithm,basic GSO algorithm and Competitive Selection Optimization(CSO)algorithm.

Key words: feature selection, PGSO, roulette wheel selection, mutation operator, forecasting operator

中图分类号: 

  • TP18
[1] Gao W F,Hu L,Zhang P. Class-specific mutual information variation for feature selection. Pattern Recognition,2018,79:328-339.
[2] Atashpaz-Gargari E,Reis M S,Braga-Neto U M,et al. A fast Branch-and-Bound algorithm for U-curve feature selection. Pattern Recognition,2018,73:172-188.
[3] Deshpande A,Patavardhan P P. Feature extraction and fuzzy-based feature selection method for long range captured iris images ∥ Perez G,Mishra K,Tiwari S.Networking communication and data knowledge engineering. Singapore:Springer,2018:137-144.
[4] Han Z,Bennis M,Wang D,et al. Special issue on big data networking-challenges and applications. Journal of Communications and Networks,2015,17(6):545-548.
[5] Gao T,Zhao X M,Xiang M,et al. Texture feature descriptor using auto salient feature selection for scale-adaptive improved local difference binary. Multidimensional Systems and Signal Processing,2017,28(1):281-292.
[6] Alshaikhdeeb B,Ahmad K. Feature selection for chemical compound extraction using wrapper approach with Naive Bayes classifier ∥ Proceedings of the 6th International Conference on Electrical Engineering and Informatics. Langkawi,Malaysia:IEEE,2017:1-6.
[7] Xue B,Zhang M J,Browne W N,et al. A survey on evolutionary computation approaches to feature selection. IEEE Transactions on Evolutionary Computation,2016,20(4):606-626.
[8] Wang J L,Zhang J,Wang XX. A data driven cycle time prediction with feature selection in a semiconductor wafer fabrication system. IEEE Transactions on Semiconductor Manufacturing,2018,31(1):173-182.
[9] Mistry K,Zhang L,Neoh S C,et al. A micro-GA embedded PSO feature selection approach to intelligent facial emotion recognition. IEEE Transactions on Cybernetics,2017,47(6):1496-1509.
[10] Gheyas I A,Smith L S. Feature subset selection in large dimensionality domains. Pattern Recognition,2010,43(1):5-13.
[11] Winkler S M,Affenzeller M,Jacak W,et al. Identification of cancer diagnosis estimation models using evolutionary algorithms:A case study for breast cancer,melanoma,and cancer in the respiratory system ∥ Proceedings of the 13th AnnualConference Companion on Genetic and Evolutionary Computation. New York,NY,USA:ACM,2011:503-510.
[12] Ke L J,Feng Z R,Ren Z Q. An efficient ant colony optimization approach to attribute reduction in rough set theory. Pattern Recognition Letters,2008,29(9):1351-1357.
[13] Xue B,Zhang M J,Browne W N. Particle swarm optimization for feature selection in classification:A multi-objective approach. IEEE Transactions on Cybernetics,2013,43(6):1656-1671.
[14] Bouraoui A,Jamoussi S,Benayed Y. A multi-objective genetic algorithm for simultaneous model and feature selection for support vector machines. Artificial Intelligence Review,2018,50(2):261-281.
[15] Shao Y H,Li C N,Liu M Z,et al. Sparse Lq-norm least squares support vector machine with feature selection. Pattern Recognition,2018,78:167-181.
[16] He S,Wu Q H,Saunders J R. Group search optimizer:An optimization algorithm inspired by animal searching behavior. IEEE Transactions on Evolutionary Computation,2009,13(5):973-990.
  [17] Wang L,Zhong X,Liu M. A novel group search optimizer for multi-objective optimization. Expert Systems with Applications,2012,39(3):2939-2946.
[18] Li Y Z,Zheng X W,Xiao X C. A study on cooperative multi-objective group search optimizer ∥ The 27th Chinese Control and Decision Conference(2015 CCDC). Qingdao,China:IEEE,2015:3776-3781.
[19] Nizami I F,Majid M,Khurshid K. New feature selection algorithms for no-reference image quality assessment. Applied Intelligence,2018,48(10):3482-3501.
[20] Deb K,Pratap A,Agarwal S,et al. A fast and elitist multiobjective genetic algorithm:NSGA-Ⅱ. IEEE Transactions on Evolutionary Computation,2002,6(2):182-197.
[21] Biesinger B,Hu B,Raidl G R. A genetic algorithm in combination with a solution archive for solving the generalized vehicle routing problem with stochastic demands. Transportation Science,2018,52(3),DOI:10.1287/trsc.2017.0778.
[22] Han H G,Wei L,Qiao J F. An adaptive multiobjective particle swarm optimization based on multiple adaptive methods. IEEE Transactions on Cybernetics,2017,47(9):2754-2767.
[1] 程玉胜,陈飞,庞淑芳. 标记倾向性的粗糙互信息k特征核选择[J]. 南京大学学报(自然科学版), 2020, 56(1): 19-29.
[2] 刘亮,何庆. 基于改进蝗虫优化算法的特征选择方法[J]. 南京大学学报(自然科学版), 2020, 56(1): 41-50.
[3] 刘 素, 刘惊雷. 基于特征选择的CP-nets结构学习[J]. 南京大学学报(自然科学版), 2019, 55(1): 14-28.
[4] 李二超,马玉泉. 基于就近取值策略的离散多目标优化[J]. 南京大学学报(自然科学版), 2018, 54(6): 1216-1224.
[5] 温 欣1,李德玉1,2*,王素格1,2. 一种基于邻域关系和模糊决策的特征选择方法[J]. 南京大学学报(自然科学版), 2018, 54(4): 733-.
[6] 靳义林1,2*,胡 峰1,2. 基于三支决策的中文文本分类算法研究[J]. 南京大学学报(自然科学版), 2018, 54(4): 794-.
[7]  董利梅,赵 红*,杨文元.  基于稀疏聚类的无监督特征选择[J]. 南京大学学报(自然科学版), 2018, 54(1): 107-.
[8]  崔 晨,邓赵红*,王士同.  面向单调分类的简洁单调TSK模糊系统[J]. 南京大学学报(自然科学版), 2018, 54(1): 124-.
[9]  李 婵,杨文元*,赵 红.  联合依赖最大化与稀疏表示的无监督特征选择方法[J]. 南京大学学报(自然科学版), 2017, 53(4): 775-.
[10]  姚 晟1,2*,徐 风1,2,赵 鹏1,2,刘政怡1,2,陈 菊1,2.  基于改进邻域粒的模糊熵特征选择算法[J]. 南京大学学报(自然科学版), 2017, 53(4): 802-.
[11] 蔡亚萍,杨 明* . 一种利用局部标记相关性的多标记特征选择算法[J]. 南京大学学报(自然科学版), 2016, 52(4): 693-.
[12] 谢娟英*,屈亚楠,王明钊 . 基于密度峰值的无监督特征选择算法[J]. 南京大学学报(自然科学版), 2016, 52(4): 735-.
[13] 珠 杰1,2*,李天瑞1,刘胜久1. 基于条件随机场的藏文人名识别技术研究[J]. 南京大学学报(自然科学版), 2016, 52(2): 289-.
[14] 胡学钢*,许尧,李培培,张玉红. 一种过滤式多标签特征选择算法[J]. 南京大学学报(自然科学版), 2015, 51(4): 723-730.
[15] 周国静,李 云*. 基于最小最大策略的集成特征选择[J]. 南京大学学报(自然科学版), 2014, 50(4): 457-.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 王旻,林志斌,卢晶. 适用于虚拟低音音质的客观评价方法研究[J]. 南京大学学报(自然科学版), 2019, 55(5): 796 -803 .
[2] 杜彦男,吴孔友,刘寅,林红梅,党思思,李彦颖,徐进军. 断陷盆地边界断裂结构特征及物性差异定量评价——以车镇凹陷埕南断裂为例[J]. 南京大学学报(自然科学版), 2020, 56(3): 405 -417 .