南京大学学报(自然科学版) ›› 2013, Vol. 49 ›› Issue (5): 603–610.

• • 上一篇    下一篇

基于概率密度估计的增量支持向量机算法

潘世超1,王文剑1,2**,郭虎升1   

  • 出版日期:2014-02-08 发布日期:2014-02-08
  • 作者简介:(1. 山西大学计算机与信息技术学院,太原,030006; 2. 山西大学计算智能与中文信息处理教育部重点实验室,太原,030006)
  • 基金资助:
    国家自然科学基金(60975035,61273291),山西省回国留学人员科研资助项目(2012-008)

An incremental support vector machine approach based on probability density distribution

 Pan Shi-Chao1, Wang Wen-Jian1, 2, Guo Hu-Sheng 1   

  • Online:2014-02-08 Published:2014-02-08
  • About author:(1.School of Computer and Information Technology, Shanxi University, Taiyuan, 030006 , China; 2. Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan, 030006 , China)

摘要: 增量支持向量机(Incremental Support Vector Machine, ISVM)模型通过每次加入一个或者一批样本进行学习,将大规模问题分解成一系列子问题,以提高支持向量机(Support Vector Machine, SVM)处理大规模数据的学习效率,但传统ISVM(Traditional ISVM, TISVM )模型中增量样本的选择方法不当可能降低其效率和泛化能力。针对ISVM中增量样本的选择问题,提出了一种基于概率密度分布的ISVM算法,称为PISVM,该方法通过概率密度分布选择含有较多重要分类信息(有可能成为支持向量)的增量样本进行训练,使得分类器能够以最快的速度收敛到最优。在标准数据集UCI上的实验结果表明PISVM模型可以在保持其泛化能力的同时进一步提高学习效率。

Abstract: Incremental support vector machine model (ISVM) joins a sample or a batch of samples to learn in each cycle, and then the problem can be reduced from large-scale to a series of sub issues. Therefore, ISVM can improve the efficiency of support vector machine (SVM) to deal with large scale data. However, using traditional support vector machine (TISVM), the convergence speed, efficiency and the eventual generalization ability may be decreased due to the incorrect selection of the incremental samples. To solve the problem, an ISVM approach ( incremental support vector machine based on the probability density distribution, namely PISVM) is proposed through choosing those incremental training samples including much important classification information based on probability density distribution. Using the approach can make the classifier get to the optimal hyper lane at the fastest speed. In order to verify the validity of the proposed approach, some experiments are done using the three approaches: the PISVM approach, the TISVM method and the minimum distance classifier approach. The experiment results on UCI data set demonstrate that the proposed PISVM can obtain high learning efficiency with good generalization performance simultaneously.

[1] IDC研究报告:2011年全球数据总量1.8 ZB. http://storage.chinabyte.com/163/12110163.shtml. 2011-06.

[2] Vapnik V N. Statistical learning theory. New York: Wiley, 1998, 21~22

[3] Chang T T, Liu H W, Zhou S S. Large scale classification with local diversity AdaBoost SVM algorithm. Journal of Systems Engineering and Electronics, 2009, 20(6): 1344 ~1350.

[4] Wang W J, Guo H S, Jia Y F, et al. Granular support vector machine based on mixed measure. Neurocomputing, 2013(101): 116~128.

[5] Syed N, Liu H, Sung K. Incremental learning with support vector machines. International Joint Conference on Artificial Intelligence. Sweden: Morgan Kaufmann Publishers, 1999: 352~356.

[6] Zhang L, Zhou W D, Jiao L C. Pre-extracting support vectors for support vector machine. Proceeding of ICSP2000. IEEE, 2000: 1432~1435.

[7] Xiao R, Wang J C, Sun Z X, et al. An incremental SVM learning algorithm. Journal of Nanjing University (Natural Sciences), 2002, 38(2): 152~157. (萧 嵘, 王继成, 孙正兴等. 一种SVM增量学习算法. 南京大学学报(自然科学), 2002, 38(2): 152~157).

[8] Gauwenberghs G, Poggio T. Incremental a nd decremental support vector machine learning. Machine Learning, 2001, 44 (13): 409~415.

[9] Ralaivola L, Florence d’ Alché-Buc. Incremental support vector machine learning: A local approach. Proceedings of International Conference on Neural Networks. Austria: Vienna, 2001: 322~330.

[10] Shinya K, Shigeo A. Incremental training of support vector machine using hyperspheres. Pattern Recognition Letters, 2006(27): 1495~1507.

[11] Zhang J P, Zhao Y, Yang J. Incremental learning algorithm of support vector machine based on vector projection. Computer Science, 2008, 35(3): 164~166. (张健沛, 赵 莹, 杨 静. 基于向量投影的支持向量机增量算法. 计算机科学, 2008, 35(3): 164~166).

[12] Wen B, Shan G L, Duan X S. Research of incremental learning algorithm based on KKT conditions and hull vectors. Computer Sciences, 2013, 40(3): 255~258. (文 波, 单甘霖, 段修生. 基于KKT条件与壳向量的增量学习算法研究.计算机科学, 2013, 40(3): 255~258).

[13] Silvernab B W . Density estimation for statistics and data analysis. Chapman and Hall, 1986, 176.

[14] Mathias M A, Mohamed C. Help-training for semi-supervised support vector machines. Pattern Recognition, 2011, 44(9): 2220 ~2230.

[15] Wang M, Zhou X D, Xu H T, et al. Effective image auto-annotation via discriminative hyperplane treebased generative model. Journal of Software, 2009, 20(9): 2450~2461. (王 梅, 周向东, 许红涛等. 基于可判别超平面树的生成模型图象标注方法. 软件学报, 2009, 20(9): 2450~2461).

[16] Cao L L, Chen S C. The weighted Laplacian classifier. Journal of Nanjing University (Natural Sciences), 2012, 48(4): 459~465. (曹连连, 陈松灿. 加权Laplacian分类器. 南京大学学报(自然科学), 2012, 48(4): 459~465).

[17] Shawkat A , Smith-Miles K A. A meta-learning approach to automatic kernel selection for support vector machines. Neural Networks, 2006, 24(1-3): 173 ~186.

[18] Yang X, Xiong H L, Yang X. Optimal gaussian kernal parameter selection for SVM classifier. IEICE Transactions on Information and Systems, 2010, E93-D(12): 3352~3358.

[19] Wang W J, Xu Z B, Lu V Z, et al. Determination of the spread parameter in the Gaussian kernel for classification and regression. Neurocomputing, 2003, 55(3-4): 643~663.

[20] Wang W J, Guo J L, Men C Q. An approach for kernel selection based on data distribution. LNAI 5009: Proceedings of Rough Sets and Knowledge Technology. Chengdu, Springer, 2008: 596 ~603.

[21] Wu T, He H G, He M K. Interpolation based Kernel function’s construction. Chinese Journal of Computers, 2003, 26(8): 990~996. (吴 涛, 贺汉根, 贺明科. 基于插值的核函数构造. 计算机学报, 2003, 26(8): 990~996).

[22] http://archive.ics.uci.edu/ml, 2012-07.

[23] S ang N, Z hang R , Z hang T X. Incremental learning algorithm of a modified minimum-distance classifier. Pattern R ecognition and Artificial Intelligence, 2007, 20(3):358~364. (桑 农, 张 荣, 张天序. 一类改进的最小距离分类器的增量学习算法. 模式识别与人工智能, 2007, 20(3):358~364).
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!