南京大学学报(自然科学版) ›› 2013, Vol. 49 ›› Issue (2): 150–158.

• • 上一篇    下一篇

 一种新的不平衡数据v - NSVDD多分类算法*

 刘小平1,徐桂云1,任世锦2**,杨茂云1·2
  

  • 出版日期:2015-11-03 发布日期:2015-11-03
  • 作者简介: (1.中国矿业大学机电工程学院,徐州,221116;2.江苏师范大学计算机学院,徐州,221116)
  • 基金资助:
     国家自然科学基金(60974056)

 A new unbalanced data v-NSVDD multiclass algorithm

 Liu Xiao一Ping1,Xu Gui-Yun1 ,Ren Shi一Jin2 ,Yanh Mao-Yun1.2   

  • Online:2015-11-03 Published:2015-11-03
  • About author: (1. School of Mechanical and Electrical Engineering,China University of Minim and Technology,Xuzhou,
    221116,China;2. School of Computer Science and Technology,Jiangsu Normal Univcrsity,Xuzhou,221116 , China)

摘要:  分析了多类支持向量数据描述(support vector data description,SVDD)算法存在的问题,提出
一种新的不平衡数据二一NSVDD多分类算法.该方法借鉴了二SVM方法以及带有负类的SVDD的思
想,并基于不同类别样木间隔最大原理,较好地克服噪声和在野点的影响,提高了分类模型的泛化性能;
通过样木加权的方法解决了不平衡类别样木预测精度低的问题,并在理论上给出了根据类别样木数量
设置样木加权系数的方法.针对实际应用存在大量复杂、非线性分类数据,通过核方法把上述线性分类
算法推]’一到非线性数据分类情形.由于现有的多分类器无法实现拒判,而且每个分类器的核函数参数不
同,导致数据点与各个超球中心距离的计算结果与实际距离不相符,影响了数据判决结果的准确性和可
靠性.针对上述问题,给出基于相对距离和K-NN规则相结合的多分类方法,提高了分类结果的准确性
和可靠性.使用Benchmark数据集进行仿真实验,结果表明木算法能够获得较低的分类误差,能够有效
处理样木不平衡问题.

Abstract:  Based on analysis of the problems of state of the art support vector data description(SVDD)methods for multi-
class classification,a new unbalance data v-NSVDD algorithm for multiclass classification is proposed. lnpired from the idea
of rrSVM,SVDD with negative samples(NSVDD)and classification margin maximum principle,the method can not only ef-
fectivcly reduce the perturbances of noises and outlicrs,but also correct the problem existed in optimizztion problem in Ref
[8],which considerably improves the generalization performance of the proposed algrithm. The training samples are weigh-
ted to deal with the unbalanced data classification problem,and also the weight coefficients can be conveniently calculated in
theory according to the number of each class samples. Zaking into account of complexity and nonlincarity of the classfication
samples in many practical applications,we extend the proposed linear algorithm to nonlinear cases by means of kernel trick.
Since many nonlinear unbalanced data v-NSVDD models are needed for multiclass data and each model is seperatcly devel-
oped,the corresponding kernel parameters are very different. As a result,distences between a fixed sample and hypersphere
centers in different reproduction kernel hilbert spaces(RKHSs)are not inconsistent with pratical distances. Furtherly,most
existing multiclass classification methods are lack of rejection decision, which can impair the classification performance and
reliability of decision. Relative distance measure is first proposed to achieve the consistent distances in RKHSs,and then a
multiclass classification method is developed by combining relative distance with K-NN calssfication rule to deal with the a-
bove problems.The benchmark testing results show that the proposed method can provide lower classification errors and deal
with unbalanced data problem.

[1]Huang G X,Chen H F, Zhou Z L, et al. Two- class support vector data description. Pattern Recognition, 2011,44:320一329.
[2]Zhu M L,Liu X D,Chen S F. Solving the problem of multi class pattern recognition with sphere-
structured support vector machines. Journal of Nanjing University(Natural Sciences),2003,39
(2) ; 153-158.(朱美琳,刘向东,陈世福.用球结构的支持向量机解决多分类问题.南京大学学报 (自然科学),2003,39(2);153-158).
[3]Issam B K,Claus W, Mohamcd L. Kernel k- means clustering based local support vector do-
main description fault detection of multimodal processes. Expert Systems with Applications,2012,39:2166一2171.
[4]Sakla W , Chan A,Ji J,et al. An SVDD-based al gorithm for target detection in hyperspectral im
agery. IEEE Transactions on Geoscience and Re mote Sensing Letters,2011,8(2):384一388.
[5]David M J T,Duin R P W. Support vector data description. Machine Learning,2004,54.45一66.
[6]Guo S M, Chen L C,Tsai J S H. A boundary method for outlier detection based on support
vector domain description. Pattern Recognition, 2009,42:77一83.
[7]Zhu X K,Yang D G. A multiclass support vector domain description for pattern recognition based
on a measure of expansibility. Acta Electronica Sinica,2009,37(3):464-469.(朱孝开,杨德贵.
基于推广能力测度的多类SVDD模式识别方法. 电子学报, 2009,37(3):464-469).
[8]Mu T T,Nandi A K. Multiclass classification based on extended support vector data description, IEEE
Transactions on System, Man, and Cybernetics一Part B;Cybernetics,2009,39(5):1206一1212.
[9]Daewon L,Jaewook L. Domain described support vector classifier for multi-classification problems.
Pattern Recognition, 2007 , 40 ; 41一51.
[10]Tao Q, Wu G, Wang J. A new maximum maegin algorithm for one-class problems and its boosting
implemention. Pattern Recognition, 2005,38 (10):1071一1077.
[11]Wei X K,Lofberg J,Feng Y,et al. Enclosing ma- chine learning for class description. Lecture Notes
in Computer Science. Springer-Verlag, 2007,4491:424一433.
[12]Dolia A N, Harris C J,Shawe T J,et al. Kernel ellipsoidal trimming. Computational Statistic and
Data Analysis,2007,52(1):309一324.
[13]Feng A M, Xue H,Liu X J,et al. Enhanced one class support vector machine. Journal of Comput- er and Development,2008,45(11):1858一1864. (冯爱民,薛晖,刘学军等.增强型单类支持向 量机.计算机研究与发展,2008,45(11);1858 - 1864).
[14]Lee K Y,Kim D W, Lee K H,et al. Dcnsity-in-duced support vector data description, IEEE
Transactions on Neural Networks, 2007,18(1):284~289.
[15]Wang W L,Wang Z Y,Zheng J W,et al. Dcnsity-in- duced data description one-class classficr. Control and Decision, 2001 , 26 ( 11) ; 1665 -1669.(王万良,王震宇,郑建炜等.密度诱导型数据描述单类分类机.控制与决策,2011,26(11);1665-1669).
[16]Zhao F,Zhang J Y,Liu J. An optimization kernel algorithm for improving the performance of sup-
port vector domain description. Acta Automatica Sinca, 2008 , 34 ( 9 ) : 1122 - 1127.(赵峰,张军
英,刘敬.一种改善支撑向量域描述性能的核优化算法.自动化学报,2008, 34( 9 ) : 1122 - 1127).
[17]Liao Z M, Hu G Y,Zhao L W,et al. Support vec  for data description implemented in class-imbal-
ance learning. Journal of Applied Science- Electronics and information Engineering,2008,26
(1):79-84. (缪志敏,胡谷雨,丁力等.SVDD 在类别不平衡学习中的应用.应用科学学报,2008,26(1):79一84).
[18]Michied J S,Taylor C C. Machine learning; Neu- ral and statistical classification, http;//www.
ncc. up. pt/iacc/MI./sta tlog/data. html,2011- 03-12.
[19]Zheng F H,Xu H,Li P,et al. Mining knowlwdge from unblanced data based on v-Support vector
machine. Journal of Zhejiang University(Fngi- neering Science),2006,40(10):1682一1887.(郑
恩辉,许宏,李平等.基于v-SVM的不平衡数据挖掘研究.浙江大学学报(工学版),2006,40 (10):1682一1889).

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!