南京大学学报(自然科学版) ›› 2012, Vol. 48 ›› Issue (2): 221–227.

• • 上一篇    下一篇

 一种基于模糊支持向量机软件模块缺陷检测算法*

 郭丽娜1**,杨杨2   

  • 出版日期:2015-05-26 发布日期:2015-05-26
  • 作者简介: (1.南京师范大学计算机科学与技术学院,南京,210046;
    2.南京师范大学强化培养学院,南京,210046)
  • 基金资助:
     国家自然科学基金(80873 176),江苏省自然科学基金重点重大专项(20 11BK005),江苏省自然科学基金而上项
    日(2011BK782)

 A classification algorithm of defect prediction for software modules
based on fuzzy support vector machine

 Guo Li Na 1 ,Yang Yang 2
  

  • Online:2015-05-26 Published:2015-05-26
  • About author: (1 .School of Computer Science and Technology, Nanjing Normal University, Nanjing, 210046,China
    2. Honor School,Nanjing Normal University, Nanjing, 210046,China)

摘要:  不平衡数据的分类问题是机器学习研究领域的重要问题,有着广泛的应用,如软件模块缺陷检测.基于支持向量机的不平衡数据分类方法是主流的分类方法之一,受到研究者广泛的关注.木文在己有的基于模糊支持向量机的不平衡数据分类方法的基础上,结合抽样技术,提出了基于模糊支持向量机的不平衡数据分类算法和基于模糊支持向量机的不平衡数据分类集成算法.在NASA的两个软件模块缺陷度量数据集CMl和KC3上的实验结果表明了木文新提出算法的有效性.

Abstract: Classification problem on imbalanced data is a key issue in the machine learning field,obtaining data is unbalanced in many real applications, such as the defect prediction for software modules.The classification methods based on support vector machine for imbalanced data is one of the effective classification approaches, many researchers focus on these methods. Duc to the software modules defect metric datasets have the characteristics, such as class imbalance and noise, the prediction models based on the normal support vector machine (SVM) can’t get satisfactory results.Therefore, in this paper, we make a relatively in-depth study on support vector machine for predicting software module defects. Based on the previously proposed fuzzy support vector machine for imbalanced
data classification(FSVM CIL),integrating sampling technology, in this paper we introduce two improved algorithms; Onc is FSVM CIL RUS, which combines FSVM CIL algorithm with random under sampling algorithm. Before building software module defect prediction models usingFSVM CIL, we balance the datasets
using random under sampling. And the other is an ensemble algorithm called FSVM丛aI. RBBag. This algorithm combines the FSVM CII. algorithm with roughly balanced bagging algorithm. Using FSVM CIL algorithm to build base classifiers, and then we ensemble the base classifiers to improve the prediction performance. Appling two algorithms to two NASA software module datasets CMl and KC3,and experimental results show the effectiveness of the newly proposed algorithms, and the combination between FSVM CIL. and the sampling technology or ensemble technology can improve the performance of the module.

[1]Wang Q, Wu S J,Li M S. Software defect pre diction. Journal of Software,2008,19(7): 1565~1580.(土青,伍书剑,李明树.软件缺
陷预测技术.软件学报,2008,19(7);1565~1580).
[2]Munson J C,Khoshgoftaar T M.The detection of fault prone programs, IEEE Transactions on Software Engineering, 1992,18(5):123一133.
[3]Khoshgoftaar T M, Sceliya N, Improving use- fulness of software quality classification models
based on boolean discriminant functions. Soft- ware Reliability Engineering, 2002,221一230.
[4]Khoshgoftaar T M, Yuan X,Allen E B. Balan cing misclassification rates in classification-tree
models of software quality. Empirical Software Engineering, 2000,5(4):313一330.
[5]Cortes C,VaPnik V. Support vector networks. Machine Learning. 1995,20(2):273一295.
[6]Deng N Y,Tian Y J.The new method of data mining; support vector machine. Beijing; Sci- ence Publishing, 200通, 164-223.(邓乃扬,田
英杰.数据挖掘中的新方法:支持向量机.北京:科学出版社,200通 , 164-223).
[7]Xing F, Guo P,Lyu M R. A novel method for early software quality prediction based on sup-
port vector machine. Software Reliability Engi- neering, 2005,213一222.
[8]Gondra I. Applying machine learning to soft ware fault proneness prediction. Journal of Sys tans and Software,2008,81(2):186一195.
[9]Elfish K O,Elfish M O.Predicting defect prone software modules using support vector ma chines. Journal of Systems and Software,2008, 81(5):649一660.
[10]Swliya N, Khoshgoftaar T M, Van Hulse J. Predicting faults in high assurance software.
2010 IEEE 12th International Symposium on High Assurance Systems Engineering. IEEE Computer Science, 2010,26一34.
[11]Seiffert C,Khoshgoftaar T M, Van Hulsce J,et al. An Empirical study of the classification per- formance of learners on imbalanced and noisy software quality data. Proceedings of the 2007 ,IEEE International Conference on information Reuse and Integration, 2007,651一658.
[12]Khoshgoftaar T M,Gao K,Scliya N. Attribute selection and imbalanced data: Problems in soft- ware defect prediction. 2010 IEEE 22nd Interna-
tional Conference on Tools with Artificial lntel- ligence. IEEE Computer Science, 2010,137一144.
[13]Liu X Y,Wu J X, Zhou Z Y. A cascade-bascd classification method for class-imbalanced data. Journal of Nanjing University(Natural Sci- ences),2006,42(2):148一1 55.(刘青影,吴建鑫,周志华.一种基于级联模型的类别不平衡数据分类方法.南京大学学报(自然科学),2006,r12CW:148一155).
[14]Feng D C,Chen L R. Complexity-based soft- ware defect prediticon. Computer Engineering and Design, 201 1 , 32 (1) ; 213 ~217.冯大成,
陈丽荣.基于复杂性的软件缺陷预测.计算机工程与设计,2011,32(1);213-217).
[15]Lin C F, Wang S D. Fuzzy support vector ma- chines. IEEE Transactions on Nenral Net-works,2002,13(2):464~471
[16]Li G. Research on supervised learning of cost sensitive support vector machine. Master`s The- sis. Nanjing; Nanjing Normal University,
2007.(李刚.代价敏感的支持向量机监督学习研究.硕士学位论文.南京师范大学,2007).
[17]Zheng E H,Li P,Song Z H. Cost sensitive support vector machines. Control and Decision,
2006, 21(4); 473~476.(郑恩辉,李平,宋执环.代价敏感支持向量机.控制与决策,2006, 21(4):473一476).
[18]Muntean M, Ilcana l,Rotar C, et al. Impro- wing classification with cost sensitive approach and Support Vector Machine.The 9th RoEdu-
Net IEEE international Conference, 2010,180一185.
[19]Seiffert C, Khoshgoftaar T M,Van Hulse J. Improving software-quality predictions with data sampling and boosting, IEEE Transactions on
Systems Man and Cybernetics Part A-Systems and Humans, 2009,39(6):1283一1294.
[20]Rukshan B, Vasile P. FSVM-CIL; Fuzzy sup port vecto machines for class imbalance learn ing. IEEE "hransactions on Fuzzy Systems 2010,18(3):558一571.
[21]Hido S, Kashima H. Roughly balanced bagging for imbalanced data. Proceedings of the 8th SI- AM international Conference on Data Mining, 2008,143一152.
[22]Vivanco R,Kamcei Y,Mondcn A,et al. Using search-based metric selection and oversampling to predict fault prone modules.The 23rd Canadi-
an Conference on Electrical and Computer Engi- neering, 2010,1一6.







No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!