南京大学学报(自然科学版) ›› 2017, Vol. 53 ›› Issue (4): 815.
• • 上一篇
付康安1,郭虎升1,王文剑1,2*
Fu Kang’an1,Guo Husheng1,Wang Wenjian1,2*
摘要: 由于符号属性数据缺乏固有的几何特性,不能简单地将现有的数值属性数据分类算法应用于符号属性数据.为了提高符号属性数据的性能,提出一种基于关联关系分析的支持向量机分类方法(Support Vector Machine Classification Approach Based on Correlation Analysis,CA_SVM).通过分析属性值与标签之间的相关性,得到属性值对标签的影响因子;然后结合属性值在类内出现的频率,使得所有原始符号数据下的属性值在不失信息的情况下转换成数值型数据;转换后的数据既可以体现属性值与标签之间的关联关系,也可以有效地表示相同属性下属性值之间的距离;最后用支持向量机(Support Vector Machine,SVM)进行分类.在标准UCI数据集上的实验结果表明,CA_SVM模型能够提高分类精度.
[1] Han J W,Kamber M,Pei J.数据挖掘:概念与技术.范 明,孟小峰译.第3版.北京:机械工业出版社,2012,496. [2] 周志华.机器学习.北京:清华大学出版社,2016,425. [3] Huang Z X.A fast clustering algorithm to cluster very large categorical data sets in data mining.In:Proceedings of the SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery.Vancouver,Canada:The University of British Columbia,1998:1-8. [4] Ng M K,Li M J,Huang J Z,et al.On the impact of dissimilarity measure in K-modes clustering algorithm.IEEE Transactions on Pattern Analysis and Machine Intelligence,2007,29(3):503-507. [5] Li C,Biswas G.Unsupervised learning with mixed numeric and nominal data.IEEE Transactions on Knowledge and Data Engineering,2002,14(4):673-690. [6] 梁吉业,白 亮,曹付元.基于新的距离度量的K-modes聚类算法.计算机研究与发展,2010,47(10):1749-1755.(Liang J Y,Bai L,Cao F Y.K-modes clustering algorithm based on a new distance measure.Journal of Computer Research and Development,2010,47(10):1749-1755.) [7] Bai L,Liang J Y,Dang C Y,et al.The impact of cluster representatives on the convergence of the K-modes type clustering.IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(6):1509-1522. [8] Qian Y H,Li F J,Liang J Y,et al.Space structure and clustering of categorical data.IEEE Transactions on Neural Networks and Learning Systems,2016,27(10):2047-2059. [9] Guo T,Ding X W,Li Y F.Parallel K-modes algorithm based on MapReduce.In:Proceedings of the 2015 3rd International Conference on Digital Information,Networking,and Wireless Communications.Moscow,Russia:IEEE,2015:176-179. [10] Quinlan J R.Induction of decision trees.Machine Learning,1986,1(1):81-106. [11] Chen L F,Guo G D.Nearest neighbor classification of categorical data by attributes weighting.Expert Systems with Applications,2015,42(6):3142-3149. [12] Cortes C,Vapnik V.Support vector networks.Machine Learning,1995,20(3):273-297. [13] UCI Machine Learning Repository.http://archive.ics.uci.edu/ml,2016-03. [14] Kapp M N,Sabourin R,Maupin P.A dynamic model selection strategy for support vector machine classifiers.Applied Soft Computing,2012,12(8):2550-2565. [15] 刘向东,骆 斌,陈兆乾.支持向量机最优模型选择的研究.计算机研究与发展,2005,42(4):576-581.(Liu X D,Luo B,Chen Z Q.Optimal model selection for support vector machines.Journal of Computer Research and Development,2005,42(4):576-581.) [16] Tian M,Wang W J.An efficient Gaussian kernel optimization based on centered kernel polarization criterion.Information Sciences,2015,322:133-149. |
No related articles found! |
|