南京大学学报(自然科学版) ›› 2016, Vol. 52 ›› Issue (2): 343.
李 坤1,2,3,刘 鹏2,3*,吕雅洁1,2,张国鹏2,3,黄宜华4
Li Kun1,2,3,Liu Peng2,3*,Lv Yajie1,2,Zhang Guopeng2,3,Huang Yihua4
摘要: 利用Spark集群设计LIBSVM参数优选的并行化实现.LIBSVM是一款广泛使用的SVM软件包,广泛应用于模型搭建、样本训练和结果预测等方面.在用LIBSVM训练数据集时,参数的选择对训练结果影响显著,其中以参数C和g最为重要.LIBSVM软件包中采用网格搜索算法对C、g参数组合进行寻优,尽管该算法在单机上实现了并行化,但当数据量达到一定程度时,仍需要花费大量的时间.基于Spark并行计算架构,进行了LIBSVM的C、g参数网格优选并行算法的设计与实现.实验结果表明,提出的并行粗粒度网格搜索C、g参数优选算法比传统算法速度提升了近7倍,而且这一提升将随着集群规模的扩大而进一步加大.另一方面,在粗粒度网格搜索的基础上,进而提出的细粒度并行网格搜索算法又进一步提升了C、g参数组合的优选结果.
[1] Chang C C,Lin C J.LIBSVM:A library for support vector machines.ACM Transactions on Intelligent Systems and Technology,2011,2(3):75-102. [2] Zaharia M,Chowdhury M,Das T,et al.Resilient distributed datasets:A faulttolerant abstraction for inmemory cluster computing.In:Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation.Berkeley:USENIX Association,2012,2-16. [3] 高彦杰.Spark大数据处理技术、应用与性能优化.北京:机械工业出版社,2014,3-56. [4] Ichihashi H,Honda K,Notsu A.Comparison of scaling behavior between fuzzy cmeans based classifier with many parameters and LibSVM.Fuzzy Systems,2011,35(2):386-393. [5] Joseph S M,Hameed A.Online handwritten malayalam character recognition using LIBSVM in matlab.Australian Computer Society,2014,15(1):21-25. [6] 刘天祥,包腾飞,宋锦焘等.基于遗传算法的LIBSVM模型大坝扬压力预测研究.三峡大学学报,2013,35(6):24-28.(Liu T X,Bao T F,Song J T,et al.Study of LIBSVM model based on GA optimization in uplift pressure forecasting of dam.Journal of China Three Gorges University(Natural Sciences),2013,35(6):24-28.) [7] 吴 浩,李群湛,刘 炜.基于PSOLIBSVM的广域后备保护新算法.电力系统保护与控制,2013,41(15):49-58.(Wu H,Li Q Z,Liu W.A new algorithm of widearea backup protection based on PSOLIBSVM.Power System Protection and Control,2013,41(15):49-58.) [8] 卢洪波,王金龙.基于LIBSVM和智能算法的电站锅炉飞灰含碳量优化.东北电力大学学报,2014,34(1):16-20.(Lu H B,Wang J L.Optimizing the exhaust carbon contented of fly ash of power station boiler based on LIBSVM and artificial intelligence algorithm.Journal of Northeast China Institute of Electric Power Engineering,2014,34(1):16-20.) [9] 刘志强,顾 荣,袁春风等.基于SparkR的分类算法并行化研究.计算机科学与探索,2015.优先出版.DOI:10.3778/j.issn.1673-9418.1503036.(Liu Z Q,Gu R,Yuan C F,et al.Parallelization of classification algorithms based on Spark R.Journal of Frontiers of Computer Science & Technology,DOI:10.3778/j.issn.1673-9418.1503036.) [10] 邱荣财.基于Spark平台的CURE算法并行化设计与应用.硕士学位论文.广州:华南理工大学,2014.(Qiu R C.The parallel design and application of the CURE algorithm based on Spark platform.Master Dissertation.Guangzhou:South China University of Technology,2014.) [11] 唐振坤.基于Spark的机器学习平台设计与实现.硕士学位论文.厦门:厦门大学,2014.(Tang Z K.Design and implementation of machine learning platform based on Spark.Master Dissertation.Xiamen:Xiamen University,2014.) [12] 肖佳林,赵聿晴,王 英.基于HMM与SVM的语音活动检测.计算机工程,2014,40(1):203-208.(Xiao J L,Zhao Y Q,Wang Y.Voice activity detection based on HMM and SVM.Computer Engineering,2014,40(1):203-208.) [13] 刘 爽.支持向量机在自动文本分类中的应用.大连:大连海事大学出版社,2014,75-106. [14] 翟俊海,王婷婷,王熙照.一种改进的样例约简支持向量机.南京大学学报(自然科学),2013,49(5):596-602.(Zhai J H,Wang T T,Wang X Z.An mproved instance reduction support vector machine.Journal of Nanjing University(Natural Sciences),2013,49(5):596-602.) [15] 纪昌明,周 婷,向腾飞等.基于网格搜索和交叉验证的支持向量机在梯级水电系统隐随机调度中的应用.电力自动化设备,2014,34(3):125-131.(Ji C M,Zhou T,Xiang T F,et al.Application of support vector machine based on grid search and cross validation in implicit stochastic dispatch of cascaded hydropower stations.Electric Power Automation Equipment,2014,34(3):125-131.) [16] 郑 晔,李 剑.Scala程序设计.北京:人民邮电出版社,2010,1-196. [17] 黄海旭,高宇翔.Scala编程.北京:电子工业出版社,2010,30-278. |
No related articles found! |
|