南京大学学报(自然科学版) ›› 2012, Vol. 48 ›› Issue (1): 63–69.

• • 上一篇    下一篇

 基于模糊积分多源数据融合的蛋白质功能预测*

 赵研,卢奕南**,权勇
  

  • 出版日期:2015-05-16 发布日期:2015-05-16
  • 作者简介: (吉林大学计算机科学与技术学院,长春,130012)
  • 基金资助:
     吉林省科技发展项目(20090501)

 Prediction of protein function via data fusion based on fuzzy measure

 Zhao Yan ,Lu Yi- Nan,Quan Yong   

  • Online:2015-05-16 Published:2015-05-16
  • About author: (College of Computer Science and Technology,Jilin University,Changchun, 130012,China)

摘要:  近年来多源数据融合成为蛋自质功能预测的一个热点,本文提出一种基于Choquet模糊积分的多源数据融合为一法对酵母蛋自进行预测.文中采用支持向量机做基础分类器对各个数据源进行预
测,输出概率形式的结果.使用粒子群算法确定模糊密度,基于Choquet模糊积分对每个数据源的结果进行融合.实验表明Choquet模糊积分蛋自质功能预测结果要明显优于传统的加权平均法、支持向量机
为一法和K近邻为一法.

Abstract:  Predicting the function of protein is one of the main issues in the post genomic period and the availability of large amounts of biological data makes it can be achieved. But in many cases the biological data obtained through
biotechnology have a high degree of noise and generally a single data source can only provide useful information for a subset of the protein function classes. So data fusion using diverse biological data to predict the protein function
arouses general interest in recent years. Compare with the common information fusion method of weighted average,fuzzy measure can reflect not only the importance of different objects,but also the interactions among objects. So in
this paper,Choquct fuzzy integral fusion based on fuzzy measure is used to integrate the probabilistic outputs of different classifiers. And the particle swarm algorithm is adopted to search the optimized values of fuzzy density
which is crucial for the fuzzy integral. Six data sets arc used in this paper.The first five data sets are collected from the open database or calculated by
the software of the open database and the last one is the union of the first five.Then the probabilistic support vector machines as base learners arc applied to predict the functions of examples from each data set.The Choquct fuzzy
integral method which based on the first five data sets’probabilistic outputs of the base learners will be applied. Comparison is made among the Choquct fuzzy integral method,weighted average method,support vector machines
method and K nearest neighbors method. The performances of these methods arc compared using tenfold cross- validation techniques.The experimental results show that the Choquct fuzzy integral method performs much better
and the data fusion methods which combine multiple types of biological data can substantially improve the results.

[1]Valentini G.True path rule hierarchical ecnsem b1es.Jon A B,.Ioscf K,Fabio R.The 8th Interna tional Workshop on Multiple Classifier Sys- tems. Lecture Notes in Computer Science. Springer-Verlag, 2009,5519;232一241.
[2]Liu R J,Yuan B Z,Tang X F. Multiple classifi- ers fusion algorithm with the fuzzy measures de- termined by genetic algorithm. Acta ElectronicaSinica, 2002 , 30 (1 ) ;
145- 147.(刘汝杰,袁保宗,唐晓芳.用遗传算法实现模糊测度赋值的一种多分类器融合算法.电子学报,2002,30(1); 145一147).
[3]Ruepp A,Zollner A,Maicr D, et al.The Fun Cat,a functional annotation scheme for system atic classification of proteins from whole ge nomes. Nucleic Acids Research, 2004,32: 5539一5545.
[4]Ashburner M, Ball C A,Blake J A,et al. Gene Ontology:Tool for the unification of biology. Nature Genetics,2000,25:25一29.
[5]Dempster A P. Upper and lower probabilities in duced by multivalued mapping. Annals of Math ematical Statistics.1967,38.325一339.
[6]Sugeno M.Theory of fuzzy integrals and its ap- plications. Ph. D.Thcsis. Japan;Tokyo institute of Technology,l974.
[7]Michel G, Marc R. Application of the Choquet integral in multicriteria decision making. Theory and Applications,2000,348一374.
[8]Yao H M,He T N. Dynamic combination mcth- od of multiple classifiers. Journal of Zhejiang University of Technology, 2002,30(2):156一159.(姚明海,何通能.一种基于模糊积分的多分类器联合为一法.浙江工业大学学报,2002,30 (2):156一169).
[9]Chen Y T, Wu B, Zhang G C. Regression model based on an generalized Choquct integral. Jour- nal of H ebci University(Natural Scicncc Edi- tion),
2010,30(4);353-360.(陈亚婷,吴博,张国春.基于一种推广的Choquet积分的回归模型.河北大学学报(自然科学),2010,30(4); 353一360).
[10]Kennedy J,Eberhart R. Particle swarm optimi- nation. Proceedings of IEEE international Con- ference on Neural Networks. Piscataway:IEEE Press,1995,1942一1948.
[11]Wei J X, Sun Y H,Su X N. A novel particle swarm optimization based on immune selection. Journal of Nanjing University(Natural Sci- ences) , 2010, 46(1):1一9.魏建香,孙越乱,苏新宁.一种基于免疫选择的粒子群优化算法.南京大学学报(自然科学),2010,46(1);1-9).
[12]Shi Y H,Eberhart R C. A modified particle swarm optimizer, IEEE international Conference on Evolutionary Computation. Piscataway;lEEE Press,1998,69一73.
[13]Gribskov M,Robinson N L. Use of receiver op- crating characteristic analysis to evaluate se qucnce matching. Computational Chemistry, 1996,20:25一33.
[14]Hanky J,McNeil B. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 1982,143(1);29一36.
[15]Robert D F, .Iaina M, John T,et al.The Pfam protein families database. Nucleic Acids Re search,2010,38:211一222
[16]Eddy S R. Profile hidden Markov models. Bioin formatics, 1998,14(9):755一763.
[17]Altschul S F,Madden T L,Schaffer A A,et al. Gapped BL AST and PSl-BLAST; A new gener anon of protein database search programs. Nu cleic Acids Research, 1997,25:3389一3102.
[18]Pavlidis P, Weston J,Cai J,et al. Learning gene functio nal classification from multiple data. Computational Biology, 2002,9:401一411.
[19]Davey N E,Shields D C, Edwards R J. SLiM- Disc;Short,linear motif discovery,correcting for common evolutionary descent. Nucleic Acids Re- search,2006,34(12):3516一3554.
[20]Jensen L J,Kuhn M,Stark M,et al. STRING 8: A global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Re search,2009,37:412一416.
[21]Paul T S, Gavin S, Michael Q Z,et al. Compre hensive identification of cell cycl}regulated genes of the yeast Saccharomices cerevisiae by microarray hybridization. Molecular Yiology of the Cell,1998,9:3273一3297.
[22]Gasch A P,Spellman P T,Kao C M,et al. Ge- nomic expression programs in the response of yeast cells to environmental changes. Molecular Biology of the Cell,2000,11;4241一4257.
[23]MIPS Comprehensive Yeast Genome Database.Functional Classification of Proteins.http//mips. helmholtz-muenchen.do/genre/proj/yest,2011-1-22
[24]Chih C C.Chih J L. LIBSVM;A library for sup port vector machines. http;//www.csie.ntu. edu. tw/一cjlin/libsvm,2001一5一26.
[25]Birge B. PSOt:A particle swarm optimization toolbox for use with Matlab. IEEE Swarm lntel-ligence Symposium Proceedings, Indianapolis; IEEE Prcss,2003,182一186.
[26]Ni Q S, Wang Z Z, Li G G, et al. Prediction of Protein Functions based on K nearest neighbors method. Journal of Biomedical Engineering Re- search,2009,28(2);87一90.(倪青山,王正志等.基于K近邻的蛋自质功能的预测为一法.牛物医学工程研究,2009,28(2);87-90).

















No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!