南京大学学报(自然科学版) ›› 2024, Vol. 60 ›› Issue (1): 1–11.doi: 10.13232/j.cnki.jnju.2024.01.001

• •    

基于多视图矩阵补全的蛋白受体功能预测

黄玮翔1, 丁季1, 刘夏栩1, 殷勤2, 兰闯闯1, 吴建盛1()   

  1. 1.南京邮电大学地理与生物信息学院,南京,210023
    2.南京邮电大学通信与信息工程学院,南京,210023
  • 收稿日期:2023-08-20 出版日期:2024-01-30 发布日期:2024-01-29
  • 通讯作者: 吴建盛 E-mail:jansen@njupt.edu.cn
  • 基金资助:
    国家自然科学基金(61872198);江苏省科技厅基础研究计划(BK20201378)

Predicting functions of protein receptors through multi⁃view matrix completion

Weixiang Huang1, Ji Ding1, Xiaxu Liu1, Qin Yin2, Chuangchuang Lan1, Jiansheng Wu1()   

  1. 1.School of Geographic and Biological Information,Nanjing University of Posts and Telecommunications,Nanjing,210023,China
    2.School of Telcommunication and Information Engineering,Nanjing University of Posts and Telecommunications,Nanjing,210023,China
  • Received:2023-08-20 Online:2024-01-30 Published:2024-01-29
  • Contact: Jiansheng Wu E-mail:jansen@njupt.edu.cn

摘要:

蛋白受体是细胞信号转导的重要组成部分,也是人类最重要的药物靶点,其中G蛋白偶联受体(G Protein Coupled Receptors,GPCRs)占绝大部分,目前市场上大约34%的药物都以GPCRs作为靶点.准确地注释GPCR蛋白的生物学功能对于理解它们涉及的生理过程及靶向药物发现至关重要,其中基因本体学(Gene Ontology,GO)是描述蛋白质功能最常用的方式,GPCR蛋白和GO都包含多个视图信息,有效利用这些信息可有效提升蛋白质功能的预测性能.因此,提出一种基于多视图的归纳矩阵补全方法MVIMC(Multi?View Inductive Matrix Completion)来预测GPCR蛋白的GO生物学功能.MVIMC有效利用了GPCR蛋白和GO标记视图信息,其中GPCR包含文本信息和结构域信息,GO包含文本信息.实验结果表明,MVIMC在分子功能和生物过程两方面的预测概率分别达到68%和69%,优于目前最好的矩阵补全方法以及CAFA蛋白质功能预测比赛中的常用方法.

关键词: G蛋白偶联受体, 基因本体, 矩阵补全, 多视图学习

Abstract:

Protein receptors are important component of cellular signal transduction and the most important drug targets in humans,with G Protein Coupled Receptors (GPCRs) accounting for the vast majority. GPCRs involve the most important drug targets in humans,accounting for about 34% of drugs on the market. Accurately annotating biological functions of GPCR proteins is vital to understand physiological processes involved and for targeted drug discovery,with Gene Ontology (GO) being the most commonly used way to describe protein function. Both GPCR proteins and GO contain multiple view information,and effectively utilizing this information improves protein function prediction performance. Therefore,this paper proposes a multi?view inductive matrix completion method MVIMC (Multi?View Inductive Matrix Completion) for predicting GO functions of GPCR proteins. MVIMC effectively utilizes GPCR protein and GO label view information,with GPCR containing textual and domain information,and GO containing textual information. Experimental results show that MVIMC achieves prediction probabilities of 68% and 69% for molecular function and biological process,respectively,which are better than the best current matrix completion methods and common methods in the CAFA protein function prediction competition.

Key words: G Protein?Coupled Receptors (GPCRs), Gene Ontology, inductive matrix completion, multi?view learning

中图分类号: 

  • TP391.4

图1

不同的单视图和组合视图下GPCR蛋白的GO功能预测的比较:(a)分子功能;(b)生物学过程A.三联氨基酸信息;B.氨基酸关联信息;C.进化信息;D.GPCR文本信息;E.二级结构关联信息;F.物化属性;G.无序残基信息;H.信号肽信息;I.结构域文本"

图2

IMC组合视图方法的预测概率比较(生物过程)A.三联氨基酸信息;B.氨基酸关联信息;C.进化信息;D.GPCR文本信息;E.二级结构关联信息;F.物化属性;G.无序残基信息;H.信号肽信息;I.结构域文本"

图3

各视图的相关错误率比较A.三联氨基酸信息;B.氨基酸关联信息;C.进化信息;D.GPCR文本信息;E.二级结构关联信息;F.物化属性;G.无序残基信息;H.信号肽信息;I.结构域文本"

图4

各组合视图模型的相关错误率比较A.三联氨基酸信息;B.氨基酸关联信息;C.进化信息;D.GPCR文本信息;E.二级结构关联信息;F.物化属性;G.无序残基信息;H.信号肽信息;I.结构域文本"

图5

不同矩阵补全算法的预测概率的比较"

图6

不同矩阵补全算法的相关错误率比较"

图7

不同多视图方法的预测概率比较"

图8

不同多视图方法的相关错误率比较"

图9

MVIMC算法与CAFA预测平台的性能比较"

1 Miller W E, Lefkowitz R J. Expanding roles for β?arrestins as scaffolds and adapters in GPCR signaling and trafficking. Current Opinion in Cell Biology200113(2):139-145.
2 Heng B C, Aubel D, Fussenegger M. An overview of the diverse roles of G?protein coupled receptors (GPCRs) in the pathophysiology of various human diseases. Biotechnology Advances201331(8):1676-1694.
3 Wu J S, Huang S J, Zhou Z H. Genome?wide protein function prediction through multi?instance multi?label learning. IEEE/ACM Transactions on Compu?tational Biology and Bioinformatics201411(5):891-902.
4 Folts C J, Giera S, Li T,et al. Adhesion g protein?coupled receptors as drug targets for neurological diseases. Trends in Pharmacological Sciences201940(4):278-293.
5 Huang G H. Computational models or methods for protein function prediction. Current Proteomics201916(5):352-353.
6 Ashburner M, Ball C A, Blake J A,et al. Gene ontology:Tool for the unification of biology. Nature Genetics200025(1):25-29.
7 Zhao Y W, Wang J, Chen J,et al. A literature review of gene function prediction by modeling gene ontology. Frontiers in Genetics2020,11:400.
8 Profiti G, Martelli P L, Casadio R. The bologna annotation resource (BAR 3.0):Improving protein functional annotation. Nucleic Acids Research201745(W1):W285-W290.
9 Yuan Q M, Xie J J, Xie J C,et al. Fast and accurate protein function prediction from sequence through pretrained language model and homology?based label diffusion. Briefings in Bioinformatics202324(3):bbad117.
10 Zhou N H, Jiang Y X, Bergquist T R,et al. The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens. Genome Biology201920(1):244.
11 You R H, Zhang Z H, Xiong Y,et al. GOLabeler:Improving sequence?based large?scale protein function prediction by learning to rank. Bioinformatics201834(14):2465-2473.
12 Hong J J, Luo Y C, Zhang Y,et al. Protein functional annotation of simultaneously improved stability,accuracy and false discovery rate achieved by a sequence?based deep learning. Briefings in Bioinformatics202021(4):1437-1447.
13 Lai B Q, Xu J B. Accurate protein function prediction via graph attention networks with predicted structure information. Briefings in Bioinformatics202223(1):bbab502.
14 Dhanuka R, Tripathi A, Singh J P. A semi?supervised autoencoder?based approach for protein function prediction. IEEE Journal of Biomedical and Health Informatics202226(10):4957-4965.
15 Swenson N, Krishnapriyan A S, Buluc A,et al. PersGNN:Applying topological data analysis and geometric deep learning to structure?based protein function prediction. arXiv: ,2020.
16 Smaili F Z, Tian S Y, Roy A,et al. QAUST:Protein function prediction using structure similarity,protein interaction,and functional motifs. Genomics,Proteomics & Bioinformatics,202119(6):998-1011.
17 Rojano E, Jabato F M, Perkins J R,et al. Assigning protein function from domain?function associations using DomFun. BMC Bioinformatics202223(1):43.
18 Gumerov V M, Zhulin I B. TREND:A platform for exploring protein function in prokaryotes based on phylogenetic,domain architecture and gene neighborhood analyses. Nucleic Acids Research202048(W1):W72-W76.
19 Barot M, Gligorijevi? V, Cho K,et al. NetQuilt:Deep multispecies network?based protein function prediction using homology?informed network similarity. Bioinformatics202137(16):2414-2422.
20 Jagtap S, ?elikkanat A, Pirayre A,et al. BraneMF:Integration of biological networks for functional analysis of proteins. Bioinformatics202238(24):5383-5389.
21 Sengupta K, Saha S, Halder A K,et al. PFP?GO:Integrating protein sequence,domain and protein?protein interaction information for protein function prediction using ranked GO terms. Frontiers in Genetics2022,13:969915.
22 Wu Z R, Guo M Y, Jin X P,et al. CFAGO:Cross?fusion of network and attributes based on attention mechanism for protein function prediction. Bioinformatics202339(3):btad123.
23 Li Y M, Yang M, Zhang Z F. A survey of multi?view representation learning. IEEE Transactions on Knowledge and Data Engineering201931(10):1863-1883.
24 Lu R K, Liu J W, Lian S M,et al. Multi?view representation learning in multi?task scene. Neural Computing and Applications202032(14):10403-10422.
25 Wu J S, Yin Q, Zhang C X,et al. Function prediction for G protein?coupled receptors through text mining and induction matrix completion. ACS Omega20194(2):3045-3054.
26 吴建盛,冯巧遇,袁京洲,等. 基于快速多示例多标记学习的G蛋白偶联受体生物学功能预测. 计算机研究与发展201855(8):1674-1682.
Wu J S, Feng Q Y, Yuan J Z,et al. Predicting biological functions of G protein?coupled receptors based on fast multi?instance multi?label learning. Journal of Computer Research and Development201855(8):1674-1682.
27 Rong X. word2vec parameter learning explained. 2014,arXiv:.
28 Wei X S, Wu J X, Zhou Z H. Scalable algorithms for multi?instance learning. IEEE Transactions on Neural Networks and Learning Systems201728(4):975-987.
29 Marchler?Bauer A, Anderson J B, Chitsaz F,et al. CDD:Specific functional annotation with the conserved domain database. Nucleic Acids Research200937(S1):D205-D210.
30 Wu J S, Liu H D, Duan X Y,et al. Prediction of DNA?binding residues in proteins from amino acid sequences using a random forest model with a hybrid feature. Bioinformatics200925(1):30-35.
31 Wu J S, Hu D, Xu X,et al. A novel method for quantitatively predicting non?covalent interactions from protein and nucleic acid sequence. Journal of Molecular Graphics and Modelling2011,31:28-34.
32 Liu Z H, Meng J H, Sun X. A novel feature?based method for whole genome phylogenetic analysis without alignment:Application to HEV genotyping and subtyping. Biochemical and Biophysical Research Communications2008368(2):223-230.
33 Camacho C, Coulouris G, Avagyan V,et al. BLAST+:Architecture and applications. BMC Bioinformatics200910(1):421.
34 McGuffin L, Bryson K, Jones D T. The PSIPRED protein structure prediction server. Bioinformatics200016(4):404-405.
35 Hammami R, Zouhir A, Naghmouchi K,et al. SciDBMaker:New software for computer?aided design of specialized biological databases. BMC Bioinformatics20089(1):121.
36 Jones D T, Cozzetto D. DISOPRED3:Precise disordered region predictions with annotated protein?binding activity. Bioinformatics201531(6):857-863.
37 Petersen T N, Brunak S, Heijne von,et al. SignalP 4.0:Discriminating signal peptides from trans?membrane regions. Nature Methods20118(10):785-786.
38 Singh?Blom U M, Natarajan N, Tewari A,et al. Prediction and validation of gene?disease associations using methods inspired by social network analyses. PLoS One20138(5):e58977.
39 Lin Z C, Chen M M, Ma Y. The augmented lagrange multiplier method for exact recovery of corrupted low?rank matrices. 2010,arXiv:.
40 Ma S Q, Goldfarb D, Chen L F. Fixed point and Bregman iterative methods for matrix rank mini?mization. Mathematical Programming2011128(1-2):321-353.
41 Wen Z W, Yin W T, Zhang Y. Solving a low?rank factorization model for matrix completion by a nonlinear successive over?relaxation algorithm. Mathematical Programming Computation20124(4):333-361.
42 Lei Y W, Zhou D X. Analysis of singular value thresholding algorithm for matrix completion. Journal of Fourier Analysis and Applications201925(6):2957-2972.
43 Sánchez J, Perronnin F, Mensink T,et al. Image classification with the fisher vector:Theory and practice. International Journal of Computer Vision2013105(3):222-245.
44 Xu M, Jin R, Zhou Z H. Speedup matrix completion with side information:Application to multi?label learning∥Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe,NV,USA:Curran Associates Inc.,2013:2301-2309.
45 Radivojac P, Clark W T, Oron T R,et al. A large?scale evaluation of computational protein function prediction. Nature Methods201310(3):221-227.
[1] 孟元, 张轶哲, 张功萱, 宋辉. 基于特征类内紧凑性的不平衡医学图像分类方法[J]. 南京大学学报(自然科学版), 2023, 59(4): 580-589.
[2] 谭嘉辰, 董永权, 张国玺. SSM: 基于孪生网络的糖尿病视网膜眼底图像分类模型[J]. 南京大学学报(自然科学版), 2023, 59(3): 425-434.
[3] 张绎凡, 李婷, 葛洪伟. 多样性诱导的潜在嵌入多视图聚类[J]. 南京大学学报(自然科学版), 2023, 59(3): 388-397.
[4] 宋耀莲, 殷喜喆, 杨俊. 基于时空特征学习Transformer的运动想象脑电解码方法[J]. 南京大学学报(自然科学版), 2023, 59(2): 313-321.
[5] 马学森, 马吉, 蒋功辉, 许雪梅, 周天保. 基于注意力机制和多尺度特征融合的绝缘子缺陷检测方法[J]. 南京大学学报(自然科学版), 2022, 58(6): 1020-1029.
[6] 宋鹏, 葛洪伟, 乔宇鑫. 加权最近邻分配的局部间隙密度聚类[J]. 南京大学学报(自然科学版), 2022, 58(5): 827-835.
[7] 梁纬, 逯洋, 王淳, 张桂杰. 尺度选择完备局部导数模式及其在热轧带钢图像分类中的应用研究[J]. 南京大学学报(自然科学版), 2022, 58(4): 615-628.
[8] 董煜阳, 龚安民, 丁鹏, 袁密桁, 王东庆, 伏云发. 一种新型结合下肢动觉运动想象和视觉运动想象的脑机接口[J]. 南京大学学报(自然科学版), 2022, 58(3): 460-468.
[9] 陈黎, 龚安民, 丁鹏, 伏云发. 基于欧式空间⁃加权逻辑回归迁移学习的运动想象EEG信号解码[J]. 南京大学学报(自然科学版), 2022, 58(2): 264-274.
[10] 蒋伟进, 孙永霞, 朱昊冉, 陈萍萍, 张婉清, 陈君鹏. 边云协同计算下基于ST⁃GCN的监控视频行为识别机制[J]. 南京大学学报(自然科学版), 2022, 58(1): 163-174.
[11] 薛峰, 李凡, 李爽, 李华锋. 基于域分离和对抗学习的跨域行人重识别[J]. 南京大学学报(自然科学版), 2021, 57(5): 715-723.
[12] 王发旺, 陈睿, 伏云发. 基于DBN和RF的跨被试情绪识别研究[J]. 南京大学学报(自然科学版), 2021, 57(4): 617-626.
[13] 黄鹤, 吴琨, 宋京, 王会峰, 茹锋, 郭璐. 融合全局与区域大气光值图的暗通道图像去雾方法[J]. 南京大学学报(自然科学版), 2021, 57(4): 551-565.
[14] 乔宇鑫, 葛洪伟. 自适应样本加权的多视图聚类算法[J]. 南京大学学报(自然科学版), 2021, 57(4): 544-550.
[15] 王晓琳, 赵磊, 张维, 伏云发. 基于HHT和核函数选择的情绪特征提取与识别[J]. 南京大学学报(自然科学版), 2021, 57(3): 502-511.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!