南京大学学报(自然科学版) ›› 2024, Vol. 60 ›› Issue (1): 111.doi: 10.13232/j.cnki.jnju.2024.01.001
• •
黄玮翔1, 丁季1, 刘夏栩1, 殷勤2, 兰闯闯1, 吴建盛1()
Weixiang Huang1, Ji Ding1, Xiaxu Liu1, Qin Yin2, Chuangchuang Lan1, Jiansheng Wu1()
摘要:
蛋白受体是细胞信号转导的重要组成部分,也是人类最重要的药物靶点,其中G蛋白偶联受体(G Protein Coupled Receptors,GPCRs)占绝大部分,目前市场上大约34%的药物都以GPCRs作为靶点.准确地注释GPCR蛋白的生物学功能对于理解它们涉及的生理过程及靶向药物发现至关重要,其中基因本体学(Gene Ontology,GO)是描述蛋白质功能最常用的方式,GPCR蛋白和GO都包含多个视图信息,有效利用这些信息可有效提升蛋白质功能的预测性能.因此,提出一种基于多视图的归纳矩阵补全方法MVIMC(Multi?View Inductive Matrix Completion)来预测GPCR蛋白的GO生物学功能.MVIMC有效利用了GPCR蛋白和GO标记视图信息,其中GPCR包含文本信息和结构域信息,GO包含文本信息.实验结果表明,MVIMC在分子功能和生物过程两方面的预测概率分别达到68%和69%,优于目前最好的矩阵补全方法以及CAFA蛋白质功能预测比赛中的常用方法.
中图分类号:
1 | Miller W E, Lefkowitz R J. Expanding roles for β?arrestins as scaffolds and adapters in GPCR signaling and trafficking. Current Opinion in Cell Biology,2001,13(2):139-145. |
2 | Heng B C, Aubel D, Fussenegger M. An overview of the diverse roles of G?protein coupled receptors (GPCRs) in the pathophysiology of various human diseases. Biotechnology Advances,2013,31(8):1676-1694. |
3 | Wu J S, Huang S J, Zhou Z H. Genome?wide protein function prediction through multi?instance multi?label learning. IEEE/ACM Transactions on Compu?tational Biology and Bioinformatics,2014,11(5):891-902. |
4 | Folts C J, Giera S, Li T,et al. Adhesion g protein?coupled receptors as drug targets for neurological diseases. Trends in Pharmacological Sciences,2019,40(4):278-293. |
5 | Huang G H. Computational models or methods for protein function prediction. Current Proteomics,2019,16(5):352-353. |
6 | Ashburner M, Ball C A, Blake J A,et al. Gene ontology:Tool for the unification of biology. Nature Genetics,2000,25(1):25-29. |
7 | Zhao Y W, Wang J, Chen J,et al. A literature review of gene function prediction by modeling gene ontology. Frontiers in Genetics,2020,11:400. |
8 | Profiti G, Martelli P L, Casadio R. The bologna annotation resource (BAR 3.0):Improving protein functional annotation. Nucleic Acids Research,2017,45(W1):W285-W290. |
9 | Yuan Q M, Xie J J, Xie J C,et al. Fast and accurate protein function prediction from sequence through pretrained language model and homology?based label diffusion. Briefings in Bioinformatics,2023,24(3):bbad117. |
10 | Zhou N H, Jiang Y X, Bergquist T R,et al. The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens. Genome Biology,2019,20(1):244. |
11 | You R H, Zhang Z H, Xiong Y,et al. GOLabeler:Improving sequence?based large?scale protein function prediction by learning to rank. Bioinformatics,2018,34(14):2465-2473. |
12 | Hong J J, Luo Y C, Zhang Y,et al. Protein functional annotation of simultaneously improved stability,accuracy and false discovery rate achieved by a sequence?based deep learning. Briefings in Bioinformatics,2020,21(4):1437-1447. |
13 | Lai B Q, Xu J B. Accurate protein function prediction via graph attention networks with predicted structure information. Briefings in Bioinformatics,2022,23(1):bbab502. |
14 | Dhanuka R, Tripathi A, Singh J P. A semi?supervised autoencoder?based approach for protein function prediction. IEEE Journal of Biomedical and Health Informatics,2022,26(10):4957-4965. |
15 | Swenson N, Krishnapriyan A S, Buluc A,et al. PersGNN:Applying topological data analysis and geometric deep learning to structure?based protein function prediction. arXiv: ,2020. |
16 | Smaili F Z, Tian S Y, Roy A,et al. QAUST:Protein function prediction using structure similarity,protein interaction,and functional motifs. Genomics,Proteomics & Bioinformatics,2021,19(6):998-1011. |
17 | Rojano E, Jabato F M, Perkins J R,et al. Assigning protein function from domain?function associations using DomFun. BMC Bioinformatics,2022,23(1):43. |
18 | Gumerov V M, Zhulin I B. TREND:A platform for exploring protein function in prokaryotes based on phylogenetic,domain architecture and gene neighborhood analyses. Nucleic Acids Research,2020,48(W1):W72-W76. |
19 | Barot M, Gligorijevi? V, Cho K,et al. NetQuilt:Deep multispecies network?based protein function prediction using homology?informed network similarity. Bioinformatics,2021,37(16):2414-2422. |
20 | Jagtap S, ?elikkanat A, Pirayre A,et al. BraneMF:Integration of biological networks for functional analysis of proteins. Bioinformatics,2022,38(24):5383-5389. |
21 | Sengupta K, Saha S, Halder A K,et al. PFP?GO:Integrating protein sequence,domain and protein?protein interaction information for protein function prediction using ranked GO terms. Frontiers in Genetics,2022,13:969915. |
22 | Wu Z R, Guo M Y, Jin X P,et al. CFAGO:Cross?fusion of network and attributes based on attention mechanism for protein function prediction. Bioinformatics,2023,39(3):btad123. |
23 | Li Y M, Yang M, Zhang Z F. A survey of multi?view representation learning. IEEE Transactions on Knowledge and Data Engineering,2019,31(10):1863-1883. |
24 | Lu R K, Liu J W, Lian S M,et al. Multi?view representation learning in multi?task scene. Neural Computing and Applications,2020,32(14):10403-10422. |
25 | Wu J S, Yin Q, Zhang C X,et al. Function prediction for G protein?coupled receptors through text mining and induction matrix completion. ACS Omega,2019,4(2):3045-3054. |
26 | 吴建盛,冯巧遇,袁京洲,等. 基于快速多示例多标记学习的G蛋白偶联受体生物学功能预测. 计算机研究与发展,2018,55(8):1674-1682. |
Wu J S, Feng Q Y, Yuan J Z,et al. Predicting biological functions of G protein?coupled receptors based on fast multi?instance multi?label learning. Journal of Computer Research and Development,2018,55(8):1674-1682. | |
27 | Rong X. word2vec parameter learning explained. 2014,arXiv:. |
28 | Wei X S, Wu J X, Zhou Z H. Scalable algorithms for multi?instance learning. IEEE Transactions on Neural Networks and Learning Systems,2017,28(4):975-987. |
29 | Marchler?Bauer A, Anderson J B, Chitsaz F,et al. CDD:Specific functional annotation with the conserved domain database. Nucleic Acids Research,2009,37(S1):D205-D210. |
30 | Wu J S, Liu H D, Duan X Y,et al. Prediction of DNA?binding residues in proteins from amino acid sequences using a random forest model with a hybrid feature. Bioinformatics,2009,25(1):30-35. |
31 | Wu J S, Hu D, Xu X,et al. A novel method for quantitatively predicting non?covalent interactions from protein and nucleic acid sequence. Journal of Molecular Graphics and Modelling,2011,31:28-34. |
32 | Liu Z H, Meng J H, Sun X. A novel feature?based method for whole genome phylogenetic analysis without alignment:Application to HEV genotyping and subtyping. Biochemical and Biophysical Research Communications,2008,368(2):223-230. |
33 | Camacho C, Coulouris G, Avagyan V,et al. BLAST+:Architecture and applications. BMC Bioinformatics,2009,10(1):421. |
34 | McGuffin L, Bryson K, Jones D T. The PSIPRED protein structure prediction server. Bioinformatics,2000,16(4):404-405. |
35 | Hammami R, Zouhir A, Naghmouchi K,et al. SciDBMaker:New software for computer?aided design of specialized biological databases. BMC Bioinformatics,2008,9(1):121. |
36 | Jones D T, Cozzetto D. DISOPRED3:Precise disordered region predictions with annotated protein?binding activity. Bioinformatics,2015,31(6):857-863. |
37 | Petersen T N, Brunak S, Heijne von,et al. SignalP 4.0:Discriminating signal peptides from trans?membrane regions. Nature Methods,2011,8(10):785-786. |
38 | Singh?Blom U M, Natarajan N, Tewari A,et al. Prediction and validation of gene?disease associations using methods inspired by social network analyses. PLoS One,2013,8(5):e58977. |
39 | Lin Z C, Chen M M, Ma Y. The augmented lagrange multiplier method for exact recovery of corrupted low?rank matrices. 2010,arXiv:. |
40 | Ma S Q, Goldfarb D, Chen L F. Fixed point and Bregman iterative methods for matrix rank mini?mization. Mathematical Programming,2011,128(1-2):321-353. |
41 | Wen Z W, Yin W T, Zhang Y. Solving a low?rank factorization model for matrix completion by a nonlinear successive over?relaxation algorithm. Mathematical Programming Computation,2012,4(4):333-361. |
42 | Lei Y W, Zhou D X. Analysis of singular value thresholding algorithm for matrix completion. Journal of Fourier Analysis and Applications,2019,25(6):2957-2972. |
43 | Sánchez J, Perronnin F, Mensink T,et al. Image classification with the fisher vector:Theory and practice. International Journal of Computer Vision,2013,105(3):222-245. |
44 | Xu M, Jin R, Zhou Z H. Speedup matrix completion with side information:Application to multi?label learning∥Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe,NV,USA:Curran Associates Inc.,2013:2301-2309. |
45 | Radivojac P, Clark W T, Oron T R,et al. A large?scale evaluation of computational protein function prediction. Nature Methods,2013,10(3):221-227. |
[1] | 孟元, 张轶哲, 张功萱, 宋辉. 基于特征类内紧凑性的不平衡医学图像分类方法[J]. 南京大学学报(自然科学版), 2023, 59(4): 580-589. |
[2] | 谭嘉辰, 董永权, 张国玺. SSM: 基于孪生网络的糖尿病视网膜眼底图像分类模型[J]. 南京大学学报(自然科学版), 2023, 59(3): 425-434. |
[3] | 张绎凡, 李婷, 葛洪伟. 多样性诱导的潜在嵌入多视图聚类[J]. 南京大学学报(自然科学版), 2023, 59(3): 388-397. |
[4] | 宋耀莲, 殷喜喆, 杨俊. 基于时空特征学习Transformer的运动想象脑电解码方法[J]. 南京大学学报(自然科学版), 2023, 59(2): 313-321. |
[5] | 马学森, 马吉, 蒋功辉, 许雪梅, 周天保. 基于注意力机制和多尺度特征融合的绝缘子缺陷检测方法[J]. 南京大学学报(自然科学版), 2022, 58(6): 1020-1029. |
[6] | 宋鹏, 葛洪伟, 乔宇鑫. 加权最近邻分配的局部间隙密度聚类[J]. 南京大学学报(自然科学版), 2022, 58(5): 827-835. |
[7] | 梁纬, 逯洋, 王淳, 张桂杰. 尺度选择完备局部导数模式及其在热轧带钢图像分类中的应用研究[J]. 南京大学学报(自然科学版), 2022, 58(4): 615-628. |
[8] | 董煜阳, 龚安民, 丁鹏, 袁密桁, 王东庆, 伏云发. 一种新型结合下肢动觉运动想象和视觉运动想象的脑机接口[J]. 南京大学学报(自然科学版), 2022, 58(3): 460-468. |
[9] | 陈黎, 龚安民, 丁鹏, 伏云发. 基于欧式空间⁃加权逻辑回归迁移学习的运动想象EEG信号解码[J]. 南京大学学报(自然科学版), 2022, 58(2): 264-274. |
[10] | 蒋伟进, 孙永霞, 朱昊冉, 陈萍萍, 张婉清, 陈君鹏. 边云协同计算下基于ST⁃GCN的监控视频行为识别机制[J]. 南京大学学报(自然科学版), 2022, 58(1): 163-174. |
[11] | 薛峰, 李凡, 李爽, 李华锋. 基于域分离和对抗学习的跨域行人重识别[J]. 南京大学学报(自然科学版), 2021, 57(5): 715-723. |
[12] | 王发旺, 陈睿, 伏云发. 基于DBN和RF的跨被试情绪识别研究[J]. 南京大学学报(自然科学版), 2021, 57(4): 617-626. |
[13] | 黄鹤, 吴琨, 宋京, 王会峰, 茹锋, 郭璐. 融合全局与区域大气光值图的暗通道图像去雾方法[J]. 南京大学学报(自然科学版), 2021, 57(4): 551-565. |
[14] | 乔宇鑫, 葛洪伟. 自适应样本加权的多视图聚类算法[J]. 南京大学学报(自然科学版), 2021, 57(4): 544-550. |
[15] | 王晓琳, 赵磊, 张维, 伏云发. 基于HHT和核函数选择的情绪特征提取与识别[J]. 南京大学学报(自然科学版), 2021, 57(3): 502-511. |
|