机器学习在蛋白质疏水相互作用模型研究中的应用

doi:10.13232/j.cnki.jnju.2023.06.002

南京大学学报(自然科学版) ›› 2023, Vol. 59 ›› Issue (6): 919–927.doi: 10.13232/j.cnki.jnju.2023.06.002

机器学习在蛋白质疏水相互作用模型研究中的应用

冯晨博, 马维强, 程润, 王骏()

南京大学物理学院，南京，210093

收稿日期:2023-09-27 出版日期:2023-11-30 发布日期:2023-12-06
通讯作者: 王骏 E-mail:wangj@nju.edu.cn
基金资助:
国家自然科学基金(11774157)

Application of machine learning in the study of the hydrophobic interaction model of proteins

Chenbo Feng, Weiqiang Ma, Run Cheng, Jun Wang()

School of Physics，Nanjing University，Nanjing，210093，China

Received:2023-09-27 Online:2023-11-30 Published:2023-12-06
Contact: Jun Wang E-mail:wangj@nju.edu.cn

摘要/Abstract

摘要：

疏水相互作用是一种十分复杂的非线性多体等效相互作用，在蛋白质折叠中发挥着主导作用，对蛋白质溶剂可及表面积（SASA）的分析是刻画该作用的重要手段.为了解决SASA解析或数值方法难以平衡计算成本和精确度的问题，将机器学习方法应用于蛋白质SASA的预测中.与传统的典型方法进行比较，该方法得到的结果，误差小了一个数量级，计算速度比解析方法提升了近两个数量级.将该方法拓展到基于蛋白质粗粒化结构的SASA预测上，也取得了良好的结果.该方法为蛋白质物理的研究提供了新的高效计算工具.

关键词: 蛋白质折叠, 疏水相互作用, 溶剂可及表面积（SASA）, 机器学习

Abstract:

Hydrophobic interaction is a nonlinear effective interaction with a highly complex many?body feature. This interaction plays a dominant role in protein folding. The solvent?accessible surface area (SASA) of proteins is a typical means to characterize this interaction. To solve the imbalance between computational cost and accuracy in the analytical or numerical methods of the SASA，in this work，we apply the machine learning method to the prediction of protein SASA. Compared with the traditional typical methods，the error is roughly one order smaller，and the calculation speed is nearly two orders faster. In addition，we extend this method to predict the SASA of proteins based on coarse?grained structures. Good predictions are also achieved. These results provide new efficient computational tools for the study of protein physics.

Key words: protein folding, hydrophobic interaction, solvent?accessible surface area （SASA）, machine learning

中图分类号:

Q615

冯晨博, 马维强, 程润, 王骏. 机器学习在蛋白质疏水相互作用模型研究中的应用[J]. 南京大学学报(自然科学版), 2023, 59(6): 919–927.

Chenbo Feng, Weiqiang Ma, Run Cheng, Jun Wang. Application of machine learning in the study of the hydrophobic interaction model of proteins[J]. Journal of Nanjing University(Natural Sciences), 2023, 59(6): 919–927.

图/表 12

图1

图2

图3

图4

图5

图6

图7

表1

图8

图9

图10

图11

参考文献 36

1	Bu?a J， D?urina J， Hayryan E，et al. ARVO：A Fortran package for computing the solvent accessible surface area and the excluded volume of overlapping spheres via analytic equations. Computer Physics Communications，2005，165(1)：59-96.
2	Yan Z Q， Wang J. Optimizing the affinity and specificity of ligand binding with the inclusion of solvation effect. Proteins：Structure，Function，and Bioinformatics，2015，83(9)：1632-1642.
3	Mennucci B， Tomasi J. Continuum solvation models：A new approach to the problem of solute's charge distribution and cavity boundaries. The Journal of Chemical Physics，1997，106(12)：5151-5158.
4	Eisenberg D， McLachlan A D. Solvation energy in protein folding and binding. Nature，1986，319(6050)：199-203.
5	Lee B， Richards F M. The interpretation of protein structures：Estimation of static accessibility. Journal of Molecular Biology，1971，55(3)：379?IN4.
6	Zou Z X， Chen K Y， Shi Z W，et al. Object detection in 20 years：A survey. Proceedings of the IEEE，2023，111(3)：257-276.
7	Huang S， Papernot N， Goodfellow I，et al. Adversarial attacks on neural network policies. 2017，arXiv:.
8	Hossain Z， Sohel F， Shiratuddin M F，et al. A comprehensive survey of deep learning for image captioning. ACM Computing Surveys，2019，51(6)：118.
9	Janai J， Güney F， Behl A，et al. Computer vision for autonomous vehicles：Problems，datasets and state of the art. Foundations and Trends^? in Computer Graphics and Vision，2020，12(1-3)：1-308.
10	Young T， Hazarika D， Poria S，et al. Recent trends in deep learning based natural language processing. IEEE Computational Intelligence Magazine，2018，13(3)：55-75.
11	Minaee S， Kalchbrenner N， Cambria E，et al. Deep learning?based text classification：A comprehensive review. ACM Computing Surveys，2022，54(3)：62.
12	Garbacea C， Mei Q Z. Neural language generation：Formulation，methods，and evaluation. 2020，arXiv：.
13	Tay Y， Dehghani M， Bahri D，et al. Efficient transformers：A survey. ACM Computing Surveys，2023，55(6)：109.
14	Silver D， Huang A， Maddison C J，et al. Mastering the game of Go with deep neural networks and tree search. Nature，2016，529(7587)：484-489.
15	Silver D， Schrittwieser J， Simonyan K，et al. Mastering the game of go without human knowledge. Nature，2017，550(7676)：354-359.
16	Justesen N， Bontrager P， Togelius J，et al. Deep learning for video game playing. IEEE Transactions on Games，2020，12(1)：1-20.
17	Vinyals O， Babuschkin I， Czarnecki W M，et al. Grandmaster level in StarCraft II using multi?agent reinforcement learning. Nature，2019，575(7782)：350-354.
18	Bakator M， Radosav D. Deep learning and medical diagnosis：A review of literature. Multimodal Technologies and Interaction，2018，2(3)：47.
19	De Fauw J， Ledsam J R， Romera?Paredes B，et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nature Medicine，2018，24(9)：1342-1350.
20	Chen H M， Engkvist O， Wang Y H，et al. The rise of deep learning in drug discovery. Drug Discovery Today，2018，23(6)：1241-1250.
21	Lavecchia A. Deep learning in drug discovery：Opportunities，challenges and future prospects. Drug Discovery Today，2019，24(10)：2017-2032.
22	Jumper J， Evans R， Pritzel A，et al. Highly accurate protein structure prediction with AlphaFold. Nature，2021，596(7873)：583-589.
23	Cramer P. AlphaFold2 and the future of structural biology. Nature Structural & Molecular Biology，2021，28(9)：704-705.
24	Nash W， Drummond T， Birbilis N. A review of deep learning in the study of materials degradation. npj Materials Degradation，2018，2(1)：37.
25	Agrawal A， Choudhary A. Deep materials informatics：Applications of deep learning in materials science. MRS Communications，2019，9(3)：779-792.
26	Baldi P， Sadowski P， Whiteson D. Searching for exotic particles in high?energy physics with deep learning. Nature Communications，2014，5(1)：4308.
27	Guest D， Cranmer K， Whiteson D. Deep learning and its application to LHC physics. Annual Review of Nuclear and Particle Science，2018(68)：161-181.
28	Zhang L F， Han J Q， Wang H，et al. Deep potential molecular dynamics：A scalable model with the accuracy of quantum mechanics. Physical Review Letters，2018，120(14)：143001.
29	Fox N K， Brenner S E， Chandonia J M. SCOPe：Structural Classification of Proteins?extended，integrating SCOP and ASTRAL data and classification of new structures. Nucleic Acids Research，2014，42(D1)：D304-D309.
30	Chandonia J M， Guan L， Lin S Y，et al. SCOPe：Improvements to the structural classification of proteins?extended database to facilitate variant interpretation and machine learning. Nucleic Acids Research，2022，50(D1)：D553-D559.
31	Hayryan S， Hu C K， Sk?ivánek J，et al. A new analytical method for computing solvent‐accessible surface area of macromolecules and its gradients. Journal of Computational Chemistry，2005，26(4)：334-343.
32	Fraczkiewicz R， Braun W. Exact and efficient analytical calculation of the accessible surface areas and their gradients for macromolecules. Journal of Computational Chemistry，1998，19(3)：319-333.
33	Mitternacht S. FreeSASA：An open source C library for solvent accessible surface area calculations Version 1. Peer Review：2 approved. F1000Research，2016(5)：189.
34	Center for Informational Biology，Ochanomizu University. Accessible surface area and accessibility calculation for protein. http:∥cib.cf.ocha.ac.jp/bitool/ASA/，2012.
35	Lam S K， Pitrou A， Seibert S. Numba：A LLVM?based python JIT compiler∥Proceedings of the 2^nd Workshop on the LLVM Compiler Infrastructure in HPC. Austin，TX，USA：ACM，Article No.7，2015.
36	Srivastava N， Hinton G， Krizhevsky A，et al. Dropout：A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research，2014，15(1)：1929-1958.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

[1]	李灏天, 刘晓宙, 何爱军. 基于机器学习和超声成像的缺陷识别与分析[J]. 南京大学学报(自然科学版), 2022, 58(4): 670-679.
[2]	杜渊洋, 邓成伟, 张建. 基于深度卷积神经网络的RNA三维结构打分函数[J]. 南京大学学报(自然科学版), 2022, 58(3): 369-376.
[3]	高菲, 杨柳, 李晖. 开放集识别研究综述[J]. 南京大学学报(自然科学版), 2022, 58(1): 115-134.
[4]	李苓玉, 刘治平. 基于机器学习的自发性早产生物标记物发现[J]. 南京大学学报(自然科学版), 2021, 57(5): 767-774.
[5]	贾霄, 郭顺心, 赵红. 基于图像属性的零样本分类方法综述[J]. 南京大学学报(自然科学版), 2021, 57(4): 531-543.
[6]	崔鹤, 刘昆, 瞿晓磊. 基于紫外⁃可见光谱和机器学习方法的溶解性有机质吸附预测模型研究[J]. 南京大学学报(自然科学版), 2021, 57(3): 356-363.
[7]	潘越,王骏,李文飞,张建,王炜. 基于卷积神经网络的蛋白质折叠类型最小特征提取[J]. 南京大学学报(自然科学版), 2020, 56(5): 744-753.
[8]	曹欣怡,李鹤,王蔚. 基于语料库的语音情感识别的性别差异研究[J]. 南京大学学报(自然科学版), 2019, 55(5): 758-764.
[9]	阚　威, 李　云. 基于LSTM的脑电情绪识别模型[J]. 南京大学学报(自然科学版), 2019, 55(1): 110-116.
[10]	朱亚奇1,邓维斌1 ,2*. 一种基于不平衡数据的聚类抽样方法[J]. 南京大学学报(自然科学版), 2015, 51(2): 421-429.
[11]	朱亚奇¹,邓维斌^1,2*. 一种基于不平衡数据的聚类抽样方法[J]. 南京大学学报(自然科学版), 2015, 51(2): 421-429.

机器学习在蛋白质疏水相互作用模型研究中的应用

Application of machine learning in the study of the hydrophobic interaction model of proteins

RichHTML

PDF (PC)

摘要/Abstract

引用本文

使用本文

图/表 12

参考文献 36

相关文章 11

Metrics

本文评价

推荐阅读 0