南京大学学报(自然科学版) ›› 2020, Vol. 56 ›› Issue (5): 744753.doi: 10.13232/j.cnki.jnju.2020.05.014
• • 上一篇
潘越1,王骏1,2,3(),李文飞1,2,张建1,2,王炜1,2()
Yue Pan1,Jun Wang1,2,3(),Wenfei Li1,2,Jian Zhang1,2,Wei Wang1,2()
摘要:
通过蛋白质的序列、结构等信息构建完整的蛋白质宇宙是生物信息学中的重要课题,相关研究对蛋白质结构预测、蛋白质进化路径分析以及蛋白质结构设计等方面的研究都有重要的意义.从蛋白质结构的一种简化表示——蛋白质接触图出发,通过训练卷积神经网络进行特征提取,筛选出可识别结构域折叠类型的最小特征向量,构建蛋白质折叠类型空间,并使用谱聚类等方法对不同蛋白质折叠类型的高维分布情况进行分析.得到的最小特征向量兼顾了信息的完整性与冗余度,可以很好地表示全部七种常见蛋白质类的空间关联.该研究结果填补了之前蛋白质宇宙研究中对不常见类的空间位置和相互关系描述的空白,加深了对于蛋白质结构相似性的理解.
中图分类号:
1 | Holm L,Sander C. Mapping the protein universe. Science,1996,273(5275):595-602. |
2 | Hou J T,Jun S R,Zhang C,et al. Global mapping of the protein structure space and application in struc?ture?based inference of protein function. Proceedings of the National Academy of Sciences of the United States of America,2005,102(10):3651-3656. |
3 | Caetano?Anollés G,Wang M L,Caetano?Anollés D,et al. The origin,evolution and structure of the protein world. Biochemical Journal,2009,417(3):621-637. |
4 | Woolfson D N,Bartlett G J,Burton A J,et al. De novo protein design:how do we expand into the universe of possible protein structures? Current Opinion in Structural Biology2015,33:16-26. |
5 | Murzin A G,Brenner S E,Hubbard T,et al. SCOP:a structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology,1995,247(4):536-540. |
6 | Dawson N L,Lewis T E,Das S,et al. CATH:an expanded resource to predict protein function through structure and sequence. Nucleic Acids Research,2017,45(D1):D289-D295. |
7 | Hou J T,Sims G E,Zhang C,et al. A global representation of the protein fold space. Proceedings of the National Academy of Sciences of the United States of America,2003,100(5):2386-2390. |
8 | Nepomnyachiy S,Ben?Tal N,Kolodny R. Global view of the protein universe. Proceedings of the National Academy of Sciences of the United States of America,2014,111(32):11691-11696. |
9 | Holm L,Sander C. Protein structure comparison by alignment of distance matrices. Journal of Molecular Biology,1993,233(1):123-138. |
10 | Krissinel E,Henrick K. Secondary?structure matching (SSM),a new tool for fast protein structure alignment in three dimensions. Acta Crystallo?graphica Section D:Structural Biology,2004,60(12):2256-2268. |
11 | Han X S,Sit A,Christoffer C,et al. A global map of the protein shape universe. PLoS Computational Biology,2019,15(4):e1006969. |
12 | Xu J R,Zhang J Z. Impact of structure space continuity on protein fold classification. Scientific Reports,2016,6:23263. |
13 | Fox N K,Brenner S E,Chandonia J M. SCOPe:structural classification of proteins?extended,integrating SCOP and ASTRAL data and classification of new structures. Nucleic Acids Research,2014,42(D1):D304-D309. |
14 | Vendruscolo M,Kussell E,Domany E. Recovery of protein structure from contact maps. Folding and Design,1997,2(5):295-306. |
15 | Bohr J,Bohr H,Brunak S,et al. Protein structures from distance inequalities. Journal of Molecular Biology,1993,231(3):861-869. |
16 | Noel J K,Whitford P C,Onuchic J N. The shadow map: a general contact definition for capturing the dynamics of biomolecular folding and function. The Journal of Physical Chemistry B,2012,116(29):8692-8702. |
17 | Krizhevsky A,Sutskever I,Hinton G E. ImageNet classification with deep convolutional neural networks. Communications of the ACM,2017,60(6):84-90. |
18 | LeCun Y,Bengio Y,Hinton G. Deep learning. Nature,2015,521(7553):436-444. |
19 | Wang S,Sun S Q,Li Z,et al. Accurate De Novo prediction of protein contact map by ultra?deep learning model. PLoS Computational Biology,2017,13(1):e1005324. |
20 | Zhu J W,Zhang H C,Li S C,et al. Improving protein fold recognition by extracting fold?specific features from predicted residue?residue contacts. Bioinformatics,2017,33(23):3749-3757. |
21 | Abadi M,Barham P,Chen J,et al. Tensorflow:A system for large?scale machine learning∥12th USENIX conference on operating systems design and implementation. Savannah,GA,USA:USENIX Association,2016:265-283. |
22 | Von Luxburg U. A tutorial on spectral clustering. Statistics and Computing,2007,17(4):395-416. |
23 | Rand W M. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association,1971,66(336):846-850. |
24 | Kabsch W,Sander C. Dictionary of protein secondary structure?pattern?recognition of hydrogen?bonded and geometrical features. Biopolymers,1983,22(12):2577-2637. |
25 | Tenenbaum J B,De Silva V,Langford J C. A global geometric framework for nonlinear dimensionality reduction. Science,2000,290(5500):2319-2323. |
26 | Lindahl E,Elofsson A. Identification of related proteins on family,superfamily and fold level. Journal of Molecular Biology,2000,295(3):613-625. |
27 | Osadchy M,Kolodny R. Maps of protein structure space reveal a fundamental relationship between protein structure and function. Proceedings of the National Academy of Sciences of the United States of America,2011,108(30):12301-12306. |
[1] | 朱伟,张帅,辛晓燕,李文飞,王骏,张建,王炜. 结合区域检测和注意力机制的胸片自动定位与识别[J]. 南京大学学报(自然科学版), 2020, 56(4): 591-600. |
[2] | 梅志伟,王维东. 基于FPGA的卷积神经网络加速模块设计[J]. 南京大学学报(自然科学版), 2020, 56(4): 581-590. |
[3] | 赵子龙,赵毅强,叶茂. 基于FPGA的多卷积神经网络任务实时切换方法[J]. 南京大学学报(自然科学版), 2020, 56(2): 167-174. |
[4] | 李康,谢宁,李旭,谭凯. 基于卷积神经网络和几何优化的统计染色体核型分析方法[J]. 南京大学学报(自然科学版), 2020, 56(1): 116-124. |
[5] | 王吉地,郭军军,黄于欣,高盛祥,余正涛,张亚飞. 融合依存信息和卷积神经网络的越南语新闻事件检测[J]. 南京大学学报(自然科学版), 2020, 56(1): 125-131. |
[6] | 韩普,刘亦卓,李晓艳. 基于深度学习和多特征融合的中文电子病历实体识别研究[J]. 南京大学学报(自然科学版), 2019, 55(6): 942-951. |
[7] | 张家精,夏巽鹏,陈金兰,倪友聪. 基于张量分解和深度学习的混合推荐算法[J]. 南京大学学报(自然科学版), 2019, 55(6): 952-959. |
[8] | 钟琪,冯亚琴,王蔚. 跨语言语料库的语音情感识别对比研究[J]. 南京大学学报(自然科学版), 2019, 55(5): 765-773. |
[9] | 王蔚, 胡婷婷, 冯亚琴. 基于深度学习的自然与表演语音情感识别[J]. 南京大学学报(自然科学版), 2019, 55(4): 660-666. |
[10] | 狄 岚, 何锐波, 梁久祯. 基于可能性聚类和卷积神经网络的道路交通标识识别算法[J]. 南京大学学报(自然科学版), 2019, 55(2): 238-250. |
[11] | 胡 太, 杨 明. 结合目标检测的小目标语义分割算法[J]. 南京大学学报(自然科学版), 2019, 55(1): 73-84. |
[12] | 安 晶, 艾 萍, 徐 森, 刘 聪, 夏建生, 刘大琨. 一种基于一维卷积神经网络的旋转机械智能故障诊断方法[J]. 南京大学学报(自然科学版), 2019, 55(1): 133-142. |
[13] | 梁蒙蒙1,周 涛1,2*,夏 勇3,张飞飞1,杨 健1. 基于随机化融合和CNN的多模态肺部肿瘤图像识别[J]. 南京大学学报(自然科学版), 2018, 54(4): 775-. |
[14] | 张鹏,黄毅,阮雅端,陈启美*. 基于稀疏特征的交通流视频检测算法[J]. 南京大学学报(自然科学版), 2015, 51(2): 264-270. |
|