Unknown

Dataset Information

0

Ensembling multiple raw coevolutionary features with deep residual neural networks for contact-map prediction in CASP13.


ABSTRACT: We report the results of residue-residue contact prediction of a new pipeline built purely on the learning of coevolutionary features in the CASP13 experiment. For a query sequence, the pipeline starts with the collection of multiple sequence alignments (MSAs) from multiple genome and metagenome sequence databases using two complementary Hidden Markov Model (HMM)-based searching tools. Three profile matrices, built on covariance, precision, and pseudolikelihood maximization respectively, are then created from the MSAs, which are used as the input features of a deep residual convolutional neural network architecture for contact-map training and prediction. Two ensembling strategies have been proposed to integrate the matrix features through end-to-end training and stacking, resulting in two complementary programs called TripletRes and ResTriplet, respectively. For the 31 free-modeling domains that do not have homologous templates in the PDB, TripletRes and ResTriplet generated comparable results with an average accuracy of 0.640 and 0.646, respectively, for the top L/5 long-range predictions, where 71% and 74% of the cases have an accuracy above 0.5. Detailed data analyses showed that the strength of the pipeline is due to the sensitive MSA construction and the advanced strategies for coevolutionary feature ensembling. Domain splitting was also found to help enhance the contact prediction performance. Nevertheless, contact models for tail regions, which often involve a high number of alignment gaps, and for targets with few homologous sequences are still suboptimal. Development of new approaches where the model is specifically trained on these regions and targets might help address these problems.

SUBMITTER: Li Y 

PROVIDER: S-EPMC6851483 | biostudies-literature | 2019 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Ensembling multiple raw coevolutionary features with deep residual neural networks for contact-map prediction in CASP13.

Li Yang Y   Zhang Chengxin C   Bell Eric W EW   Yu Dong-Jun DJ   Zhang Yang Y  

Proteins 20190822 12


We report the results of residue-residue contact prediction of a new pipeline built purely on the learning of coevolutionary features in the CASP13 experiment. For a query sequence, the pipeline starts with the collection of multiple sequence alignments (MSAs) from multiple genome and metagenome sequence databases using two complementary Hidden Markov Model (HMM)-based searching tools. Three profile matrices, built on covariance, precision, and pseudolikelihood maximization respectively, are the  ...[more]

Similar Datasets

| S-EPMC6851476 | biostudies-literature
| S-EPMC6800999 | biostudies-literature
| S-EPMC3463120 | biostudies-literature
| S-EPMC6851495 | biostudies-literature
| S-EPMC8453599 | biostudies-literature
| S-EPMC8792440 | biostudies-literature
| S-EPMC8044223 | biostudies-literature
| S-EPMC7320627 | biostudies-literature