Dataset Information


Gibbs sampling and helix-cap motifs.

ABSTRACT: Protein backbones have characteristic secondary structures, including alpha-helices and beta-sheets. Which structure is adopted locally is strongly biased by the local amino acid sequence of the protein. Accurate (probabilistic) mappings from sequence to structure are valuable for both secondary-structure prediction and protein design. For the case of alpha-helix caps, we test whether the information content of the sequence-structure mapping can be self-consistently improved by using a relaxed definition of the structure. We derive helix-cap sequence motifs using database helix assignments for proteins of known structure. These motifs are refined using Gibbs sampling in competition with a null motif. Then Gibbs sampling is repeated, allowing for frameshifts of +/-1 amino acid residue, in order to find sequence motifs of higher total information content. All helix-cap motifs were found to have good generalization capability, as judged by training on a small set of non-redundant proteins and testing on a larger set. For overall prediction purposes, frameshift motifs using all training examples yielded the best results. Frameshift motifs using a fraction of all training examples performed best in terms of true positives among top predictions. However, motifs without frameshifts also performed well, despite a roughly one-third lower total information content.


PROVIDER: S-EPMC1234247 | BioStudies | 2005-01-01T00:00:00Z

REPOSITORIES: biostudies

Similar Datasets

2014-01-01 | S-EPMC4253828 | BioStudies
2013-01-01 | S-EPMC3711429 | BioStudies
2020-01-01 | S-EPMC7084103 | BioStudies
2011-01-01 | S-EPMC3131356 | BioStudies
2019-01-01 | S-EPMC6476729 | BioStudies
2019-01-01 | S-EPMC6368855 | BioStudies
2013-01-01 | S-EPMC3855712 | BioStudies
2014-01-01 | S-EPMC3936762 | BioStudies
2002-01-01 | S-EPMC137970 | BioStudies
1000-01-01 | S-EPMC2912546 | BioStudies