Unknown

Dataset Information

0

Evolution of protein indels in plants, animals and fungi.


ABSTRACT:

Background

Insertions/deletions (indels) in protein sequences are useful as drug targets, protein structure predictors, species diagnostics and evolutionary markers. However there is limited understanding of indel evolutionary patterns. We sought to characterize indel patterns focusing first on the major groups of multicellular eukaryotes.

Results

Comparisons of complete proteomes from a taxonically broad set of primarily Metazoa, Fungi and Viridiplantae yielded 299 substantial (>250aa) universal, single-copy (in-paralog only) proteins, from which 901 simple (present/absent) and 3,806 complex (multistate) indels were extracted. Simple indels are mostly small (1-7aa) with a most frequent size class of 1aa. However, even these simple looking indels show a surprisingly high level of hidden homoplasy (multiple independent origins). Among the apparently homoplasy-free simple indels, we identify 69 potential clade-defining indels (CDIs) that may warrant closer examination. CDIs show a very uneven taxonomic distribution among Viridiplante (13 CDIs), Fungi (40 CDIs), and Metazoa (0 CDIs). An examination of singleton indels shows an excess of insertions over deletions in nearly all examined taxa. This excess averages 2.31 overall, with a maximum observed value of 7.5 fold.

Conclusions

We find considerable potential for identifying taxon-marker indels using an automated pipeline. However, it appears that simple indels in universal proteins are too rare and homoplasy-rich to be used for pure indel-based phylogeny. The excess of insertions over deletions seen in nearly every genome and major group examined maybe useful in defining more realistic gap penalties for sequence alignment. This bias also suggests that insertions in highly conserved proteins experience less purifying selection than do deletions.

SUBMITTER: Ajawatanawong P 

PROVIDER: S-EPMC3706215 | BioStudies | 2013-01-01

REPOSITORIES: biostudies

Similar Datasets

2014-01-01 | S-EPMC3879449 | BioStudies
2019-01-01 | S-EPMC6543879 | BioStudies
2008-01-01 | S-EPMC2459192 | BioStudies
2013-01-01 | S-EPMC3806772 | BioStudies
2013-01-01 | S-EPMC3622295 | BioStudies
2013-01-01 | S-EPMC3638132 | BioStudies
2019-01-01 | S-EPMC6554054 | BioStudies
1000-01-01 | S-EPMC4245841 | BioStudies
1000-01-01 | S-EPMC4482057 | BioStudies
2013-01-01 | S-EPMC3642179 | BioStudies