Unknown

Dataset Information

0

Whole-genome sequences from wild-type and laboratory-evolved strains define the alleleome and establish its hallmarks.


ABSTRACT: The genomic diversity across strains of a species forms the genetic basis for differences in their behavior. A large-scale assessment of sequence variation has been made possible by the growing availability of strain-specific whole-genome sequences (WGS) and with the advent of large-scale databases of laboratory-acquired mutations. We define the Escherichia coli "alleleome" through a genome-scale assessment of amino acid (AA) sequence diversity in open reading frames across 2,661 WGS from wild-type strains. We observe a highly conserved alleleome enriched in mutations unlikely to affect protein function. In contrast, 33,000 mutations acquired in laboratory evolution experiments result in more severe AA substitutions that are rarely achieved by natural selection. Large-scale assessment of the alleleome establishes a method for the quantification of bacterial allelic diversity, reveals opportunities for synthetic biology to explore novel sequence space, and offers insights into the constraints governing evolution.

SUBMITTER: Catoiu EA 

PROVIDER: S-EPMC10104531 | biostudies-literature | 2023 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

Whole-genome sequences from wild-type and laboratory-evolved strains define the alleleome and establish its hallmarks.

Catoiu Edward Alexander EA   Phaneuf Patrick P   Monk Jonathan J   Palsson Bernhard O BO  

Proceedings of the National Academy of Sciences of the United States of America 20230403 15


The genomic diversity across strains of a species forms the genetic basis for differences in their behavior. A large-scale assessment of sequence variation has been made possible by the growing availability of strain-specific whole-genome sequences (WGS) and with the advent of large-scale databases of laboratory-acquired mutations. We define the <i>Escherichia coli</i> "alleleome" through a genome-scale assessment of amino acid (AA) sequence diversity in open reading frames across 2,661 WGS from  ...[more]

Similar Datasets

| S-EPMC8116681 | biostudies-literature
| S-EPMC2829512 | biostudies-literature
| S-EPMC4591324 | biostudies-literature
2025-02-13 | GSE289029 | GEO
2015-02-05 | GSE48900 | GEO
| S-EPMC10044804 | biostudies-literature
| S-EPMC4172273 | biostudies-literature
| S-EPMC4132622 | biostudies-literature
| S-EPMC5502862 | biostudies-literature
| S-EPMC135346 | biostudies-literature