Unknown

Dataset Information

0

Rapid quantification of sequence repeats to resolve the size, structure and contents of bacterial genomes.


ABSTRACT: BACKGROUND: The numerous classes of repeats often impede the assembly of genome sequences from the short reads provided by new sequencing technologies. We demonstrate a simple and rapid means to ascertain the repeat structure and total size of a bacterial or archaeal genome without the need for assembly by directly analyzing the abundances of distinct k-mers among reads. RESULTS: The sensitivity of this procedure to resolve variation within a bacterial species is demonstrated: genome sizes and repeat structure of five environmental strains of E. coli from short Illumina reads were estimated by this method, and total genome sizes corresponded well with those obtained for the same strains by pulsed-field gel electrophoresis. In addition, this approach was applied to read-sets for completed genomes and shown to be accurate over a wide range of microbial genome sizes. CONCLUSIONS: Application of these procedures, based solely on k-mer abundances in short read data sets, allows aspects of genome structure to be resolved that are not apparent from conventional short read assemblies. This knowledge of the repetitive content of genomes provides insights into genome evolution and diversity.

SUBMITTER: Williams D 

PROVIDER: S-EPMC3751351 | biostudies-literature | 2013

REPOSITORIES: biostudies-literature

altmetric image

Publications

Rapid quantification of sequence repeats to resolve the size, structure and contents of bacterial genomes.

Williams David D   Trimble William L WL   Shilts Meghan M   Meyer Folker F   Ochman Howard H  

BMC genomics 20130808


<h4>Background</h4>The numerous classes of repeats often impede the assembly of genome sequences from the short reads provided by new sequencing technologies. We demonstrate a simple and rapid means to ascertain the repeat structure and total size of a bacterial or archaeal genome without the need for assembly by directly analyzing the abundances of distinct k-mers among reads.<h4>Results</h4>The sensitivity of this procedure to resolve variation within a bacterial species is demonstrated: genom  ...[more]

Similar Datasets

| S-EPMC9296519 | biostudies-literature
| S-EPMC1895974 | biostudies-literature
| S-EPMC8743544 | biostudies-literature
| S-EPMC7181556 | biostudies-literature
| S-EPMC6385970 | biostudies-literature
| S-EPMC10341722 | biostudies-literature
| S-EPMC3585866 | biostudies-literature
2011-01-15 | E-GEOD-19917 | biostudies-arrayexpress
| S-EPMC9297083 | biostudies-literature
| S-EPMC10302572 | biostudies-literature