Unknown

Dataset Information

0

SWALO: scaffolding with assembly likelihood optimization.


ABSTRACT: Scaffolding, i.e. ordering and orienting contigs is an important step in genome assembly. We present a method for scaffolding using second generation sequencing reads based on likelihoods of genome assemblies. A generative model for sequencing is used to obtain maximum likelihood estimates of gaps between contigs and to estimate whether linking contigs into scaffolds would lead to an increase in the likelihood of the assembly. We then link contigs if they can be unambiguously joined or if the corresponding increase in likelihood is substantially greater than that of other possible joins of those contigs. The method is implemented in a tool called Swalo with approximations to make it efficient and applicable to large datasets. Analysis on real and simulated datasets reveals that it consistently makes more or similar number of correct joins as other scaffolders while linking very few contigs incorrectly, thus outperforming other scaffolders and demonstrating that substantial improvement in genome assembly may be achieved through the use of statistical models. Swalo is freely available for download at https://atifrahman.github.io/SWALO/.

SUBMITTER: Rahman A 

PROVIDER: S-EPMC8599790 | biostudies-literature | 2021 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

SWALO: scaffolding with assembly likelihood optimization.

Rahman Atif A   Pachter Lior L  

Nucleic acids research 20211101 20


Scaffolding, i.e. ordering and orienting contigs is an important step in genome assembly. We present a method for scaffolding using second generation sequencing reads based on likelihoods of genome assemblies. A generative model for sequencing is used to obtain maximum likelihood estimates of gaps between contigs and to estimate whether linking contigs into scaffolds would lead to an increase in the likelihood of the assembly. We then link contigs if they can be unambiguously joined or if the co  ...[more]

Similar Datasets

| S-EPMC4168704 | biostudies-literature
| S-EPMC7788845 | biostudies-literature
| S-EPMC4053845 | biostudies-literature
| S-EPMC2909219 | biostudies-literature
| S-EPMC4897622 | biostudies-literature
| S-EPMC8519820 | biostudies-literature
| S-EPMC4545973 | biostudies-literature
| S-EPMC7320612 | biostudies-literature
| S-EPMC8557608 | biostudies-literature
| S-EPMC6813328 | biostudies-literature