Unknown

Dataset Information

0

Targeted domain assembly for fast functional profiling of metagenomic datasets with S3A.


ABSTRACT:

Motivation

The understanding of the ever-increasing number of metagenomic sequences accumulating in our databases demands for approaches that rapidly 'explore' the content of multiple and/or large metagenomic datasets with respect to specific domain targets, avoiding full domain annotation and full assembly.

Results

S3A is a fast and accurate domain-targeted assembler designed for a rapid functional profiling. It is based on a novel construction and a fast traversal of the Overlap-Layout-Consensus graph, designed to reconstruct coding regions from domain annotated metagenomic sequence reads. S3A relies on high-quality domain annotation to efficiently assemble metagenomic sequences and on the design of a new confidence measure for a fast evaluation of overlapping reads. Its implementation is highly generic and can be applied to any arbitrary type of annotation. On simulated data, S3A achieves a level of accuracy similar to that of classical metagenomics assembly tools while permitting to conduct a faster and sensitive profiling on domains of interest. When studying a few dozens of functional domains-a typical scenario-S3A is up to an order of magnitude faster than general purpose metagenomic assemblers, thus enabling the analysis of a larger number of datasets in the same amount of time. S3A opens new avenues to the fast exploration of the rapidly increasing number of metagenomic datasets displaying an ever-increasing size.

Availability and implementation

S3A is available at http://www.lcqb.upmc.fr/S3A_ASSEMBLER/.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: David L 

PROVIDER: S-EPMC7332565 | biostudies-literature | 2020 Jul

REPOSITORIES: biostudies-literature

altmetric image

Publications

Targeted domain assembly for fast functional profiling of metagenomic datasets with S3A.

David Laurent L   Vicedomini Riccardo R   Richard Hugues H   Carbone Alessandra A  

Bioinformatics (Oxford, England) 20200701 13


<h4>Motivation</h4>The understanding of the ever-increasing number of metagenomic sequences accumulating in our databases demands for approaches that rapidly 'explore' the content of multiple and/or large metagenomic datasets with respect to specific domain targets, avoiding full domain annotation and full assembly.<h4>Results</h4>S3A is a fast and accurate domain-targeted assembler designed for a rapid functional profiling. It is based on a novel construction and a fast traversal of the Overlap  ...[more]

Similar Datasets

| S-EPMC3309356 | biostudies-literature
| S-EPMC6114274 | biostudies-literature
| S-EPMC3052304 | biostudies-literature
| S-EPMC4641738 | biostudies-literature
| S-EPMC4978931 | biostudies-literature
| S-EPMC9749362 | biostudies-literature
| S-EPMC4526283 | biostudies-literature
| S-EPMC6635389 | biostudies-literature
| S-EPMC11373326 | biostudies-literature
| S-EPMC3883706 | biostudies-literature