Unknown

Dataset Information

0

Reproducible acquisition, management and meta-analysis of nucleotide sequence (meta)data using q2-fondue.


ABSTRACT:

Motivation

The volume of public nucleotide sequence data has blossomed over the past two decades and is ripe for re- and meta-analyses to enable novel discoveries. However, reproducible re-use and management of sequence datasets and associated metadata remain critical challenges. We created the open source Python package q2-fondue to enable user-friendly acquisition, re-use and management of public sequence (meta)data while adhering to open data principles.

Results

q2-fondue allows fully provenance-tracked programmatic access to and management of data from the NCBI Sequence Read Archive (SRA). Unlike other packages allowing download of sequence data from the SRA, q2-fondue enables full data provenance tracking from data download to final visualization, integrates with the QIIME 2 ecosystem, prevents data loss upon space exhaustion and allows download of (meta)data given a publication library. To highlight its manifold capabilities, we present executable demonstrations using publicly available amplicon, whole genome and metagenome datasets.

Availability and implementation

q2-fondue is available as an open-source BSD-3-licensed Python package at https://github.com/bokulich-lab/q2-fondue. Usage tutorials are available in the same repository. All Jupyter notebooks used in this article are available under https://github.com/bokulich-lab/q2-fondue-examples.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Ziemski M 

PROVIDER: S-EPMC9665871 | biostudies-literature | 2022 Nov

REPOSITORIES: biostudies-literature

altmetric image

Publications

Reproducible acquisition, management and meta-analysis of nucleotide sequence (meta)data using q2-fondue.

Ziemski Michal M   Adamov Anja A   Kim Lina L   Flörl Lena L   Bokulich Nicholas A NA  

Bioinformatics (Oxford, England) 20221101 22


<h4>Motivation</h4>The volume of public nucleotide sequence data has blossomed over the past two decades and is ripe for re- and meta-analyses to enable novel discoveries. However, reproducible re-use and management of sequence datasets and associated metadata remain critical challenges. We created the open source Python package q2-fondue to enable user-friendly acquisition, re-use and management of public sequence (meta)data while adhering to open data principles.<h4>Results</h4>q2-fondue allow  ...[more]

Similar Datasets

| S-EPMC1274256 | biostudies-literature
| S-EPMC8601625 | biostudies-literature
| S-EPMC5053501 | biostudies-literature
| S-EPMC4493686 | biostudies-literature
| S-EPMC8848943 | biostudies-literature
| S-EPMC4143700 | biostudies-literature
| S-EPMC6139382 | biostudies-other
| S-EPMC5127711 | biostudies-literature
| S-EPMC6909466 | biostudies-literature
| S-EPMC7718033 | biostudies-literature