Unknown

Dataset Information

0

RepairNatrix: a Snakemake workflow for processing DNA sequencing data for DNA storage.


ABSTRACT:

Motivation

There has been rapid progress in the development of error-correcting and constrained codes for DNA storage systems in recent years. However, improving the steps for processing raw sequencing data for DNA storage has a lot of untapped potential for further progress. In particular, constraints can be used as prior information to improve the processing of DNA sequencing data. Furthermore, a workflow tailored to DNA storage codes enables fair comparisons between different approaches while leading to reproducible results.

Results

We present RepairNatrix, a read-processing workflow for DNA storage. RepairNatrix supports preprocessing of raw sequencing data for DNA storage applications and can be used to flag and heuristically repair constraint-violating sequences to further increase the recoverability of encoded data in the presence of errors. Compared to a preprocessing strategy without repair functionality, RepairNatrix reduced the number of raw reads required for the successful, error-free decoding of the input files by a factor of 25-35 across different datasets.

Availability and implementation

RepairNatrix is available on Github: https://github.com/umr-ds/repairnatrix.

SUBMITTER: Schwarz PM 

PROVIDER: S-EPMC10941317 | biostudies-literature | 2023

REPOSITORIES: biostudies-literature

altmetric image

Publications

RepairNatrix: a Snakemake workflow for processing DNA sequencing data for DNA storage.

Schwarz Peter Michael PM   Welzel Marius M   Heider Dominik D   Freisleben Bernd B  

Bioinformatics advances 20230826 1


<h4>Motivation</h4>There has been rapid progress in the development of error-correcting and constrained codes for DNA storage systems in recent years. However, improving the steps for processing raw sequencing data for DNA storage has a lot of untapped potential for further progress. In particular, constraints can be used as prior information to improve the processing of DNA sequencing data. Furthermore, a workflow tailored to DNA storage codes enables fair comparisons between different approach  ...[more]

Similar Datasets

| S-EPMC7667751 | biostudies-literature
| S-EPMC9825751 | biostudies-literature
| S-EPMC7079470 | biostudies-literature
| S-EPMC10883564 | biostudies-literature
| S-EPMC7320628 | biostudies-literature
| S-EPMC10576169 | biostudies-literature
| S-EPMC10633908 | biostudies-literature
| S-EPMC4824868 | biostudies-literature
| S-EPMC8114187 | biostudies-literature
| S-EPMC8666039 | biostudies-literature