Unknown

Dataset Information

0

Streamlining remote nanopore data access with slow5curl.


ABSTRACT:

Background

As adoption of nanopore sequencing technology continues to advance, the need to maintain large volumes of raw current signal data for reanalysis with updated algorithms is a growing challenge. Here we introduce slow5curl, a software package designed to streamline nanopore data sharing, accessibility, and reanalysis.

Results

Slow5curl allows a user to fetch a specified read or group of reads from a raw nanopore dataset stored on a remote server, such as a public data repository, without downloading the entire file. Slow5curl uses an index to quickly fetch specific reads from a large dataset in SLOW5/BLOW5 format and highly parallelized data access requests to maximize download speeds. Using all public nanopore data from the Human Pangenome Reference Consortium (>22 TB), we demonstrate how slow5curl can be used to quickly fetch and reanalyze raw signal reads corresponding to a set of target genes from each individual in large cohort dataset (n = 91), minimizing the time, egress costs, and local storage requirements for their reanalysis.

Conclusions

We provide slow5curl as a free, open-source package that will reduce frictions in data sharing for the nanopore community: https://github.com/BonsonW/slow5curl.

SUBMITTER: Wong B 

PROVIDER: S-EPMC11010652 | biostudies-literature | 2024 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

Streamlining remote nanopore data access with slow5curl.

Wong Bonson B   Ferguson James M JM   Do Jessica Y JY   Gamaarachchi Hasindu H   Deveson Ira W IW  

GigaScience 20240101


<h4>Background</h4>As adoption of nanopore sequencing technology continues to advance, the need to maintain large volumes of raw current signal data for reanalysis with updated algorithms is a growing challenge. Here we introduce slow5curl, a software package designed to streamline nanopore data sharing, accessibility, and reanalysis.<h4>Results</h4>Slow5curl allows a user to fetch a specified read or group of reads from a raw nanopore dataset stored on a remote server, such as a public data rep  ...[more]

Similar Datasets

| S-EPMC4489243 | biostudies-literature
| PRJNA625583 | ENA
| PRJNA625584 | ENA
| S-EPMC4669966 | biostudies-literature
| S-EPMC8631065 | biostudies-literature
| S-EPMC11917030 | biostudies-literature
| S-EPMC10877752 | biostudies-literature
| S-EPMC11893148 | biostudies-literature
| S-EPMC8697502 | biostudies-literature
| S-EPMC11435353 | biostudies-literature