Unknown

Dataset Information

0

Compression of FASTQ and SAM format sequencing data.


ABSTRACT: Storage and transmission of the data produced by modern DNA sequencing instruments has become a major concern, which prompted the Pistoia Alliance to pose the SequenceSqueeze contest for compression of FASTQ files. We present several compression entries from the competition, Fastqz and Samcomp/Fqzcomp, including the winning entry. These are compared against existing algorithms for both reference based compression (CRAM, Goby) and non-reference based compression (DSRC, BAM) and other recently published competition entries (Quip, SCALCE). The tools are shown to be the new Pareto frontier for FASTQ compression, offering state of the art ratios at affordable CPU costs. All programs are freely available on SourceForge. Fastqz: https://sourceforge.net/projects/fastqz/, fqzcomp: https://sourceforge.net/projects/fqzcomp/, and samcomp: https://sourceforge.net/projects/samcomp/.

SUBMITTER: Bonfield JK 

PROVIDER: S-EPMC3606433 | biostudies-literature | 2013

REPOSITORIES: biostudies-literature

altmetric image

Publications

Compression of FASTQ and SAM format sequencing data.

Bonfield James K JK   Mahoney Matthew V MV  

PloS one 20130322 3


Storage and transmission of the data produced by modern DNA sequencing instruments has become a major concern, which prompted the Pistoia Alliance to pose the SequenceSqueeze contest for compression of FASTQ files. We present several compression entries from the competition, Fastqz and Samcomp/Fqzcomp, including the winning entry. These are compared against existing algorithms for both reference based compression (CRAM, Goby) and non-reference based compression (DSRC, BAM) and other recently pub  ...[more]

Similar Datasets

| S-EPMC6547476 | biostudies-literature
| S-EPMC4459677 | biostudies-literature
| S-EPMC2847217 | biostudies-literature
| S-EPMC3868316 | biostudies-literature
| S-EPMC3832420 | biostudies-literature
| S-EPMC6969201 | biostudies-literature
| S-EPMC5946873 | biostudies-literature
| S-EPMC4547610 | biostudies-literature
| S-EPMC3027120 | biostudies-literature