Unknown

Dataset Information

0

Nucleotide Archival Format (NAF) enables efficient lossless reference-free compression of DNA sequences.


ABSTRACT:

Summary

DNA sequence databases use compression such as gzip to reduce the required storage space and network transmission time. We describe Nucleotide Archival Format (NAF)-a new file format for lossless reference-free compression of FASTA and FASTQ-formatted nucleotide sequences. Nucleotide Archival Format compression ratio is comparable to the best DNA compressors, while providing dramatically faster decompression. We compared our format with DNA compressors: DELIMINATE and MFCompress, and with general purpose compressors: gzip, bzip2, xz, brotli and zstd.

Availability and implementation

NAF compressor and decompressor, as well as format specification are available at https://github.com/KirillKryukov/naf. Format specification is in public domain. Compressor and decompressor are open source under the zlib/libpng license, free for nearly any use.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Kryukov K 

PROVIDER: S-EPMC6761962 | biostudies-literature | 2019 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

Nucleotide Archival Format (NAF) enables efficient lossless reference-free compression of DNA sequences.

Kryukov Kirill K   Ueda Mahoko Takahashi MT   Nakagawa So S   Imanishi Tadashi T  

Bioinformatics (Oxford, England) 20191001 19


<h4>Summary</h4>DNA sequence databases use compression such as gzip to reduce the required storage space and network transmission time. We describe Nucleotide Archival Format (NAF)-a new file format for lossless reference-free compression of FASTA and FASTQ-formatted nucleotide sequences. Nucleotide Archival Format compression ratio is comparable to the best DNA compressors, while providing dramatically faster decompression. We compared our format with DNA compressors: DELIMINATE and MFCompress,  ...[more]

Similar Datasets

| S-EPMC9259476 | biostudies-literature
| S-EPMC7265431 | biostudies-literature
| S-EPMC11696233 | biostudies-literature
| S-EPMC11258903 | biostudies-literature
| S-EPMC11735755 | biostudies-literature
| S-EPMC9902536 | biostudies-literature
| S-EPMC11226158 | biostudies-literature
| S-EPMC7079445 | biostudies-literature
| S-EPMC10680779 | biostudies-literature
| S-EPMC7165212 | biostudies-literature