Unknown

Dataset Information

0

CRAM 3.1: Advances in the CRAM File Format.


ABSTRACT: CRAM has established itself as a high compression alternative to the BAM file format for DNA sequencing data. We describe updates to further improve this on modern sequencing instruments. With Illumina data CRAM 3.1 is 7 to 15% smaller than the equivalent CRAM 3.0 file, and 50 to 70% smaller than the corresponding BAM file. Long-read technology shows more modest compression due to the presence of high-entropy signals. The CRAM 3.0 specification is freely available from https://samtools.github.io/hts-specs/CRAMv3.pdf. The CRAM 3.1 improvements are available in a separate OpenSource HTScodecs library from from https://github.com/samtools/htscodecs, and have been incorporated into HTSlib. Supplementary data are available online.

SUBMITTER: Bonfield JK 

PROVIDER: S-EPMC8896640 | biostudies-literature | 2022 Jan

REPOSITORIES: biostudies-literature

altmetric image

Publications

CRAM 3.1: advances in the CRAM file format.

Bonfield James K JK  

Bioinformatics (Oxford, England) 20220301 6


<h4>Motivation</h4>CRAM has established itself as a high compression alternative to the BAM file format for DNA sequencing data. We describe updates to further improve this on modern sequencing instruments.<h4>Results</h4>With Illumina data CRAM 3.1 is 7-15% smaller than the equivalent CRAM 3.0 file, and 50-70% smaller than the corresponding BAM file. Long-read technology shows more modest compression due to the presence of high-entropy signals.<h4>Availability and implementation</h4>The CRAM 3.  ...[more]

Similar Datasets

| S-EPMC4874736 | biostudies-literature
| S-EPMC2892967 | biostudies-literature
| S-EPMC9665857 | biostudies-literature
| S-EPMC9237710 | biostudies-literature
| S-EPMC2945790 | biostudies-literature
| S-EPMC2655813 | biostudies-literature
| S-EPMC11829953 | biostudies-literature
| S-EPMC7265431 | biostudies-literature
| S-EPMC10069377 | biostudies-literature
| S-EPMC8522443 | biostudies-literature