Unknown

Dataset Information

0

LDmat: efficiently queryable compression of linkage disequilibrium matrices.


ABSTRACT:

Motivation

Linkage disequilibrium (LD) matrices derived from large populations are widely used in population genetics in fine-mapping, LD score regression, and linear mixed models for Genome-wide Association Studies (GWAS). However, these matrices can reach large sizes when they are derived from millions of individuals; hence, moving, sharing and extracting granular information from this large amount of data can be cumbersome.

Results

We sought to address the need for compressing and easily querying large LD matrices by developing LDmat. LDmat is a standalone tool to compress large LD matrices in an HDF5 file format and query these compressed matrices. It can extract submatrices corresponding to a sub-region of the genome, a list of select loci, and loci within a minor allele frequency range. LDmat can also rebuild the original file formats from the compressed files.

Availability and implementation

LDmat is implemented in python, and can be installed on Unix systems with the command 'pip install ldmat'. It can also be accessed through https://github.com/G2Lab/ldmat and https://pypi.org/project/ldmat/.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Weiner RJ 

PROVIDER: S-EPMC9969815 | biostudies-literature | 2023 Feb

REPOSITORIES: biostudies-literature

altmetric image

Publications

LDmat: efficiently queryable compression of linkage disequilibrium matrices.

Weiner Rockwell J RJ   Lakhani Chirag C   Knowles David A DA   Gürsoy Gamze G  

Bioinformatics (Oxford, England) 20230201 2


<h4>Motivation</h4>Linkage disequilibrium (LD) matrices derived from large populations are widely used in population genetics in fine-mapping, LD score regression, and linear mixed models for Genome-wide Association Studies (GWAS). However, these matrices can reach large sizes when they are derived from millions of individuals; hence, moving, sharing and extracting granular information from this large amount of data can be cumbersome.<h4>Results</h4>We sought to address the need for compressing  ...[more]

Similar Datasets

| S-EPMC3970903 | biostudies-literature
| S-EPMC8696101 | biostudies-literature
| S-EPMC8733019 | biostudies-literature
| S-EPMC1950958 | biostudies-literature
| S-EPMC8982034 | biostudies-literature
| S-EPMC1665459 | biostudies-literature
| S-EPMC3931163 | biostudies-literature
| S-EPMC1137007 | biostudies-literature
| S-EPMC4614761 | biostudies-literature
| S-EPMC9826361 | biostudies-literature