Dataset Information

LDmat: efficiently queryable compression of linkage disequilibrium matrices.

ABSTRACT:

Motivation

Linkage disequilibrium (LD) matrices derived from large populations are widely used in population genetics in fine-mapping, LD score regression, and linear mixed models for Genome-wide Association Studies (GWAS). However, these matrices can reach large sizes when they are derived from millions of individuals; hence, moving, sharing and extracting granular information from this large amount of data can be cumbersome.

Results

We sought to address the need for compressing and easily querying large LD matrices by developing LDmat. LDmat is a standalone tool to compress large LD matrices in an HDF5 file format and query these compressed matrices. It can extract submatrices corresponding to a sub-region of the genome, a list of select loci, and loci within a minor allele frequency range. LDmat can also rebuild the original file formats from the compressed files.

Availability and implementation

LDmat is implemented in python, and can be installed on Unix systems with the command 'pip install ldmat'. It can also be accessed through https://github.com/G2Lab/ldmat and https://pypi.org/project/ldmat/.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Weiner RJ

PROVIDER: S-EPMC9969815 | biostudies-literature | 2023 Feb

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

LDmat: efficiently queryable compression of linkage disequilibrium matrices.

Weiner Rockwell J RJ Lakhani Chirag C Knowles David A DA Gürsoy Gamze G

Bioinformatics (Oxford, England) 20230201 2

<h4>Motivation</h4>Linkage disequilibrium (LD) matrices derived from large populations are widely used in population genetics in fine-mapping, LD score regression, and linear mixed models for Genome-wide Association Studies (GWAS). However, these matrices can reach large sizes when they are derived from millions of individuals; hence, moving, sharing and extracting granular information from this large amount of data can be cumbersome.<h4>Results</h4>We sought to address the need for compressing ...[more]

PMID: 36794924

Dataset Information

LDmat: efficiently queryable compression of linkage disequilibrium matrices.

Motivation

Results

Availability and implementation

Supplementary information

Publications

LDmat: efficiently queryable compression of linkage disequilibrium matrices.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Confounding by linkage disequilibrium.
| S-EPMC3970903 | biostudies-literature

Optimal linkage disequilibrium splitting.
| S-EPMC8696101 | biostudies-literature

Linkage disequilibrium under polysomic inheritance.
| S-EPMC8733019 | biostudies-literature

Linkage disequilibrium in wild mice.
| S-EPMC1950958 | biostudies-literature

Linkage disequilibrium between rare mutations.
| S-EPMC8982034 | biostudies-literature

Volume measures for linkage disequilibrium.
| S-EPMC1665459 | biostudies-literature

Combined linkage disequilibrium and linkage mapping: Bayesian multilocus approach.
| S-EPMC3931163 | biostudies-literature

Linkage disequilibrium maps and association mapping.
| S-EPMC1137007 | biostudies-literature

Mitonuclear linkage disequilibrium in human populations.
| S-EPMC4614761 | biostudies-literature

Partial sex linkage and linkage disequilibrium on the guppy sex chromosome.
| S-EPMC9826361 | biostudies-literature