Dataset Information

High-sensitivity pattern discovery in large, paired multiomic datasets.

ABSTRACT:

Motivation

Modern biological screens yield enormous numbers of measurements, and identifying and interpreting statistically significant associations among features are essential. In experiments featuring multiple high-dimensional datasets collected from the same set of samples, it is useful to identify groups of associated features between the datasets in a way that provides high statistical power and false discovery rate (FDR) control.

Results

Here, we present a novel hierarchical framework, HAllA (Hierarchical All-against-All association testing), for structured association discovery between paired high-dimensional datasets. HAllA efficiently integrates hierarchical hypothesis testing with FDR correction to reveal significant linear and non-linear block-wise relationships among continuous and/or categorical data. We optimized and evaluated HAllA using heterogeneous synthetic datasets of known association structure, where HAllA outperformed all-against-all and other block-testing approaches across a range of common similarity measures. We then applied HAllA to a series of real-world multiomics datasets, revealing new associations between gene expression and host immune activity, the microbiome and host transcriptome, metabolomic profiling and human health phenotypes.

Availability and implementation

An open-source implementation of HAllA is freely available at http://huttenhower.sph.harvard.edu/halla along with documentation, demo datasets and a user group.

Supplementary information

Supplementary data are available at Bioinformatics online.

SUBMITTER: Ghazi AR

PROVIDER: S-EPMC9235493 | biostudies-literature | 2022 Jun

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

High-sensitivity pattern discovery in large, paired multiomic datasets.

Ghazi Andrew R AR Sucipto Kathleen K Rahnavard Ali A Franzosa Eric A EA McIver Lauren J LJ Lloyd-Price Jason J Schwager Emma E Weingart George G Moon Yo Sup YS Morgan Xochitl C XC Waldron Levi L Huttenhower Curtis C

Bioinformatics (Oxford, England) 20220601 Suppl 1

<h4>Motivation</h4>Modern biological screens yield enormous numbers of measurements, and identifying and interpreting statistically significant associations among features are essential. In experiments featuring multiple high-dimensional datasets collected from the same set of samples, it is useful to identify groups of associated features between the datasets in a way that provides high statistical power and false discovery rate (FDR) control.<h4>Results</h4>Here, we present a novel hierarchica ...[more]

PMID: 35758795

Dataset Information

High-sensitivity pattern discovery in large, paired multiomic datasets.

Motivation

Results

Availability and implementation

Supplementary information

Publications

High-sensitivity pattern discovery in large, paired multiomic datasets.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

Pattern discovery and disentanglement on relational datasets.
| S-EPMC7952710 | biostudies-literature

Causal Discovery in High-dimensional, Multicollinear Datasets.
| S-EPMC9910507 | biostudies-literature

Genetic prediction of male pattern baldness based on large independent datasets.
| S-EPMC9995341 | biostudies-literature

Multiomic sequencing of paired primary and metastatic small bowel carcinoids.
| S-EPMC10632590 | biostudies-literature

CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets.
| S-EPMC5473464 | biostudies-literature

Secure Discovery of Genetic Relatives across Large-Scale and Distributed Genomic Datasets.
| S-EPMC11257153 | biostudies-literature

High-throughput DNA methylation datasets for evaluating false discovery rate methodologies.
| S-EPMC3352593 | biostudies-literature

FlowCT for the analysis of large immunophenotypic datasets and biomarker discovery in cancer immunology
2021-09-26 | GSE166711 | GEO

Empirical Models of Shear-Wave Radiation Pattern Derived from Large Datasets of Ground-Shaking Observations.
| S-EPMC6353911 | biostudies-literature

Pattern Discovery from High-Order Drug-Drug Interaction Relations.
| S-EPMC8982853 | biostudies-literature