Transcriptomics

Dataset Information

0

Genomic language models improve cross-species gene expression prediction and accurately capture regulatory variant effects in Brachypodium mutant lines


ABSTRACT: Predicting gene expression from cis-regulatory DNA sequences is a central challenge in plant genomics. Here, we developed deep learning sequence-to-expression (S2E) models that leverage high-dimensional representations from auxiliary foundational models (genomic language model PlantCaduceus, chromatin accessibility model a2z) instead of one-hot encoding of sequences, to predict gene expression across 17 plant species. We first evaluated our models to predict gene expression on unseen gene families via cross-validation, demonstrating our model’s prediction accuracy across all species outperforms PhytoExpr, a state-of-the-art (SOTA) S2E model trained on the same dataset (Pearson R=0.82 vs. R=0.74). We then validated variant effect predictions using an experimental dataset across 796 Brachypodium mutant lines, specifically designed to test predictions at single-base resolution. Our models outperformed the SOTA models in predicting between-gene expression differences (regression coefficient β=0.78 vs. β=0.57). Remarkably, they also accurately predicted the effects of single-nucleotide mutations on within-gene expression, while SOTA models showed only weak associations (regression coefficient β=0.38 vs. β=0.08). Our results demonstrate the value of context-aware DNA sequence embeddings for predicting regulatory variant effects in plants. They also reveal a persistent accuracy gap in S2E models when moving from between-gene to allelic variation, a challenge that needs to be addressed in future studies.

ORGANISM(S): Brachypodium distachyon

PROVIDER: GSE324261 | GEO | 2026/03/09

REPOSITORIES: GEO

Dataset's files

Source:
Action DRS
Other
Items per page:
1 - 1 of 1

Similar Datasets

| 2653863 | ecrin-mdr-crc
2015-01-22 | E-GEOD-65218 | biostudies-arrayexpress
2015-01-22 | GSE65218 | GEO
2025-09-05 | GSE287695 | GEO
2025-09-05 | GSE289179 | GEO
2020-12-31 | GSE158699 | GEO
2014-11-04 | GSE59845 | GEO
2022-10-21 | GSE215868 | GEO
2004-11-24 | GSE1414 | GEO
2015-03-01 | GSE60200 | GEO