Unknown

Dataset Information

0

Precious2GPT: the combination of multiomics pretrained transformer and conditional diffusion for artificial multi-omics multi-species multi-tissue sample generation.


ABSTRACT: Synthetic data generation in omics mimics real-world biological data, providing alternatives for training and evaluation of genomic analysis tools, controlling differential expression, and exploring data architecture. We previously developed Precious1GPT, a multimodal transformer trained on transcriptomic and methylation data, along with metadata, for predicting biological age and identifying dual-purpose therapeutic targets potentially implicated in aging and age-associated diseases. In this study, we introduce Precious2GPT, a multimodal architecture that integrates Conditional Diffusion (CDiffusion) and decoder-only Multi-omics Pretrained Transformer (MoPT) models trained on gene expression and DNA methylation data. Precious2GPT excels in synthetic data generation, outperforming Conditional Generative Adversarial Networks (CGANs), CDiffusion, and MoPT. We demonstrate that Precious2GPT is capable of generating representative synthetic data that captures tissue- and age-specific information from real transcriptomics and methylomics data. Notably, Precious2GPT surpasses other models in age prediction accuracy using the generated data, and it can generate data beyond 120 years of age. Furthermore, we showcase the potential of using this model in identifying gene signatures and potential therapeutic targets in a colorectal cancer case study.

SUBMITTER: Sidorenko D 

PROVIDER: S-EPMC11310469 | biostudies-literature | 2024 Aug

REPOSITORIES: biostudies-literature

altmetric image

Publications

Precious2GPT: the combination of multiomics pretrained transformer and conditional diffusion for artificial multi-omics multi-species multi-tissue sample generation.

Sidorenko Denis D   Pushkov Stefan S   Sakip Akhmed A   Leung Geoffrey Ho Duen GHD   Lok Sarah Wing Yan SWY   Urban Anatoly A   Zagirova Diana D   Veviorskiy Alexander A   Tihonova Nina N   Kalashnikov Aleksandr A   Kozlova Ekaterina E   Naumov Vladimir V   Pun Frank W FW   Aliper Alex A   Ren Feng F   Zhavoronkov Alex A  

npj aging 20240808 1


Synthetic data generation in omics mimics real-world biological data, providing alternatives for training and evaluation of genomic analysis tools, controlling differential expression, and exploring data architecture. We previously developed Precious1GPT, a multimodal transformer trained on transcriptomic and methylation data, along with metadata, for predicting biological age and identifying dual-purpose therapeutic targets potentially implicated in aging and age-associated diseases. In this st  ...[more]

Similar Datasets

| S-EPMC10370089 | biostudies-literature
| S-EPMC11565080 | biostudies-literature
| S-EPMC10026941 | biostudies-literature
| S-EPMC10771517 | biostudies-literature
| S-EPMC10462017 | biostudies-literature
| S-EPMC11657395 | biostudies-literature
| S-EPMC10469107 | biostudies-literature
| S-EPMC11385924 | biostudies-literature
| S-EPMC9280463 | biostudies-literature
| S-EPMC10805303 | biostudies-literature