Dataset Information

A proteomics sample metadata representation for multiomics integration and big data analysis.

ABSTRACT: The amount of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanalysis. Here we propose to develop the transcriptomics data format MAGE-TAB into a standard representation for proteomics sample metadata. We implement MAGE-TAB-Proteomics in a crowdsourcing project to manually curate over 200 public datasets. We also describe tools and libraries to validate and submit sample metadata-related information to the PRIDE repository. We expect that these developments will improve the reproducibility and facilitate the reanalysis and integration of public proteomics datasets.

SUBMITTER: Dai C

PROVIDER: S-EPMC8494749 | biostudies-literature | 2021 Oct

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

A proteomics sample metadata representation for multiomics integration and big data analysis.

Dai Chengxin C Füllgrabe Anja A Pfeuffer Julianus J Solovyeva Elizaveta M EM Deng Jingwen J Moreno Pablo P Kamatchinathan Selvakumar S Kundu Deepti Jaiswal DJ George Nancy N Fexova Silvie S Grüning Björn B Föll Melanie Christine MC Griss Johannes J Vaudel Marc M Audain Enrique E Locard-Paulet Marie M Turewicz Michael M Eisenacher Martin M Uszkoreit Julian J Van Den Bossche Tim T Schwämmle Veit V Webel Henry H Schulze Stefan S Bouyssié David D Jayaram Savita S Duggineni Vinay Kumar VK Samaras Patroklos P Wilhelm Mathias M Choi Meena M Wang Mingxun M Kohlbacher Oliver O Brazma Alvis A Papatheodorou Irene I Bandeira Nuno N Deutsch Eric W EW Vizcaíno Juan Antonio JA Bai Mingze M Sachsenberg Timo T Levitsky Lev I LI Perez-Riverol Yasset Y

Nature communications 20211006 1

The amount of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanalysis. Here we propose to develop the transcriptomics data format MAGE-TAB into a standard representation for proteomics sample metadata. We implement MAGE-TAB-Proteomics in a crowdsourcing project to manually curate over 200 public datasets. We also describe tools and ...[more]

PMID: 34615866

Dataset Information

A proteomics sample metadata representation for multiomics integration and big data analysis.

Publications

A proteomics sample metadata representation for multiomics integration and big data analysis.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets

Similar Datasets

SODAR: managing multiomics study data and metadata.
| S-EPMC10373112 | biostudies-literature

Toward a Sample Metadata Standard in Public Proteomics Repositories.
| S-EPMC7116434 | biostudies-literature

Big data to knowledge: common pitfalls in transcriptomics data analysis and representation.
| S-EPMC6779380 | biostudies-literature

MetaGate: Interactive analysis of high-dimensional cytometry data with metadata integration.
| S-EPMC11284499 | biostudies-literature

Compact graphical representation of phylogenetic data and metadata with GraPhlAn.
| S-EPMC4476132 | biostudies-literature

Small sample sizes: A big data problem in high-dimensional data analysis.
| S-EPMC8008424 | biostudies-literature

HIVseqDB: a portable resource for NGS and sample metadata integration for HIV-1 drug resistance analysis.
| S-EPMC10834361 | biostudies-literature

Phenonaut: multiomics data integration for phenotypic space exploration.
| S-EPMC10068743 | biostudies-literature

SUPREME: multiomics data integration using graph convolutional networks
| S-EPMC10481254 | biostudies-literature

iProX in 2021: connecting proteomics data sharing with big data.
| S-EPMC8728291 | biostudies-literature