Unknown

Dataset Information

0

Bias-invariant RNA-sequencing metadata annotation.


ABSTRACT: Recent technological advances have resulted in an unprecedented increase in publicly available biomedical data, yet the reuse of the data is often precluded by experimental bias and a lack of annotation depth and consistency. Missing annotations makes it impossible for researchers to find datasets specific to their needs. Here, we investigate RNA-sequencing metadata prediction based on gene expression values. We present a deep-learning-based domain adaptation algorithm for the automatic annotation of RNA-sequencing metadata. We show, in multiple experiments, that our model is better at integrating heterogeneous training data compared with existing linear regression-based approaches, resulting in improved tissue type classification. By using a model architecture similar to Siamese networks, the algorithm can learn biases from datasets with few samples. Using our novel domain adaptation approach, we achieved metadata annotation accuracies up to 15.7% better than a previously published method. Using the best model, we provide a list of >10,000 novel tissue and sex label annotations for 8,495 unique SRA samples. Our approach has the potential to revive idle datasets by automated annotation making them more searchable.

SUBMITTER: Wartmann H 

PROVIDER: S-EPMC8559615 | biostudies-literature | 2021 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

Bias-invariant RNA-sequencing metadata annotation.

Wartmann Hannes H   Heins Sven S   Kloiber Karin K   Bonn Stefan S  

GigaScience 20210901 9


<h4>Background</h4>Recent technological advances have resulted in an unprecedented increase in publicly available biomedical data, yet the reuse of the data is often precluded by experimental bias and a lack of annotation depth and consistency. Missing annotations makes it impossible for researchers to find datasets specific to their needs.<h4>Findings</h4>Here, we investigate RNA-sequencing metadata prediction based on gene expression values. We present a deep-learning-based domain adaptation a  ...[more]

Similar Datasets

| S-EPMC8401820 | biostudies-literature
| S-EPMC10277029 | biostudies-literature
| S-EPMC4197826 | biostudies-literature
| S-EPMC3149584 | biostudies-literature
| S-EPMC4117970 | biostudies-literature
| S-EPMC11863512 | biostudies-literature
| S-EPMC4130647 | biostudies-literature
| S-EPMC7703774 | biostudies-literature
| S-EPMC5778030 | biostudies-literature
| S-EPMC5428526 | biostudies-literature