Unknown

Dataset Information

0

A comprehensive catalog of predicted functional upstream open reading frames in humans.


ABSTRACT: Upstream open reading frames (uORFs) latent in mRNA transcripts are thought to modify translation of coding sequences by altering ribosome activity. Not all uORFs are thought to be active in such a process. To estimate the impact of uORFs on the regulation of translation in humans, we first circumscribed the universe of all possible uORFs based on coding gene sequence motifs and identified 1.3 million unique uORFs. To determine which of these are likely to be biologically relevant, we built a simple Bayesian classifier using 89 attributes of uORFs labeled as active in ribosome profiling experiments. This allowed us to extrapolate to a comprehensive catalog of likely functional uORFs. We validated our predictions using in vivo protein levels and ribosome occupancy from 46 individuals. This is a substantially larger catalog of functional uORFs than has previously been reported. Our ranked list of likely active uORFs allows researchers to test their hypotheses regarding the role of uORFs in health and disease. We demonstrate several examples of biological interest through the application of our catalog to somatic mutations in cancer and disease-associated germline variants in humans.

SUBMITTER: McGillivray P 

PROVIDER: S-EPMC6283423 | biostudies-literature | 2018 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

A comprehensive catalog of predicted functional upstream open reading frames in humans.

McGillivray Patrick P   Ault Russell R   Pawashe Mayur M   Kitchen Robert R   Balasubramanian Suganthi S   Gerstein Mark M  

Nucleic acids research 20180401 7


Upstream open reading frames (uORFs) latent in mRNA transcripts are thought to modify translation of coding sequences by altering ribosome activity. Not all uORFs are thought to be active in such a process. To estimate the impact of uORFs on the regulation of translation in humans, we first circumscribed the universe of all possible uORFs based on coding gene sequence motifs and identified 1.3 million unique uORFs. To determine which of these are likely to be biologically relevant, we built a si  ...[more]

Similar Datasets

| S-EPMC2813248 | biostudies-literature
| S-EPMC4820681 | biostudies-literature
| S-EPMC2527020 | biostudies-literature
| S-EPMC2142145 | biostudies-other
2020-03-14 | GSE131650 | GEO
2018-04-30 | GSE105082 | GEO
2018-09-07 | GSE119615 | GEO
| S-EPMC3710870 | biostudies-other