Project description:Our ability to predict protein expression from DNA sequence alone remains poor, reflecting our limited understanding of cis-regulatory grammar and hampering the design of engineered genes for synthetic biology applications. Here, we generate a model that predicts the translational efficiency of the 5’ untranslated region (UTR) of mRNAs in the yeast Saccharomyces cerevisiae. We constructed a library of half a million 50-nucleotide- long random 5’ UTRs and assayed their activity in a massively parallel growth selection experiment. The resulting data allow us to quantify the impact on translation of Kozak sequence composition, upstream open reading frames (uORFs) and secondary structure. We trained a convolutional neural network (CNN) on the random library and showed that it performs well at predicting the translational efficiency of both a held-out set of the random 5’ UTRs as well as native S. cerevisiae 5’ UTRs. The model additionally was used to computationally evolve highly translating 5’ UTRs. We confirmed experimentally that the great majority of the evolved sequences lead to higher translation rates than the starting sequences, demonstrating the predictive power of this model. The sequences were assayed in the context of a constant promoter and downstream coding sequence. Cell growth was used as a proxy for protein expression. Growth was measured by determining the ratio of sequence counts before and after growth selection.
Project description:“Biological noise” is defined as functionally insignificant events that occur in living cells due to imperfect fidelity of biological processes. Distinguishing between biological function and biological noise is often difficult, and experiments to measure biological noise have not been performed. Here, we measure biological noise in yeast cells by analyzing chromatin structure and transcription of an 18 kb region of DNA whose sequence was randomly generated and hence is functionally irrelevant. Nucleosome occupancy on random-sequence DNA is comparable to that on yeast genomic DNA. However, nucleosome-depleted regions are much less frequent, and there are fewer well-positioned nucleosomes and shorter nucleosome arrays. Steady-state levels of RNAs expressed from random-sequence DNA are comparable to those of yeast mRNAs, although transcription and mRNA decay rates are at higher levels. Transcriptional initiation (5’ ends) from random-sequence DNA occurs at numerous sites at low levels, indicating very low intrinsic specificity of the Pol II machinery. In contrast, poly(A) profiles (relative levels and clustering of 3’ isoforms) of random-sequence RNAs are roughly comparable to those of endogenous yeast RNAs, which are restricted to 3’ untranslated regions. RNAs expressed from random-sequence DNA show higher cell-to-cell variability than RNAs expressed from yeast genomic DNA, suggesting that functional elements limit the variability among individual cells within a population. These observations indicate that transcriptional noise occurs at high levels in yeast, and they provide insight into how chromatin and transcription patterns arise from the evolved yeast genome.
Project description:“Biological noise” is defined as functionally insignificant events that occur in living cells due to imperfect fidelity of biological processes. Distinguishing between biological function and biological noise is often difficult, and experiments to measure biological noise have not been performed. Here, we measure biological noise in yeast cells by analyzing chromatin structure and transcription of an 18 kb region of DNA whose sequence was randomly generated and hence is functionally irrelevant. Nucleosome occupancy on random-sequence DNA is comparable to that on yeast genomic DNA. However, nucleosome-depleted regions are much less frequent, and there are fewer well-positioned nucleosomes and shorter nucleosome arrays. Steady-state levels of RNAs expressed from random-sequence DNA are comparable to those of yeast mRNAs, although transcription and mRNA decay rates are at higher levels. Transcriptional initiation (5’ ends) from random-sequence DNA occurs at numerous sites at low levels, indicating very low intrinsic specificity of the Pol II machinery. In contrast, poly(A) profiles (relative levels and clustering of 3’ isoforms) of random-sequence RNAs are roughly comparable to those of endogenous yeast RNAs, which are restricted to 3’ untranslated regions. RNAs expressed from random-sequence DNA show higher cell-to-cell variability than RNAs expressed from yeast genomic DNA, suggesting that functional elements limit the variability among individual cells within a population. These observations indicate that transcriptional noise occurs at high levels in yeast, and they provide insight into how chromatin and transcription patterns arise from the evolved yeast genome.
Project description:“Biological noise” is defined as functionally insignificant events that occur in living cells due to imperfect fidelity of biological processes. Distinguishing between biological function and biological noise is often difficult, and experiments to measure biological noise have not been performed. Here, we measure biological noise in yeast cells by analyzing chromatin structure and transcription of an 18 kb region of DNA whose sequence was randomly generated and hence is functionally irrelevant. Nucleosome occupancy on random-sequence DNA is comparable to that on yeast genomic DNA. However, nucleosome-depleted regions are much less frequent, and there are fewer well-positioned nucleosomes and shorter nucleosome arrays. Steady-state levels of RNAs expressed from random-sequence DNA are comparable to those of yeast mRNAs, although transcription and mRNA decay rates are at higher levels. Transcriptional initiation (5’ ends) from random-sequence DNA occurs at numerous sites at low levels, indicating very low intrinsic specificity of the Pol II machinery. In contrast, poly(A) profiles (relative levels and clustering of 3’ isoforms) of random-sequence RNAs are roughly comparable to those of endogenous yeast RNAs, which are restricted to 3’ untranslated regions. RNAs expressed from random-sequence DNA show higher cell-to-cell variability than RNAs expressed from yeast genomic DNA, suggesting that functional elements limit the variability among individual cells within a population. These observations indicate that transcriptional noise occurs at high levels in yeast, and they provide insight into how chromatin and transcription patterns arise from the evolved yeast genome.
Project description:“Biological noise” is defined as functionally insignificant events that occur in living cells due to imperfect fidelity of biological processes. Distinguishing between biological function and biological noise is often difficult, and experiments to measure biological noise have not been performed. Here, we measure biological noise in yeast cells by analyzing chromatin structure and transcription of an 18 kb region of DNA whose sequence was randomly generated and hence is functionally irrelevant. Nucleosome occupancy on random-sequence DNA is comparable to that on yeast genomic DNA. However, nucleosome-depleted regions are much less frequent, and there are fewer well-positioned nucleosomes and shorter nucleosome arrays. Steady-state levels of RNAs expressed from random-sequence DNA are comparable to those of yeast mRNAs, although transcription and mRNA decay rates are at higher levels. Transcriptional initiation (5’ ends) from random-sequence DNA occurs at numerous sites at low levels, indicating very low intrinsic specificity of the Pol II machinery. In contrast, poly(A) profiles (relative levels and clustering of 3’ isoforms) of random-sequence RNAs are roughly comparable to those of endogenous yeast RNAs, which are restricted to 3’ untranslated regions. RNAs expressed from random-sequence DNA show higher cell-to-cell variability than RNAs expressed from yeast genomic DNA, suggesting that functional elements limit the variability among individual cells within a population. These observations indicate that transcriptional noise occurs at high levels in yeast, and they provide insight into how chromatin and transcription patterns arise from the evolved yeast genome.
Project description:“Biological noise” is defined as functionally insignificant events that occur in living cells due to imperfect fidelity of biological processes. Distinguishing between biological function and biological noise is often difficult, and experiments to measure biological noise have not been performed. Here, we measure biological noise in yeast cells by analyzing chromatin structure and transcription of an 18 kb region of DNA whose sequence was randomly generated and hence is functionally irrelevant. Nucleosome occupancy on random-sequence DNA is comparable to that on yeast genomic DNA. However, nucleosome-depleted regions are much less frequent, and there are fewer well-positioned nucleosomes and shorter nucleosome arrays. Steady-state levels of RNAs expressed from random-sequence DNA are comparable to those of yeast mRNAs, although transcription and mRNA decay rates are at higher levels. Transcriptional initiation (5’ ends) from random-sequence DNA occurs at numerous sites at low levels, indicating very low intrinsic specificity of the Pol II machinery. In contrast, poly(A) profiles (relative levels and clustering of 3’ isoforms) of random-sequence RNAs are roughly comparable to those of endogenous yeast RNAs, which are restricted to 3’ untranslated regions. RNAs expressed from random-sequence DNA show higher cell-to-cell variability than RNAs expressed from yeast genomic DNA, suggesting that functional elements limit the variability among individual cells within a population. These observations indicate that transcriptional noise occurs at high levels in yeast, and they provide insight into how chromatin and transcription patterns arise from the evolved yeast genome.
Project description:This is a Random Forest algorithm-based machine learning model to predict lncRNAs from coding mRNAs in plant transcriptomic data. The model assigns 1 for coding sequences and 2 for long non-coding sequences. The prediction is performed using a combination of Open Reading Frame (ORF) based, Sequence-based and Codon-bias features. Users need to download the curated ONNX model and also need to convert the sequences into feature matrix as mentioned in PLIT paper (Deshpande et al. 2019) to make predictions on sequences from Zea Mays sequence data.
Project description:Eukaryotic genes generate multiple mRNA transcript isoforms though alternative transcription, splicing, and polyadenylation. However, the relationship between human transcript diversity and protein production is complex as each isoform can be translated differently. We fractionated a polysome profile and reconstructed transcript isoforms from each fraction, which we term Transcript Isoforms in Polysomes sequencing (TrIP-seq). Analysis of these data revealed regulatory features that control ribosome occupancy and translational output of each transcript isoform. We extracted a panel of 5′ and 3′ untranslated regions that control protein production from an unrelated gene in cells over a 100-fold range. Select 5′ untranslated regions exert robust translational control between cell lines, while 3′ untranslated regions can confer cell-type-specific expression. These results expose the large dynamic range of transcript-isoform-specific translational control, identify isoform-specific sequences that control protein output in human cells, and demonstrate that transcript isoform diversity must be considered when relating RNA and protein levels.
Project description:Prp45 is a budding yeast NineTeen-Complex associated factor, which plays a role in pre-mRNA splicing. In addition to its documented involvement in the second step of splicing, here we show that it is important in the early stages of co-transcriptional spliceosome assembly as well. To determine the overall splicing efficiency of prp45 mutant cells and to reveal how the efficiency depends on the sequences which define introns (whether they are consensual or non-consensual), we performed RNA-seq analysis of prp45(1-169) and corresponding wild-type cells (two biological replicates each). Total RNA was isolated by combining phenol-chlorophorm extraction with MasterPure Yeast RNA Purification Kit (Epicentre). Ribodepletion, library preparation and sequencing were performed by BGI Genomics.