Project description:Structure probing coupled with high-throughput sequencing holds the potential to revolutionize our understanding of the role of RNA structure in regulation of gene expression. Despite major technological advances, intrinsic noise and high coverage requirements greatly limit the applicability of these techniques. Here we describe a probabilistic modeling pipeline which accounts for biological variability and biases in the data, yielding statistically interpretable scores for the probability of nucleotide modification transcriptome-wide. We demonstrate on two yeast data sets that our method has greatly increased sensitivity, enabling the identification of modified regions on many more transcripts compared with existing pipelines. It also provides confident predictions at much lower coverage levels than previously reported. Our results show that statistical modeling greatly extends the scope and potential of transcriptome-wide structure probing experiments.
Project description:To accelerate previous RNA structure probing approaches, which focus on analyzing one RNA sequence at a time, we have developed FragSeq, a high-throughput RNA structure probing method that uses high-throughput RNA sequencing to identify single-stranded RNA (ssRNA) regions from fragments generated by nuclease P1, which is specific for single-stranded nucleic acids. In the accompanying study, we show that we can accurately and simultaneously map ssRNA regions in multiple non-coding RNAs with known structure in experiments probing the entire mouse nuclear transcriptome. We carried out probing in two cell types to assess reproducibility. We also identified and experimentally validated structured regions in ncRNAs never previously probed.
Project description:Structure probing experiments were performed on in vitro transcripts and E. coli and human cell cultures under natively extracted (cell-free) and in-cell conditions to benchmark the performance of the newly introduced PAIR-MaP correlated chemical probing strategy for detecting RNA duplexes. Multiple-hit dimethyl sulfate (DMS) probing was done using new buffer conditions that facilitate DMS modification of all four nucleotides.
Project description:To accelerate previous RNA structure probing approaches, which focus on analyzing one RNA sequence at a time, we have developed FragSeq, a high-throughput RNA structure probing method that uses high-throughput RNA sequencing to identify single-stranded RNA (ssRNA) regions from fragments generated by nuclease P1, which is specific for single-stranded nucleic acids. In the accompanying study, we show that we can accurately and simultaneously map ssRNA regions in multiple non-coding RNAs with known structure in experiments probing the entire mouse nuclear transcriptome. We carried out probing in two cell types to assess reproducibility. We also identified and experimentally validated structured regions in ncRNAs never previously probed. We examined mouse nuclear RNA from two cell types: undifferentiated embryonic stem cells (UNDIFF) and cells differentiated into neural precursors (D5NP). For each cell type, nuclear RNA was purified and deproteinized, denatured, and refolded in vitro, from which we prepared three barcoded samples: "nuclease" (RNA partially digested with P1 ssRNA-specific nuclease, yielding 5'-PO4/3'-OH end chemistry at each cleavage site), "control" (control for "nuclease" sample to idenfity endogenous 5'-PO4/3'-OH), and "PNK" (same as "control" but followed by a polynucleotide kinase treatment to convert 5'-OH/3'-cyclic-phosphate ends to clonable 5'-PO4/3'-OH ends). Resulting RNA fragments were cloned using the SOLiD Small RNA Expression Kit (SREK) protocol, which ligates linkers only to 5'-PO4/3'-OH containing RNA, enriching for clones of products resulting from P1 cleavage in "nuclease" sample and selecting against random degradation. Two cell types, three treatments each, thus resulted in six barcoded samples total (barcodes 01, 02, 04, 05, 07, 08). Four other barcoded samples were prepared for separate experiments not used in our study (barcodes 03, 06, 09, 10), so their preparation is not described here. The total run of ten barcodes was done on the ABI SOLiD3 platform and a custom algorithm (FragSeq v0.0.1) was used to compute "cutting scores" (as described in our paper) that show ssRNA regions in hundreds of ncRNAs.
Project description:Ribosome assembly in eukaryotes involves the activity of hundreds of assembly factors that direct the hierarchical assembly of ribosomal proteins and numerous ribosomal RNA folding steps. However, detailed insights into the function of assembly factors and ribosomal RNA folding events are lacking. To address this, we have developed ChemModSeq, a method that combines structure probing, high throughput sequencing and statistical modeling, to quantitatively measure RNA structural rearrangements during the assembly of macromolecular complexes. By applying ChemModSeq to purified 40S assembly intermediates we obtained nucleotide-resolution maps of ribosomal RNA flexibility revealing structurally distinct assembly intermediates and mechanistic insights into assembly dynamics not readily observed in cryo-electron microscopy reconstructions. We show that RNA restructuring events coincide with the release of assembly factors and predict that completion of the head domain is required before the Rio1 kinase enters the assembly pathway. Collectively, our results suggest that 40S assembly factors regulate the timely incorporation of ribosomal proteins by delaying specific folding steps in the 3M-bM-^@M-^Y major domain of the 20S pre-ribosomal RNA. Three datasets of yeast ribosomal samples subjected to different chemical modifications; 1M7 dataset contains 8 different modified samples and 2 control samples; NAI dataset contains 3 different modified samples and 2 control samples; DMS dataset contains 1 modified sample and 1 control sample. Each sample consists of at least two replicates.
Project description:Ribosome assembly in eukaryotes involves the activity of hundreds of assembly factors that direct the hierarchical assembly of ribosomal proteins and numerous ribosomal RNA folding steps. However, detailed insights into the function of assembly factors and ribosomal RNA folding events are lacking. To address this, we have developed ChemModSeq, a method that combines structure probing, high throughput sequencing and statistical modeling, to quantitatively measure RNA structural rearrangements during the assembly of macromolecular complexes. By applying ChemModSeq to purified 40S assembly intermediates we obtained nucleotide-resolution maps of ribosomal RNA flexibility revealing structurally distinct assembly intermediates and mechanistic insights into assembly dynamics not readily observed in cryo-electron microscopy reconstructions. We show that RNA restructuring events coincide with the release of assembly factors and predict that completion of the head domain is required before the Rio1 kinase enters the assembly pathway. Collectively, our results suggest that 40S assembly factors regulate the timely incorporation of ribosomal proteins by delaying specific folding steps in the 3’ major domain of the 20S pre-ribosomal RNA.
Project description:Ribosome profiling data reports on the distribution of translating ribosomes, at steady-state, with codon-level resolution. We present a robust method to extract codon translation rates and protein synthesis rates from these data, and identify causal features associated with elongation and translation efficiency in physiological conditions in yeast. We show that neither elongation rate nor translational efficiency is improved by experimental manipulation of the abundance or body sequence of the rare AGG tRNA. Deletion of three of the four copies of the heavily used ACA tRNA shows a modest efficiency decrease that could be explained by other rate-reducing signals at gene start. This suggests that correlation between codon bias and efficiency arises as selection for codons to utilize translation machinery efficiently in highly translated genes. We also show a correlation between efficiency and RNA structure calculated both computationally and from recent structure probing data, as well as the Kozak initiation motif, which may comprise a mechanism to regulate initiation. We test whether tRNA abundance affects elongation or translation efficiency by changing the tRNA levels through deletion or over expression and measuring the ribosomal dwell time at each codon using a robust statistical method that accounts for flow conservation.
Project description:The long non-coding RNA (lncRNA) Xist is a master regulator of X-chromosome inactivation in mammalian cells. Models for how Xist and other lncRNAs function depend on thermodynamically stable secondary and higher-order structures that RNAs can form in the context of a cell. Probing accessible RNA bases can provide data to build models of RNA conformation that provide insight into RNA function, molecular evolution, and modularity. To study the structure of Xist in cells, we built upon recent advances in RNA secondary structure mapping and modeling to develop Targeted Structure-Seq, which combines chemical probing of RNA structure in cells with target-specific massively parallel sequencing. By enriching for signals from the RNA of interest, Targeted Structure-Seq achieves high coverage of the target RNA with relatively few sequencing reads, thus providing a targeted and scalable approach to analyze RNA conformation in cells. We use this approach to probe the full-length Xist lncRNA to develop new models for functional elements within Xist, including the repeat A element in the 5'-end of Xist. This analysis also identified new structural elements in Xist that are evolutionarily conserved, including a new element proximal to the C repeats that is important for Xist function. Examination of dimethylsufate reactivity of Xist lncRNA and 18S rRNA in cells using targeted reverse transcription to determine reactivity, and comparisons with untreated control samples.
Project description:High-density oligonucleotide tiling-microarrays are currently providing a powerful tool for genome-wide in vivo DNA footprinting assays, yielding unprecedented insights into tissuespecific protein-DNA interactions and chromatin structure. Despite the impressive advances, however, the technology still suffers from numerous complications caused by background noise and probespecific effects. A few computational methods modeling sequencerelated probe effects are now available for Affymetrix tiling arrays, but no counterpart is yet available for two-color arrays. A novel normalization method based on the GC content of probes is developed for two-color tiling-arrays. The proposed method, together with robust estimates of the model parameters, is shown to perform superbly on published data sets. Accompanying the normalization method, a robust algorithm for detecting peak regions is formulated and also shown to perform well compared to other approaches. The tools presented herein have been implemented for NimbleGen tiling arrays as a stand-alone Java program, which can also display various plots of statistical analysis for quality control of experiments. Upon changing the file format, the program also works on Agilent data. Keywords: ChIP-chip