Project description:Optimization of gene expression levels is an essential part of the organism design process. Fine control of this process can be achieved by engineering transcription and translation control elements, including the ribosome binding site (RBS). Unfortunately, the design of specific genetic parts remains challenging because of the lack of reliable design methods. To address this problem, we have created a machine learning guided Design-Build-Test-Learn (DBTL) cycle for the experimental design of bacterial RBSs to demonstrate how small genetic parts can be reliably designed using relatively small, high-quality data sets. We used Gaussian Process Regression for the Learn phase of the cycle and the Upper Confidence Bound multiarmed bandit algorithm for the Design of genetic variants to be tested in vivo. We have integrated these machine learning algorithms with laboratory automation and high-throughput processes for reliable data generation. Notably, by Testing a total of 450 RBS variants in four DBTL cycles, we have experimentally validated RBSs with high translation initiation rates equaling or exceeding our benchmark RBS by up to 34%. Overall, our results show that machine learning is a powerful tool for designing RBSs, and they pave the way toward more complicated genetic devices.
Project description:Binding of transcription factors (TFs) to regulatory sequences is a pivotal step in the control of gene expression. Despite many advances in the characterization of sequence motifs recognized by TFs, our ability to quantitatively predict TF binding to different regulatory sequences is still limited. Here, we present a novel experimental assay termed BunDLE-seq that provides quantitative measurements of TF binding to thousands of fully designed sequences of 200 bp in length within a single experiment. Applying this binding assay to two yeast TFs we demonstrate that sequences outside the core TF binding site profoundly affect TF binding. We show that TF-specific models based on the sequence or DNA shape of the regions flanking the core binding site are highly predictive of the measured differential TF binding. We further characterize the dependence of TF binding, accounting for measurements of single and co-occurring binding events, on the number and location of binding sites and on the TF concentration. Finally, by coupling our in vitro TF binding measurements, and another application of our method probing nucleosome formation, to in vivo expression measurements carried out with the same template sequences now serving and promoters, we offer insights into mechanisms that may determine the different expression outcomes observed. Our assay thus paves the way to a more comprehensive understanding of TF binding to regulatory sequences, and allows the characterization of TF binding determinants within and outside of core binding sites.
Project description:High-throughput sequencing has opened numerous possibilities for the identification of regulatory RNA-binding events. Cross-linking and immunoprecipitation of Argonaute protein members can pinpoint microRNA target sites within tens of bases, but leaves the identity of the microRNA unresolved. A flexible computational framework that integrates sequence with cross-linking features reliably identifies the microRNA family involved in each binding event, considerably outperforms sequence-only approaches, and quantifies the prevalence of noncanonical binding modes. Ago2 (Argonaute 2) PAR-CLIP and RNA deep sequencing of Epstein-Barr virus B95.8-infected Lymphoblastoid Cell Lines (LCLs)
Project description:High-throughput sequencing has opened numerous possibilities for the identification of regulatory RNA-binding events. Cross-linking and immunoprecipitation of Argonaute protein members can pinpoint microRNA target sites within tens of bases, but leaves the identity of the microRNA unresolved. A flexible computational framework that integrates sequence with cross-linking features reliably identifies the microRNA family involved in each binding event, considerably outperforms sequence-only approaches, and quantifies the prevalence of noncanonical binding modes.
Project description:BackgroundWhile methods for annotation of genes are increasingly reliable, the exact identification of translation initiation sites remains a challenging problem. Since the N-termini of proteins often contain regulatory and targeting information, developing a robust method for start site identification is crucial. Ribosome profiling reads show distinct patterns of read length distributions around translation initiation sites. These patterns are typically lost in standard ribosome profiling analysis pipelines, when reads from footprints are adjusted to determine the specific codon being translated.ResultsUtilising these signatures in combination with nucleotide sequence information, we build a model capable of predicting translation initiation sites and demonstrate its high accuracy using N-terminal proteomics. Applying this to prokaryotic translatomes, we re-annotate translation initiation sites and provide evidence of N-terminal truncations and extensions of previously annotated coding sequences. These re-annotations are supported by the presence of structural and sequence-based features next to N-terminal peptide evidence. Finally, our model identifies 61 novel genes previously undiscovered in the Salmonella enterica genome.ConclusionsSignatures within ribosome profiling read length distributions can be used in combination with nucleotide sequence information to provide accurate genome-wide identification of translation initiation sites.
Project description:BR induces rapid dephosphorylation and nuclear localization of BZR1 through a cascade of signaling events thus regulates BR-responsive gene expression and plant development. We use chromatin-immunoprecipitation microarray (ChIP-chip) experiments identified about 2200 high confidence BZR1 binding sites, which includes all previous known binding regions. The binding sites are distributed throughout the genome but are rare in the centromere regions, which is similar to the distribution of expressed genes. The binding sites were substantially enriched in the 5’ and 3’ intergenic regions compared to the transcribed regions of genes. Combining with transcription profilling of BR regulated genes generated 953 BR regulated BZR1 targets.
Project description:The majority of sequence-specific transcription factors bind genomic DNA only at a fraction of their potential binding sites and the ‘rules’ for binding or not-binding are only partially understood. Here, we studied the binding properties of the myeloid and B-cell specific transcription factor PU.1 in-vivo and in-vitro to unveil basic features of occupied vs. non-occupied consensus sites. In addition to published PU.1 ChIP-seq data we mapped CTCF binding sites in monocytes and macrophages to determine chromatin domain boundaries and performed MCIp-seq in monocytes to reveal DNA methylation patterns across the genome. ChIP-seq of CTCF in human monocytes and human monocyte-derived macrophages as well as MCIp-seq in human monocytes
Project description:The majority of sequence-specific transcription factors bind genomic DNA only at a fraction of their potential binding sites and the ‘rules’ for binding or not-binding are only partially understood. Here, we studied the binding properties of the myeloid and B-cell specific transcription factor PU.1 in-vivo and in-vitro to unveil basic features of occupied vs. non-occupied consensus sites. In addition to published PU.1 ChIP-seq data we mapped CTCF binding sites in monocytes and macrophages to determine chromatin domain boundaries and performed MCIp-seq in monocytes to reveal DNA methylation patterns across the genome.