Inferring transcriptional logic from multiple dynamic experiments.
ABSTRACT: The availability of more data of dynamic gene expression under multiple experimental conditions provides new information that makes the key goal of identifying not only the transcriptional regulators of a gene but also the underlying logical structure attainable.We propose a novel method for inferring transcriptional regulation using a simple, yet biologically interpretable, model to find the logic by which a set of candidate genes and their associated transcription factors (TFs) regulate the transcriptional process of a gene of interest. Our dynamic model links the mRNA transcription rate of the target gene to the activation states of the TFs assuming that these interactions are consistent across multiple experiments and over time. A trans-dimensional Markov Chain Monte Carlo (MCMC) algorithm is used to efficiently sample the regulatory logic under different combinations of parents and rank the estimated models by their posterior probabilities. We demonstrate and compare our methodology with other methods using simulation examples and apply it to a study of transcriptional regulation of selected target genes of Arabidopsis Thaliana from microarray time series data obtained under multiple biotic stresses. We show that our method is able to detect complex regulatory interactions that are consistent under multiple experimental conditions.Programs are written in MATLAB and Statistics Toolbox Release 2016b, The MathWorks, Inc., Natick, Massachusetts, United States and are available on GitHub https://github.com/giorgosminas/TRS and at http://firstname.lastname@example.org or email@example.com.Supplementary data are available at Bioinformatics online.
Project description:The impetus for this work was the need to analyse nucleotide diversity in a viral mix taken from honeybees. The paper has two findings. First, a method for correction of next generation sequencing error in the distribution of nucleotides at a site is developed. Second, a package of methods for assessment of nucleotide diversity is assembled. The error correction method is statistically based and works at the level of the nucleotide distribution rather than the level of individual nucleotides. The method relies on an error model and a sample of known viral genotypes that is used for model calibration. A compendium of existing and new diversity analysis tools is also presented, allowing hypotheses about diversity and mean diversity to be tested and associated confidence intervals to be calculated. The methods are illustrated using honeybee viral samples. Software in both Excel and Matlab and a guide are available at http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software/, the Warwick University Systems Biology Centre software download site.
Project description:<h4>Motivation</h4>There are a number of algorithms to infer causal regulatory networks from time series (gene expression) data. Here we analyse the phenomena of regulator interference, where regulators with similar dynamics mutually suppress both the probability of regulating a target and the associated link strength; for instance, interference between two identical strong regulators reduces link probabilities by ?50%.<h4>Results</h4>We construct a robust method to define an interference-corrected causal network based on an analysis of the conditional link probabilities that recovers links lost through interference. On a large real network (Streptomyces coelicolor, phosphate depletion), we demonstrate that significant interference can occur between regulators with a correlation as low as 0.865, losing an estimated 34% of links by interference. However, levels of interference cannot be predicted from the correlation between regulators alone and are data specific. Validating against known networks, we show that high numbers of functional links are lost by regulator interference. Performance against other methods on DREAM4 data is excellent.<h4>Availability and implementation</h4>The method is implemented in R and is publicly available as the NIACS package at http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software.
Project description:Protocols for preparing RNA sequencing (RNA-seq) libraries, most prominently "Smart-seq" variations, introduce global biases that can have a significant impact on the quantification of gene expression levels. This global bias can lead to drastic over- or under-representation of RNA in non-linear length-dependent fashion due to enzymatic reactions during cDNA production. It is currently not corrected by any RNA-seq software, which mostly focus on local bias in coverage along RNAs. This paper describes LiBiNorm, a simple command line program that mimics the popular htseq-count software and allows diagnostics, quantification, and global bias removal. LiBiNorm outputs gene expression data that has been normalized to correct for global bias introduced by the Smart-seq2 protocol. In addition, it produces data and several plots that allow insights into the experimental history underlying library preparation. The LiBiNorm package includes an R script that allows visualization of the main results. LiBiNorm is the first software application to correct for the global bias that is introduced by the Smart-seq2 protocol. It is freely downloadable at http://www2.warwick.ac.uk/fac/sci/lifesci/research/libinorm.
Project description:This study exploits time, the relatively unexplored fourth dimension of gene regulatory networks (GRNs), to learn the temporal transcriptional logic underlying dynamic nitrogen (N) signaling in plants. Our "just-in-time" analysis of time-series transcriptome data uncovered a temporal cascade of cis elements underlying dynamic N signaling. To infer transcription factor (TF)-target edges in a GRN, we applied a time-based machine learning method to 2,174 dynamic N-responsive genes. We experimentally determined a network precision cutoff, using TF-regulated genome-wide targets of three TF hubs (CRF4, SNZ, and CDF1), used to "prune" the network to 155 TFs and 608 targets. This network precision was reconfirmed using genome-wide TF-target regulation data for four additional TFs (TGA1, HHO5/6, and PHL1) not used in network pruning. These higher-confidence edges in the GRN were further filtered by independent TF-target binding data, used to calculate a TF "N-specificity" index. This refined GRN identifies the temporal relationship of known/validated regulators of N signaling (NLP7/8, TGA1/4, NAC4, HRS1, and LBD37/38/39) and 146 additional regulators. Six TFs-CRF4, SNZ, CDF1, HHO5/6, and PHL1-validated herein regulate a significant number of genes in the dynamic N response, targeting 54% of N-uptake/assimilation pathway genes. Phenotypically, inducible overexpression of CRF4 in planta regulates genes resulting in altered biomass, root development, and 15NO3- uptake, specifically under low-N conditions. This dynamic N-signaling GRN now provides the temporal "transcriptional logic" for 155 candidate TFs to improve nitrogen use efficiency with potential agricultural applications. Broadly, these time-based approaches can uncover the temporal transcriptional logic for any biological response system in biology, agriculture, or medicine.
Project description:The ability to perform molecular-level computation in mammalian cells has the potential to enable a new wave of sophisticated cell-based therapies and diagnostics. To this end, we developed a Boolean logic framework utilizing artificial Cys(2)-His(2) zinc finger transcription factors (ZF-TFs) as computing elements. Artificial ZFs can be designed to specifically bind different DNA sequences and thus comprise a diverse set of components ideal for the construction of scalable networks. We generate ZF-TF activators and repressors and demonstrate a novel, general method to tune ZF-TF response by fusing ZF-TFs to leucine zipper homodimerization domains. We describe 15 transcriptional activators that display 2- to 463-fold induction and 15 transcriptional repressors that show 1.3- to 16-fold repression. Using these ZF-TFs, we compute OR, NOR, AND and NAND logic, employing hybrid promoters and split intein-mediated protein splicing to integrate signals. The split intein strategy is able to fully reconstitute the ZF-TFs, maintaining them as a uniform set of computing elements. Together, these components comprise a robust platform for building mammalian synthetic gene circuits capable of precisely modulating cellular behavior.
Project description:Motivation:The time evolution of molecular species involved in biochemical reaction networks often arises from complex stochastic processes involving many species and reaction events. Inference for such systems is profoundly challenged by the relative sparseness of experimental data, as measurements are often limited to a small subset of the participating species measured at discrete time points. The need for model reduction can be realistically achieved for oscillatory dynamics resulting from negative translational and transcriptional feedback loops (TTFLs) by the introduction of probabilistic time-delays. Although this approach yields a simplified model, inference is challenging and subject to ongoing research. The linear noise approximation (LNA) has recently been proposed to address such systems in stochastic form and will be exploited here. Results:We develop a novel filtering approach for the LNA in stochastic systems with distributed delays, which allows the parameter values and unobserved states of a stochastic negative feedback model to be inferred from univariate time-series data. The performance of the methods is tested for simulated data. Results are obtained for real data when the model is fitted to imaging data on Cry1, a key gene involved in the mammalian central circadian clock, observed via a luciferase reporter construct in a mouse suprachiasmatic nucleus (SCN). Availability:Programs are written in MATLAB and Statistics Toolbox Release 2016b, The MathWorks, Inc., Natick, Massachusetts, United States. Sample code and Cry1 data are available on GitHub https://github.com/scalderazzo/FLNADD. Supplementary information:Supplementary data are available at Bioinformatics online.
Project description:C/EBP? plays an instructive role in the macrophage-neutrophil cell-fate decision and its expression is necessary for neutrophil development. How Cebpa itself is regulated in the myeloid lineage is not known. We decoded the cis-regulatory logic of Cebpa, and two other myeloid transcription factors, Egr1 and Egr2, using a combined experimental-computational approach. With a reporter design capable of detecting both distal enhancers and silencers, we analyzed 46 putative cis-regulatory modules (CRMs) in cells representing myeloid progenitors, and derived early macrophages or neutrophils. In addition to novel enhancers, this analysis revealed a surprisingly large number of silencers. We determined the regulatory roles of 15 potential transcriptional regulators by testing 32,768 alternative sequence-based transcriptional models against CRM activity data. This comprehensive analysis allowed us to infer the cis-regulatory logic for most of the CRMs. Silencer-mediated repression of Cebpa was found to be effected mainly by TFs expressed in non-myeloid lineages, highlighting a previously unappreciated contribution of long-distance silencing to hematopoietic lineage resolution. The repression of Cebpa by multiple factors expressed in alternative lineages suggests that hematopoietic genes are organized into densely interconnected repressive networks instead of hierarchies of mutually repressive pairs of pivotal TFs. More generally, our results demonstrate that de novo cis-regulatory dissection is feasible on a large scale with the aid of transcriptional modeling. Current address: Department of Biology, University of North Dakota, 10 Cornell Street, Stop 9019, Grand Forks, ND 58202-9019, USA.
Project description:Gene regulatory networks are based on simple building blocks such as promoters, transcription factors (TFs) and their binding sites on DNA. But how diverse are the functions that can be obtained by different arrangements of promoters and TF binding sites? In this work we constructed synthetic regulatory regions using promoter elements and binding sites of two noninteracting TFs, each sensing a single environmental input signal. We show that simply by combining these three kinds of elements, we can obtain 11 of the 16 Boolean logic gates that integrate two environmental signals in vivo. Further, we demonstrate how combination of logic gates can result in new logic functions. Our results suggest that simple elements of transcription regulation form a highly flexible toolbox that can generate diverse functions under natural selection.
Project description:Synthetic biology has seen an explosive growth in the capability of engineering artificial gene circuits from transcription factors (TFs), particularly in bacteria. However, most artificial networks still employ the same core set of TFs (for example LacI, TetR and cI). The TFs mostly function via repression and it is difficult to integrate multiple inputs in promoter logic. Here we present to our knowledge the first set of dual activator-repressor switches for orthogonal logic gates, based on bacteriophage ? cI variants and multi-input promoter architectures. Our toolkit contains 12 TFs, flexibly operating as activators, repressors, dual activator-repressors or dual repressor-repressors, on up to 270 synthetic promoters. To engineer non cross-reacting cI variants, we design a new M13 phagemid-based system for the directed evolution of biomolecules. Because cI is used in so many synthetic biology projects, the new set of variants will easily slot into the existing projects of other groups, greatly expanding current engineering capacities.
Project description:In animal systems, master regulatory transcription factors (TFs) mediate stem cell maintenance through a direct transcriptional repression of differentiation promoting TFs. Whether similar mechanisms operate in plants is not known. In plants, shoot apical meristems serve as reservoirs of stem cells that provide cells for all above ground organs. WUSCHEL, a homeodomain TF produced in cells of the niche, migrates into adjacent cells where it specifies stem cells. Through high-resolution genomic analysis, we show that WUSCHEL represses a large number of genes that are expressed in differentiating cells including a group of differentiation promoting TFs involved in leaf development. We show that WUS directly binds to the regulatory regions of differentiation promoting TFs; KANADI1, KANADI2, ASYMMETRICLEAVES2 and YABBY3 to repress their expression. Predictions from a computational model, supported by live imaging, reveal that WUS-mediated repression prevents premature differentiation of stem cell progenitors, being part of a minimal regulatory network for meristem maintenance. Our work shows that direct transcriptional repression of differentiation promoting TFs is an evolutionarily conserved logic for stem cell regulation.