Deciphering transcriptional regulations coordinating the response to environmental changes.
ABSTRACT: Gene co-expression evidenced as a response to environmental changes has shown that transcriptional activity is coordinated, which pinpoints the role of transcriptional regulatory networks (TRNs). Nevertheless, the prediction of TRNs based on the affinity of transcription factors (TFs) with binding sites (BSs) generally produces an over-estimation of the observable TF/BS relations within the network and therefore many of the predicted relations are spurious.We present LOMBARDE, a bioinformatics method that extracts from a TRN determined from a set of predicted TF/BS affinities a subnetwork explaining a given set of observed co-expressions by choosing the TFs and BSs most likely to be involved in the co-regulation. LOMBARDE solves an optimization problem which selects confident paths within a given TRN that join a putative common regulator with two co-expressed genes via regulatory cascades. To evaluate the method, we used public data of Escherichia coli to produce a regulatory network that explained almost all observed co-expressions while using only 19 % of the input TF/BS affinities but including about 66 % of the independent experimentally validated regulations in the input data. When all known validated TF/BS affinities were integrated into the input data the precision of LOMBARDE increased significantly. The topological characteristics of the subnetwork that was obtained were similar to the characteristics described for known validated TRNs.LOMBARDE provides a useful modeling scheme for deciphering the regulatory mechanisms that underlie the phenotypic responses of an organism to environmental challenges. The method can become a reliable tool for further research on genome-scale transcriptional regulation studies.
Project description:Transcriptional regulatory networks (TRNs) have been studied intensely for >25 y. Yet, even for the <i>Escherichia coli</i> TRN-probably the best characterized TRN-several questions remain. Here, we address three questions: (<i>i</i>) How complete is our knowledge of the <i>E. coli</i> TRN; (<i>ii</i>) how well can we predict gene expression using this TRN; and (<i>iii</i>) how robust is our understanding of the TRN? First, we reconstructed a high-confidence TRN (hiTRN) consisting of 147 transcription factors (TFs) regulating 1,538 transcription units (TUs) encoding 1,764 genes. The 3,797 high-confidence regulatory interactions were collected from published, validated chromatin immunoprecipitation (ChIP) data and RegulonDB. For 21 different TF knockouts, up to 63% of the differentially expressed genes in the hiTRN were traced to the knocked-out TF through regulatory cascades. Second, we trained supervised machine learning algorithms to predict the expression of 1,364 TUs given TF activities using 441 samples. The algorithms accurately predicted condition-specific expression for 86% (1,174 of 1,364) of the TUs, while 193 TUs (14%) were predicted better than random TRNs. Third, we identified 10 regulatory modules whose definitions were robust against changes to the TRN or expression compendium. Using surrogate variable analysis, we also identified three unmodeled factors that systematically influenced gene expression. Our computational workflow comprehensively characterizes the predictive capabilities and systems-level functions of an organism's TRN from disparate data types.
Project description:BACKGROUND:The filamentous fungus Fusarium graminearum causes devastating crop diseases and produces harmful mycotoxins worldwide. Understanding the complex F. graminearum transcriptional regulatory networks (TRNs) is vital for effective disease management. Reconstructing F. graminearum dynamic TRNs, an NP (non-deterministic polynomial) -hard problem, remains unsolved using commonly adopted reductionist or co-expression based approaches. Multi-omic data such as fungal genomic, transcriptomic data and phenomic data are vital to but so far have been largely isolated and untapped for unraveling phenotype-specific TRNs. RESULTS:Here for the first time, we harnessed these resources to infer global TRNs for F. graminearum using a Bayesian network based algorithm called "Module Networks". The inferred TRNs contain 49 regulatory modules that show condition-specific gene regulation. Through a thorough validation based on prior biological knowledge including functional annotations and TF binding site enrichment, our network prediction displayed high accuracy and concordance with existing knowledge. One regulatory module was partially validated using network perturbations caused by Tri6 and Tri10 gene disruptions, as well as using Tri6 Chip-seq data. We then developed a novel computational method to calculate the associations between modules and phenotypes, and identified major module groups regulating different phenotypes. As a result, we identified TRN subnetworks responsible for F. graminearum virulence, sexual reproduction and mycotoxin production, pinpointing phenotype-associated modules and key regulators. Finally, we found a clear compartmentalization of TRN modules in core and lineage-specific genomic regions in F. graminearum, reflecting the evolution of the TRNs in fungal speciation. CONCLUSIONS:This system-level reconstruction of filamentous fungal TRNs provides novel insights into the intricate networks of gene regulation that underlie key processes in F. graminearum pathobiology and offers promise for the development of improved disease control strategies.
Project description:The adaptation of microorganisms to their environment is controlled by complex transcriptional regulatory networks (TRNs), which are still only partially understood even for model species. Genome scale annotation of regulatory features of genes and TRN reconstruction are challenging tasks of microbial genomics. We used the knowledge-driven comparative-genomics approach implemented in the RegPredict Web server to infer TRN in the model Gram-positive bacterium Bacillus subtilis and 10 related Bacillales species. For transcription factor (TF) regulons, we combined the available information from the DBTBS database and the literature with bioinformatics tools, allowing inference of TF binding sites (TFBSs), comparative analysis of the genomic context of predicted TFBSs, functional assignment of target genes, and effector prediction. For RNA regulons, we used known RNA regulatory motifs collected in the Rfam database to scan genomes and analyze the genomic context of new RNA sites. The inferred TRN in B. subtilis comprises regulons for 129 TFs and 24 regulatory RNA families. First, we analyzed 66 TF regulons with previously known TFBSs in B. subtilis and projected them to other Bacillales genomes, resulting in refinement of TFBS motifs and identification of novel regulon members. Second, we inferred motifs and described regulons for 28 experimentally studied TFs with previously unknown TFBSs. Third, we discovered novel motifs and reconstructed regulons for 36 previously uncharacterized TFs. The inferred collection of regulons is available in the RegPrecise database (http://regprecise.lbl.gov/) and can be used in genetic experiments, metabolic modeling, and evolutionary analysis.
Project description:Transcriptional regulatory network (TRN) reconstitution and deconstruction occur simultaneously during reprogramming; however, it remains unclear how the starting and targeting TRNs regulate the induction and suppression of peripheral genes. Here we analyzed the regulation using direct cell reprogramming from human dermal fibroblasts to monocytes as the platform. We simultaneously deconstructed fibroblastic TRN and reconstituted monocytic TRN; monocytic and fibroblastic gene expression were analyzed in comparison with that of fibroblastic TRN deconstruction only or monocytic TRN reconstitution only. Global gene expression analysis showed cross-regulation of TRNs. Detailed analysis revealed that knocking down fibroblastic TRN positively affected half of the upregulated monocytic genes, indicating that intrinsic fibroblastic TRN interfered with the expression of induced genes. In contrast, reconstitution of monocytic TRN showed neutral effects on the majority of fibroblastic gene downregulation. This study provides an explicit example that demonstrates how two networks together regulate gene expression during cell reprogramming processes and contributes to the elaborate exploration of TRNs.
Project description:Transcriptional regulatory networks (TRNs) provide insight into cellular behavior by describing interactions between transcription factors (TFs) and their gene targets. The Assay for Transposase Accessible Chromatin (ATAC)-seq, coupled with transcription-factor motif analysis, provides indirect evidence of chromatin binding for hundreds of TFs genome-wide. Here, we propose methods for TRN inference in a mammalian setting, using ATAC-seq data to influence gene expression modeling. We rigorously test our methods in the context of T Helper Cell Type 17 (Th17) differentiation, generating new ATAC-seq data to complement existing Th17 genomic resources (plentiful gene expression data, TF knock-outs and ChIP-seq experiments). In this resource-rich mammalian setting our extensive benchmarking provides quantitative, genome-scale evaluation of TRN inference combining ATAC-seq and RNA-seq data. We refine and extend our previous Th17 TRN, using our new TRN inference methods to integrate all Th17 data (gene expression, ATAC-seq, TF KO, ChIP-seq). We highlight new roles for individual TFs and groups of TFs (“TF-TF modules”) in Th17 gene regulation. Given the popularity of ATAC-seq (a widely adapted protocol with high resolution and low sample input requirements), we anticipate that application of our methods will improve TRN inference in new mammalian systems and be of particular use for rare, uncharacterized cell types. Overall design: Gene expression (RNA-seq) of naive and Th17- and Th0-polarized CD4 T Cells
Project description:Transcriptional regulatory networks (TRNs) provide insight into cellular behavior by describing interactions between transcription factors (TFs) and their gene targets. The Assay for Transposase Accessible Chromatin (ATAC)-seq, coupled with transcription-factor motif analysis, provides indirect evidence of chromatin binding for hundreds of TFs genome-wide. Here, we propose methods for TRN inference in a mammalian setting, using ATAC-seq data to influence gene expression modeling. We rigorously test our methods in the context of T Helper Cell Type 17 (Th17) differentiation, generating new ATAC-seq data to complement existing Th17 genomic resources (plentiful gene expression data, TF knock-outs and ChIP-seq experiments). In this resource-rich mammalian setting our extensive benchmarking provides quantitative, genome-scale evaluation of TRN inference combining ATAC-seq and RNA-seq data. We refine and extend our previous Th17 TRN, using our new TRN inference methods to integrate all Th17 data (gene expression, ATAC-seq, TF KO, ChIP-seq). We highlight new roles for individual TFs and groups of TFs (“TF-TF modules”) in Th17 gene regulation. Given the popularity of ATAC-seq (a widely adapted protocol with high resolution and low sample input requirements), we anticipate that application of our methods will improve TRN inference in new mammalian systems and be of particular use for rare, uncharacterized cell types. Overall design: Chromatin accessibility (ATAC-seq) of Th17-, Treg-, Th2-, Th0-polarized CD4 T Cells (from 2-48hrs, including MAF and STAT3 KO)
Project description:There is a strong need for computational frameworks that integrate different biological processes and data-types to unravel cellular regulation. Current efforts to reconstruct transcriptional regulatory networks (TRNs) focus primarily on proximal data such as gene co-expression and transcription factor (TF) binding. While such approaches enable rapid reconstruction of TRNs, the overwhelming combinatorics of possible networks limits identification of mechanistic regulatory interactions. Utilizing growth phenotypes and systems-level constraints to inform regulatory network reconstruction is an unmet challenge. We present our approach Gene Expression and Metabolism Integrated for Network Inference (GEMINI) that links a compendium of candidate regulatory interactions with the metabolic network to predict their systems-level effect on growth phenotypes. We then compare predictions with experimental phenotype data to select phenotype-consistent regulatory interactions. GEMINI makes use of the observation that only a small fraction of regulatory network states are compatible with a viable metabolic network, and outputs a regulatory network that is simultaneously consistent with the input genome-scale metabolic network model, gene expression data, and TF knockout phenotypes. GEMINI preferentially recalls gold-standard interactions (p-value = 10(-172)), significantly better than using gene expression alone. We applied GEMINI to create an integrated metabolic-regulatory network model for Saccharomyces cerevisiae involving 25,000 regulatory interactions controlling 1597 metabolic reactions. The model quantitatively predicts TF knockout phenotypes in new conditions (p-value = 10(-14)) and revealed potential condition-specific regulatory mechanisms. Our results suggest that a metabolic constraint-based approach can be successfully used to help reconstruct TRNs from high-throughput data, and highlights the potential of using a biochemically-detailed mechanistic framework to integrate and reconcile inconsistencies across different data-types. The algorithm and associated data are available at https://sourceforge.net/projects/gemini-data/
Project description:Over millions of years the structure and complexity of the transcriptional regulatory network (TRN) in bacteria has changed, reorganized and enabled them to adapt to almost every environmental niche on earth. In order to understand the plasticity of TRNs in bacteria, we studied the conservation of currently known TRNs of the two model organisms Escherichia coli K12 and Bacillus subtilis across complete genomes including Bacteria, Archaea and Eukarya at three different levels: individual components of the TRN, pairs of interactions and regulons. We found that transcription factors (TFs) evolve much faster than the target genes (TGs) across phyla. We show that global regulators are poorly conserved across the phylogenetic spectrum and hence TFs could be the major players responsible for the plasticity and evolvability of the TRNs. We also found that there is only a small fraction of significantly conserved transcriptional regulatory interactions among different phyla of bacteria and that there is no constraint on the elements of the interaction to co-evolve. Finally our results suggest that majority of the regulons in bacteria are rapidly lost implying a high-order flexibility in the TRNs. We hypothesize that during the divergence of bacteria certain essential cellular processes like the synthesis of arginine, biotine and ribose, transport of amino acids and iron, availability of phosphate, replication process and the SOS response are well conserved in evolution. From our comparative analysis, it is possible to infer that transcriptional regulation is more flexible than the genetic component of the organisms and its complexity and structure plays an important role in the phenotypic adaptation.
Project description:BACKGROUND: Uncovering the complex transcriptional regulatory networks (TRNs) that underlie plant and animal development remains a challenge. However, a vast amount of data from public microarray experiments is available, which can be subject to inference algorithms in order to recover reliable TRN architectures. RESULTS: In this study we present a simple bioinformatics methodology that uses public, carefully curated microarray data and the mutual information algorithm ARACNe in order to obtain a database of transcriptional interactions. We used data from Arabidopsis thaliana root samples to show that the transcriptional regulatory networks derived from this database successfully recover previously identified root transcriptional modules and to propose new transcription factors for the SHORT ROOT/SCARECROW and PLETHORA pathways. We further show that these networks are a powerful tool to integrate and analyze high-throughput expression data, as exemplified by our analysis of a SHORT ROOT induction time-course microarray dataset, and are a reliable source for the prediction of novel root gene functions. In particular, we used our database to predict novel genes involved in root secondary cell-wall synthesis and identified the MADS-box TF XAL1/AGL12 as an unexpected participant in this process. CONCLUSIONS: This study demonstrates that network inference using carefully curated microarray data yields reliable TRN architectures. In contrast to previous efforts to obtain root TRNs, that have focused on particular functional modules or tissues, our root transcriptional interactions provide an overview of the transcriptional pathways present in Arabidopsis thaliana roots and will likely yield a plethora of novel hypotheses to be tested experimentally.
Project description:Analysis of the topology of transcriptional regulatory networks (TRNs) is an effective way to study the regulatory interactions between the transcription factors (TFs) and the target genes. TRNs are characterized by the abundance of motifs such as feed forward loops (FFLs), which contribute to their structural and functional properties. In this paper, we focus on the role of motifs (specifically, FFLs) in signal propagation in TRNs and the organization of the TRN topology with FFLs as building blocks. To this end, we classify nodes participating in FFLs (termed motif central nodes) into three distinct roles (namely, roles A, B and C), and contrast them with TRN nodes having high connectivity on the basis of their potential for information dissemination, using metrics such as network efficiency, path enumeration, epidemic models and standard graph centrality measures. We also present the notion of a three tier architecture and how it can help study the structural properties of TRN based on connectivity and clustering tendency of motif central nodes. Finally, we motivate the potential implication of the structural properties of motif centrality in design of efficient protocols of information routing in communication networks as well as their functional properties in global regulation and stress response to study specific disease conditions and identification of drug targets.