Genomewide Assessment of Mycobacterium tuberculosis Conditionally Essential Metabolic Pathways.
ABSTRACT: A better understanding of essential cellular functions in pathogenic bacteria is important for the development of more effective antimicrobial agents. We performed a comprehensive identification of essential genes in Mycobacterium tuberculosis, the major causative agent of tuberculosis, using a combination of transposon insertion sequencing (Tn-seq) and comparative genomic analysis. To identify conditionally essential genes by Tn-seq, we used media with different nutrient compositions. Although many conditional gene essentialities were affected by the presence of relevant nutrient sources, we also found that the essentiality of genes in a subset of metabolic pathways was unaffected by metabolite availability. Comparative genomic analysis revealed that not all essential genes identified by Tn-seq were fully conserved within the M. tuberculosis complex, including some existing antitubercular drug target genes. In addition, we utilized an available M. tuberculosis genome-scale metabolic model, iSM810, to predict M. tuberculosis gene essentiality in silico Comparing the sets of essential genes experimentally identified by Tn-seq to those predicted in silico reveals the capabilities and limitations of gene essentiality predictions, highlighting the complexity of M. tuberculosis essential metabolic functions. This study provides a promising platform to study essential cellular functions in M. tuberculosis IMPORTANCE Mycobacterium tuberculosis causes 10 million cases of tuberculosis (TB), resulting in over 1 million deaths each year. TB therapy is challenging because it requires a minimum of 6 months of treatment with multiple drugs. Protracted treatment times and the emergent spread of drug-resistant M. tuberculosis necessitate the identification of novel targets for drug discovery to curb this global health threat. Essential functions, defined as those indispensable for growth and/or survival, are potential targets for new antimicrobial drugs. In this study, we aimed to define gene essentialities of M. tuberculosis on a genomewide scale to comprehensively identify potential targets for drug discovery. We utilized a combination of experimental (functional genomics) and in silico approaches (comparative genomics and flux balance analysis). Our functional genomics approach identified sets of genes whose essentiality was affected by nutrient availability. Comparative genomics revealed that not all essential genes were fully conserved within the M. tuberculosis complex. Comparing sets of essential genes identified by functional genomics to those predicted by flux balance analysis highlighted gaps in current knowledge regarding M. tuberculosis metabolic capabilities. Thus, our study identifies numerous potential antitubercular drug targets and provides a comprehensive picture of the complexity of M. tuberculosis essential cellular functions.
Project description:Tn-Seq is a high throughput technique for analysis of transposon mutant libraries to determine conditional essentiality of a gene under an experimental condition. A special feature of the Tn-seq data is that multiple mutants in a gene provides independent evidence to prioritize that gene as being essential. The existing methods do not account for this feature or rely on a high-density transposon library. Moreover, these methods are unable to accommodate complex designs.The method proposed here is specifically designed for the analysis of Tn-Seq data. It utilizes two steps to estimate the conditional essentiality for each gene in the genome. First, it collects evidence of conditional essentiality for each insertion by comparing read counts of that insertion between conditions. Second, it combines insertion-level evidence for the corresponding gene. It deals with data from both low- and high-density transposon libraries and accommodates complex designs. Moreover, it is very fast to implement. The performance of the proposed method was tested on simulated data and experimental Tn-Seq data from Serratia marcescens transposon mutant library used to identify genes that contribute to fitness in a murine model of infection.We describe a new, efficient method for identifying conditionally essential genes in Tn-Seq experiments with high detection sensitivity and specificity. It is implemented as TnseqDiff function in R package Tnseq and can be installed from the Comprehensive R Archive Network, CRAN.
Project description:Rhodobacter sphaeroides is one of the best-studied alphaproteobacteria from biochemical, genetic, and genomic perspectives. To gain a better systems-level understanding of this organism, we generated a large transposon mutant library and used transposon sequencing (Tn-seq) to identify genes that are essential under several growth conditions. Using newly developed Tn-seq analysis software (TSAS), we identified 493 genes as essential for aerobic growth on a rich medium. We then used the mutant library to identify conditionally essential genes under two laboratory growth conditions, identifying 85 additional genes required for aerobic growth in a minimal medium and 31 additional genes required for photosynthetic growth. In all instances, our analyses confirmed essentiality for many known genes and identified genes not previously considered to be essential. We used the resulting Tn-seq data to refine and improve a genome-scale metabolic network model (GEM) for R. sphaeroides. Together, we demonstrate how genetic, genomic, and computational approaches can be combined to obtain a systems-level understanding of the genetic framework underlying metabolic diversity in bacterial species. IMPORTANCE Knowledge about the role of genes under a particular growth condition is required for a holistic understanding of a bacterial cell and has implications for health, agriculture, and biotechnology. We developed the Tn-seq analysis software (TSAS) package to provide a flexible and statistically rigorous workflow for the high-throughput analysis of insertion mutant libraries, advanced the knowledge of gene essentiality in R. sphaeroides, and illustrated how Tn-seq data can be used to more accurately identify genes that play important roles in metabolism and other processes that are essential for cellular survival.
Project description:High-throughput analysis of genome-wide random transposon mutant libraries is a powerful tool for (conditional) essential gene discovery. Recently, several next-generation sequencing approaches, e.g. Tn-seq/INseq, HITS and TraDIS, have been developed that accurately map the site of transposon insertions by mutant-specific amplification and sequence readout of DNA flanking the transposon insertions site, assigning a measure of essentiality based on the number of reads per insertion site flanking sequence or per gene. However, analysis of these large and complex datasets is hampered by the lack of an easy to use and automated tool for transposon insertion sequencing data. To fill this gap, we developed ESSENTIALS, an open source, web-based software tool for researchers in the genomics field utilizing transposon insertion sequencing analysis. It accurately predicts (conditionally) essential genes and offers the flexibility of using different sample normalization methods, genomic location bias correction, data preprocessing steps, appropriate statistical tests and various visualizations to examine the results, while requiring only a minimum of input and hands-on work from the researcher. We successfully applied ESSENTIALS to in-house and published Tn-seq, TraDIS and HITS datasets and we show that the various pre- and post-processing steps on the sequence reads and count data with ESSENTIALS considerably improve the sensitivity and specificity of predicted gene essentiality.
Project description:Knowledge of which genes are essential to the survival of an organism is critical to understanding the function of genes, and for the identification of potential drug targets for antimicrobial treatment. Previous statistical methods for assessing essentiality based on sequencing of tranposon libraries have usually limited their assessment to strict 'essential' or 'non-essential' categories. However, this binary view of essentiality does not accurately represent the more nuanced ways the growth of an organism might be affected by the disruption of its genes. In addition, these methods often limit their analysis to open-reading frames. We propose a novel method for analyzing sequence data from transposon mutant libraries using a Hidden Markov Model (HMM), along with formulas to adapt the parameters of the model to different datasets for robustness. This approach allows for the clustering of insertion sites into distinct regions of essentiality across the entire genome in a statistically rigorous manner, while also allowing for the detection of growth-defect and growth-advantage regions.We evaluate the performance of a 4-state HMM on a sequence dataset of M. tuberculosis transposon mutants. We also test the HMM on several synthetic datasets representing different levels of transposon insertion density and sequence coverage. We show that the HMM produces results that are highly correlated with previous assignments of essentiality for this organism. We also show that it detects growth-defect and growth-advantage genes previously shown to impair or enhance growth when disrupted.A 4-state HMM provides an improved way of analyzing Tn-seq data and assessing different levels of essentiality that enables not only the characterization of essential and non-essential genes, but also genes whose disruption leads to impairment (or enhancement) of growth.
Project description:Protein-protein interactions (PPIs) mediate the transmission and regulation of oncogenic signals that are essential to cellular proliferation and survival, and thus represent potential targets for anti-cancer therapeutic discovery. Despite their significance, there is no method to experimentally disrupt and interrogate the essentiality of individual endogenous PPIs. The ability to computationally predict or infer PPI essentiality would help prioritize PPIs for drug discovery and help advance understanding of cancer biology. Here we introduce a computational method (MEDICI) to predict PPI essentiality by combining gene knockdown studies with network models of protein interaction pathways in an analytic framework. Our method uses network topology to model how gene silencing can disrupt PPIs, relating the unknown essentialities of individual PPIs to experimentally observed protein essentialities. This model is then deconvolved to recover the unknown essentialities of individual PPIs. We demonstrate the validity of our approach via prediction of sensitivities to compounds based on PPI essentiality and differences in essentiality based on genetic mutations. We further show that lung cancer patients have improved overall survival when specific PPIs are no longer present, suggesting that these PPIs may be potentially new targets for therapeutic development. Software is freely available at https://github.com/cooperlab/MEDICI. Datasets are available at https://ctd2.nci.nih.gov/dataPortal.
Project description:In many prokaryotes but limited eukaryotic species, the combination of transposon mutagenesis and high-throughput sequencing has greatly accelerated the identification of essential genes. Here we successfully applied this technique to the methylotrophic yeast Pichia pastoris and classified its conditionally essential/non-essential gene sets. Firstly, we showed that two DNA transposons, TcBuster and Sleeping beauty, had high transposition activities in P. pastoris. By merging their insertion libraries and performing Tn-seq, we identified a total of 202,858 unique insertions under glucose supported growth condition. We then developed a machine learning method to classify the 5,040 annotated genes into putatively essential, putatively non-essential, ambig1 and ambig2 groups, and validated the accuracy of this classification model. Besides, Tn-seq was also performed under methanol supported growth condition and methanol specific essential genes were identified. The comparison of conditionally essential genes between glucose and methanol supported growth conditions helped to reveal potential novel targets involved in methanol metabolism and signaling. Our findings suggest that transposon mutagenesis and Tn-seq could be applied in the methylotrophic yeast Pichia pastoris to classify conditionally essential/non-essential gene sets. Our work also shows that determining gene essentiality under different culture conditions could help to screen for novel functional components specifically involved in methanol metabolism.
Project description:Recent evidence suggests that the genes an organism needs to survive in an environment drastically differ when alone or in a community. However, it is not known if there are universal functions that enable microbes to persist in a community and if there are functions specific to interactions between microbes native to the same (sympatric) or different (allopatric) environments. Here, we ask how the essential functions of the oral pathogen Aggregatibacter actinomycetemcomitans change during pairwise coinfection in a murine abscess with each of 15 microbes commonly found in the oral cavity and 10 microbes that are not. A. actinomycetemcomitans was more abundant when coinfected with allopatric than with sympatric microbes, and this increased fitness correlated with expanded metabolic capacity of the coinfecting microbes. Using transposon sequencing, we discovered that 33% of the A. actinomycetemcomitans genome is required for coinfection fitness. Fifty-nine "core" genes were required across all coinfections and included genes necessary for aerobic respiration. The core genes were also all required in monoinfection, indicating the essentiality of these genes cannot be alleviated by a coinfecting microbe. Furthermore, coinfection with some microbes, predominately sympatric species, induced the requirement for over 100 new community-dependent essential genes. In contrast, in other coinfections, predominately with nonoral species, A. actinomycetemcomitans required 50 fewer genes than in monoinfection, demonstrating that some allopatric microbes can drastically alleviate gene essentialities. These results expand our understanding of how diverse microbes alter growth and gene essentiality within polymicrobial infections.
Project description:Genomics offered the promise of transforming antibiotic discovery by revealing many new essential genes as good targets, but the results fell short of the promise. While numerous factors contributed to the disappointing yield, one factor was that essential genes for a bacterial species were often defined based on a single or limited number of strains grown under a single or limited number of in vitro laboratory conditions. In fact, the essentiality of a gene can depend on both the genetic background and growth condition. We thus developed a strategy for more rigorously defining the core essential genome of a bacterial species by studying many pathogen strains and growth conditions. We assessed how many strains must be examined to converge on a set of core essential genes for a species. We used transposon insertion sequencing (Tn-Seq) to define essential genes in nine strains of Pseudomonas aeruginosa on five different media and developed a statistical model, FiTnEss, to classify genes as essential versus nonessential across all strain-medium combinations. We defined a set of 321 core essential genes, representing 6.6% of the genome. We determined that analysis of four strains was typically sufficient in P. aeruginosa to converge on a set of core essential genes likely to be essential across the species across a wide range of conditions relevant to in vivo infection, and thus to represent attractive targets for novel drug discovery.
Project description:Pseudomonas aeruginosa MPAO1 is the parental strain of the widely utilized transposon mutant collection for this important clinical pathogen. Here, we validate a model system to identify genes involved in biofilm growth and biofilm-associated antibiotic resistance. Our model employs a genomics-driven workflow to assemble the complete MPAO1 genome, identify unique and conserved genes by comparative genomics with the PAO1 reference strain and genes missed within existing assemblies by proteogenomics. Among over 200 unique MPAO1 genes, we identified six general essential genes that were overlooked when mapping public Tn-seq data sets against PAO1, including an antitoxin. Genomic data were integrated with phenotypic data from an experimental workflow using a user-friendly, soft lithography-based microfluidic flow chamber for biofilm growth and a screen with the Tn-mutant library in microtiter plates. The screen identified hitherto unknown genes involved in biofilm growth and antibiotic resistance. Experiments conducted with the flow chamber across three laboratories delivered reproducible data on P. aeruginosa biofilms and validated the function of both known genes and genes identified in the Tn-mutant screens. Differential protein abundance data from planktonic cells versus biofilm confirmed the upregulation of candidates known to affect biofilm formation, of structural and secreted proteins of type VI secretion systems, and provided proteogenomic evidence for some missed MPAO1 genes. This integrated, broadly applicable model promises to improve the mechanistic understanding of biofilm formation, antimicrobial tolerance, and resistance evolution in biofilms.
Project description:Tn-Seq is an experimental method for probing the functions of genes through construction of complex random transposon insertion libraries and quantification of each mutant's abundance using next-generation sequencing. An important emerging application of Tn-Seq is for identifying genetic interactions, which involves comparing Tn mutant libraries generated in different genetic backgrounds (e.g. wild-type strain versus knockout strain). Several analytical methods have been proposed for analyzing Tn-Seq data to identify genetic interactions, including estimating relative fitness ratios and fitting a generalized linear model. However, these have limitations which necessitate an improved approach. We present a hierarchical Bayesian method for identifying genetic interactions through quantifying the statistical significance of changes in enrichment. The analysis involves a four-way comparison of insertion counts across datasets to identify transposon mutants that differentially affect bacterial fitness depending on genetic background. Our approach was applied to Tn-Seq libraries made in isogenic strains of Mycobacterium tuberculosis lacking three different genes of unknown function previously shown to be necessary for optimal fitness during infection. By analyzing the libraries subjected to selection in mice, we were able to distinguish several distinct classes of genetic interactions for each target gene that shed light on their functions and roles during infection.