Combining Genome-Scale Experimental and Computational Methods To Identify Essential Genes in Rhodobacter sphaeroides.
ABSTRACT: Rhodobacter sphaeroides is one of the best-studied alphaproteobacteria from biochemical, genetic, and genomic perspectives. To gain a better systems-level understanding of this organism, we generated a large transposon mutant library and used transposon sequencing (Tn-seq) to identify genes that are essential under several growth conditions. Using newly developed Tn-seq analysis software (TSAS), we identified 493 genes as essential for aerobic growth on a rich medium. We then used the mutant library to identify conditionally essential genes under two laboratory growth conditions, identifying 85 additional genes required for aerobic growth in a minimal medium and 31 additional genes required for photosynthetic growth. In all instances, our analyses confirmed essentiality for many known genes and identified genes not previously considered to be essential. We used the resulting Tn-seq data to refine and improve a genome-scale metabolic network model (GEM) for R. sphaeroides. Together, we demonstrate how genetic, genomic, and computational approaches can be combined to obtain a systems-level understanding of the genetic framework underlying metabolic diversity in bacterial species. IMPORTANCE Knowledge about the role of genes under a particular growth condition is required for a holistic understanding of a bacterial cell and has implications for health, agriculture, and biotechnology. We developed the Tn-seq analysis software (TSAS) package to provide a flexible and statistically rigorous workflow for the high-throughput analysis of insertion mutant libraries, advanced the knowledge of gene essentiality in R. sphaeroides, and illustrated how Tn-seq data can be used to more accurately identify genes that play important roles in metabolism and other processes that are essential for cellular survival.
Project description:Tn-Seq is a high throughput technique for analysis of transposon mutant libraries to determine conditional essentiality of a gene under an experimental condition. A special feature of the Tn-seq data is that multiple mutants in a gene provides independent evidence to prioritize that gene as being essential. The existing methods do not account for this feature or rely on a high-density transposon library. Moreover, these methods are unable to accommodate complex designs.The method proposed here is specifically designed for the analysis of Tn-Seq data. It utilizes two steps to estimate the conditional essentiality for each gene in the genome. First, it collects evidence of conditional essentiality for each insertion by comparing read counts of that insertion between conditions. Second, it combines insertion-level evidence for the corresponding gene. It deals with data from both low- and high-density transposon libraries and accommodates complex designs. Moreover, it is very fast to implement. The performance of the proposed method was tested on simulated data and experimental Tn-Seq data from Serratia marcescens transposon mutant library used to identify genes that contribute to fitness in a murine model of infection.We describe a new, efficient method for identifying conditionally essential genes in Tn-Seq experiments with high detection sensitivity and specificity. It is implemented as TnseqDiff function in R package Tnseq and can be installed from the Comprehensive R Archive Network, CRAN.
Project description:In many prokaryotes but limited eukaryotic species, the combination of transposon mutagenesis and high-throughput sequencing has greatly accelerated the identification of essential genes. Here we successfully applied this technique to the methylotrophic yeast Pichia pastoris and classified its conditionally essential/non-essential gene sets. Firstly, we showed that two DNA transposons, TcBuster and Sleeping beauty, had high transposition activities in P. pastoris. By merging their insertion libraries and performing Tn-seq, we identified a total of 202,858 unique insertions under glucose supported growth condition. We then developed a machine learning method to classify the 5,040 annotated genes into putatively essential, putatively non-essential, ambig1 and ambig2 groups, and validated the accuracy of this classification model. Besides, Tn-seq was also performed under methanol supported growth condition and methanol specific essential genes were identified. The comparison of conditionally essential genes between glucose and methanol supported growth conditions helped to reveal potential novel targets involved in methanol metabolism and signaling. Our findings suggest that transposon mutagenesis and Tn-seq could be applied in the methylotrophic yeast Pichia pastoris to classify conditionally essential/non-essential gene sets. Our work also shows that determining gene essentiality under different culture conditions could help to screen for novel functional components specifically involved in methanol metabolism.
Project description:Knowledge of which genes are essential to the survival of an organism is critical to understanding the function of genes, and for the identification of potential drug targets for antimicrobial treatment. Previous statistical methods for assessing essentiality based on sequencing of tranposon libraries have usually limited their assessment to strict 'essential' or 'non-essential' categories. However, this binary view of essentiality does not accurately represent the more nuanced ways the growth of an organism might be affected by the disruption of its genes. In addition, these methods often limit their analysis to open-reading frames. We propose a novel method for analyzing sequence data from transposon mutant libraries using a Hidden Markov Model (HMM), along with formulas to adapt the parameters of the model to different datasets for robustness. This approach allows for the clustering of insertion sites into distinct regions of essentiality across the entire genome in a statistically rigorous manner, while also allowing for the detection of growth-defect and growth-advantage regions.We evaluate the performance of a 4-state HMM on a sequence dataset of M. tuberculosis transposon mutants. We also test the HMM on several synthetic datasets representing different levels of transposon insertion density and sequence coverage. We show that the HMM produces results that are highly correlated with previous assignments of essentiality for this organism. We also show that it detects growth-defect and growth-advantage genes previously shown to impair or enhance growth when disrupted.A 4-state HMM provides an improved way of analyzing Tn-seq data and assessing different levels of essentiality that enables not only the characterization of essential and non-essential genes, but also genes whose disruption leads to impairment (or enhancement) of growth.
Project description:A better understanding of essential cellular functions in pathogenic bacteria is important for the development of more effective antimicrobial agents. We performed a comprehensive identification of essential genes in Mycobacterium tuberculosis, the major causative agent of tuberculosis, using a combination of transposon insertion sequencing (Tn-seq) and comparative genomic analysis. To identify conditionally essential genes by Tn-seq, we used media with different nutrient compositions. Although many conditional gene essentialities were affected by the presence of relevant nutrient sources, we also found that the essentiality of genes in a subset of metabolic pathways was unaffected by metabolite availability. Comparative genomic analysis revealed that not all essential genes identified by Tn-seq were fully conserved within the M. tuberculosis complex, including some existing antitubercular drug target genes. In addition, we utilized an available M. tuberculosis genome-scale metabolic model, iSM810, to predict M. tuberculosis gene essentiality in silico Comparing the sets of essential genes experimentally identified by Tn-seq to those predicted in silico reveals the capabilities and limitations of gene essentiality predictions, highlighting the complexity of M. tuberculosis essential metabolic functions. This study provides a promising platform to study essential cellular functions in M. tuberculosis IMPORTANCE Mycobacterium tuberculosis causes 10 million cases of tuberculosis (TB), resulting in over 1 million deaths each year. TB therapy is challenging because it requires a minimum of 6 months of treatment with multiple drugs. Protracted treatment times and the emergent spread of drug-resistant M. tuberculosis necessitate the identification of novel targets for drug discovery to curb this global health threat. Essential functions, defined as those indispensable for growth and/or survival, are potential targets for new antimicrobial drugs. In this study, we aimed to define gene essentialities of M. tuberculosis on a genomewide scale to comprehensively identify potential targets for drug discovery. We utilized a combination of experimental (functional genomics) and in silico approaches (comparative genomics and flux balance analysis). Our functional genomics approach identified sets of genes whose essentiality was affected by nutrient availability. Comparative genomics revealed that not all essential genes were fully conserved within the M. tuberculosis complex. Comparing sets of essential genes identified by functional genomics to those predicted by flux balance analysis highlighted gaps in current knowledge regarding M. tuberculosis metabolic capabilities. Thus, our study identifies numerous potential antitubercular drug targets and provides a comprehensive picture of the complexity of M. tuberculosis essential cellular functions.
Project description:High-throughput analysis of genome-wide random transposon mutant libraries is a powerful tool for (conditional) essential gene discovery. Recently, several next-generation sequencing approaches, e.g. Tn-seq/INseq, HITS and TraDIS, have been developed that accurately map the site of transposon insertions by mutant-specific amplification and sequence readout of DNA flanking the transposon insertions site, assigning a measure of essentiality based on the number of reads per insertion site flanking sequence or per gene. However, analysis of these large and complex datasets is hampered by the lack of an easy to use and automated tool for transposon insertion sequencing data. To fill this gap, we developed ESSENTIALS, an open source, web-based software tool for researchers in the genomics field utilizing transposon insertion sequencing analysis. It accurately predicts (conditionally) essential genes and offers the flexibility of using different sample normalization methods, genomic location bias correction, data preprocessing steps, appropriate statistical tests and various visualizations to examine the results, while requiring only a minimum of input and hands-on work from the researcher. We successfully applied ESSENTIALS to in-house and published Tn-seq, TraDIS and HITS datasets and we show that the various pre- and post-processing steps on the sequence reads and count data with ESSENTIALS considerably improve the sensitivity and specificity of predicted gene essentiality.
Project description:Rhodopseudomonas palustris is an alphaproteobacterium that has served as a model organism for studies of photophosphorylation, regulation of nitrogen fixation, production of hydrogen as a biofuel, and anaerobic degradation of aromatic compounds. This bacterium is able to transition between anaerobic photoautotrophic growth, anaerobic photoheterotrophic growth, and aerobic heterotrophic growth. As a starting point to explore the genetic basis for the metabolic versatility of R. palustris, we used transposon mutagenesis and Tn-seq to identify 552 genes as essential for viability in cells growing aerobically on semirich medium. Of these, 323 have essential gene homologs in the alphaproteobacterium Caulobacter crescentus, and 187 have essential gene homologs in Escherichia coli. There were 24 R. palustris genes that were essential for viability under aerobic growth conditions that have low sequence identity but are likely to be functionally homologous to essential E. coli genes. As expected, certain functional categories of essential genes were highly conserved among the three organisms, including translation, ribosome structure and biogenesis, secretion, and lipid metabolism. R. palustris cells divide by budding in which a sessile cell gives rise to a motile swarmer cell. Conserved cell cycle genes required for this developmental process were essential in both C. crescentus and R. palustris. Our results suggest that despite vast differences in lifestyles, members of the alphaproteobacteria have a common set of essential genes that is specific to this group and distinct from that of gammaproteobacteria like E. coli.Essential genes in bacteria and other organisms are those absolutely required for viability. Rhodopseudomonas palustris has served as a model organism for studies of anaerobic aromatic compound degradation, hydrogen gas production, nitrogen fixation, and photosynthesis. We used the technique of Tn-seq to determine the essential genes of R. palustris grown under heterotrophic aerobic conditions. The transposon library generated in this study will be useful for future studies to identify R. palustris genes essential for viability under specialized growth conditions and also for survival under conditions of stress.
Project description:Bordetella pertussis is the causative agent of whooping cough, a serious respiratory illness affecting children and adults, associated with prolonged cough and potential mortality. Whooping cough has reemerged in recent years, emphasizing a need for increased knowledge of basic mechanisms of B. pertussis growth and pathogenicity. While previous studies have provided insight into in vitro gene essentiality of this organism, very little is known about in vivo gene essentiality, a critical gap in knowledge, since B. pertussis has no previously identified environmental reservoir and is isolated from human respiratory tract samples. We hypothesize that the metabolic capabilities of B. pertussis are especially tailored to the respiratory tract and that many of the genes involved in B. pertussis metabolism would be required to establish infection in vivo In this study, we generated a diverse library of transposon mutants and then used it to probe gene essentiality in vivo in a murine model of infection. Using the CON-ARTIST pipeline, 117 genes were identified as conditionally essential at 1 day postinfection, and 169 genes were identified as conditionally essential at 3 days postinfection. Most of the identified genes were associated with metabolism, and we utilized two existing genome-scale metabolic network reconstructions to probe the effects of individual essential genes on biomass synthesis. This analysis suggested a critical role for glucose metabolism and lipooligosaccharide biosynthesis in vivo This is the first genome-wide evaluation of in vivo gene essentiality in B. pertussis and provides tools for future exploration.IMPORTANCE Our study describes the first in vivo transposon sequencing (Tn-seq) analysis of B. pertussis and identifies genes predicted to be essential for in vivo growth in a murine model of intranasal infection, generating key resources for future investigations into B. pertussis pathogenesis and vaccine design.
Project description:Burkholderia cenocepacia K56-2 belongs to the Burkholderia cepacia complex, a group of Gram-negative opportunistic pathogens that have large and dynamic genomes. In this work, we identified the essential genome of B. cenocepacia K56-2 using high-density transposon mutagenesis and insertion site sequencing (Tn-seq circle). We constructed a library of one million transposon mutants and identified the transposon insertions at an average of one insertion per 27?bp. The probability of gene essentiality was determined by comparing of the insertion density per gene with the variance of neutral datasets generated by Monte Carlo simulations. Five hundred and eight genes were not significantly disrupted, suggesting that these genes are essential for survival in rich, undefined medium. Comparison of the B. cenocepacia K56-2 essential genome with that of the closely related B. cenocepacia J2315 revealed partial overlapping, suggesting that some essential genes are strain-specific. Furthermore, 158 essential genes were conserved in B. cenocepacia and two species belonging to the Burkholderia pseudomallei complex, B. pseudomallei K96243 and Burkholderia thailandensis E264. Porins, including OpcC, a lysophospholipid transporter, LplT, and a protein involved in the modification of lipid A with aminoarabinose were found to be essential in Burkholderia genomes but not in other bacterial essential genomes identified so far. Our results highlight the existence of cell envelope processes that are uniquely essential in species of the genus Burkholderia for which the essential genomes have been identified by Tn-seq.
Project description:The study of the minimum set of genes required to sustain life is a fundamental question in biological research. Recent studies on bacterial essential genes suggested that between 350 and 700 genes are essential to support autonomous bacterial cell growth. Essential genes are of interest as potential new antimicrobial drug targets; hence, our aim was to identify the essential genome of the cystic fibrosis (CF) isolate Burkholderia cenocepacia H111. Using a transposon sequencing (Tn-Seq) approach, we identified essential genes required for growth in rich medium under aerobic and microoxic conditions as well as in a defined minimal medium with citrate as a sole carbon source. Our analysis suggests that 398 genes are required for autonomous growth in rich medium, a number that represents only around 5% of the predicted genes of this bacterium. Five hundred twenty-six genes were required to support growth in minimal medium, and 434 genes were essential under microoxic conditions (0.5% O2). A comparison of these data sets identified 339 genes that represent the minimal set of essential genes required for growth under all conditions tested and can be considered the core essential genome of B. cenocepacia H111. The majority of essential genes were found to be located on chromosome 1, and few such genes were located on chromosome 2, where most of them were clustered in one region. This gene cluster is fully conserved in all Burkholderia species but is present on chromosome 1 in members of the closely related genus Ralstonia, suggesting that the transfer of these essential genes to chromosome 2 in a common ancestor contributed toward the separation of the two genera.IMPORTANCE Transposon sequencing (Tn-Seq) is a powerful method used to identify genes that are essential for autonomous growth under various conditions. In this study, we have identified a set of "core essential genes" that are required for growth under multiple conditions, and these genes represent potential antimicrobial targets. We also identified genes specifically required for growth under low-oxygen and nutrient-limited environments. We generated conditional mutants to verify the results of our Tn-Seq analysis and demonstrate that one of the identified genes was not essential per se but was an artifact of the construction of the mutant library. We also present verified examples of genes that were not truly essential but, when inactivated, showed a growth defect. These examples have identified so-far-underestimated shortcomings of this powerful method.
Project description:Transposon sequencing is commonly applied for identifying the minimal set of genes required for cellular life; a major challenge in fields such as evolutionary or synthetic biology. However, the scientific community has no standards at the level of processing, treatment, curation and analysis of this kind data. In addition, we lack knowledge about artifactual signals and the requirements a dataset has to satisfy to allow accurate prediction. Here, we have developed FASTQINS, a pipeline for the detection of transposon insertions, and ANUBIS, a library of functions to evaluate and correct deviating factors known and uncharacterized until now. ANUBIS implements previously defined essentiality estimate models in addition to new approaches with advantages like not requiring a training set of genes to predict general essentiality. To highlight the applicability of these tools, and provide a set of recommendations on how to analyze transposon sequencing data, we performed a comprehensive study on artifacts corrections and essentiality estimation at a 1.5-bp resolution, in the genome-reduced bacterium Mycoplasma pneumoniae. We envision FASTQINS and ANUBIS to aid in the analysis of Tn-seq procedures and lead to the development of accurate genome essentiality estimates to guide applications such as designing live vaccines or growth optimization.