Drug target prediction and prioritization: using orthology to predict essentiality in parasite genomes.
ABSTRACT: BACKGROUND: New drug targets are urgently needed for parasites of socio-economic importance. Genes that are essential for parasite survival are highly desirable targets, but information on these genes is lacking, as gene knockouts or knockdowns are difficult to perform in many species of parasites. We examined the applicability of large-scale essentiality information from four model eukaryotes, Caenorhabditis elegans, Drosophila melanogaster, Mus musculus and Saccharomyces cerevisiae, to discover essential genes in each of their genomes. Parasite genes that lack orthologues in their host are desirable as selective targets, so we also examined prediction of essential genes within this subset. RESULTS: Cross-species analyses showed that the evolutionary conservation of genes and the presence of essential orthologues are each strong predictors of essentiality in eukaryotes. Absence of paralogues was also found to be a general predictor of increased relative essentiality. By combining several orthology and essentiality criteria one can select gene sets with up to a five-fold enrichment in essential genes compared with a random selection. We show how quantitative application of such criteria can be used to predict a ranked list of potential drug targets from Ancylostoma caninum and Haemonchus contortus--two blood-feeding strongylid nematodes, for which there are presently limited sequence data but no functional genomic tools. CONCLUSIONS: The present study demonstrates the utility of using orthology information from multiple, diverse eukaryotes to predict essential genes. The data also emphasize the challenge of identifying essential genes among those in a parasite that are absent from its host.
Project description:BACKGROUND:Proving that specific genes are essential for the intracellular viability of Leishmania parasites within macrophages remains a challenge for the identification of suitable targets for drug development. This is especially evident in the absence of a robust inducible expression system or functioning RNAi machinery that works in all Leishmania species. Currently, if a target gene of interest in extracellular parasites can only be deleted from its genomic locus in the presence of ectopic expression from a wild type copy, it is assumed that this gene will also be essential for viability in disease-promoting intracellular parasites. However, functional essentiality must be proven independently in both life-cycle stages for robust validation of the gene of interest as a putative target for chemical intervention. METHODS:Here, we have used plasmid shuffle methods in vivo to provide supportive genetic evidence that N-myristoyltransferase (NMT) is essential for Leishmania viability throughout the parasite life-cycle. Following confirmation of NMT essentiality in vector-transmitted promastigotes, a range of mutant parasites were used to infect mice prior to negative selection pressure to test the hypothesis that NMT is also essential for parasite viability in an established infection. RESULTS:Ectopically-expressed NMT was only dispensable under negative selection in the presence of another copy. Total parasite burdens in animals subjected to negative selection were comparable to control groups only if an additional NMT copy, not affected by the negative selection, was expressed. CONCLUSIONS:NMT is an essential gene in all parasite life-cycle stages, confirming its role as a genetically-validated target for drug development.
Project description:The availability of whole-genome sequences and associated multi-omics data sets, combined with advances in gene knockout and knockdown methods, has enabled large-scale annotation and exploration of gene and protein functions in eukaryotes. Knowing which genes are essential for the survival of eukaryotic organisms is paramount for an understanding of the basic mechanisms of life, and could assist in identifying intervention targets in eukaryotic pathogens and cancer. Here, we studied essential gene orthologs among selected species of eukaryotes, and then employed a systematic machine-learning approach, using protein sequence-derived features and selection procedures, to investigate essential gene predictions within and among species. We showed that the numbers of essential gene orthologs comprise small fractions when compared with the total number of orthologs among the eukaryotic species studied. In addition, we demonstrated that machine-learning models trained with subsets of essentiality-related data performed better than random guessing of gene essentiality for a particular species. Consistent with our gene ortholog analysis, the predictions of essential genes among multiple (including distantly-related) species is possible, yet challenging, suggesting that most essential genes are unique to a species. The present work provides a foundation for the expansion of genome-wide essentiality investigations in eukaryotes using machine learning approaches.
Project description:Essential gene prediction helps to find minimal genes indispensable for the survival of any organism. Machine learning (ML) algorithms have been useful for the prediction of gene essentiality. However, currently available ML pipelines perform poorly for organisms with limited experimental data. The objective is the development of a new ML pipeline to help in the annotation of essential genes of less explored disease-causing organisms for which minimal experimental data is available. The proposed strategy combines unsupervised feature selection technique, dimension reduction using the Kamada-Kawai algorithm, and semi-supervised ML algorithm employing Laplacian Support Vector Machine (LapSVM) for prediction of essential and non-essential genes from genome-scale metabolic networks using very limited labeled dataset. A novel scoring technique, Semi-Supervised Model Selection Score, equivalent to area under the ROC curve (auROC), has been proposed for the selection of the best model when supervised performance metrics calculation is difficult due to lack of data. The unsupervised feature selection followed by dimension reduction helped to observe a distinct circular pattern in the clustering of essential and non-essential genes. LapSVM then created a curve that dissected this circle for the classification and prediction of essential genes with high accuracy (auROC > 0.85) even with 1% labeled data for model training. After successful validation of this ML pipeline on both Eukaryotes and Prokaryotes that show high accuracy even when the labeled dataset is very limited, this strategy is used for the prediction of essential genes of organisms with inadequate experimentally known data, such as Leishmania sp. Using a graph-based semi-supervised machine learning scheme, a novel integrative approach has been proposed for essential gene prediction that shows universality in application to both Prokaryotes and Eukaryotes with limited labeled data. The essential genes predicted using the pipeline provide an important lead for the prediction of gene essentiality and identification of novel therapeutic targets for antibiotic and vaccine development against disease-causing parasites.
Project description:In order to combat the on-going malaria epidemic, discovery of new drug targets remains vital. Proteins that are essential to survival and specific to malaria parasites are key candidates. To survive within host cells, the parasites need to acquire nutrients and dispose of waste products across multiple membranes. Additionally, like all eukaryotes, they must redistribute ions and organic molecules between their various internal membrane bound compartments. Membrane transport proteins mediate all of these processes and are considered important mediators of drug resistance as well as drug targets in their own right. Recently, using advanced experimental genetic approaches and streamlined life cycle profiling, we generated a large collection of Plasmodium berghei gene deletion mutants and assigned essential gene functions, highlighting potential targets for prophylactic, therapeutic, and transmission-blocking anti-malarial drugs. Here, we present a comprehensive orthology assignment of all Plasmodium falciparum putative membrane transport proteins and provide a detailed overview of the associated essential gene functions obtained through experimental genetics studies in human and murine model parasites. Furthermore, we discuss the phylogeny of selected potential drug targets identified in our functional screen. We extensively discuss the results in the context of the functional assignments obtained using gene targeting available to date.
Project description:Plasmodium erythrocyte invasion genes play a key role in malaria parasite transmission, host-specificity and immuno-evasion. However, the evolution of the genes responsible remains understudied. Investigating these genes in avian malaria parasites, where diversity is particularly high, offers new insights into the processes that confer malaria pathogenesis. These parasites can pose a significant threat to birds and since birds play crucial ecological roles they serve as important models for disease dynamics. Comprehensive knowledge of the genetic factors involved in avian malaria parasite invasion is lacking and has been hampered by difficulties in obtaining nuclear data from avian malaria parasites. Thus the first Illumina-based de novo transcriptome sequencing and analysis of the chicken parasite Plasmodium gallinaceum was performed to assess the evolution of essential Plasmodium genes.White leghorn chickens were inoculated intravenously with erythrocytes containing P. gallinaceum. cDNA libraries were prepared from RNA extracts collected from infected chick blood and sequencing was run on the HiSeq2000 platform. Orthologues identified by transcriptome sequencing were characterized using phylogenetic, ab initio protein modelling and comparative and population-based methods.Analysis of the transcriptome identified several orthologues required for intra-erythrocytic survival and erythrocyte invasion, including the rhoptry neck protein 2 (RON2) and the apical membrane antigen-1 (AMA-1). Ama-1 of avian malaria parasites exhibits high levels of genetic diversity and evolves under positive diversifying selection, ostensibly due to protective host immune responses.Erythrocyte invasion by Plasmodium parasites require AMA-1 and RON2 interactions. AMA-1 and RON2 of P. gallinaceum are evolutionarily and structurally conserved, suggesting that these proteins may play essential roles for avian malaria parasites to invade host erythrocytes. In addition, host-driven selection presumably results in the high levels of genetic variation found in ama-1 of avian Plasmodium species. These findings have implications for investigating avian malaria epidemiology and population dynamics. Moreover, this work highlights the P. gallinaceum transcriptome as an important public resource for investigating the diversity and evolution of essential Plasmodium genes.
Project description:Human African trypanosomiasis (HAT) is an important public health threat in sub-Saharan Africa. Current drugs are unsatisfactory, and new drugs are being sought. Few validated enzyme targets are available to support drug discovery efforts, so our goal was to obtain essentiality data on genes with proven utility as drug targets. Aminoacyl-tRNA synthetases (aaRSs) are known drug targets for bacterial and fungal pathogens and are required for protein synthesis. Here we survey the essentiality of eight Trypanosoma brucei aaRSs by RNA interference (RNAi) gene expression knockdown, covering an enzyme from each major aaRS class: valyl-tRNA synthetase (ValRS) (class Ia), tryptophanyl-tRNA synthetase (TrpRS-1) (class Ib), arginyl-tRNA synthetase (ArgRS) (class Ic), glutamyl-tRNA synthetase (GluRS) (class 1c), threonyl-tRNA synthetase (ThrRS) (class IIa), asparaginyl-tRNA synthetase (AsnRS) (class IIb), and phenylalanyl-tRNA synthetase (? and ?) (PheRS) (class IIc). Knockdown of mRNA encoding these enzymes in T. brucei mammalian stage parasites showed that all were essential for parasite growth and survival in vitro. The reduced expression resulted in growth, morphological, cell cycle, and DNA content abnormalities. ThrRS was characterized in greater detail, showing that the purified recombinant enzyme displayed ThrRS activity and that the protein localized to both the cytosol and mitochondrion. Borrelidin, a known inhibitor of ThrRS, was an inhibitor of T. brucei ThrRS and showed antitrypanosomal activity. The data show that aaRSs are essential for T. brucei survival and are likely to be excellent targets for drug discovery efforts.
Project description:Trypanosomes are protistan parasites that diverged early in evolution from most eukaryotes. Their streamlined genomes are packed with arrays of tandemly linked genes that are transcribed polycistronically by RNA polymerase (pol) II. Individual mRNAs are processed from pre-mRNA by spliced leader (SL) trans splicing and polyadenylation. While there is no strong evidence that general transcription factors are needed for transcription initiation at these gene arrays, a RNA pol II transcription pre-initiation complex (PIC) is formed on promoters of SLRNA genes, which encode the small nuclear SL RNA, the SL donor in trans splicing. The factors that form the PIC are extremely divergent orthologues of the small nuclear RNA-activating complex, TBP, TFIIA, TFIIB, TFIIH, TFIIE and Mediator. Here, we functionally characterized a heterodimeric complex of unannotated, nuclear proteins that interacts with RNA pol II and is essential for PIC formation, SL RNA synthesis in vivo, SLRNA transcription in vitro, and parasite viability. These functional attributes suggest that the factor represents TFIIF although the amino acid sequences are too divergent to firmly make this conclusion. This work strongly indicates that early-diverged trypanosomes have orthologues of each and every general transcription factor, requiring them for the synthesis of SL RNA.
Project description:Identification of essential genes is critical to understanding the physiology of a species, proposing novel drug targets and uncovering minimal gene sets required for life. Although essential gene sets of several organisms have been determined using large-scale mutagenesis techniques, systematic studies addressing their conservation, genomic context and functions remain scant. Here we integrate 17 essential gene sets from genome-wide in vitro screenings and three gene collections required for growth in vivo, encompassing 15 Bacteria and one Archaea. We refine and generalize important theories proposed using Escherichia coli. Essential genes are typically monogenic and more conserved than nonessential genes. Genes required in vivo are less conserved than those essential in vitro, suggesting that more divergent strategies are deployed when the organism is stressed by the host immune system and unstable nutrient availability. We identified essential analogous pathways that would probably be missed by orthology-based essentiality prediction strategies. For example, Streptococcus sanguinis carries horizontally transferred isoprenoid biosynthesis genes that are widespread in Archaea. Genes specifically essential in Mycobacterium tuberculosis and Burkholderia pseudomallei are reported as potential drug targets. Moreover, essential genes are not only preferentially located in operons, but also occupy the first position therein, supporting the influence of their regulatory regions in driving transcription of whole operons. Finally, these important genomic features are shared between Bacteria and at least one Archaea, suggesting that high order properties of gene essentiality and genome architecture were probably present in the last universal common ancestor or evolved independently in the prokaryotic domains.
Project description:Geptop has performed effectively in the identification of prokaryotic essential genes since its first release in 2013. It estimates gene essentiality for prokaryotes based on orthology and phylogeny. Genome-scale essentiality data of more prokaryotic species are available, and the information has been collected into public essential gene repositories such as DEG and OGEE. A faster and more accurate toolkit is needed to meet the increasing prokaryotic genome data. We updated Geptop by supplementing more validated essentiality data into reference set (from 19 to 37 species), and introducing multi-process technology to accelerate the computing speed. Compared with Geptop 1.0 and other gene essentiality prediction models, Geptop 2.0 can generate more stable predictions and finish the computation in a shorter time. The software is available both as an online server and a downloadable standalone application. We hope that the improved Geptop 2.0 will facilitate researches in gene essentiality and the development of novel antibacterial drugs. The gene essentiality prediction tool is available at http://cefg.uestc.cn/geptop.
Project description:BACKGROUND: Wolbachia (wBm) is an obligate endosymbiotic bacterium of Brugia malayi, a parasitic filarial nematode of humans and one of the causative agents of lymphatic filariasis. There is a pressing need for new drugs against filarial parasites, such as B. malayi. As wBm is required for B. malayi development and fertility, targeting wBm is a promising approach. However, the lifecycle of neither B. malayi nor wBm can be maintained in vitro. To facilitate selection of potential drug targets we computationally ranked the wBm genome based on confidence that a particular gene is essential for the survival of the bacterium. RESULTS: wBm protein sequences were aligned using BLAST to the Database of Essential Genes (DEG) version 5.2, a collection of 5,260 experimentally identified essential genes in 15 bacterial strains. A confidence score, the Multiple Hit Score (MHS), was developed to predict each wBm gene's essentiality based on the top alignments to essential genes in each bacterial strain. This method was validated using a jackknife methodology to test the ability to recover known essential genes in a control genome. A second estimation of essentiality, the Gene Conservation Score (GCS), was calculated on the basis of phyletic conservation of genes across Wolbachia's parent order Rickettsiales. Clusters of orthologous genes were predicted within the 27 currently available complete genomes. Druggability of wBm proteins was predicted by alignment to a database of protein targets of known compounds. CONCLUSION: Ranking wBm genes by either MHS or GCS predicts and prioritizes potentially essential genes. Comparison of the MHS to GCS produces quadrants representing four types of predictions: those with high confidence of essentiality by both methods (245 genes), those highly conserved across Rickettsiales (299 genes), those similar to distant essential genes (8 genes), and those with low confidence of essentiality (253 genes). These data facilitate selection of wBm genes for entry into drug design pipelines.