Genome annotation improvements from cross-phyla proteogenomics and time-of-day differences in malaria mosquito proteins using untargeted quantitative proteomics.
ABSTRACT: The malaria mosquito, Anopheles stephensi, and other mosquitoes modulate their biology to match the time-of-day. In the present work, we used a non-hypothesis driven approach (untargeted proteomics) to identify proteins in mosquito tissue, and then quantified the relative abundance of the identified proteins from An. stephensi bodies. Using these quantified protein levels, we then analyzed the data for proteins that were only detectable at certain times-of-the day, highlighting the need to consider time-of-day in experimental design. Further, we extended our time-of-day analysis to look for proteins which cycle in a rhythmic 24-hour ("circadian") manner, identifying 31 rhythmic proteins. Finally, to maximize the utility of our data, we performed a proteogenomic analysis to improve the genome annotation of An. stephensi. We compare peptides that were detected using mass spectrometry but are 'missing' from the An. stephensi predicted proteome, to reference proteomes from 38 other primarily human disease vector species. We found 239 such peptide matches and reveal that genome annotation can be improved using proteogenomic analysis from taxonomically diverse reference proteomes. Examination of 'missing' peptides revealed reading frame errors, errors in gene-calling, overlapping gene models, and suspected gaps in the genome assembly.
Project description:Fat body from Anopheles stephensi female mosquitoes were dissected and processed for proteomic analysis. Both SDS-PAGE and basic Reverse Phase Liquid Chromatography-based fractionation strategies were used to achieve a broad coverage of protein identification. The fractionated peptides were then analyzed on a high-resolution mass spectrometer. Searching the raw data against the protein database of An. stephensi resulted in identification of 4535 proteins, which is, to our knowledge, the largest catalog of fat body proteome in any mosquito vector species reported so far. Bioinformatics analysis on these fat body proteins suggested the enrichment of biological processes including carbon and lipid metabolism, amino acid metabolism, signal peptide processing and oxidation-reduction. In addition, using proteogenomic approaches, 43 novel proteins were identified, which were not listed in the annotated gene annotations of An. stephensi. The data used in the analysis are related to the article 'Integrating transcriptomic and proteomic data for accurate assembly and annotation of genomes' (Prasad et al., 2017).
Project description:BACKGROUND: New strategies for high-throughput sequencing are constantly appearing, leading to a great increase in the number of completely sequenced genomes. Unfortunately, computational genome annotation is out of step with this progress. Thus, the accurate annotation of these genomes has become a bottleneck of knowledge acquisition. RESULTS: We exploited a proteogenomic approach to improve conventional genome annotation by integrating proteomic data with genomic information. Using Shigella flexneri 2a as a model, we identified total 823 proteins, including 187 hypothetical proteins. Among them, three annotated ORFs were extended upstream through comprehensive analysis against an in-house N-terminal extension database. Two genes, which could not be translated to their full length because of stop codon 'mutations' induced by genome sequencing errors, were revised and annotated as fully functional genes. Above all, seven new ORFs were discovered, which were not predicted in S. flexneri 2a str.301 by any other annotation approaches. The transcripts of four novel ORFs were confirmed by RT-PCR assay. Additionally, most of these novel ORFs were overlapping genes, some even nested within the coding region of other known genes. CONCLUSIONS: Our findings demonstrate that current Shigella genome annotation methods are not perfect and need to be improved. Apart from the validation of predicted genes at the protein level, the additional features of proteogenomic tools include revision of annotation errors and discovery of novel ORFs. The complementary dataset could provide more targets for those interested in Shigella to perform functional studies.
Project description:Single molecule real-time (SMRT) sequencing has recently been used to obtain full-length cDNA sequences that improve genome annotation and reveal RNA isoforms. Here, we used one such method called isoform sequencing from Pacific Biosciences (PacBio) to sequence a cDNA library from the Asian malaria mosquito Anopheles stephensi. More than 600 000 full-length cDNAs, referred to as reads of insert, were identified. Owing to the inherently high error rate of PacBio sequencing, we tested different approaches for error correction. We found that error correction using Illumina RNA sequencing (RNA-seq) generated more data than using the default SMRT pipeline. The full-length error-corrected PacBio reads greatly improved the gene annotation of Anopheles stephensi: 4867 gene models were updated and 1785 alternatively spliced isoforms were added to the annotation. In addition, six trans-splicing events, where exons from different primary transcripts were joined together, were identified in An. stephensi. All six trans-splicing events appear to be conserved in Culicidae, as they are also found in Anopheles gambiae and Aedes aegypti. The proteins encoded by trans-splicing events are also highly conserved and the orthologues of these proteins are cis-spliced in outgroup species, indicating that trans-splicing may arise as a mechanism to rescue genes that broke up during evolution.
Project description:Plasmodium falciparum sporozoites that develop and mature inside an Anopheles mosquito initiate a malaria infection in humans. Here we report the first proteomic comparison of different parasite stages from the mosquito -- early and late oocysts containing midgut sporozoites, and the mature, infectious salivary gland sporozoites. Despite the morphological similarity between midgut and salivary gland sporozoites, their proteomes are markedly different, in agreement with their increase in hepatocyte infectivity. The different sporozoite proteomes contain a large number of stage specific proteins whose annotation suggest an involvement in sporozoite maturation, motility, infection of the human host and associated metabolic adjustments. Analyses of proteins identified in the P. falciparum sporozoite proteomes by orthologous gene disruption in the rodent malaria parasite, P. berghei, revealed three previously uncharacterized Plasmodium proteins that appear to be essential for sporozoite development at distinct points of maturation in the mosquito. This study sheds light on the development and maturation of the malaria parasite in an Anopheles mosquito and also identifies proteins that may be essential for sporozoite infectivity to humans.
Project description:The aim of the present study was to identify the proteomic differences among human lenses in different physiopathological states and to screen for susceptibility genes/proteins via proteogenomic characterization.The total proteomes identified across the regenerative lens with secondary cataract (RLSC), congenital cataract (CC) and age-related cataract (ARC) groups were compared to those of normal lenses using isobaric tagging for relative and absolute protein quantification (iTRAQ). The up-regulated proteins between the groups were subjected to biological analysis. Whole exome sequencing (WES) was performed to detect genetic variations.The most complete human lens proteome to date, which consisted of 1251 proteins, including 55.2% previously unreported proteins, was identified across the experimental groups. Bioinformatics functional annotation revealed the common involvement of cellular metabolic processes, immune responses and protein folding disturbances among the groups. RLSC-over-expressed proteins were characteristically enriched in the intracellular immunological signal transduction pathways. The CC groups featured biological processes relating to gene expression and vascular endothelial growth factor (VEGF) signaling transduction, whereas the molecular functions corresponding to external stress were specific to the ARC groups. Combined with WES, the proteogenomic characterization narrowed the list to 16 candidate causal molecules.These findings revealed common final pathways with diverse upstream regulation of cataractogenesis in different physiopathological states. This proteogenomic characterization shows translational potential for detecting susceptibility genes/proteins in precision medicine.
Project description:The genome sequence of rhesus macaque is a draft version with many errors and is lack of Y chromosome annotation. In the present dataset, we reanalyzed the previously published macaque testis proteome. We searched for refined protein sequences, potential Y chromosome proteins and transcripts predicted proteins in addition to the latest Ensembl protein sequences of macaque. A total of 74,433 peptides corresponding to 9247 protein groups were identified, and the data are supplied in this paper. The updated version of macaque testis proteome provided evidences for predicted genes or transcripts at the peptide level. It can be used for further in-depth proteogenomic annotation of macaque genome and is useful for studying the mechanisms of macaque spermatogenesis.
Project description:Anopheles stephensi is the key vector of malaria throughout the Indian subcontinent and Middle East and an emerging model for molecular and genetic studies of mosquito-parasite interactions. The type form of the species is responsible for the majority of urban malaria transmission across its range.Here, we report the genome sequence and annotation of the Indian strain of the type form of An. stephensi. The 221 Mb genome assembly represents more than 92% of the entire genome and was produced using a combination of 454, Illumina, and PacBio sequencing. Physical mapping assigned 62% of the genome onto chromosomes, enabling chromosome-based analysis. Comparisons between An. stephensi and An. gambiae reveal that the rate of gene order reshuffling on the X chromosome was three times higher than that on the autosomes. An. stephensi has more heterochromatin in pericentric regions but less repetitive DNA in chromosome arms than An. gambiae. We also identify a number of Y-chromosome contigs and BACs. Interspersed repeats constitute 7.1% of the assembled genome while LTR retrotransposons alone comprise more than 49% of the Y contigs. RNA-seq analyses provide new insights into mosquito innate immunity, development, and sexual dimorphism.The genome analysis described in this manuscript provides a resource and platform for fundamental and translational research into a major urban malaria vector. Chromosome-based investigations provide unique perspectives on Anopheles chromosome evolution. RNA-seq analysis and studies of immunity genes offer new insights into mosquito biology and mosquito-parasite interactions.
Project description:In Aedes and Anopheles mosquitoes, ribosomal protein RPS6 has an unusual C-terminal extension that resembles histone H1 proteins. To explore homology between a mosquito H1 histone and the RPS6 tail, we took advantage of the Anopheles gambiae genome database to clone a histone H1 gene from an Anopheles stephensi mosquito cell line.We designed specific primers based on RPS6 and histone H1 alignments to recover an Anopheles stephensi histone H1 corresponding to a conceptual An. gambiae protein, with 92% identity. Southern blots suggested that Anopheles stephensi histone H1 gene has multiple variants, as is also the case for histone H1 proteins in Chironomid flies.Histone H1 proteins from Anopheles stephensi and Anopheles gambiae mosquitoes share 92% identity to each other, but only 50% identity to a Drosophila homolog. In a phylogenetic analysis, Anopheles, Chironomus and Drosophila histone H1 proteins cluster separately from the histone H1-like, C-terminal tails on RPS6 in Aedes and Anopheles mosquitoes. These observations suggest that the resemblance between histone H1 and the C-terminal extensions on mosquito RPS6 has been maintained by convergent evolution.
Project description:<h4>Background</h4>microRNAs (miRNAs) are non-coding RNAs that are now recognized as a major class of gene-regulating molecules widely distributed in metozoans and plants. miRNAs have been found to play important roles in apoptosis, cancer, development, differentiation, inflammation, longevity, and viral infection. There are a few reports describing miRNAs in the African malaria mosquito, Anopheles gambiae, on the basis of similarity to known miRNAs from other species. An. stephensi is the most important malaria vector in Asia and it is becoming a model Anopheline species for physiological and genetics studies.<h4>Results</h4>We report the cloning and characterization of 27 distinct miRNAs from 17-day old An. stephensi female mosquitoes. Seventeen of the 27 miRNAs matched previously predicted An. gambiae miRNAs, offering the first experimental verification of miRNAs from mosquito species. Ten of the 27 are miRNAs previously unknown to mosquitoes, four of which did not match any known miRNAs in any organism. Twenty-five of the 27 Anopheles miRNAs had conserved sequences in the genome of a divergent relative, the yellow fever mosquito Aedes aegypti. Two clusters of miRNAs were found within introns of orthologous genes in An. gambiae, Ae. aegypti, and Drosophila melanogaster. Mature miRNAs were detected in An. stephensi for all of the nine selected miRNAs, including the four novel miRNAs (miR-x1- miR-x4), either by northern blot or by Ribonuclease Protection Assay. Expression profile analysis of eight of these miRNAs revealed distinct expression patterns from early embryo to adult stages in An. stephensi. In both An. stephensi and Ae. aegypti, the expression of miR-x2 was restricted to adult females and predominantly in the ovaries. A significant reduction of miR-x2 level was observed 72 hrs after a blood meal. Thus miR-x2 is likely involved in female reproduction and its function may be conserved among divergent mosquitoes. A mosquito homolog of miR-14, a regulator of longevity and apoptosis in D. melanogaster, represented 25% of all sequenced miRNA clones from 17-day old An. stephensi female mosquitoes. An. stephensi miR-14 displayed a relatively strong signal from late embryonic to adult stages. miR-14 expression is consistent during the adult lifespan regardless of age, sex, and blood feeding status. Thus miR-14 is likely important across all mosquito life stages.<h4>Conclusion</h4>This study provides experimental evidence for 23 conserved and four new microRNAs in An. stephensi mosquitoes. Comparisons between miRNA gene clusters in Anopheles and Aedes mosquitoes, and in D. melanogaster suggest the loss or significant change of two miRNA genes in Ae. aegypti. Expression profile analysis of eight miRNAs, including the four new miRNAs, revealed distinct patterns from early embryo to adult stages in An. stephensi. Further analysis showed that miR-x2 is likely involved in female reproduction and its function may be conserved among divergent mosquitoes. Consistent expression of miR-14 suggests that it is likely important across all mosquito life stages from embryos to aged adults. Understanding the functions of mosquito miRNAs will undoubtedly contribute to a better understanding of mosquito biology including longevity, reproduction, and mosquito-pathogen interactions, which are important to disease transmission.
Project description:Mosquitoes with their ability to transmit several pathogens of human disease pose a serious threat to healthcare worldwide. Although much has been done to prevent the disease transmission by mosqitos. The rising rate of resistance in mosquitos towards conventionally used control strategies necessitates developing of novel strategies to counter disease transmission. The mosquito brain plays a key role in host-seeking, finding mates and selection of oviposition sites. However, not much is know about the underlying physiological processes in mosquito brain. The data presented in this study describes the proteins that have been identified in the brain tissue of adult female Anopheles stephensi and their associated processes. Interpretation of the data can be related to the previously published article "Integrating transcriptomics and proteomics data for accurate assembly and annotation of genomes" .