Extreme mutation bias and high AT content in Plasmodium falciparum.
ABSTRACT: For reasons that remain unknown, the Plasmodium falciparum genome has an exceptionally high AT content compared to other Plasmodium species and eukaryotes in general - nearly 80% in coding regions and approaching 90% in non-coding regions. Here, we examine how this phenomenon relates to genome-wide patterns of de novo mutation. Mutation accumulation experiments were performed by sequential cloning of six P. falciparum isolates growing in human erythrocytes in vitro for 4 years, with 279 clones sampled for whole genome sequencing at different time points. Genome sequence analysis of these samples revealed a significant excess of G:C to A:T transitions compared to other types of nucleotide substitution, which would naturally cause AT content to equilibrate close to the level seen across the P. falciparum reference genome (80.6% AT). These data also uncover an extremely high rate of small indel mutation relative to other species, primarily associated with repetitive AT-rich sequences, in addition to larger-scale structural rearrangements focused in antigen-coding var genes. In conclusion, high AT content in P. falciparum is driven by a systematic mutational bias and ultimately leads to an unusual level of microstructural plasticity, raising the question of whether this contributes to adaptive evolution.
Project description:Studies on DNA from pathogenic organisms, within clinical samples, are often complicated by the presence of large amounts of host, e.g., human DNA. Isolation of pathogen DNA from these samples would improve the efficiency of next-generation sequencing (NGS) and pathogen identification. Here we describe a solution-based hybridisation method for isolation of pathogen DNA from a mixed population. This straightforward and inexpensive technique uses probes made from whole-genome DNA and off-the-shelf reagents. In this study, Escherichia coli DNA was successfully enriched from a mixture of E.coli and human DNA. After enrichment, genome coverage following NGS was significantly higher and the evenness of coverage and GC content were unaffected. This technique was also applied to samples containing a mixture of human and Plasmodium falciparum DNA. The P.falciparum genome is particularly difficult to sequence due to its high AT content (80.6%) and repetitive nature. Post enrichment, a bias in the recovered DNA was observed, with a poorer representation of the AT-rich non-coding regions. This uneven coverage was also observed in pre-enrichment samples, but to a lesser degree. Despite the coverage bias in enriched samples, SNP (single-nucleotide polymorphism) calling in coding regions was unaffected and the majority of samples had over 90% of their coding region covered at 5× depth. This technique shows significant promise as an effective method to enrich pathogen DNA from samples with heavy human contamination, particularly when applied to GC-neutral genomes.
Project description:The genomic architecture of organisms, including nucleotide composition, can be highly variable, even among closely-related species. To better understand the causes leading to structural variation in genomes, information on distinct and diverse genomic features is needed. Malaria parasites are known for encompassing a wide range of genomic GC-content and it has long been thought that Plasmodium falciparum, the virulent malaria parasite of humans, has the most AT-biased eukaryotic genome. Here, I perform comparative genomic analyses of the most AT-rich eukaryotes sequenced to date, and show that the avian malaria parasites Plasmodium gallinaceum, P. ashfordi, and P. relictum have the most extreme coding sequences in terms of AT-bias. Their mean GC-content is 21.21, 21.22 and 21.60?%, respectively, which is considerably lower than the transcriptome of P. falciparum (23.79?%) and other eukaryotes. This information enables a better understanding of genome evolution and raises the question of how certain organisms are able to prosper despite severe compositional constraints.
Project description:Genome variation studies in Plasmodium falciparum have focused on SNPs and, more recently, large-scale copy number polymorphisms and ectopic rearrangements. Here, we examine another source of variation: variable number tandem repeats (VNTRs). Interspersed low complexity features, including the well-studied P. falciparum microsatellite sequences, are commonly classified as VNTRs; however, this study is focused on longer coding VNTR polymorphisms, a small class of copy number variations. Selection against frameshift mutation is a main constraint on tandem repeats (TRs) in coding regions, while limited propagation of TRs longer than 975 nt total length is a minor restriction in coding regions. Comparative analysis of three P. falciparum genomes reveals that more than 9% of all P. falciparum ORFs harbor VNTRs, much more than has been reported for any other species. Moreover, genotyping of VNTR loci in a drug-selected line, progeny of a genetic cross, and 334 field isolates demonstrates broad variability in these sequences. Functional enrichment analysis of ORFs harboring VNTRs identifies stress and DNA damage responses along with chromatin modification activities, suggesting an influence on genome mutability and functional variation. Analysis of the repeat units and their flanking regions in both P. falciparum and Plasmodium reichenowi sequences implicates a replication slippage mechanism in the generation of TRs from an initially unrepeated sequence. VNTRs can contribute to rapid adaptation by localized sequence duplication. They also can confound SNP-typing microarrays or mapping short-sequence reads and therefore must be accounted for in such analyses.
Project description:A 13.6 kb contig of chromosome 5 of Plasmodium berghei, a rodent malaria parasite, has been sequenced and analysed for its coding potential. Assembly and comparison of this genomic locus with the orthologous locus on chromosome 10 of the human malaria Plasmodium falciparum revealed an unexpectedly high level of conservation of the gene organisation and complexity, only partially predicted by current gene-finder algorithms. Adjacent putative genes, transcribed from complementary strands, overlap in their untranslated regions, introns and exons, resulting in a tight clustering of both regulatory and coding sequences, which is unprecedented for genome organisation of PLASMODIUM: In total, six putative genes were identified, three of which are transcribed in gametocytes, the precursor cells of gametes. At least in the case of two multiple exon genes, alternative splicing and alternative transcription initiation sites contribute to a flexible use of the dense information content of this locus. The data of the small sample presented here indicate the value of a comparative approach for Plasmodium to elucidate structure, organisation and gene content of complex genomic loci and emphasise the need to integrate biological data of all Plasmodium species into the P.falciparum genome database and associated projects such as PlasmodB to further improve their annotation.
Project description:BACKGROUND:Plasmodium parasites undergo several major developmental transitions during their complex lifecycle, which are enabled by precisely ordered gene expression programs. Transcriptomes from the 48-h blood stages of the major human malaria parasite Plasmodium falciparum have been described using cDNA microarrays and RNA-seq, but these assays have not always performed well within non-coding regions, where the AT-content is often 90-95%. RESULTS:We developed a directional, amplification-free RNA-seq protocol (DAFT-seq) to reduce bias against AT-rich cDNA, which we have applied to three strains of P. falciparum (3D7, HB3 and IT). While strain-specific differences were detected, overall there is strong conservation between the transcriptional profiles. For the 3D7 reference strain, transcription was detected from 89% of the genome, with over 78% of the genome transcribed into mRNAs. We also find that transcription from bidirectional promoters frequently results in non-coding, antisense transcripts. These datasets allowed us to refine the 5' and 3' untranslated regions (UTRs), which can be variable, long (>?1000?nt), and often overlap those of adjacent transcripts. CONCLUSIONS:The approaches applied in this study allow a refined description of the transcriptional landscape of P. falciparum and demonstrate that very little of the densely packed P. falciparum genome is inactive or redundant. By capturing the 5' and 3' ends of mRNAs, we reveal both constant and dynamic use of transcriptional start sites across the intraerythrocytic developmental cycle that will be useful in guiding the definition of regulatory regions for use in future experimental gene expression studies.
Project description:The application of next-generation sequencing to estimate genetic diversity of Plasmodium falciparum, the most lethal malaria parasite, has proved challenging due to the skewed AT-richness [?80.6% (A?+?T)] of its genome and the lack of technology to assemble highly polymorphic subtelomeric regions that contain clonally variant, multigene virulence families (Ex: var and rifin). To address this, we performed amplification-free, single molecule, real-time sequencing of P. falciparum genomic DNA and generated reads of average length 12?kb, with 50% of the reads between 15.5 and 50?kb in length. Next, using the Hierarchical Genome Assembly Process, we assembled the P. falciparum genome de novo and successfully compiled all 14 nuclear chromosomes telomere-to-telomere. We also accurately resolved centromeres [?90-99% (A?+?T)] and subtelomeric regions and identified large insertions and duplications that add extra var and rifin genes to the genome, along with smaller structural variants such as homopolymer tract expansions. Overall, we show that amplification-free, long-read sequencing combined with de novo assembly overcomes major challenges inherent to studying the P. falciparum genome. Indeed, this technology may not only identify the polymorphic and repetitive subtelomeric sequences of parasite populations from endemic areas but may also evaluate structural variation linked to virulence, drug resistance and disease transmission.
Project description:Plasmodium parasites are valuable models to understand how nucleotide composition affects mutation, diversification, and adaptation. No other observed eukaryotes have undergone such large changes in genomic Guanine-Cytosine (GC) content as seen in the genus Plasmodium (∼30% within 35-40 Myr). Although mutational biases are known to influence GC content in the human-infective Plasmodium vivax and Plasmodium falciparum; no study has addressed how different gene functional classes contribute to genus-wide compositional changes, or if Plasmodium GC content variation is driven by natural selection. Here, we tested the hypothesis that certain gene processes and functions drive variation in global GC content between Plasmodium species. We performed a large-scale comparative genomic analysis using the genomes and predicted genes of 17 Plasmodium species encompassing a wide genomic GC content range. Genic GC content was sorted and divided into ten equally sized quantiles that were then assessed for functional enrichment classes. In agreement that selection on gene classes may drive genomic GC content, trans-membrane proteins were enriched within extreme GC content quantiles (Q1 and Q10). Specifically, variant surface antigens, which primarily interact with vertebrate immune systems, showed skewed GC content distributions compared with other trans-membrane proteins. Although a definitive causation linking GC content, expression, and positive selection within variant surface antigens from Plasmodium vivax, Plasmodium berghei, and Plasmodium falciparum could not be established, we found that regardless of genomic nucleotide composition, genic GC content and expression were positively correlated during trophozoite stages. Overall, these data suggest that, alongside mutational biases, functional protein classes drive Plasmodium GC content change.
Project description:Plasmodium parasites, the causal agents of malaria, result in more than 1 million deaths annually. Plasmodium are unicellular eukaryotes with small ?23 Mb genomes encoding ?5200 protein-coding genes. The protein-coding genes comprise about half of these genomes. Although evolutionary processes have a significant impact on malaria control, the selective pressures within Plasmodium genomes are poorly understood, particularly in the non-protein-coding portion of the genome. We use evolutionary methods to describe selective processes in both the coding and non-coding regions of these genomes. Based on genome alignments of seven Plasmodium species, we show that protein-coding, intergenic and intronic regions are all subject to purifying selection and we identify 670 conserved non-genic elements. We then use genome-wide polymorphism data from P. falciparum to describe short-term selective processes in this species and identify some candidate genes for balancing (diversifying) selection. Our analyses suggest that there are many functional elements in the non-genic regions of these genomes and that adaptive evolution has occurred more frequently in the protein-coding regions of the genome.
Project description:BACKGROUND:Drug resistance within the major malaria parasites Plasmodium vivax and Plasmodium falciparum threatens malaria control and elimination in Southeast Asia. Plasmodium vivax first-line treatment drug is chloroquine together with primaquine, and the first-line treatment for P. falciparum malaria is artemisinin in combination with a partner drug. Plasmodium vivax and P. falciparum parasites resistant to their respective first-line therapies are now found within Southeast Asia. The resistance perimeters may include high transmission regions of Southern Thailand which are underrepresented in surveillance efforts. METHODS:This study investigated blood samples from malaria centres in Southern Thailand. Genetic loci associated with drug resistance were amplified and sequenced. Drug resistance associated genes Pvmdr1, Pvcrt-o, Pvdhfr, and Pvdhps were characterized for 145 cases of P. vivax malaria, as well as the artemisinin resistance-associated Pfkelch13 gene from 91 cases of P. falciparum malaria. RESULTS:Plasmodium vivax samples from Southern Thai provinces showed numerous chloroquine and antifolate resistance-associated mutations, including SNP and Pvcrt-o K10-insertion combinations suggestive of chloroquine resistant P. vivax phenotypes. A high proportion of the C580Y coding mutation (conferring artemisinin resistance) was detected in P. falciparum samples originating from Ranong and Yala (where the mutation was previously unreported). CONCLUSIONS:The results demonstrate a risk of chloroquine and antifolate resistant P. vivax phenotypes in Southern Thailand, and artemisinin resistant P. falciparum observed as far south as the Thai-Malaysian border region. Ongoing surveillance of antimalarial drug resistance markers is called for in Southern Thailand to inform case management.
Project description:The Plasmodium falciparum genome being AT-rich, the presence of GC-rich regions suggests functional significance. Evolution imposes selection pressure to retain functionally important coding and regulatory elements. Hence searching for evolutionarily conserved GC-rich, intergenic regions in an AT-rich genome will help in discovering new coding regions and regulatory elements. We have used elevated GC content in intergenic regions coupled with sequence conservation against P. reichenowi, which is evolutionarily closely related to P. falciparum to identify potential sequences of functional importance. Interestingly, ~30% of the GC-rich, conserved sequences were associated with antigenic proteins encoded by var and rifin genes. The majority of sequences identified in the 5' UTR of var genes are represented by short expressed sequence tags (ESTs) in cDNA libraries signifying that they are transcribed in the parasite. Additionally, 19 sequences were located in the 3' UTR of rifins and 4 also have overlapping ESTs. Further analysis showed that several sequences associated with var genes have the capacity to encode small peptides. A previous report has shown that upstream peptides can regulate the expression of var genes hence we propose that these conserved GC-rich sequences may play roles in regulation of gene expression.