Amplification methods bias metagenomic libraries of uncultured single-stranded and double-stranded DNA viruses.
ABSTRACT: Investigation of viruses in the environment often requires the amplification of viral DNA before sequencing of viral metagenomes. In this study, two of the most widely used amplification methods, the linker amplified shotgun library (LASL) and multiple displacement amplification (MDA) methods, were applied to a sample from the seawater surface. Viral DNA was extracted from viruses concentrated by tangential flow filtration and amplified by these two methods. 454 pyrosequencing was used to read the metagenomic sequences from different libraries. The resulting taxonomic classifications of the viruses, their functional assignments, and assembly patterns differed substantially depending on the amplification method. Only double-stranded DNA viruses were retrieved from the LASL, whereas most sequences in the MDA library were from single-stranded DNA viruses, and double-stranded DNA viral sequences were minorities. Thus, the two amplification methods reveal different aspects of viral diversity.
Project description:Viruses are known to be the most numerous biological entities in soil; however, little is known about their diversity in this environment. In order to explore the genetic diversity of soil viruses, we isolated viruses by centrifugation and sequential filtration before performing a metagenomic investigation. We adopted multiple-displacement amplification (MDA), an isothermal whole-genome amplification method with phi29 polymerase and random hexamers, to amplify viral DNA and construct clone libraries for metagenome sequencing. By the MDA method, the diversity of both single-stranded DNA (ssDNA) viruses and double-stranded DNA viruses could be investigated at the same time. On the contrary, by eliminating the denaturing step in the MDA reaction, only ssDNA viral diversity could be explored selectively. Irrespective of the denaturing step, more than 60% of the soil metagenome sequences did not show significant hits (E-value criterion, 0.001) with previously reported viral sequences. Those hits that were considered to be significant were also distantly related to known ssDNA viruses (average amino acid similarity, approximately 34%). Phylogenetic analysis showed that replication-related proteins (which were the most frequently detected proteins) related to those of ssDNA viruses obtained from the metagenomic sequences were diverse and novel. Putative circular genome components of ssDNA viruses that are unrelated to known viruses were assembled from the metagenomic sequences. In conclusion, ssDNA viral diversity in soil is more complex than previously thought. Soil is therefore a rich pool of previously unknown ssDNA viruses.
Project description:BACKGROUND:Viruses strongly influence microbial population dynamics and ecosystem functions. However, our ability to quantitatively evaluate those viral impacts is limited to the few cultivated viruses and double-stranded DNA (dsDNA) viral genomes captured in quantitative viral metagenomes (viromes). This leaves the ecology of non-dsDNA viruses nearly unknown, including single-stranded DNA (ssDNA) viruses that have been frequently observed in viromes, but not quantified due to amplification biases in sequencing library preparations (Multiple Displacement Amplification, Linker Amplification or Tagmentation). METHODS:Here we designed mock viral communities including both ssDNA and dsDNA viruses to evaluate the capability of a sequencing library preparation approach including an Adaptase step prior to Linker Amplification for quantitative amplification of both dsDNA and ssDNA templates. We then surveyed aquatic samples to provide first estimates of the abundance of ssDNA viruses. RESULTS:Mock community experiments confirmed the biased nature of existing library preparation methods for ssDNA templates (either largely enriched or selected against) and showed that the protocol using Adaptase plus Linker Amplification yielded viromes that were ±1.8-fold quantitative for ssDNA and dsDNA viruses. Application of this protocol to community virus DNA from three freshwater and three marine samples revealed that ssDNA viruses as a whole represent only a minor fraction (<5%) of DNA virus communities, though individual ssDNA genomes, both eukaryote-infecting Circular Rep-Encoding Single-Stranded DNA (CRESS-DNA) viruses and bacteriophages from the Microviridae family, can be among the most abundant viral genomes in a sample. DISCUSSION:Together these findings provide empirical data for a new virome library preparation protocol, and a first estimate of ssDNA virus abundance in aquatic systems.
Project description:Metagenomics can be used to determine the diversity of complex, often unculturable, viral communities with various nucleic acid compositions. Here, we report the use of hydroxyapatite chromatography to efficiently fractionate double-stranded DNA (dsDNA), single-stranded DNA (ssDNA), dsRNA, and ssRNA genomes from known bacteriophages. Linker-amplified shotgun libraries were constructed to generate sequencing reads from each hydroxyapatite fraction. Greater than 90% of the reads displayed significant similarity to the expected genomes at the nucleotide level. These methods were applied to marine viruses collected from the Chesapeake Bay and the Dry Tortugas National Park. Isolated nucleic acids were fractionated using hydroxyapatite chromatography followed by linker-amplified shotgun library construction and sequencing. Taxonomic analysis demonstrated that the majority of environmental sequences, regardless of their source nucleic acid, were most similar to dsDNA viruses, reflecting the bias of viral metagenomic sequence databases.
Project description:The present study illustrates the genetic diversity of four uncultured viral communities from the surface waters of Cochin Estuary (CE), India. Viral diversity inferred using Illumina HiSeq paired-end sequencing using a linker-amplified shotgun library (LASL) revealed different double-stranded DNA (dsDNA) viral communities. The water samples were collected from four stations PR1, PR2, PR3, and PR4, during the pre-monsoon (PRM) season. Analysis of virus families indicated that the Myoviridae was the most common viral community in the CE followed by Siphoviridae and Podoviridae. There were significant (p?<?0.05) spatial variations in the relative abundance of dominant families in response to the salinity regimes. The relative abundance of Myoviridae and Podoviridae were high in the euryhaline region and Siphoviridae in the mesohaline region of the estuary. The predominant phage type in CE was phages that infected Synechococcus. The viral proteins were found to be involved in major functional activities such as ATP binding, DNA binding, and DNA replication. The study highlights the genetic diversity of dsDNA viral communities and their functional protein predictions from a highly productive estuarine system. Further, the metavirome data generated in this study will enhance the repertoire of publicly available dataset and advance our understanding of estuarine viral ecology.
Project description:BACKGROUND:Viruses are key players regulating microbial ecosystems. Exploration of viral assemblages is now possible thanks to the development of metagenomics, the most powerful tool available for studying viral ecology and discovering new viruses. Unfortunately, several sources of bias lead to the misrepresentation of certain viruses within metagenomics workflows, hindering the shift from merely descriptive studies towards quantitative comparisons of communities. Therefore, benchmark studies on virus enrichment and random amplification protocols are required to better understand the sources of bias. RESULTS:We assessed the bias introduced by viral enrichment on mock assemblages composed of seven DNA viruses, and the bias from random amplification methods on human saliva DNA viromes, using qPCR and deep sequencing, respectively. While iodixanol cushions and 0.45 ?m filtration preserved the original composition of nuclease-protected viral genomes, low-force centrifugation and 0.22 ?m filtration removed large viruses. Comparison of unamplified and randomly amplified saliva viromes revealed that multiple displacement amplification (MDA) induced stochastic bias from picograms of DNA template. However, the type of bias shifted to systematic using 1 ng, with only a marginal influence by amplification time. Systematic bias consisted of over-amplification of small circular genomes, and under-amplification of those with extreme GC content, a negative bias that was shared with the PCR-based sequence-independent, single-primer amplification (SISPA) method. MDA based on random priming provided by a DNA primase activity slightly outperformed those based on random hexamers and SISPA, which may reflect differences in ability to handle sequences with extreme GC content. SISPA viromes showed uneven coverage profiles, with high coverage peaks in regions with low linguistic sequence complexity. Despite misrepresentation of certain viruses after random amplification, ordination plots based on dissimilarities among contig profiles showed perfect overlapping of related amplified and unamplified saliva viromes and strong separation from unrelated saliva viromes. This result suggests that random amplification bias has a minor impact on beta diversity studies. CONCLUSIONS:Benchmark analyses of mock and natural communities of viruses improve understanding and mitigate bias in metagenomics surveys. Bias induced by random amplification methods has only a minor impact on beta diversity studies of human saliva viromes.
Project description:In this study, we investigated the abundance and diversity of single-stranded DNA (ssDNA) viruses in fecal samples from five healthy individuals through a combination of serial filtration and CsCl gradient ultracentrifugation. Virus abundance ranged from 10? to 10? per gram of feces, and virus-to-bacterium ratios were much lower (less than 0.1) than those observed in aquatic environments (5 to 10). Viral DNA was extracted and randomly amplified using phi29 polymerase and analyzed through high-throughput 454 pyrosequencing. Among 400,133 sequences, an average of 86.2% viromes were previously uncharacterized in public databases. Among previously known viruses, double-stranded DNA podophages (52 to 74%), siphophages (11 to 30%), myophages (1 to 4%), and ssDNA microphages (3 to 9%) were major constituents of human fecal viromes. A phylogenetic analysis of 24 large contigs of microphages based on conserved capsid protein sequences revealed five distinct newly discovered evolutionary microphage groups that were distantly related to previously known microphages. Moreover, putative capsid protein sequences of five contigs were closely related to prophage-like sequences in the genomes of three Bacteroides and three Prevotella strains, suggesting that Bacteroides and Prevotella are the sources of infecting microphages in their hosts.
Project description:The relationship between parasitoid wasps and polydnaviruses constitutes one of the few known mutualisms between viruses and eukaryotes. Viral particles are injected with the wasp eggs into parasitized larvae, and the viral genes thus introduced are used to manipulate lepidopteran host physiology. The genome packaged in the particles is composed of 35 double-stranded DNA (dsDNA) circles produced in wasp ovaries by amplification of viral sequences from proviral segments integrated in tandem arrays in the wasp genome. These segments and their flanking regions within the genome of the wasp Cotesia congregata were recently isolated, allowing extensive mapping of amplified sequences. The bracovirus DNAs packaged in the particles were found to be amplified within more than 12 replication units. Strikingly, the nudiviral cluster, the genes of which encode particle structural components, was also amplified, although not encapsidated. Amplification of bracoviral sequences was shown to involve successive head-to-head and tail-to-tail concatemers, which was not expected given the nudiviral origin of bracoviruses.
Project description:Metagenomics is a powerful tool for characterizing viral composition within environmental samples, but sample and molecular processing steps can bias the estimation of viral community structure. The objective of this study is to understand the inherent variability introduced when conducting viral metagenomic analyses of wastewater and provide a bioinformatic strategy to accurately analyze sequences for viral community analyses. A standard approach using a combination of ultrafiltration, membrane filtration, and DNase treatment, and multiple displacement amplification (MDA) produced DNA preparations without any bacterial derived genes. Results showed recoveries in wastewater matrix ranged between 60-100%. A bias towards small single stranded DNA (ssDNA; polyomavirus) virus types vs larger double stranded DNA (dsDNA; adenovirus) viruses was also observed with a total estimated recovery of small circular viruses to be as much as 173-fold higher. Notably, ssDNA abundance decreased with sample dilution while large dsDNA genomes (e.g., Caudovirales) initially increased in abundance with dilution before gradually decreasing with further dilution in wastewater samples. The present study revealed the inherent biases associated with different components of viral metagenomic methods applied to wastewater. Overall, these results provide a well-characterized approach for effectively conducting viral metagenomics analysis of wastewater and reveal that dilution can effectively mitigate MDA bias.
Project description:With the emergence of Next Generation Sequencing, major advances were made with regard to identifying viruses in natural environments. However, bioinformatical research on viruses is still limited because of the low amounts of viral DNA that can be obtained for analysis. To overcome this limitation, DNA is often amplified with multiple displacement amplification (MDA), which may cause an unavoidable bias. Here, we describe a case study in which the virome of a bioreactor is sequenced using Ion Torrent technology. DNA-spiking of samples is compared with MDA-amplified samples. DNA for spiking was obtained by amplifying a bacterial 16S rRNA gene. After sequencing, the 16S rRNA gene reads were removed by mapping to the Silva database. Three samples were tested, a whole genome from Enterobacteria P1 Phage and two viral metagenomes from an infected bioreactor. For one sample, the new DNA-spiking protocol was compared with the MDA technique. When MDA was applied, the overall GC content of the reads showed a bias towards lower GC%, indicating a change in composition of the DNA sample. Assemblies using all available reads from both MDA and the DNA-spiked samples resulted in six viral genomes. All six genomes could be almost completely retrieved (97.9%-100%) when mapping the reads from the DNA-spiked sample to those six genomes. In contrast, 6.3%-77.7% of three viral genomes was covered by reads obtained using the MDA amplification method and only three were nearly fully covered (97.4%-100%). This case study shows that DNA-spiking could be a simple and inexpensive alternative with very low bias for sequencing of metagenomes for which low amounts of DNA are available.
Project description:Horizontal gene transfer commonly occurs from cells to viruses but rarely occurs from viruses to their host cells, with the exception of retroviruses and some DNA viruses. However, extensive sequence similarity searches in public genome databases for various organisms showed that the capsid protein and RNA-dependent RNA polymerase genes from totiviruses and partitiviruses have widespread homologs in the nuclear genomes of eukaryotic organisms, including plants, arthropods, fungi, nematodes, and protozoa. PCR amplification and sequencing as well as comparative evidence of junction coverage between virus and host sequences support the conclusion that these viral homologs are real and occur in eukaryotic genomes. Sequence comparison and phylogenetic analysis suggest that these genes were likely transferred horizontally from viruses to eukaryotic genomes. Furthermore, we present evidence showing that some of the transferred genes are conserved and expressed in eukaryotic organisms and suggesting that these viral genes are also functional in the recipient genomes. Our findings imply that horizontal transfer of double-stranded RNA viral genes is widespread among eukaryotes and may give rise to functionally important new genes, thus entailing that RNA viruses may play significant roles in the evolution of eukaryotes.