Vast diversity of prokaryotic virus genomes encoding double jelly-roll major capsid proteins uncovered by genomic and metagenomic sequence analysis.
ABSTRACT: BACKGROUND:Analysis of metagenomic sequences has become the principal approach for the study of the diversity of viruses. Many recent, extensive metagenomic studies on several classes of viruses have dramatically expanded the visible part of the virosphere, showing that previously undetected viruses, or those that have been considered rare, actually are important components of the global virome. RESULTS:We investigated the provenance of viruses related to tail-less bacteriophages of the family Tectiviridae by searching genomic and metagenomics sequence databases for distant homologs of the tectivirus-like Double Jelly-Roll major capsid proteins (DJR MCP). These searches resulted in the identification of numerous genomes of virus-like elements that are similar in size to tectiviruses (10-15 kilobases) and have diverse gene compositions. By comparison of the gene repertoires, the DJR MCP-encoding genomes were classified into 6 distinct groups that can be predicted to differ in reproduction strategies and host ranges. Only the DJR MCP gene that is present by design is shared by all these genomes, and most also encode a predicted DNA-packaging ATPase; the rest of the genes are present only in subgroups of this unexpectedly diverse collection of DJR MCP-encoding genomes. Only a minority encode a DNA polymerase which is a hallmark of the family Tectiviridae and the putative family "Autolykiviridae". Notably, one of the identified putative DJR MCP viruses encodes a homolog of Cas1 endonuclease, the integrase involved in CRISPR-Cas adaptation and integration of transposon-like elements called casposons. This is the first detected occurrence of Cas1 in a virus. Many of the identified elements are individual contigs flanked by inverted or direct repeats and appear to represent complete, extrachromosomal viral genomes, whereas others are flanked by bacterial genes and thus can be considered as proviruses. These contigs come from metagenomes of widely different environments, some dominated by archaea and others by bacteria, suggesting that collectively, the DJR MCP-encoding elements have a broad host range among prokaryotes. CONCLUSIONS:The findings reported here greatly expand the known host range of (putative) viruses of bacteria and archaea that encode a DJR MCP. They also demonstrate the extreme diversity of genome architectures in these viruses that encode no universal proteins other than the capsid protein that was used as the marker for their identification. From a supposedly minor group of bacterial and archaeal viruses, these viruses are emerging as a substantial component of the prokaryotic virome.
Project description:Mobile genetic elements such as DNA transposons are a feature of most genomes. The existence of novel DNA transposons can be inferred when whole genome sequencing reveals the presence of hallmarks of mobile elements such as terminal inverted repeats (TIRs) flanked by target site duplications (TSDs). A recent report describes a new superfamily of DNA transposons in the genomes of a few bacteria and archaea that possess TIRs and TSDs, and encode several conserved genes including a cas1 endonuclease gene, previously associated only with CRISPR-Cas adaptive immune systems. The data strongly suggests that these elements, designated 'casposons', are likely to be bona fide DNA transposons and that their Cas1 nucleases act as transposases and are possibly still active.
Project description:Many archaea and bacteria have an adaptive immune system known as CRISPR which allows them to recognize and destroy foreign nucleic acid that they have previously encountered. Two CRISPR-associated proteins, Cas1 and Cas2, are required for the acquisition step of adaptation, in which fragments of foreign DNA are incorporated into the host CRISPR locus. Cas1 genes have also been found scattered in several archaeal and bacterial genomes, unassociated with CRISPR loci or other cas proteins. Rather, they are flanked by nearly identical inverted repeats and enclosed within direct repeats, suggesting that these genetic regions might be mobile elements ('casposons'). To investigate this possibility, we have characterized the in vitro activities of the putative Cas1 transposase ('casposase') from Aciduliprofundum boonei. The purified Cas1 casposase can integrate both short oligonucleotides with inverted repeat sequences and a 2.8 kb excised mini-casposon into target DNA. Casposon integration occurs without target specificity and generates 14-15 basepair target site duplications, consistent with those found in casposon host genomes. Thus, Cas1 casposases carry out similar biochemical reactions as the CRISPR Cas1-Cas2 complex but with opposite substrate specificities: casposases integrate specific sequences into random target sites, whereas CRISPR Cas1-Cas2 integrates essentially random sequences into a specific site in the CRISPR locus.
Project description:Virophages have the unique property of parasitizing giant viruses within unicellular hosts. Little is understood about how they form infectious virions in this tripartite interplay. We provide mechanistic insights into assembly and maturation of mavirus, a marine virophage, by combining structural and stability studies on capsomers, virus-like particles (VLPs), and native virions. We found that the mavirus protease processes the double jelly-roll (DJR) major capsid protein (MCP) at multiple C-terminal sites and that these sites are conserved among virophages. Mavirus MCP assembled in Escherichia coli in the absence and presence of penton protein, forming VLPs with defined size and shape. While quantifying VLPs in E. coli lysates, we found that full-length rather than processed MCP is the competent state for capsid assembly. Full-length MCP was thermally more labile than truncated MCP, and crystal structures of both states indicate that full-length MCP has an expanded DJR core. Thus, we propose that the MCP C-terminal domain serves as a scaffolding domain by adding strain on MCP to confer assembly competence. Mavirus protease processed MCP more efficiently after capsid assembly, which provides a regulation mechanism for timing capsid maturation. By analogy to Sputnik and adenovirus, we propose that MCP processing renders mavirus particles infection competent by loosening interactions between genome and capsid shell and destabilizing pentons for genome release into host cells. The high structural similarity of mavirus and Sputnik capsid proteins together with conservation of protease and MCP processing suggest that assembly and maturation mechanisms described here are universal for virophages.
Project description:The diverse viruses in the phylum <i>Nucleocytoviricota</i> (also known as NLCDVs, Nucleo-cytoplasmic Large DNA Viruses) typically possess large icosahedral virions. However, in several families of <i>Nucleocytoviricota</i>, the icosahedral capsid was replaced by irregular particle shapes, most notably, the amphora-like virions of pandoraviruses and pithoviruses, the largest known virus particles in the entire virosphere. Pandoraviruses appear to be the most highly derived viruses in this phylum because their evolution involved not only the change in the virion shape, but also, the actual loss of the gene encoding double-jelly roll major capsid protein (DJR MCP), the main building block of icosahedral capsids in this virus assemblage. Instead, pandoravirus virions are built of unrelated abundant proteins. Here we show that the second most abundant virion protein of pandoraviruses, major virion protein 2 (MVP2), evolved from an inactivated derivative of a bacterial glycoside hydrolase of the GH16 family. The ancestral form of MVP2 was apparently acquired early in the evolution of the <i>Nucleocytoviricota</i>, to become a minor virion protein. After a duplication in the common ancestor of pandoraviruses and molliviruses, one of the paralogs displaces DJR MCP in pandoraviruses, conceivably, opening the way for a major increase in the size of the virion and the genome. Exaptation of a carbohydrate-binding protein for the function of the MVP is a general trend in virus evolution and might underlie the transformation of the virion shape in other groups of the <i>Nucleocytoviricota</i> as well.
Project description:Diverse transposable elements are abundant in genomes of cellular organisms from all three domains of life. Although transposons are often regarded as junk DNA, a growing body of evidence indicates that they are behind some of the major evolutionary innovations. With the growth in the number and diversity of sequenced genomes, previously unnoticed mobile elements continue to be discovered.We describe a new superfamily of archaeal and bacterial mobile elements which we denote casposons because they encode Cas1 endonuclease, a key enzyme of the CRISPR-Cas adaptive immunity systems of archaea and bacteria. The casposons share several features with self-synthesizing eukaryotic DNA transposons of the Polinton/Maverick class, including terminal inverted repeats and genes for B family DNA polymerases. However, unlike any other known mobile elements, the casposons are predicted to rely on Cas1 for integration and excision, via a mechanism similar to the integration of new spacers into CRISPR loci. We identify three distinct families of casposons that differ in their gene repertoires and evolutionary provenance of the DNA polymerases. Deep branching of the casposon-encoded endonuclease in the Cas1 phylogeny suggests that casposons played a pivotal role in the emergence of CRISPR-Cas immunity.The casposons are a novel superfamily of mobile elements, the first family of putative self-synthesizing transposons discovered in prokaryotes. The likely contribution of capsosons to the evolution of CRISPR-Cas parallels the involvement of the RAG1 transposase in vertebrate immunoglobulin gene rearrangement, suggesting that recruitment of endonucleases from mobile elements as ready-made tools for genome manipulation is a general route of evolution of adaptive immunity.
Project description:The principal function of archaeal and bacterial CRISPR-Cas systems is antivirus adaptive immunity. However, recent genome analyses identified a variety of derived CRISPR-Cas variants at least some of which appear to perform different functions. Here, we describe a unique repertoire of CRISPR-Cas-related systems that we discovered by searching archaeal metagenome-assemble genomes of the Asgard superphylum. Several of these variants contain extremely diverged homologs of Cas1, the integrase involved in CRISPR adaptation as well as casposon transposition. Strikingly, the diversity of Cas1 in Asgard archaea alone is greater than that detected so far among the rest of archaea and bacteria. The Asgard CRISPR-Cas derivatives also encode distinct forms of Cas4, Cas5, and Cas7 proteins, and/or additional nucleases. Some of these systems are predicted to perform defense functions, but possibly not programmable ones, whereas others are likely to represent previously unknown mobile genetic elements.
Project description:Streptomyces phages WheeHeim and Forthebois are two novel members of the Tectiviridae family. These phages were isolated on cultures of the plant pathogen Streptomyces scabiei, known for its worldwide economic impact on potato crops. Transmission electron microscopy showed viral particles with double-layered icosahedral capsids, and frequent instances of protruding nanotubes harboring a collar-like structure. Mass-spectrometry confirmed the presence of lipids in the virion, and serial purification of colonies from turbid plaques and immunity testing revealed that both phages are temperate. Streptomyces phages WheeHeim and Forthebois have linear dsDNA chromosomes (18,266 bp and 18,251 bp long, respectively) with the characteristic two-segment architecture of the Tectiviridae. Both genomes encode homologs of the canonical tectiviral proteins (major capsid protein, packaging ATPase and DNA polymerase), as well as PRD1-type virion-associated transglycosylase and membrane DNA delivery proteins. Comparative genomics and phylogenetic analyses firmly establish that these two phages, together with Rhodococcus phage Toil, form a new genus within the Tectiviridae, which we have tentatively named Deltatectivirus. The identification of a cohesive clade of Actinobacteria-infecting tectiviruses with conserved genome structure but with scant sequence similarity to members of other tectiviral genera confirms that the Tectiviridae are an ancient lineage infecting a broad range of bacterial hosts.
Project description:RNA viruses in aquatic environments remain poorly studied. Here, we analysed the RNA virome from approximately 10?l water from Yangshan Deep-Water Harbour near the Yangtze River estuary in China and identified more than 4,500 distinct RNA viruses, doubling the previously known set of viruses. Phylogenomic analysis identified several major lineages, roughly, at the taxonomic ranks of class, order and family. The 719-member-strong Yangshan virus assemblage is the sister clade to the expansive class Alsuviricetes and consists of viruses with simple genomes that typically encode only RNA-dependent RNA polymerase (RdRP), capping enzyme and capsid protein. Several clades within the Yangshan assemblage independently evolved domain permutation in the RdRP. Another previously unknown clade shares ancestry with Potyviridae, the largest known plant virus family. The 'Aquatic picorna-like viruses/Marnaviridae' clade was greatly expanded, with more than 800 added viruses. Several RdRP-linked protein domains not previously detected in any RNA viruses were identified, such as the small ubiquitin-like modifier (SUMO) domain, phospholipase A2 and PrsW-family protease domain. Multiple viruses utilize alternative genetic codes implying protist (especially ciliate) hosts. The results reveal a vast RNA virome that includes many previously unknown groups. However, phylogenetic analysis of the RdRPs supports the previously established five-branch structure of the RNA virus evolutionary tree, with no additional phyla.
Project description:The Cas4 family endonuclease is a component of the adaptation module in many variants of CRISPR-Cas adaptive immunity systems. Unlike most of the other Cas proteins, Cas4 is often encoded outside CRISPR-cas loci (solo-Cas4) and is also found in mobile genetic elements (MGE-Cas4).As part of our ongoing investigation of CRISPR-Cas evolution, we explored the phylogenomics of the Cas4 family. About 90% of the archaeal genomes encode Cas4 compared to only about 20% of the bacterial genomes. Many archaea encode both the CRISPR-associated form (CAS-Cas4) and solo-Cas4, whereas in bacteria, this combination is extremely rare. The solo-cas4 genes are over-represented in environmental bacteria and archaea with small genomes that typically lack CRISPR-Cas, suggesting that Cas4 could perform uncharacterized defense or repair functions in these microbes. Phylogenomic analysis indicates that both the CRISPR-associated cas4 genes are often transferred horizontally but almost exclusively, as part of the adaptation module. The evolutionary integrity of the adaptation module sharply contrasts the rampant shuffling of CRISPR-cas modules whereby a given variant of the adaptation module can combine with virtually any effector module. The solo-cas4 genes evolve primarily via vertical inheritance and are subject only to occasional horizontal transfer. The selection pressure on cas4 genes does not substantially differ between CAS-Cas4 and solo-cas4, and is close to the genomic median. Thus, cas4 genes, similarly to cas1 and cas2, evolve similarly to 'regular' microbial genes involved in various cellular functions, showing no evidence of direct involvement in virus-host arms races. A notable feature of the Cas4 family evolution is the frequent recruitment of cas4 genes by various mobile genetic elements (MGE), particularly, archaeal viruses. The functions of Cas4 in these elements are unknown and potentially might involve anti-defense roles.Unlike most of the other Cas proteins, Cas4 family members are as often encoded by stand-alone genes as they are incorporated in CRISPR-Cas systems. In addition, cas4 genes were repeatedly recruited by MGE, perhaps, for anti-defense functions. Experimental characterization of the solo and MGE-encoded Cas4 nucleases is expected to reveal currently uncharacterized defense and anti-defense systems and their interactions with CRISPR-Cas systems.
Project description:Free-range cattle are common in the Northeast China area, which have close contact with farmers and may carry virus threatening to cattle and farmers.Using viral metagenomics we analyzed the virome in plasma samples collected from 80 cattle from the forested region of Northeast China.The virome of cattle plasma is composed of the viruses belonging to the families including Parvoviridae, Papillomaviridae, Picobirnaviridae, and divergent viral genomes showing sequence similarity to circular Rep-encoding single stranded (CRESS) DNA viruses. Five such CRESS-DNA genomes were full characterized, with Rep sequences related to circovirus and gemycircularvirus. Three bovine parvoviruses belonging to two different genera were also characterized.The virome in plasma samples of cattle from the forested region of Northeast China was revealed, which further characterized the diversity of viruses in cattle plasma.