Project description:Oligonucleotide signatures, especially tetranucleotide signatures, have been used as method for homology binning by exploiting an organism's inherent biases towards the use of specific oligonucleotide words. Tetranucleotide signatures have been especially useful in environmental metagenomics samples as many of these samples contain organisms from poorly classified phyla which cannot be easily identified using traditional homology methods, including NCBI BLAST. This study examines oligonucleotide signatures across 1,424 completed genomes from across the tree of life, substantially expanding upon previous work. A comprehensive analysis of mononucleotide through nonanucleotide word lengths suggests that longer word lengths substantially improve the classification of DNA fragments across a range of sizes of relevance to high throughput sequencing. We find that, at present, heptanucleotide signatures represent an optimal balance between prediction accuracy and computational time for resolving taxonomy using both genomic and metagenomic fragments. We directly compare the ability of tetranucleotide and heptanucleotide world lengths (tetranucleotide signatures are the current standard for oligonucleotide word usage analyses) for taxonomic binning of metagenome reads. We present evidence that heptanucleotide word lengths consistently provide more taxonomic resolving power, particularly in distinguishing between closely related organisms that are often present in metagenomic samples. This implies that longer oligonucleotide word lengths should replace tetranucleotide signatures for most analyses. Finally, we show that the application of longer word lengths to metagenomic datasets leads to more accurate taxonomic binning of DNA scaffolds and have the potential to substantially improve taxonomic assignment and assembly of metagenomic data.
Project description:BackgroundGenome assembly of viruses with high mutation rates, such as Norovirus and other RNA viruses, or from metagenome samples, poses a challenge for the scientific community due to the coexistence of several viral quasispecies and strains. Furthermore, there is no standard method for obtaining whole-genome sequences in non-related patients. After polyA RNA isolation and sequencing in eight patients with acute gastroenteritis, we evaluated two de Bruijn graph assemblers (SPAdes and MEGAHIT), combined with four different and common pre-assembly strategies, and compared those yielding whole genome Norovirus contigs.ResultsReference-genome guided strategies with both host and target virus did not present any advantages compared to the assembly of non-filtered data in the case of SPAdes, and in the case of MEGAHIT, only host genome filtering presented improvements. MEGAHIT performed better than SPAdes in most samples, reaching complete genome sequences in most of them for all the strategies employed. Read binning with CD-HIT improved assembly when paired with different analysis strategies, and more notably in the case of SPAdes.ConclusionsNot all metagenome assemblies are equal and the choice in the workflow depends on the species studied and the prior steps to analysis. We may need different approaches even for samples treated equally due to the presence of high intra host variability. We tested and compared different workflows for the accurate assembly of Norovirus genomes and established their assembly capacities for this purpose.
Project description:Crude oil-polluted sites are a global threat, raising the demand for remediation worldwide. Here, we investigated a crude oil metagenome from a former borehole in Wietze, Germany, and reconstructed 42 metagenome-assembled genomes, many of which contained genes involved in crude oil degradation with a high potential for bioremediation purposes.
Project description:The rhizosphere microbiome plays an essential role in enhancing the growth of plants, raising the need for comprehension of their metabolic abilities. Here, we investigated rhizospheric and bulk soils of maize plants in Mafikeng, South Africa. Metagenome-assembled genomes containing plant growth-promoting genes were reconstructed.
Project description:BackgroundThe dispensable genome of a species, consisting of the dispensable sequences present only in a subset of individuals, is believed to play important roles in phenotypic variation and genome evolution. However, construction of the dispensable genome is costly and labor-intensive at present, and so the influence of the dispensable genome in genetic and functional genomic studies has not been fully explored.ResultsWe construct the dispensable genome of rice through a metagenome-like de novo assembly strategy based on low-coverage (1-3×) sequencing data of 1483 cultivated rice (Oryza sativa L.) accessions. Thousands of protein-coding genes are successfully assembled, including most of the known agronomically important genes absent from the Nipponbare rice reference genome. We develop an integration approach based on alignment and linkage disequilibrium, which is able to assign genomic positions relative to the reference genome for more than 78.2 % of the dispensable sequences. We carry out association mapping studies for rice grain width and 840 metabolic traits using 0.46 million polymorphisms between the dispensable sequences of different rice accessions. About 23.5 % of metabolic traits have more significant association signals with polymorphisms from dispensable sequences than with SNPs from the reference genome, and 41.6 % of trait-associated SNPs have concordant genomic locations with associated dispensable sequences.ConclusionsOur results suggest the feasibility of building a species' dispensable genome using low-coverage population sequencing data. The constructed sequences will be helpful for understanding the rice dispensable genome and are complementary to the reference genome for identifying candidate genes associated with phenotypic variation.
Project description:BackgroundMicrobes and their viruses are hidden engines driving Earth's ecosystems from the oceans and soils to humans and bioreactors. Though gene marker approaches can now be complemented by genome-resolved studies of inter-(macrodiversity) and intra-(microdiversity) population variation, analytical tools to do so remain scattered or under-developed.ResultsHere, we introduce MetaPop, an open-source bioinformatic pipeline that provides a single interface to analyze and visualize microbial and viral community metagenomes at both the macro- and microdiversity levels. Macrodiversity estimates include population abundances and α- and β-diversity. Microdiversity calculations include identification of single nucleotide polymorphisms, novel codon-constrained linkage of SNPs, nucleotide diversity (π and θ), and selective pressures (pN/pS and Tajima's D) within and fixation indices (FST) between populations. MetaPop will also identify genes with distinct codon usage. Following rigorous validation, we applied MetaPop to the gut viromes of autistic children that underwent fecal microbiota transfers and their neurotypical peers. The macrodiversity results confirmed our prior findings for viral populations (microbial shotgun metagenomes were not available) that diversity did not significantly differ between autistic and neurotypical children. However, by also quantifying microdiversity, MetaPop revealed lower average viral nucleotide diversity (π) in autistic children. Analysis of the percentage of genomes detected under positive selection was also lower among autistic children, suggesting that higher viral π in neurotypical children may be beneficial because it allows populations to better "bet hedge" in changing environments. Further, comparisons of microdiversity pre- and post-FMT in autistic children revealed that the delivery FMT method (oral versus rectal) may influence viral activity and engraftment of microdiverse viral populations, with children who received their FMT rectally having higher microdiversity post-FMT. Overall, these results show that analyses at the macro level alone can miss important biological differences.ConclusionsThese findings suggest that standardized population and genetic variation analyses will be invaluable for maximizing biological inference, and MetaPop provides a convenient tool package to explore the dual impact of macro- and microdiversity across microbial communities. Video abstract.
Project description:A recently published article in BMCGenomics by Fuentes-Trillo et al. contains a comparison of assembly approaches of several noroviral samples via different tools and preprocessing strategies. It turned out that the study used outdated versions of tools as well as tools that were not designed for the viral assembly task. In order to improve the suboptimal assemblies, authors suggested different sophisticated preprocessing strategies that seem to make only minor contributions to the results. We have reproduced the analysis using state-of-the-art tools designed for viral assembly, and we demonstrate that tools from the SPAdes toolkit (rnaviralSPAdes and coronaSPAdes) allow one to assemble the samples from the original study into a single contig without any additional preprocessing.
Project description:A plethora of hot springs are found at the Los Azufres volcanic complex in Mexico, and studies are needed to determine their microbial genomic diversity. Here, we report a metagenome of hot spring sediments and a metagenome-assembled genome of "Candidatus Aramenus sulfurataquae." This study reveals novel genomic sequences of Sulfolobales archaea.
Project description:De novo assembly of next generation metagenomic reads is widely used to provide taxonomic and functional information of genomes in a microbial community. As strains are functionally specific, recovery of strain-resolved genomes is important but still a challenge. Unitigs and assembly graphs are mid-products generated during the assembly of reads into contigs, and they provide higher resolution for sequences connection information. In this study, we propose a new approach UGMAGrefiner (a unitig level assembly graph-based metagenome-assembled Genome refiner), which uses the connection and coverage information from unitig level assembly graphs to recruit unbinned unitigs to MAGs, adjust binning result, and infer unitigs shared by multiple MAGs. In two simulated datasets (Simdata and CAMI data) and one real dataset (GD02), it outperforms two state-of-the-art assembly graph-based binning refine tools in the refinement of MAGs' quality by stably increasing the completeness of genomes. UGMAGrefiner can identify genome specific clusters of genomes with below 99% average nucleotide identity for homologous sequences. For MAGs mixed with 99% similarity genome clusters, it could distinguish 8 out of 9 genomes in Simdata and 8 out of 12 genomes in CAMI data. In GD02 data, it could identify 16 new unitig clusters representing genome specific regions of mixed genomes and 4 unitig clusters representing new genomes from total 135 MAGs for further functional analysis. UGMAGrefiner provides an efficient way to obtain more complete MAGs and study genome specific functions. It will be useful to improve taxonomic and functional information of genomes after de novo assembly.
Project description:Microtubules are essential cytoskeletal tracks for cargo transportation in axons and also serve as the primary structural scaffold of neurons. Structural assembly, stability, and dynamics of axonal microtubules are of great interest for understanding neuronal functions and pathologies. However, microtubules are so densely packed in axons that their separations are well below the diffraction limit of light, which precludes using optical microscopy for live-cell studies. Here, we present a single-molecule imaging method capable of resolving individual microtubules in live axons. In our method, unlabeled microtubules are revealed by following individual axonal cargos that travel along them. We resolved more than six microtubules in a 1 microm diameter axon by real-time tracking of endosomes containing quantum dots. Our live-cell study also provided direct evidence that endosomes switch between microtubules while traveling along axons, which has been proposed to be the primary means for axonal cargos to effectively navigate through the crowded axoplasmic environment.