MetaMicrobesOnline: phylogenomic analysis of microbial communities.
ABSTRACT: The metaMicrobesOnline database (freely available at http://meta.MicrobesOnline.org) offers phylogenetic analysis of genes from microbial genomes and metagenomes. Gene trees are constructed for canonical gene families such as COG and Pfam. Such gene trees allow for rapid homologue analysis and subfamily comparison of genes from multiple metagenomes and comparisons with genes from microbial isolates. Additionally, the genome browser permits genome context comparisons, which may be used to determine the closest sequenced genome or suggest functionally associated genes. Lastly, the domain browser permits rapid comparison of protein domain organization within genes of interest from metagenomes and complete microbial genomes.
Project description:With an increasing number of vertebrate genomes being sequenced in draft or finished form, unique opportunities for decoding the language of DNA sequence through comparative genome alignments have arisen. However, novel tools and strategies are required to accommodate this large volume of genomic information and to facilitate the transfer of predictions generated by comparative sequence alignment to researchers focused on experimental annotation of genome function. Here, we present the ECR Browser, a tool that provides easy and dynamic access to whole genome alignments of human, mouse, rat and fish sequences. This web-based tool (http://ecrbrowser.dcode.org) provides the starting point for discovery of novel genes, identification of distant gene regulatory elements and prediction of transcription factor binding sites. The genome alignment portal of the ECR Browser also permits fast and automated alignments of any user-submitted sequence to the genome of choice. The interconnection of the ECR Browser with other DNA sequence analysis tools creates a unique portal for studying and exploring vertebrate genomes.
Project description:Antibiotic resistance in pathogens is extensively studied, and yet little is known about how antibiotic resistance genes of typical gut bacteria influence microbiome dynamics. Here, we leveraged genomes from metagenomes to investigate how genes of the premature infant gut resistome correspond to the ability of bacteria to survive under certain environmental and clinical conditions. We found that formula feeding impacts the resistome. Random forest models corroborated by statistical tests revealed that the gut resistome of formula-fed infants is enriched in class D beta-lactamase genes. Interestingly, Clostridium difficile strains harboring this gene are at higher abundance in formula-fed infants than C. difficile strains lacking this gene. Organisms with genes for major facilitator superfamily drug efflux pumps have higher replication rates under all conditions, even in the absence of antibiotic therapy. Using a machine learning approach, we identified genes that are predictive of an organism's direction of change in relative abundance after administration of vancomycin and cephalosporin antibiotics. The most accurate results were obtained by reducing annotated genomic data to five principal components classified by boosted decision trees. Among the genes involved in predicting whether an organism increased in relative abundance after treatment are those that encode subclass B2 beta-lactamases and transcriptional regulators of vancomycin resistance. This demonstrates that machine learning applied to genome-resolved metagenomics data can identify key genes for survival after antibiotics treatment and predict how organisms in the gut microbiome will respond to antibiotic administration. IMPORTANCE The process of reconstructing genomes from environmental sequence data (genome-resolved metagenomics) allows unique insight into microbial systems. We apply this technique to investigate how the antibiotic resistance genes of bacteria affect their ability to flourish in the gut under various conditions. Our analysis reveals that strain-level selection in formula-fed infants drives enrichment of beta-lactamase genes in the gut resistome. Using genomes from metagenomes, we built a machine learning model to predict how organisms in the gut microbial community respond to perturbation by antibiotics. This may eventually have clinical applications.
Project description:Long-read Oxford Nanopore sequencing has democratized microbial genome sequencing and enables the recovery of highly contiguous microbial genomes from isolates or metagenomes. However, to obtain near-finished genomes it has been necessary to include short-read polishing to correct insertions and deletions derived from homopolymer regions. Here, we show that Oxford Nanopore R10.4 can be used to generate near-finished microbial genomes from isolates or metagenomes without short-read or reference polishing.
Project description:Metagenomic studies characterize both the composition and diversity of uncultured viral and microbial communities. BLAST-based comparisons have typically been used for such analyses; however, sampling biases, high percentages of unknown sequences, and the use of arbitrary thresholds to find significant similarities can decrease the accuracy and validity of estimates. Here, we present Genome relative Abundance and Average Size (GAAS), a complete software package that provides improved estimates of community composition and average genome length for metagenomes in both textual and graphical formats. GAAS implements a novel methodology to control for sampling bias via length normalization, to adjust for multiple BLAST similarities by similarity weighting, and to select significant similarities using relative alignment lengths. In benchmark tests, the GAAS method was robust to both high percentages of unknown sequences and to variations in metagenomic sequence read lengths. Re-analysis of the Sargasso Sea virome using GAAS indicated that standard methodologies for metagenomic analysis may dramatically underestimate the abundance and importance of organisms with small genomes in environmental systems. Using GAAS, we conducted a meta-analysis of microbial and viral average genome lengths in over 150 metagenomes from four biomes to determine whether genome lengths vary consistently between and within biomes, and between microbial and viral communities from the same environment. Significant differences between biomes and within aquatic sub-biomes (oceans, hypersaline systems, freshwater, and microbialites) suggested that average genome length is a fundamental property of environments driven by factors at the sub-biome level. The behavior of paired viral and microbial metagenomes from the same environment indicated that microbial and viral average genome sizes are independent of each other, but indicative of community responses to stressors and environmental conditions.
Project description:Shotgun metagenome sequencing has become a fast, cheap and high-throughput technology for characterizing microbial communities in complex environments and human body sites. However, accurate identification of microorganisms at the strain/species level remains extremely challenging. We present a novel k-mer-based approach, termed GSMer, that identifies genome-specific markers (GSMs) from currently sequenced microbial genomes, which were then used for strain/species-level identification in metagenomes. Using 5390 sequenced microbial genomes, 8 770 321 50-mer strain-specific and 11 736 360 species-specific GSMs were identified for 4088 strains and 2005 species (4933 strains), respectively. The GSMs were first evaluated against mock community metagenomes, recently sequenced genomes and real metagenomes from different body sites, suggesting that the identified GSMs were specific to their targeting genomes. Sensitivity evaluation against synthetic metagenomes with different coverage suggested that 50 GSMs per strain were sufficient to identify most microbial strains with ?0.25× coverage, and 10% of selected GSMs in a database should be detected for confident positive callings. Application of GSMs identified 45 and 74 microbial strains/species significantly associated with type 2 diabetes patients and obese/lean individuals from corresponding gastrointestinal tract metagenomes, respectively. Our result agreed with previous studies but provided strain-level information. The approach can be directly applied to identify microbial strains/species from raw metagenomes, without the effort of complex data pre-processing.
Project description:BACKGROUND:Determining the evolutionary relationships among the major lineages of extant birds has been one of the biggest challenges in systematic biology. To address this challenge, we assembled or collected the genomes of 48 avian species spanning most orders of birds, including all Neognathae and two of the five Palaeognathae orders. We used these genomes to construct a genome-scale avian phylogenetic tree and perform comparative genomic analyses. FINDINGS:Here we present the datasets associated with the phylogenomic analyses, which include sequence alignment files consisting of nucleotides, amino acids, indels, and transposable elements, as well as tree files containing gene trees and species trees. Inferring an accurate phylogeny required generating: 1) A well annotated data set across species based on genome synteny; 2) Alignments with unaligned or incorrectly overaligned sequences filtered out; and 3) Diverse data sets, including genes and their inferred trees, indels, and transposable elements. Our total evidence nucleotide tree (TENT) data set (consisting of exons, introns, and UCEs) gave what we consider our most reliable species tree when using the concatenation-based ExaML algorithm or when using statistical binning with the coalescence-based MP-EST algorithm (which we refer to as MP-EST*). Other data sets, such as the coding sequence of some exons, revealed other properties of genome evolution, namely convergence. CONCLUSIONS:The Avian Phylogenomics Project is the largest vertebrate phylogenomics project to date that we are aware of. The sequence, alignment, and tree data are expected to accelerate analyses in phylogenomics and other related areas.
Project description:<h4>Background</h4>Members of the Roseobacter clade which play a key role in the biogeochemical cycles of the ocean are diverse and abundant, comprising 10-25% of the bacterioplankton in most marine surface waters. The rapid accumulation of whole-genome sequence data for the Roseobacter clade allows us to obtain a clearer picture of its evolution.<h4>Methodology/principal findings</h4>In this study about 1,200 likely orthologous protein families were identified from 17 Roseobacter bacteria genomes. Functional annotations for these genes are provided by iProClass. Phylogenetic trees were constructed for each gene using maximum likelihood (ML) and neighbor joining (NJ). Putative organismal phylogenetic trees were built with phylogenomic methods. These trees were compared and analyzed using principal coordinates analysis (PCoA), approximately unbiased (AU) and Shimodaira-Hasegawa (SH) tests. A core set of 694 genes with vertical descent signal that are resistant to horizontal gene transfer (HGT) is used to reconstruct a robust organismal phylogeny. In addition, we also discovered the most likely 109 HGT genes. The core set contains genes that encode ribosomal apparatus, ABC transporters and chaperones often found in the environmental metagenomic and metatranscriptomic data. These genes in the core set are spread out uniformly among the various functional classes and biological processes.<h4>Conclusions/significance</h4>Here we report a new multigene-derived phylogenetic tree of the Roseobacter clade. Of particular interest is the HGT of eleven genes involved in vitamin B12 synthesis as well as key enzynmes for dimethylsulfoniopropionate (DMSP) degradation. These aquired genes are essential for the growth of Roseobacters and their eukaryotic partners.
Project description:BACKGROUND:Isoprene is the most abundantly produced biogenic volatile organic compound (BVOC) on Earth, with annual global emissions almost equal to those of methane. Despite its importance in atmospheric chemistry and climate, little is known about the biological degradation of isoprene in the environment. The largest source of isoprene is terrestrial plants, and oil palms, the cultivation of which is expanding rapidly, are among the highest isoprene-producing trees. RESULTS:DNA stable isotope probing (DNA-SIP) to study the microbial isoprene-degrading community associated with oil palm trees revealed novel genera of isoprene-utilising bacteria including Novosphingobium, Pelomonas, Rhodoblastus, Sphingomonas and Zoogloea in both oil palm soils and on leaves. Amplicon sequencing of isoA genes, which encode the ?-subunit of the isoprene monooxygenase (IsoMO), a key enzyme in isoprene metabolism, confirmed that oil palm trees harbour a novel diversity of isoA sequences. In addition, metagenome-assembled genomes (MAGs) were reconstructed from oil palm soil and leaf metagenomes and putative isoprene degradation genes were identified. Analysis of unenriched metagenomes showed that isoA-containing bacteria are more abundant in soils than in the oil palm phyllosphere. CONCLUSION:This study greatly expands the known diversity of bacteria that can metabolise isoprene and contributes to a better understanding of the biological degradation of this important but neglected climate-active gas. Video abstract.
Project description:AlkB and CYP153 are important alkane hydroxylases responsible for aerobic alkane degradation in bioremediation of oil-polluted environments and microbial enhanced oil recovery. Since their distribution in nature is not clear, we made the investigation among thus-far sequenced 3,979 microbial genomes and 137 metagenomes from terrestrial, freshwater, and marine environments. Hundreds of diverse alkB and CYP153 genes including many novel ones were found in bacterial genomes, whereas none were found in archaeal genomes. Moreover, these genes were detected with different distributional patterns in the terrestrial, freshwater, and marine metagenomes. Hints for horizontal gene transfer, gene duplication, and gene fusion were found, which together are likely responsible for diversifying the alkB and CYP153 genes adapt to the ubiquitous distribution of different alkanes in nature. In addition, different distributions of these genes between bacterial genomes and metagenomes suggested the potentially important roles of unknown or less common alkane degraders in nature.
Project description:Public health authorities whole-genome sequence thousands of isolates each month for microbial diagnostics and surveillance of pathogenic bacteria. The computational methods have not kept up with the deluge of data and the need for real-time results. We have therefore created a bioinformatics pipeline for rapid subtyping and continuous phylogenomic analysis of bacterial samples, suited for large-scale surveillance. The data is divided into sets by mapping to reference genomes, then consensus sequences are generated. Nucleotide based genetic distance is calculated between the sequences in each set, and isolates are clustered together at 10 single-nucleotide polymorphisms. Phylogenetic trees are inferred from the non-redundant sequences and the clustered isolates are added back. The method is accurate at grouping outbreak strains together, while discriminating them from non-outbreak strains. The pipeline is applied in Evergreen Online, which processes publicly available sequencing data from foodborne bacterial pathogens on a daily basis, updating phylogenetic trees as needed.