Project description:Metagenome data from soil samples were collected at 0 to 10cm deep from 2 avocado orchards in Channybearup, Western Australia, in 2024. Amplicon sequence variant (ASV) tables were constructed based on the DADA2 pipeline with default parameters.
Project description:Whole-genome sequencing on PacBio of laboratory mouse strains. See http://www.sanger.ac.uk/resources/mouse/genomes/ for more details. This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/
Project description:Recovering high-quality metagenome-assembled genomes (MAGs) from complex microbial ecosystems remains challenging. Recently, high-throughput chromosome conformation capture (Hi-C) has been applied to simultaneously study multiple genomes in natural microbial communities. We develop HiCBin, a novel open-source pipeline, to resolve high-quality MAGs utilizing Hi-C contact maps. HiCBin employs the HiCzin normalization method and the Leiden clustering algorithm and includes the spurious contact detection into binning pipelines for the first time. HiCBin is validated on one synthetic and two real metagenomic samples and is shown to outperform the existing Hi-C-based binning methods. HiCBin is available at https://github.com/dyxstat/HiCBin .
Project description:BackgroundSo far, a lot of binning approaches have been intensively developed for untangling metagenome-assembled genomes (MAGs) and evaluated by two main strategies. The strategy by comparison to known genomes prevails over the other strategy by using single-copy genes. However, there is still no dataset with all known genomes for a real (not simulated) bacterial consortium yet.ResultsHere, we continue investigating the real bacterial consortium F1RT enriched and sequenced by us previously, considering the high possibility to unearth all MAGs, due to its low complexity. The improved F1RT metagenome reassembled by metaSPAdes here utilizes about 98.62% of reads, and a series of analyses for the remaining reads suggests that the possibility of containing other low-abundance organisms in F1RT is greatly low, demonstrating that almost all MAGs are successfully assembled. Then, 4 isolates are obtained and individually sequenced. Based on the 4 isolate genomes and the entire metagenome, an elaborate pipeline is then in-house developed to construct all F1RT MAGs. A series of assessments extensively prove the high reliability of the herein reconstruction. Next, our findings further show that this dataset harbors several properties challenging for binning and thus is suitable to compare advanced binning tools available now or benchmark novel binners. Using this dataset, 8 advanced binning algorithms are assessed, giving useful insights for developing novel approaches. In addition, compared with our previous study, two novel MAGs termed FC8 and FC9 are discovered here, and 7 MAGs are solidly unearthed for species without any available genomes.ConclusionTo our knowledge, it is the first time to construct a dataset with almost all known MAGs for a not simulated consortium. We hope that this dataset will be used as a routine toolkit to complement mock datasets for evaluating binning methods to further facilitate binning and metagenomic studies in the future.
Project description:Comparing metagenomic samples is a critical step in understanding the relationships among microbial communities. Recently, next-generation sequencing (NGS) technologies have produced a massive amount of short reads data for microbial communities from different environments. The assembly of these short reads can, however, be time-consuming and challenging. In addition, alignment-based methods for metagenome comparison are limited by incomplete genome and/or pathway databases. In contrast, alignment-free methods for metagenome comparison do not depend on the completeness of genome or pathway databases. Still, the existing alignment-free methods, d2S and d2* , which model k-tuple patterns using only one Markov chain for each sample, neglect the heterogeneity within metagenomic data wherein potentially thousands of types of microorganisms are sequenced. To address this imperfection in d2S and d2* , we organized NGS sequences into different reads bins and constructed several corresponding Markov models. Next, we modified the definition of our previous alignment-free methods, d2S and d2* , to make them more compatible with a scheme of analysis which uses the proposed reads bins. We then used two simulated and three real metagenomic datasets to test the effect of the k-tuple size and Markov orders of background sequences on the performance of these de novo alignment-free methods. For dependable comparison of metagenomic samples, our newly developed alignment-free methods with reads binning outperformed alignment-free methods without reads binning in detecting the relationship among microbial communities, including whether they form groups or change according to some environmental gradients.
Project description:To investigate which cellular functions may be perturbed along the branches of a synthetic evolutionary tree obtained by incremental deletions of large genomic regions, we subjected six Bacillus subtilis strains to transcriptome profiling. These six strains are : MS (~3.98 Mbp), which is already a genome-reduced derivative of the B. subtilis 168 (~4.22 Mbp) and the root of our evolutionary tree; MGP254 (~2.73 Mbp), the farthest genome-reduced strain; MGP234 (~2.81 Mbp), another terminal leaf in our tree; MGP181 (~2.87 Mb) and MGP192 (~2.85 Mbp), two intermediate strains in the ancestor lineage common to MGP254 and MGP234; and finally MGP229 (2.82 Mbp), an intermedidate strain between MGP192 and MGP254 (i.e. an ancestor of MGP254 but not MGP234). The vast majority of genes conserved in the six strains displayed no differential expression, showing the robustness of the cell transcriptional network against massive genome reduction. Among deregulated genes, more than half could be explained by loss of known functions and aberrant transcription at deletion boundaries. An unexpected common feature in genome-reduced strains was the upregulation of genes involved in cell responses to oxidative stresses.
Project description:MotivationMetagenomic contig binning is an important computational problem in metagenomic research, which aims to cluster contigs from the same genome into the same group. Unlike classical clustering problem, contig binning can utilize known relationships among some of the contigs or the taxonomic identity of some contigs. However, the current state-of-the-art contig binning methods do not make full use of the additional biological information except the coverage and sequence composition of the contigs.ResultsWe developed a novel contig binning method, Semi-supervised Spectral Normalized Cut for Binning (SolidBin), based on semi-supervised spectral clustering. Using sequence feature similarity and/or additional biological information, such as the reliable taxonomy assignments of some contigs, SolidBin constructs two types of prior information: must-link and cannot-link constraints. Must-link constraints mean that the pair of contigs should be clustered into the same group, while cannot-link constraints mean that the pair of contigs should be clustered in different groups. These constraints are then integrated into a classical spectral clustering approach, normalized cut, for improved contig binning. The performance of SolidBin is compared with five state-of-the-art genome binners, CONCOCT, COCACOLA, MaxBin, MetaBAT and BMC3C on five next-generation sequencing benchmark datasets including simulated multi- and single-sample datasets and real multi-sample datasets. The experimental results show that, SolidBin has achieved the best performance in terms of F-score, Adjusted Rand Index and Normalized Mutual Information, especially while using the real datasets and the single-sample dataset.Availability and implementationhttps://github.com/sufforest/SolidBin.Supplementary informationSupplementary data are available at Bioinformatics online.
Project description:BackgroundIn metagenomic studies, a process called binning is necessary to assign contigs that belong to multiple species to their respective phylogenetic groups. Most of the current methods of binning, such as BLAST, k-mer and PhyloPythia, involve assigning sequence fragments by comparing sequence similarity or sequence composition with already-sequenced genomes that are still far from comprehensive. We propose a semi-supervised seeding method for binning that does not depend on knowledge of completed genomes. Instead, it extracts the flanking sequences of highly conserved 16S rRNA from the metagenome and uses them as seeds (labels) to assign other reads based on their compositional similarity.ResultsThe proposed seeding method is implemented on an unsupervised Growing Self-Organising Map (GSOM), and called Seeded GSOM (S-GSOM). We compared it with four well-known semi-supervised learning methods in a preliminary test, separating random-length prokaryotic sequence fragments sampled from the NCBI genome database. We identified the flanking sequences of the highly conserved 16S rRNA as suitable seeds that could be used to group the sequence fragments according to their species. S-GSOM showed superior performance compared to the semi-supervised methods tested. Additionally, S-GSOM may also be used to visually identify some species that do not have seeds. The proposed method was then applied to simulated metagenomic datasets using two different confidence threshold settings and compared with PhyloPythia, k-mer and BLAST. At the reference taxonomic level Order, S-GSOM outperformed all k-mer and BLAST results and showed comparable results with PhyloPythia for each of the corresponding confidence settings, where S-GSOM performed better than PhyloPythia in the >/= 10 reads datasets and comparable in the > or = 8 kb benchmark tests.ConclusionIn the task of binning using semi-supervised learning methods, results indicate S-GSOM to be the best of the methods tested. Most importantly, the proposed method does not require knowledge from known genomes and uses only very few labels (one per species is sufficient in most cases), which are extracted from the metagenome itself. These advantages make it a very attractive binning method. S-GSOM outperformed the binning methods that depend on already-sequenced genomes, and compares well to the current most advanced binning method, PhyloPythia.
Project description:The genomes of three newly isolated Dehalococcoides strains (11a, 11a5 and MB) were compared against known genomes in the Dehalococcoides genus via a microarray targeting four sequenced Dehalococcoides strains (195, CBDB1, BAV1, and VS). All three strains exhibit different dechlorination patterns, with strains 11a dechlorinating TCE to ethene, 11a5 dechlorinating TCE to VC and MB dechlorinating PCE only to isomers of DCE. Hybridization of their respective genomic DNA to the microarrays showed that the genomes of strains 11a and 11a5 show great similarity to each other and to strains CBDB1 and BAV1 of the Pinellas subgroup, while strain MB shows strong genome similarity to members of the Cornell subgroup. All genes within the respective subgroups that were not detected by microarray are within the respective high plasticity regions or integrated elements of the sequenced strains. A large number of reductive dehalogenase (RDase)-encoding genes are present within each genome, and the presence of the vcrA and tceA genes in strains 11a and 11a5 respectively, and the absence of any of the four functionally-characterized chlorinated ethene RDases (pceA, tceA, vcrA, bvcA) within strain MB appear to dictate chlorinated ethene usages regardless of the respective core genome phylogeny of the three strains. Considering the current data set together with previous comparative genomics results from application of the Dehalococcoides genus microarray to two other un-sequenced strains, the observed incongruence between the core genome phylogeny and chlorinated ethene usage of Dehalococcoides strains is likely driven by horizontal gene transfer of functional RDases. The other genomic features that are repeatedly observed in the microarray analyses of all five un-sequenced Dehalococcoides strains as well as the environmental implications on this work are presented in this study. The genomic DNA (gDNA) of each culture was analyzed in triplicate. gDNA from the two newly isolated Dehalococcoides strains 11a and 11a5 were analyzed.