Project description:Accompanying benchmarking sample for "TaxIt: An iterative computational pipeline for untargeted strain-level identification using MS/MS spectra from pathogenic single-organism samples": Untargeted accurate strain-level classification of a priori unidentified organisms using tandem mass spectrometry is a challenging task. Reference databases often lack taxonomic depth, limiting peptide assignments to the species level. However, the extension with detailed strain information increases runtime and decreases statistical power. In addition, larger databases contain a higher number of similar proteomes. We present TaxIt, an iterative workflow to address the increasing search space required for MS/MS-based strain-level classification of samples with unknown taxonomic origin. TaxIt first applies reference sequence data for initial identification of species candidates, followed by automated acquisition of relevant strain sequences for low level classification. Furthermore, proteome similarities resulting in ambiguous taxonomic assignments are addressed with an abundance weighting strategy to increase the confidence in candidate taxa. For benchmarking the performance of our method, we apply our iterative workflow on several samples of bacterial and viral origin. In comparison to non-iterative approaches using unique peptides or advanced abundance correction, TaxIt identifies microbial strains correctly in all examples presented (with one tie), thereby demonstrating the potential for untargeted and deeper taxonomic classification. TaxIt makes extensive use of public, unrestricted and continuously growing sequence resources such as the NCBI databases and is available under open-source BSD license at https://gitlab.com/rki_bioinformatics/TaxIt.
Project description:State-of-the-art algorithms for m6A detection and quantification via nanopore direct RNA sequencing have been continuously developed, little is known about their capacities and limitations, which makes a comprehensive assessment in urgent need. Therefore, we performed comprehensive benchmarking of 10 computational tools relying on current-based and base-calling “errors” strategies for m6A detection by nanopore sequencing.
Project description:Metaproteomics, the study of the collective proteome within a microbial ecosystem, has substantially grown over the past few years. This growth comes from the increased awareness that it can powerfully supplement metagenomics and metatranscriptomics analyses. Although metaproteomics is more challenging than single-species proteomics, its added value has already been demonstrated in various biosystems, such as gut microbiomes or biogas plants. Because of the many challenges, a variety of metaproteomics workflows have been developed, yet it remains unclear what the impact of the choice of workflow is on the obtained results. Therefore, we set out to compare several well-established workflows in the first community-driven, multi-lab comparison in metaproteomics: the critical assessment of metaproteome investigation (CAMPI) study. In this benchmarking study, we evaluated the influence of different workflows on sample preparation, mass spectrometry acquisition, and bioinformatic analysis on two samples: a simplified, lab-assembled human intestinal sample and a complex human fecal sample. We find that the same overall biological meaning can be inferred from the metaproteome data, regardless of the chosen workflow. Indeed, taxonomic and functional annotations were very similar across all sample-specific data sets. Moreover, this outcome was consistent regardless of whether protein groups or peptides, or differences at the spectrum or peptide level were used to infer these annotations. Where differences were observed, those originated primarily from different wet-lab methods rather than from different bioinformatic pipelines. The CAMPI study thus provides a solid foundation for benchmarking metaproteomics workflows, and will therefore be a key reference for future method improvement.
2022-02-16 | PXD023217 | Pride
Project description:Benchmarking sequencing methods and tools that facilitate the study of alternative polyadenylation
Project description:Metaproteomics, the study of the collective proteome within a microbial ecosystem, has substantially grown over the past few years. This growth comes from the increased awareness that it can powerfully supplement metagenomics and metatranscriptomics analyses. Although metaproteomics is more challenging than single-species proteomics, its added value has already been demonstrated in various biosystems, such as gut microbiomes or biogas plants. Because of the many challenges, a variety of metaproteomics workflows have been developed, yet it remains unclear what the impact of the choice of workflow is on the obtained results. Therefore, we set out to compare several well-established workflows in the first community-driven, multi-lab comparison in metaproteomics: the critical assessment of metaproteome investigation (CAMPI) study. In this benchmarking study, we evaluated the influence of different workflows on sample preparation, mass spectrometry acquisition, and bioinformatic analysis on two samples: a simplified, lab-assembled human intestinal sample and a complex human fecal sample. We find that the same overall biological meaning can be inferred from the metaproteome data, regardless of the chosen workflow. Indeed, taxonomic and functional annotations were very similar across all sample-specific data sets. Moreover, this outcome was consistent regardless of whether protein groups or peptides, or differences at the spectrum or peptide level were used to infer these annotations. Where differences were observed, those originated primarily from different wet-lab methods rather than from different bioinformatic pipelines. The CAMPI study thus provides a solid foundation for benchmarking metaproteomics workflows, and will therefore be a key reference for future method improvement.
Project description:Understanding microbial community diversity is thought to be crucial for improving process functioning and stabilities of wastewater treatment systems. However, current studies largely focus on taxonomic groups based on 16S rRNA, which are not necessarily linked to functioning, or a few selected functional genes. Here we launched a study to profile the overall functional genes of microbial communities in three full-scale wastewater treatment systems. Triplicate activated sludge samples from each system were analyzed using a high-throughput metagenomics tool named GeoChip 4.2, resulting in the detection of 38,507 to 40,647 functional genes. A high similarity of 75.5% to 79.7% shared genes was noted among the nine samples. Moreover, correlation analyses showed that the abundances of a wide array of functional genes were associated with system performances. For example, the abundances of overall nitrogen cycling genes had a strong correlation to total nitrogen (TN) removal rates (r = 0.7647, P < 0.01). The abundances of overall carbon cycling genes were moderately correlated with COD removal rates (r = 0.6515, P < 0.01). Lastly, we found that influent chemical oxygen demand (COD inf) and total phosphorus concentrations (TP inf), and dissolved oxygen (DO) concentrations were key environmental factors shaping the overall functional genes. Together, the results revealed vast functional gene diversity and some links between the functional gene compositions and microbe-mediated processes.