Project description:Untargeted metabolomics studies are unbiased but identifying the same feature across studies is complicated by environmental variation, batch effects, and instrument variability. Ideally, several studies that assay the same set of metabolic features would be used to select recurring features to pursue for identification. Here, we developed an anchored experimental design. This generalizable approach enabled us to integrate three genetic studies consisting of 14 test strains of Caenorhabditis elegans prior to the compound identification process. An anchor strain, PD1074, was included in every sample collection, resulting in a large set of biological replicates of a genetically identical strain that anchored each study. This enables us to estimate treatment effects within each batch and apply straightforward meta-analytic approaches to combine treatment effects across batches without the need for estimation of batch effects and complex normalization strategies. We collected 104 test samples for three genetic studies across six batches to produce five analytical datasets from two complementary technologies commonly used in untargeted metabolomics. Here, we use the model system C. elegans to demonstrate that an augmented design combined with experimental blocks and other metabolomic QC approaches can be used to anchor studies and enable comparisons of stable spectral features across time without the need for compound identification. This approach is generalizable to systems where the same genotype can be assayed in multiple environments and provides biologically relevant features for downstream compound identification efforts. All methods are included in the newest release of the publicly available SECIMTools based on the open-source Galaxy platform.
Project description:Batch effects in omics data are notoriously common technical variations unrelated to study objectives, and may result in misleading outcomes if uncorrected, or hinder biomedical discovery if over-corrected. Assessing and mitigating batch effects is crucial for ensuring the reliability and reproducibility of omics data and minimizing the impact of technical variations on biological interpretation. In this review, we highlight the profound negative impact of batch effects and the urgent need to address this challenging problem in large-scale omics studies. We summarize potential sources of batch effects, current progress in evaluating and correcting them, and consortium efforts aiming to tackle them.
Project description:IntroductionLiquid chromatography-mass spectrometry (LC-MS) is a commonly used technique in untargeted metabolomics owing to broad coverage of metabolites, high sensitivity and simple sample preparation. However, data generated from multiple batches are affected by measurement errors inherent to alterations in signal intensity, drift in mass accuracy and retention times between samples both within and between batches. These measurement errors reduce repeatability and reproducibility and may thus decrease the power to detect biological responses and obscure interpretation.ObjectiveOur aim was to develop procedures to address and correct for within- and between-batch variability in processing multiple-batch untargeted LC-MS metabolomics data to increase their quality.MethodsAlgorithms were developed for: (i) alignment and merging of features that are systematically misaligned between batches, through aggregating feature presence/missingness on batch level and combining similar features orthogonally present between batches; and (ii) within-batch drift correction using a cluster-based approach that allows multiple drift patterns within batch. Furthermore, a heuristic criterion was developed for the feature-wise choice of reference-based or population-based between-batch normalisation.ResultsIn authentic data, between-batch alignment resulted in picking 15 % more features and deconvoluting 15 % of features previously erroneously aligned. Within-batch correction provided a decrease in median quality control feature coefficient of variation from 20.5 to 15.1 %. Algorithms are open source and available as an R package ('batchCorr').ConclusionsThe developed procedures provide unbiased measures of improved data quality, with implications for improved data analysis. Although developed for LC-MS based metabolomics, these methods are generic and can be applied to other data suffering from similar limitations.
Project description:Advancements in mass spectrometry-based proteomics have enabled experiments encompassing hundreds of samples. While these large sample sets deliver much-needed statistical power, handling them introduces technical variability known as batch effects. Here, we present a step-by-step protocol for the assessment, normalization, and batch correction of proteomic data. We review established methodologies from related fields and describe solutions specific to proteomic challenges, such as ion intensity drift and missing values in quantitative feature matrices. Finally, we compile a set of techniques that enable control of batch effect adjustment quality. We provide an R package, "proBatch", containing functions required for each step of the protocol. We demonstrate the utility of this methodology on five proteomic datasets each encompassing hundreds of samples and consisting of multiple experimental designs. In conclusion, we provide guidelines and tools to make the extraction of true biological signal from large proteomic studies more robust and transparent, ultimately facilitating reliable and reproducible research in clinical proteomics and systems biology.
Project description:BackgroundBatch effects are notoriously common technical variations in multiomics data and may result in misleading outcomes if uncorrected or over-corrected. A plethora of batch-effect correction algorithms are proposed to facilitate data integration. However, their respective advantages and limitations are not adequately assessed in terms of omics types, the performance metrics, and the application scenarios.ResultsAs part of the Quartet Project for quality control and data integration of multiomics profiling, we comprehensively assess the performance of seven batch effect correction algorithms based on different performance metrics of clinical relevance, i.e., the accuracy of identifying differentially expressed features, the robustness of predictive models, and the ability of accurately clustering cross-batch samples into their own donors. The ratio-based method, i.e., by scaling absolute feature values of study samples relative to those of concurrently profiled reference material(s), is found to be much more effective and broadly applicable than others, especially when batch effects are completely confounded with biological factors of study interests. We further provide practical guidelines for implementing the ratio based approach in increasingly large-scale multiomics studies.ConclusionsMultiomics measurements are prone to batch effects, which can be effectively corrected using ratio-based scaling of the multiomics data. Our study lays the foundation for eliminating batch effects at a ratio scale.
Project description:We have developed PRIDE, a primer design program that automatically designs primers in single contigs or whole sequencing projects to extend the already known sequence and to double strand single-stranded regions. The program is fully integrated into the Staden package (GAP4) and accessible with a graphical user interface. PRIDE uses a fuzzy logic-based system to calculate primer qualities. The computational performance of PRIDE is enhanced by using suffix trees to store the huge amount of data being produced. A test set of 110 sequencing primers and 11 PCR primer pairs has been designed on genomic templates, cDNAs and sequences containing repetitive elements to analyze PRIDE's success rate. The high performance of PRIDE, combined with its minimal requirement of user interaction and its fast algorithm, make this program useful for the large scale design of primers, especially in large sequencing projects.
Project description:Retinal artery occlusion (RAO) is a devastating condition with no effective treatment. The management of RAO could potentially be improved through an in-depth understanding of the molecular alterations in the condition. This study combined advanced proteomic techniques and an experimental model to uncover the retinal large-scale protein profile of RAO. In 13 pigs, RAO was induced with an argon laser and confirmed by fluorescein angiography. Left eyes serving as controls received a sham laser without inducing occlusion. Retinal samples were collected after one, three, or six days and analyzed with liquid chromatography-tandem mass spectrometry. In RAO, 36 proteins were differentially regulated on day one, 86 on day three, and 557 on day six. Upregulated proteins included clusterin, vitronectin, and vimentin, with several proteins increasing over time with a maximum on day six, including clusterin, vimentin, osteopontin, annexin-A, signal transducer, and the activator of transcription 3. On day six, RAO resulted in the upregulation of proteins involved in cellular response to stress, hemostasis, innate immune response, and cytokine signaling. Downregulated proteins were involved in transmission across chemical synapses and visual phototransduction. This study identified the upregulation of multiple inflammatory proteins in RAO and the downregulation of proteins involved in visual pathways.
Project description:Multiple reaction monitoring (MRM) is a powerful and popular technique used for metabolite quantification in targeted metabolomics. Accurate and consistent quantitation of metabolites from the MRM data is essential for subsequent analyses. Here, we developed an automated tool, MRMQuant, for targeted metabolomic quantitation using high-throughput liquid chromatography-tandem mass spectrometry MRM data to provide users with an easy-to-use tool for accurate MRM data quantitation with minimal human intervention. This tool has many user-friendly functions and features to inspect and correct the quantitation results as required. MRMQuant possesses the following features to ensure accurate quantitation: (1) dynamic signal smoothing, (2) automatic deconvolution of coeluted peaks, (3) absolute quantitation via standard curves and/or internal standards, (4) visualized inspection and correction, (5) corrections applicable to multiple samples, and (6) batch-effect correction.