Project description:The volume of proteomics and mass spectrometry data available in public repositories continues to grow at a rapid pace as more researchers embrace open science practices. Open access to the data behind scientific discoveries has become critical to validate published findings and develop new computational tools. Here, we present ppx, a Python package that provides easy, programmatic access to the data stored in ProteomeXchange repositories, such as PRIDE and MassIVE. The ppx package can be used as either a command line tool or a Python package to retrieve the files and metadata associated with a project when provided its identifier. To demonstrate how ppx enhances reproducible research, we used ppx within a Snakemake workflow to reanalyze a published data set with the open modification search tool ANN-SoLo and compared our reanalysis to the original results. We show that ppx readily integrates into workflows, and our reanalysis produced results consistent with the original analysis. We envision that ppx will be a valuable tool for creating reproducible analyses, providing tool developers easy access to data for development, testing, and benchmarking, and enabling the use of mass spectrometry data in data-intensive analyses. The ppx package is freely available and open source under the MIT license at https://github.com/wfondrie/ppx.
Project description:Endometrial cancer is the most common gynaecological malignancy in developed countries. Over 382,000 new cases were diagnosed worldwide in 2018, and its incidence and mortality are constantly rising due to longer life expectancy and life style factors including obesity. Two major improvements are needed in the management of patients with endometrial cancer, i.e., the development of non/minimally invasive tools for diagnostics and prognostics, which are currently missing. Diagnostic tools are needed to manage the increasing number of women at risk of developing the disease. Prognostic tools are necessary to stratify patients according to their risk of recurrence pre-preoperatively, to advise and plan the most appropriate treatment and avoid over/under-treatment. Biomarkers derived from proteomics and metabolomics, especially when derived from non/minimally-invasively collected body fluids, can serve to develop such prognostic and diagnostic tools, and the purpose of the present review is to explore the current research in this topic. We first provide a brief description of the technologies, the computational pipelines for data analyses and then we provide a systematic review of all published studies using proteomics and/or metabolomics for diagnostic and prognostic biomarker discovery in endometrial cancer. Finally, conclusions and recommendations for future studies are also given.
Project description:Genes are pleiotropic and getting a better knowledge of their function requires a comprehensive characterization of their mutants. Here, we generated multi-level data combining phenomic, proteomic and metabolomic acquisitions from plasma and liver tissues of two C57BL/6 N mouse models lacking the Lat (linker for activation of T cells) and the Mx2 (MX dynamin-like GTPase 2) genes, respectively. Our dataset consists of 9 assays (1 preclinical, 2 proteomics and 6 metabolomics) generated with a fully non-targeted and standardized approach. The data and processing code are publicly available in the ProMetIS R package to ensure accessibility, interoperability, and reusability. The dataset thus provides unique molecular information about the physiological role of the Lat and Mx2 genes. Furthermore, the protocols described herein can be easily extended to a larger number of individuals and tissues. Finally, this resource will be of great interest to develop new bioinformatic and biostatistic methods for multi-omics data integration.
Project description:The improvement of long-term transplant organ and patient survival remains a critical challenge following kidney transplantation. Proteomics and biochemical profiling (metabolomics) may allow for the detection of early changes in cell signal transduction regulation and biochemistry with high sensitivity and specificity. Hence, these analytical strategies hold the promise to detect and monitor disease processes and drug effects before histopathological and pathophysiological changes occur. In addition, they will identify enriched populations and enable individualized drug therapy. However, proteomics and metabolomics have not yet lived up to such high expectations. Renal transplant patients are highly complex, making it difficult to establish cause-effect relationships between surrogate markers and disease processes. Appropriate study design, adequate sample handling, storage and processing, quality and reproducibility of bioanalytical multi-analyte assays, data analysis and interpretation, mechanistic verification, and clinical qualification (=establishment of sensitivity and specificity in adequately powered prospective clinical trials) are important factors for the success of molecular marker discovery and development in renal transplantation. However, a newly developed and appropriately qualified molecular marker can only be successful if it is realistic that it can be implemented in a clinical setting. The development of combinatorial markers with supporting software tools is an attractive goal.
Project description:UnlabelledHigh-throughput omics datasets often contain technical replicates included to account for technical sources of noise in the measurement process. Although summarizing these replicate measurements by using robust averages may help to reduce the influence of noise on downstream data analysis, the information on the variance across the replicate measurements is lost in the averaging process and therefore typically disregarded in subsequent statistical analyses.We introduce RepExplore, a web-service dedicated to exploit the information captured in the technical replicate variance to provide more reliable and informative differential expression and abundance statistics for omics datasets. The software builds on previously published statistical methods, which have been applied successfully to biomedical omics data but are difficult to use without prior experience in programming or scripting. RepExplore facilitates the analysis by providing a fully automated data processing and interactive ranking tables, whisker plot, heat map and principal component analysis visualizations to interpret omics data and derived statistics.Availability and implementationFreely available at http://www.repexplore.tkContactenrico.glaab@uni.luSupplementary informationSupplementary data are available at Bioinformatics online.
Project description:Background: Metabolomics measurements are noisy, often characterized by a small sample size and missing entries. While data-driven methods have shown promise in terms of analyzing metabolomics data, e.g., revealing biomarkers of various phenotypes, metabolomics data analysis can significantly benefit from incorporating prior information about metabolic mechanisms. This paper introduces a novel data analysis approach to incorporate mechanistic models in metabolomics data analysis. Methods: We arranged time-resolved metabolomics measurements of plasma samples collected during a meal challenge test from the COPSAC2000 cohort as a third-order tensor: subjects by metabolites by time samples. Simulated challenge test data generated using a human whole-body metabolic model were also arranged as a third-order tensor: virtual subjects by metabolites by time samples. Real and simulated data sets were coupled in the metabolites mode and jointly analyzed using coupled tensor factorizations to reveal the underlying patterns. Results: Our experiments demonstrated that the joint analysis of simulated and real data had better performance in terms of pattern discovery, achieving higher correlations with a BMI (body mass index)-related phenotype compared to the analysis of only real data in males, while in females, the performance was comparable. We also demonstrated the advantages of such a joint analysis approach in the presence of incomplete measurements and its limitations in the presence of wrong prior information. Conclusions: The joint analysis of real measurements and simulated data (generated using a mechanistic model) through coupled tensor factorizations guides real data analysis with prior information encapsulated in mechanistic models and reveals interpretable patterns.
Project description:High-need, high-cost (HNHC) patients-usually defined as those who account for the top 5% of annual healthcare costs-use as much as half of the total healthcare costs. Accurately predicting future HNHC patients and designing targeted interventions for them has the potential to effectively control rapidly growing healthcare expenditures. To achieve this goal, we used a nationally representative random sample of the working-age population who underwent a screening program in Japan in 2013-2016, and developed five machine-learning-based prediction models for HNHC patients in the subsequent year. Predictors include demographics, blood pressure, laboratory tests (e.g., HbA1c, LDL-C, and AST), survey responses (e.g., smoking status, medications, and past medical history), and annual healthcare cost in the prior year. Our prediction models for HNHC patients combining clinical data from the national screening program with claims data showed a c-statistics of 0.84 (95%CI, 0.83-0.86), and overperformed traditional prediction models relying only on claims data.
Project description:Glioblastoma (GB) is a primary malignancy of the central nervous system that is classified by the WHO as a grade IV astrocytoma. Despite decades of research, several aspects about the biology of GB are still unclear. Its pathogenesis and resistance mechanisms are poorly understood, and methods to optimize patient diagnosis and prognosis remain a bottle neck owing to the heterogeneity of the malignancy. The field of omics has recently gained traction, as it can aid in understanding the dynamic spatiotemporal regulatory network of enzymes and metabolites that allows cancer cells to adjust to their surroundings to promote tumor development. In combination with other omics techniques, proteomic and metabolomic investigations, which are a potent means for examining a variety of metabolic enzymes as well as intermediate metabolites, might offer crucial information in this area. Therefore, this review intends to stress the major contribution these tools have made in GB clinical and preclinical research and highlights the crucial impacts made by the integrative "omics" approach in reducing some of the therapeutic challenges associated with GB research and treatment. Thus, our study can purvey the use of these powerful tools in research by serving as a hub that particularly summarizes studies employing metabolomics and proteomics in the realm of GB diagnosis, treatment, and prognosis.
Project description:The Genetic Association Information Network (GAIN) Data Access Committee was established in June 2007 to provide prompt and fair access to data from six genome-wide association studies through the database of Genotypes and Phenotypes (dbGaP). Of 945 project requests received through 2011, 749 (79%) have been approved; median receipt-to-approval time decreased from 14 days in 2007 to 8 days in 2011. Over half (54%) of the proposed research uses were for GAIN-specific phenotypes; other uses were for method development (26%) and adding controls to other studies (17%). Eight data-management incidents, defined as compromises of any of the data-use conditions, occurred among nine approved users; most were procedural violations, and none violated participant confidentiality. Over 5 years of experience with GAIN data access has demonstrated substantial use of GAIN data by investigators from academic, nonprofit, and for-profit institutions with relatively few and contained policy violations. The availability of GAIN data has allowed for advances in both the understanding of the genetic underpinnings of mental-health disorders, diabetes, and psoriasis and the development and refinement of statistical methods for identifying genetic and environmental factors related to complex common diseases.