Project description:In the rapidly moving proteomics field, a diverse patchwork of algorithms for data normalization and differential expression analysis is used by the community. We generated an all-inclusive mass spectrometry downstream analysis pipeline (MS-DAP) that integrates many algorithms for normalization and statistical analyses and produces standardized quality reporting with extensive data visualizations. Second, systematic evaluation of normalization and statistical algorithms on various benchmarking datasets, including additional data generated in this study, suggest best-practices for data analysis. Commonly used approaches for differential testing based on moderated t-statistics are consistently outperformed by more recent statistical models, all integrated in MS-DAP, and we encourage their adoption. Third, we introduced a novel normalization algorithm that rescues deficiencies observed in commonly used normalization methods. Finally, we used the MS-DAP platform to re-analyze a recently published large-scale proteomics dataset of CSF from AD patients. This revealed increased sensitivity, resulting in additional significant target proteins which improved overlap with results reported in related studies and includes a large set of new potential AD biomarkers in addition to previously reported.
Project description:The technological advances in mass spectrometry allow us to collect more comprehensive data with higher quality and increasing speed. With the rapidly increasing amount of data generated, the need for streamlining analyses becomes more apparent. Proteomic data is known to be often affected by systemic bias from unknown sources, and failing to adequately normalize the data can lead to erroneous conclusions. To allow researchers to easily evaluate and compare different normalization methods via a user-friendly interface, we have developed “proteiNorm”. The current implementation of proteiNorm accommodates preliminary filter on peptide and sample level, followed by an evaluation of several popular normalization methods and visualization of missing value. The user then selects an adequate normalization method and one of several imputation methods used for the subsequent comparison of different differential abundance/expression methods and estimation of statistical power. The application of proteiNorm and interpretation of its results is demonstrated on a Tandem Mass Tag mass spectrometry example data set, where the proteome of three different breast cancer cell lines was profiled with and without hydroxyurea treatment. With proteiNorm, we provide a user-friendly tool to identify an adequate normalization method and to select an appropriate method for a differential abundance/expression analysis.
Project description:Motivation: Detection of changes in DNA-protein interactions from ChIP-seq data is a crucial step in unraveling the regulatory networks behind biological processes. The simplest variation of this problem is the differential peak calling problem. Here one has to find genomic regions with ChIP-seq signal changes between two cellular conditions in the interaction of a protein with DNA. The great majority of peak calling methods can only analyse one ChIP-seq signal at a time and are unable to perform differential peak calling. Recently, a few approaches based on the combination of these peak callers with statistical tests for detecting differential digital expression have been proposed. However, these methods fail to detect detailed changes of protein-DNA interactions. Results: We propose ODIN; an HMM-based approach to detect and analyse differential peaks in pairs of ChIP-seq data. ODIN performs genomic signal processing, peak calling and p-value calculation in an integrated framework. We also propose an evaluation methodology to compare ODIN with competing methods. The evaluation method is based on the association of differential peaks with expression changes in the same cellular conditions. Our empirical study based on several ChIP-seq experiments from transcription factors, histone modifications and simulated data shows that ODIN outperforms considered competing methods in most scenarios. H3K4me1 and PU.1 occupancy in MPP, CDP, cDC and pDC
Project description:Motivation: Detection of changes in DNA-protein interactions from ChIP-seq data is a crucial step in unraveling the regulatory networks behind biological processes. The simplest variation of this problem is the differential peak calling problem. Here one has to find genomic regions with ChIP-seq signal changes between two cellular conditions in the interaction of a protein with DNA. The great majority of peak calling methods can only analyse one ChIP-seq signal at a time and are unable to perform differential peak calling. Recently, a few approaches based on the combination of these peak callers with statistical tests for detecting differential digital expression have been proposed. However, these methods fail to detect detailed changes of protein-DNA interactions. Results: We propose ODIN; an HMM-based approach to detect and analyse differential peaks in pairs of ChIP-seq data. ODIN performs genomic signal processing, peak calling and p-value calculation in an integrated framework. We also propose an evaluation methodology to compare ODIN with competing methods. The evaluation method is based on the association of differential peaks with expression changes in the same cellular conditions. Our empirical study based on several ChIP-seq experiments from transcription factors, histone modifications and simulated data shows that ODIN outperforms considered competing methods in most scenarios.