Project description:This benchmark data set is composed of two groups of fractionated samples. The first group contained a mixture of E. coli and human digest, and the second group contained a similar mixture except that the amount of E. coli was twice the amount as the first group.
Project description:Single cell RNA sequencing (scRNA-seq) technology has undergone rapid development in recent years and brings new challenges in data processing and analysis. This has led to an explosion of tailored analysis methods for scRNA-seq to address various biological questions. However, the current lack of gold-standard benchmarking datasets makes it difficult for researchers to evaluate the performance of the many methods available in a systematic manner. Here, we designed and generated a cross-platform benchmark dataset that has in-built truth in various forms and varying levels of biological noise. We used this dataset to compare different protocols and data analysis methods. We found that different protocols have different data quality and ERCC spike-in works independently to endogenous RNA. We found significant differences in the results from the methods compared and we associated the results with data characteristics to identify methods that perform well in different situations. Our dataset and analysis provide a valuable resource for algorithm selection in different biological settings.
Project description:Single cell RNA sequencing (scRNA-seq) technology has undergone rapid development in recent years and brings new challenges in data processing and analysis. This has led to an explosion of tailored analysis methods for scRNA-seq to address various biological questions. However, the current lack of gold-standard benchmarking datasets makes it difficult for researchers to evaluate the performance of the many methods available in a systematic manner. Here, we designed and generated a cross-platform benchmark dataset that has in-built truth in various forms and varying levels of biological noise. We used this dataset to compare different protocols and data analysis methods. We found that different protocols have different data quality and ERCC spike-in works independently to endogenous RNA. We found significant differences in the results from the methods compared and we associated the results with data characteristics to identify methods that perform well in different situations. Our dataset and analysis provide a valuable resource for algorithm selection in different biological settings.
Project description:A comprehensive LFQ benchmark dataset to validate data analysis pipelines on modern day acquisition strategies in proteomics using SCIEX TripleTOF5600 and 6600+, Orbitrap QE-HFX, Waters Synapt GS-Si and Synapt XS and Bruker timsTOF Pro.
Project description:Bulk deconvolution with single-cell/nucleus RNA-seq data is critical for understanding heterogeneity in complex biological samples, yet the technological discrepancy across sequencing platforms limits deconvolution accuracy. To address this, we introduce an experimental design to match inter-platform biological signals, hence revealing the technological discrepancy, and then develop a deconvolution framework called DeMixSC using the better-matched, i.e., benchmark, data. Built upon a novel weighted nonnegative least-squares framework, DeMixSC identifies and adjusts genes with high technological discrepancy and aligns the benchmark data with large patient cohorts of matched-tissue-type for large-scale deconvolution. Our results using a benchmark dataset of healthy retinas suggest much-improved deconvolution accuracy. Further analysis of a cohort of 453 patients with age-related macular degeneration supports the broad applicability of DeMixSC. Our findings reveal the impact of technological discrepancy on deconvolution performance and underscore the importance of a well-matched dataset to resolve this challenge. The developed DeMixSC framework is generally applicable for deconvolving large cohorts of disease tissues, and potentially cancer.
Project description:Data analysis is a critical part of quantitative proteomics studies in interpreting biological questions. Numerous computational tools including protein quantification, imputation, and differential expression (DE) analysis were generated in the past decade. However, searching optimized tools is still an unsolved issue. Moreover, due to the rapid development of RNA-Seq technology, a vast number of DE analysis methods are created. Applying these newly developed RNA-Seq-oriented tools to proteomics data is still a question that needs to be addressed. In order to benchmark these analysis methods, a proteomics dataset constituted the proteins derived from human, yeast, and drosophila with different ratios were generated. Based on this dataset, DE analysis tools (including array-based and RNA-Seq based), imputation algorithms, and protein quantification methods were compared and benchmarked. This study provided useful information on analyzing quantitative proteomics datasets. All the methods used in this study were integrated into Perseus which are available at https://www.maxquant.org/perseus.