Project description:Droplet-based single cell transcriptome sequencing (scRNA-seq) technology is able to measure the gene expression from tens of thousands of single cells simultaneously. More recently, coupled with the cutting-edge Cellular Indexing of Transcriptomes and Epitopes by Sequencing (CITE-seq), the droplet-based system has allowed for immunophenotyping of single cells based on cell surface expression of specific proteins together with simultaneous transcriptome profiling in the same cell. In this study, we developed BREM-SC, a novel Bayesian Random Effects Mixture model that jointly clusters paired single cell transcriptomic and proteomic data, which will greatly facilitate researchers to jointly study transcriptome and surface proteins at the single cell level to make new biological discoveries.
Project description:Abstract: The recently developed droplet-based single cell transcriptome sequencing (scRNA-seq) technology makes it feasible to perform a population-scale scRNA-seq study, in which the transcriptome is measured for tens of thousands of single cells from multiple individuals. Despite the advances of many clustering methods, there are few tailored methods for population-scale scRNA-seq studies. Here, we develop a BAyesian Mixture Model for Single Cell sequencing (BAMM-SC) method to cluster scRNA-seq data from multiple individuals simultaneously. BAMM-SC takes raw count data as input and accounts for data heterogeneity and batch effect among multiple individuals in a unified Bayesian hierarchical model framework. Results from applications of BAMM-SC to in-house experimental scRNA-seq datasets using blood and lung cells from humans or mice demonstrate that BAMM-SC outperformed existing clustering methods with considerable improved clustering accuracy, particularly in the presence of heterogeneity among individuals. Data purpose: To evaluate the performance of BAMM-SC for clustering droplet-based scRNA-seq data in population-based study, we performed single cell RNA-seq on peripheral blood mononuclear cells (PBMC) isolated from whole blood obtained from 4 healthy donors, and on lung cells isolated from streptococcus pneumonia (SP) infected and naïve mice.
Project description:Histone modifications are a key epigenetic mechanism to activate or repress the expression of genes. Data sets of matched microarray expression data and histone modification data measured by ChIP-seq exist, but methods for integrative analysis of both data types are still rare. Here, we present a novel bioinformatic approach to detect genes that are differentially expressed between two conditions putatively caused by alterations in histone modification. We introduce a correlation measure for integrative analysis of ChIP-seq and gene expression data and demonstrate that a proper normalization of the ChIP-seq data is crucial. We suggest applying Bayesian mixture models of different distributions to further study the distribution of the correlation measure. The implicit classification of the mixture models is used to detect genes with differences between two conditions in both gene expression and histone modification. The method is applied to different data sets and its superiority to a naive separate analysis of both data types is demonstrated. This GEO series contains the expression data of the Cebpa example data set.
Project description:Histone modifications are a key epigenetic mechanism to activate or repress the expression of genes. Data sets of matched microarray expression data and histone modification data measured by ChIP-seq exist, but methods for integrative analysis of both data types are still rare. Here, we present a novel bioinformatic approach to detect genes that are differentially expressed between two conditions putatively caused by alterations in histone modification. We introduce a correlation measure for integrative analysis of ChIP-seq and gene expression data and demonstrate that a proper normalization of the ChIP-seq data is crucial. We suggest applying Bayesian mixture models of different distributions to further study the distribution of the correlation measure. The implicit classification of the mixture models is used to detect genes with differences between two conditions in both gene expression and histone modification. The method is applied to different data sets and its superiority to a naive separate analysis of both data types is demonstrated. This GEO series contains the expression data of the Cebpa example data set. This data set was derived from sorted Cebpafl/fl and Cebpafl/fl;Mx1Cre murine hematopoietic LSKCD150- 18 post pIpC injections (conditional deletion of Cebpa). The specimens from three Cebpafl/fl and three Cebpafl/fl;Mx1Cre mice were hybridized separately on six Affymetrix Mouse Gene 1.0 ST arrays. Associated histone modification ChIP-seq data is provided by series GSE43007.