DC3 is a method for deconvolution and coupled clustering from bulk and single-cell genomics data.
ABSTRACT: Characterizing and interpreting heterogeneous mixtures at the cellular level is a critical problem in genomics. Single-cell assays offer an opportunity to resolve cellular level heterogeneity, e.g., scRNA-seq enables single-cell expression profiling, and scATAC-seq identifies active regulatory elements. Furthermore, while scHi-C can measure the chromatin contacts (i.e., loops) between active regulatory elements to target genes in single cells, bulk HiChIP can measure such contacts in a higher resolution. In this work, we introduce DC3 (De-Convolution and Coupled-Clustering) as a method for the joint analysis of various bulk and single-cell data such as HiChIP, RNA-seq and ATAC-seq from the same heterogeneous cell population. DC3 can simultaneously identify distinct subpopulations, assign single cells to the subpopulations (i.e., clustering) and de-convolve the bulk data into subpopulation-specific data. The subpopulation-specific profiles of gene expression, chromatin accessibility and enhancer-promoter contact obtained by DC3 provide a comprehensive characterization of the gene regulatory system in each subpopulation.
Project description:HiChIP/PLAC-seq is increasingly becoming popular for profiling 3D chromatin contacts among regulatory elements and for annotating functions of genetic variants. Here we describe FitHiChIP, a computational method for loop calling from HiChIP/PLAC-seq data, which jointly models the non-uniform coverage and genomic distance scaling of contact counts to compute statistical significance estimates. We also develop a technique to filter putative bystander loops that can be explained by stronger adjacent loops. Compared to existing methods, FitHiChIP performs better in recovering contacts reported by Hi-C, promoter capture Hi-C and ChIA-PET experiments and in capturing previously validated promoter-enhancer interactions. FitHiChIP loop calls are reproducible among replicates and are consistent across different experimental settings. Our work also provides a framework for differential HiChIP analysis with an option to utilize ChIP-seq data for further characterizing differential loops. Even though designed for HiChIP, FitHiChIP is also applicable to other conformation capture assays.
Project description:Advances in single-cell RNA-sequencing techniques reveal the existence of distinct cell subpopulations. Identification of transcription factors (TFs) that define the identity of these subpopulations poses a challenge. Here, we postulate that identity depends on background subpopulations, and is determined by a synergistic core combination of TFs mainly uniquely expressed in each subpopulation, but also TFs more broadly expressed across background subpopulations. Building on this view, we develop a new computational method for determining such synergistic identity cores of subpopulations within a given cell population. Our method utilizes an information-theoretic measure for quantifying transcriptional synergy, and implements a novel algorithm for searching for optimal synergistic cores. It requires only single-cell RNA-seq data as input, and does not rely on any prior knowledge of candidate genes or gene regulatory networks. Hence, it can be directly applied to any cellular systems, including those containing novel subpopulations. The method is capable of recapitulating known experimentally validated identity TFs in eight published single-cell RNA-seq datasets. Furthermore, some of these identity TFs are known to trigger cell conversions between subpopulations. Thus, this methodology can help design strategies for cell conversion within a cell population, guiding experimentalists in the field of stem cell research and regenerative medicine.
Project description:Ependymoma (EPN) is a brain tumor commonly presenting in childhood that remains fatal in most children. Intra-tumoral cellular heterogeneity in bulk-tumor samples significantly confounds our understanding of EPN biology, impeding development of effective therapy. We, therefore, use single-cell RNA sequencing, histology, and deconvolution to catalog cellular heterogeneity of the major childhood EPN subgroups. Analysis of PFA subgroup EPN reveals evidence of an undifferentiated progenitor subpopulation that either differentiates into subpopulations with ependymal cell characteristics or transitions into a mesenchymal subpopulation. Histological analysis reveals that progenitor and mesenchymal subpopulations co-localize in peri-necrotic zones. In conflict with current classification paradigms, relative PFA subpopulation proportions are shown to determine bulk-tumor-assigned subgroups. We provide an interactive online resource that facilitates exploration of the EPN single-cell dataset. This atlas of EPN cellular heterogeneity increases understanding of EPN biology.
Project description:Despite its popularity, characterization of subpopulations with transcript abundance is subject to a significant amount of noise. We propose to use effective and expressed nucleotide variations (eeSNVs) from scRNA-seq as alternative features for tumor subpopulation identification. We develop a linear modeling framework, SSrGE, to link eeSNVs associated with gene expression. In all the datasets tested, eeSNVs achieve better accuracies than gene expression for identifying subpopulations. Previously validated cancer-relevant genes are also highly ranked, confirming the significance of the method. Moreover, SSrGE is capable of analyzing coupled DNA-seq and RNA-seq data from the same single cells, demonstrating its value in integrating multi-omics single cell techniques. In summary, SNV features from scRNA-seq data have merits for both subpopulation identification and linkage of genotype-phenotype relationship.
Project description:BACKGROUND:Classic dendritic cells (cDCs) play a central role in the immune system by processing and presenting antigens to activate T cells, and consist of two major subsets: CD141+ cDC (cDC1) and CD1c+ cDC (cDC2). A population of migratory precursor cells, the pre-cDCs, is the immediate precursors to both cDC subsets. Previous studies showed that there were two pre-committed pre-cDC subpopulations. However, the key molecular drivers of pre-commitment in human pre-cDCs were not investigated. RESULTS:To identify the key molecular drivers for pre-commitment in human pre-cDCs, we performed single cell RNA sequencing (RNA-Seq) of two cDC subsets and pre-cDCs, and bulk RNA-Seq of pre-cDCs and cDCs from human peripheral blood. We found that pre-DC subpopulations cannot be separated by either variable genes within pre-cDCs or differentially expressed genes between cDC1 and cDC2. In contrast, they were separated by 16 transcription factors that are themselves differentially expressed or have regulated targets enriched in the differentially expressed genes between bulk cDC1 and cDC2, with one subpopulation close to cDC1 and the other close to cDC2. More importantly, these two pre-cDC sub-populations are correlated with ratio of IRF8 to IRF4 expression level more than their individual expression level. We also verified these findings using three recently published datasets. CONCLUSIONS:In this study, we demonstrate that single cell transcriptome profiling can reveal pre-cDCs differentiation map, and our results suggest the concept that combinatorial dose of transcription factors determines cell differentiation fate.
Project description:Emerging single-cell technologies (e.g. single-cell ATAC-seq, DNase-seq or ChIP-seq) have made it possible to assay regulome of individual cells. Single-cell regulome data are highly sparse and discrete. Analyzing such data is challenging. User-friendly software tools are still lacking. We present SCRAT, a Single-Cell Regulome Analysis Toolbox with a graphical user interface, for studying cell heterogeneity using single-cell regulome data. SCRAT can be used to conveniently summarize regulatory activities according to different features (e.g. gene sets, transcription factor binding motif sites, etc.). Using these features, users can identify cell subpopulations in a heterogeneous biological sample, infer cell identities of each subpopulation, and discover distinguishing features such as gene sets and transcription factors that show different activities among subpopulations.SCRAT is freely available at https://zhiji.shinyapps.io/scrat as an online web service and at https://github.com/zji90/SCRAT as an R email@example.com.Supplementary data are available at Bioinformatics online.
Project description:Intratumor heterogeneity-heterogeneity of cancer cells within a single tumor-is considered one of the most problematic factors of treatment. Genetic heterogeneity, such as in somatic mutations and chromosome aberrations, is a common characteristic of human solid tumors and is probably the basis of biological heterogeneity. Using mutations in APC, TP53 and KRAS as markers to identify distinct colorectal cancer subpopulations, we analyzed a total of 42 primary colorectal cancer tissues and six paired liver metastases with multipoint microsampling, which enabled analysis of mutation patterns and allelic imbalances with a resolution of 0.01 mm(2) (about 200 cells). There was usually more than one subpopulation in each primary tumor. Only two of 15 (13.3%) cases with three gene mutations and eight of 27 (29.6%) cases with two gene mutations had a single subpopulation. Cells with mutations in all of the examined genes usually constituted the major population. Multipoint microsampling of six primary and metastatic tumor pairs revealed that the majority of discrepancies in mutation patterns found with the bulk tissue analysis were due to loss of subpopulations in the metastatic tissues. In addition, multipoint microsampling uncovered substantial changes in subpopulations that were not detected with bulk tissue analysis. Specifically, the proportion of KRAS mutation-negative subpopulations increased in the metastatic tumors of four cases. Because KRAS mutation status is linked to cetuximab/panitumumab efficacy, subpopulation dynamics could lead to differences in response to cetuximab/panitumumab in primary versus metastatic tumors.
Project description:BACKGROUND:Bone marrow stromal cells (BMSCs) are a heterogeneous population that participates in wound healing, immune modulation and tissue regeneration. Next generation sequencing was used to analyze transcripts from single BMSCs in order to better characterize BMSC subpopulations. METHODS:Cryopreserved passage 2 BMSCs from one healthy subject were cultured through passage 10. The transcriptomes of bulk BMSCs from designated passages were analyzed with microarrays and RNA sequencing (RNA-Seq). For some passages, single BMSCs were separated using microfluidics and their transcriptomes were analyzed by RNA-Seq. RESULTS:Transcriptome analysis by microarray and RNA-Seq of unseparated BMSCs from passages 2, 4, 6, 8, 9 and 10 yielded similar results; both data sets grouped passages 4 and 6 and passages 9 and 10 together and genes differentially expressed among these early and late passage BMSCs were similar. 3D Diffusion map visualization of single BMSCs from passages 3, 4, 6, 8 and 9 clustered passages 3 and 9 into two distinct groups, but there was considerable overlap for passages 4, 6 and 8 cells. Markers for early passage, FGFR2, and late passage BMSCs, PLAT, were able to identify three subpopulations within passage 3 BMSCs; one that expressed high levels of FGFR2 and low levels of PLAT; one that expressed low levels of FGFR2 and high levels of PLAT and one that expressed intermediate levels of FGFR2 and low levels of PLAT. CONCLUSIONS:Single BMSCs can be separated by microfluidics and their transcriptome analyzed by next generation sequencing. Single cell analysis of early passage BMSCs identified a subpopulation of cells expressing high levels of FGFR2 that might include skeletal stem cells.
Project description:Rationale: Single-cell RNA sequencing (scRNA-seq) has provided an unbiased assessment of specific profiling of cell populations at the single-cell level. Conventional renal biopsy and bulk RNA-seq only average out the underlying differences, while the extent of chronic kidney transplant rejection (CKTR) and how it is shaped by cells and states in the kidney remain poorly characterized. Here, we analyzed cells from CKTR and matched healthy adult kidneys at single-cell resolution. Methods: High-quality transcriptomes were generated from three healthy human kidneys and two CKTR biopsies. Unsupervised clustering analysis of biopsy specimens was performed to identify fifteen distinct cell types, including major immune cells, renal cells and a few types of stromal cells. Single-sample gene set enrichment (ssGSEA) algorithm was utilized to explore functional differences between cell subpopulations and between CKTR and normal cells. Results: Natural killer T (NKT) cells formed five subclasses, representing CD4+ T cells, CD8+ T cells, cytotoxic T lymphocytes (CTLs), regulatory T cells (Tregs) and natural killer cells (NKs). Memory B cells were classified into two subtypes, representing reverse immune activation. Monocytes formed a classic CD14+ group and a nonclassical CD16+ group. We identified a novel subpopulation [myofibroblasts (MyoF)] in fibroblasts, which express collagen and extracellular matrix components. The CKTR group was characterized by increased numbers of immune cells and MyoF, leading to increased renal rejection and fibrosis. Conclusions: By assessing functional differences of subtype at single-cell resolution, we discovered different subtypes that correlated with distinct functions in CKTR. This resource provides deeper insights into CKTR biology that will be helpful in the diagnosis and treatment of CKTR.
Project description:Stem cell differentiation is a complex biological process. Cellular heterogeneity, such as the co-existence of different cell subpopulations within a population, partly hampers our understanding of this process. The modern single-cell gene expression technologies, such as single-cell RT-PCR and RNA-seq, have enabled us to elucidate such heterogeneous cell subpopulations. However, the identification of a transcriptional regulatory network (TRN) for each cell subpopulation within a population and genes determining specific cell fates (lineage specifiers) remains a challenge due to the slower development of appropriate computational and experimental workflows. Here, we propose a computational differential network analysis approach for predicting lineage specifiers in binary-fate differentiation events.The proposed method is based on a model that considers each stem cell subpopulation being in a stable state maintained by its specific TRN stability core, and cell differentiation involves changes in these stability cores between parental and daughter cell subpopulations. The method first reconstructs topologically different cell-subpopulation specific TRNs from single-cell gene expression data, literature knowledge and transcription factor (TF)-DNA binding-site prediction. Then, it systematically predicts lineage specifiers by identifying genes in the TRN stability cores in both parental and daughter cell subpopulations.Application of this method to different stem cell differentiation systems was able to predict known and putative novel lineage specifiers. These examples include the differentiation of inner cell mass into either primitive endoderm or epiblast, different progenitor cells in the hematopoietic system, and the lung alveolar bipotential progenitor into either alveolar type 1 or alveolar type 2.The method is generally applicable to any binary-fate differentiation system, for which single-cell gene expression data are available. Therefore, it should aid in understanding stem cell lineage specification, and in the development of experimental strategies for regenerative medicine.