SCENIC: single-cell regulatory network inference and clustering.
ABSTRACT: We present SCENIC, a computational method for simultaneous gene regulatory network reconstruction and cell-state identification from single-cell RNA-seq data (http://scenic.aertslab.org). On a compendium of single-cell data from tumors and brain, we demonstrate that cis-regulatory analysis can be exploited to guide the identification of transcription factors and cell states. SCENIC provides critical biological insights into the mechanisms driving cellular heterogeneity.
Project description:We present cisTopic, a probabilistic framework used to simultaneously discover coaccessible enhancers and stable cell states from sparse single-cell epigenomics data ( http://github.com/aertslab/cistopic ). Using a compendium of single-cell ATAC-seq datasets from differentiating hematopoietic cells, brain and transcription factor perturbations, we demonstrate that topic modeling can be exploited for robust identification of cell types, enhancers and relevant transcription factors. cisTopic provides insight into the mechanisms underlying regulatory heterogeneity in cell populations.
Project description:Few people would deny an intuitive sense of increased wellbeing when spending time in beautiful locations. Here, we ask: can we quantify the relationship between environmental aesthetics and human health? We draw on data from Scenic-Or-Not, a website that crowdsources ratings of "scenicness" for geotagged photographs across Great Britain, in combination with data on citizen-reported health from the Census for England and Wales. We find that inhabitants of more scenic environments report better health, across urban, suburban and rural areas, even when taking core socioeconomic indicators of deprivation into account, such as income, employment and access to services. Our results provide evidence in line with the striking hypothesis that the aesthetics of the environment may have quantifiable consequences for our wellbeing.
Project description:Does spending time in beautiful settings boost people's happiness? The answer to this question has long remained elusive due to a paucity of large-scale data on environmental aesthetics and individual happiness. Here, we draw on two novel datasets: first, individual happiness data from the smartphone app, Mappiness, and second, crowdsourced ratings of the "scenicness" of photographs taken across England from the online game Scenic-Or-Not. We find that individuals are happier in more scenic locations, even when we account for a range of factors such as the activity the individual was engaged in at the time, weather conditions and the income of local inhabitants. Crucially, this relationship holds not only in natural environments, but in built-up areas too, even after controlling for the presence of green space. Our results provide evidence that the aesthetics of the environments that policymakers choose to build or demolish may have consequences for our everyday wellbeing.
Project description:Plant landscapes are fundamental components of the green space of urban parks and are often dynamic, changing throughout the year. Winter is a season with poor plant landscape effects in urban park green spaces. However, plant community landscapes in the winter in urban park green spaces could be further optimized. Here, we conducted scenic beauty estimation (SBE) of the landscape factors in 29 winter plant communities in four typical urban parks in Yangzhou, China using partial correlation analysis and multiple linear regression. The standard SBE values of the 29 plant communities ranged from -0.981 to 1.209. Complex plant community landscapes with abundant plant species, beautiful plant community morphology and obvious seasonal changes generally received high scenic beauty scores. Six landscape factors, including the diversity of plant species, the proportion of evergreen tree species, the morphological characteristics of plants, the ground cover rate, the overall sense of harmony and the color composition, greatly influenced the scenic beauty of the plant landscape in the winter. Generally, the results of this study provide insight into how the plant community landscape in urban parks could be improved.
Project description:We present cisTopic, a probabilistic framework to simultaneously discover co-accessible enhancers and stable cell states from sparse single-cell epigenomics data (http://github.com/aertslab/cistopic). On a compendium of single-cell ATAC-seq datasets from differentiating hematopoietic cells, brain, and transcription-factor perturbation dynamics, we demonstrate that topic modelling can be exploited for a robust identification of cell types, enhancers, and relevant transcription factors. cisTopic provides insight into the mechanisms underlying regulatory heterogeneity within cell populations. Overall design: Time series scATAC-seq (Fluidigm C1) and bulk OmniATAC-seq of SOX10 knockdown-induced phenotype switching in two melanoma cell lines. ChIP-seq of H3K27Ac on four melanoma cell lines.
Project description:We present Clustering and Lineage Inference in Single-Cell Transcriptional Analysis (CALISTA), a numerically efficient and highly scalable toolbox for an end-to-end analysis of single-cell transcriptomic profiles. CALISTA includes four essential single-cell analyses for cell differentiation studies, including single-cell clustering, reconstruction of cell lineage specification, transition gene identification, and cell pseudotime ordering, which can be applied individually or in a pipeline. In these analyses, we employ a likelihood-based approach where single-cell mRNA counts are described by a probabilistic distribution function associated with stochastic gene transcriptional bursts and random technical dropout events. We illustrate the efficacy of CALISTA using single-cell gene expression datasets from different single-cell transcriptional profiling technologies and from a few hundreds to tens of thousands of cells. CALISTA is freely available on https://www.cabselab.com/calista.
Project description:Transcriptional regulatory network inference (TRNI) from large compendia of DNA microarrays has become a fundamental approach for discovering transcription factor (TF)-gene interactions at the genome-wide level. In correlation-based TRNI, network edges can in principle be evaluated using standard statistical tests. However, while such tests nominally assume independent microarray experiments, we expect dependency between the experiments in microarray compendia, due to both project-specific factors (e.g., microarray preparation, environmental effects) in the multi-project compendium setting and effective dependency induced by gene-gene correlations. Herein, we characterize the nature of dependency in an Escherichia coli microarray compendium and explore its consequences on the problem of determining which and how many arrays to use in correlation-based TRNI.We present evidence of substantial effective dependency among microarrays in this compendium, and characterize that dependency with respect to experimental condition factors. We then introduce a measure neff of the effective number of experiments in a compendium, and find that corresponding to the dependency observed in this particular compendium there is a huge reduction in effective sample size i.e., neff = 14.7 versus n = 376. Furthermore, we found that the neff of select subsets of experiments actually exceeded neff of the full compendium, suggesting that the adage 'less is more' applies here. Consistent with this latter result, we observed improved performance in TRNI using subsets of the data compared to results using the full compendium. We identified experimental condition factors that trend with changes in TRNI performance and neff , including growth phase and media type. Finally, using the set of known E. coli genetic regulatory interactions from RegulonDB, we demonstrated that false discovery rates (FDR) derived from neff -adjusted p-values were well-matched to FDR based on the RegulonDB truth set.These results support utilization of neff as a potent descriptor of microarray compendia. In addition, they highlight a straightforward correlation-based method for TRNI with demonstrated meaningful statistical testing for significant edges, readily applicable to compendia from any species, even when a truth set is not available. This work facilitates a more refined approach to construction and utilization of mRNA expression compendia in TRNI.
Project description:Transcription control plays a crucial role in establishing a unique gene expression signature for each of the hundreds of mammalian cell types. Though gene expression data have been widely used to infer cellular regulatory networks, existing methods mainly infer correlations rather than causality. We developed statistical models and likelihood-ratio tests to infer causal gene regulatory networks using enhancer RNA (eRNA) expression information as a causal anchor and applied the framework to eRNA and transcript expression data from the FANTOM Consortium. Predicted causal targets of transcription factors (TFs) in mouse embryonic stem cells, macrophages and erythroblastic leukaemia overlapped significantly with experimentally-validated targets from ChIP-seq and perturbation data. We further improved the model by taking into account that some TFs might act in a quantitative, dosage-dependent manner, whereas others might act predominantly in a binary on/off fashion. We predicted TF targets from concerted variation of eRNA and TF and target promoter expression levels within a single cell type, as well as across multiple cell types. Importantly, TFs with high-confidence predictions were largely different between these two analyses, demonstrating that variability within a cell type is highly relevant for target prediction of cell type-specific factors. Finally, we generated a compendium of high-confidence TF targets across diverse human cell and tissue types.
Project description:The diversity of cell types and regulatory states in the brain, and how these change during aging, remains largely unknown. We present a single-cell transcriptome atlas of the entire adult Drosophila melanogaster brain sampled across its lifespan. Cell clustering identified 87 initial cell clusters that are further subclustered and validated by targeted cell-sorting. Our data show high granularity and identify a wide range of cell types. Gene network analyses using SCENIC revealed regulatory heterogeneity linked to energy consumption. During aging, RNA content declines exponentially without affecting neuronal identity in old brains. This single-cell brain atlas covers nearly all cells in the normal brain and provides the tools to study cellular diversity alongside other Drosophila and mammalian single-cell datasets in our unique single-cell analysis platform: SCope (http://scope.aertslab.org). These results, together with SCope, allow comprehensive exploration of all transcriptional states of an entire aging brain.
Project description:BACKGROUND: The availability of large collections of microarray datasets (compendia), or knowledge about grouping of genes into pathways (gene sets), is typically not exploited when training predictors of disease outcome. These can be useful since a compendium increases the number of samples, while gene sets reduce the size of the feature space. This should be favorable from a machine learning perspective and result in more robust predictors. METHODOLOGY: We extracted modules of regulated genes from gene sets, and compendia. Through supervised analysis, we constructed predictors which employ modules predictive of breast cancer outcome. To validate these predictors we applied them to independent data, from the same institution (intra-dataset), and other institutions (inter-dataset). CONCLUSIONS: We show that modules derived from single breast cancer datasets achieve better performance on the validation data compared to gene-based predictors. We also show that there is a trend in compendium specificity and predictive performance: modules derived from a single breast cancer dataset, and a breast cancer specific compendium perform better compared to those derived from a human cancer compendium. Additionally, the module-based predictor provides a much richer insight into the underlying biology. Frequently selected gene sets are associated with processes such as cell cycle, E2F regulation, DNA damage response, proteasome and glycolysis. We analyzed two modules related to cell cycle, and the OCT1 transcription factor, respectively. On an individual basis, these modules provide a significant separation in survival subgroups on the training and independent validation data.