Project description:One of the major challenges in genomics is to build computational models that accurately predict genome-wide gene expression from the sequences of regulatory elements. Promoters play a key role in gene regulation, yet their regulatory logic remains incompletely understood. Here, we present PARM, a cell-type specific deep learning model trained on specially designed massively parallel reporter assays that query human promoter sequences. PARM is computationally light-weight and reliably predicts autonomous promoter activity across the genome from DNA sequence alone, in multiple cell types. PARM can also design purely synthetic strong promoters. We leveraged PARM to systematically identify binding sites of transcription factors (TFs) binding sites that are likely to contribute to the activity of each natural human promoter, and to detect the rewiring of these regulatory interactions upon various stimuli to the cells. We also uncovered and experimentally confirmed striking positional preferences of TFs that differ between activating and repressive regulatory functions, as well as a complex grammar of motif-motif interactions. Our approach provides a foundation towards a deeper understanding of the dynamic regulation of human promoters by TFs
Project description:Across a range of biological processes, cells undergo coordinated changes in gene expression, resulting in transcriptome dynamics that unfold within a low-dimensional manifold. Single-cell RNA-sequencing (scRNA-seq) only measures temporal snapshots of gene expression, yet information on the underlying low-dimensional dynamics can be extracted using RNA velocity, which models unspliced and spliced RNA abundances to estimate the rate of change of gene expression. Available RNA velocity algorithms can be fragile and rely on heuristics that lack statistical control. Moreover, the estimated vector field is not dynamically consistent with the traversed gene expression manifold. Here, we develop a generative model of RNA velocity and a Bayesian inference approach that solves these problems. Our model couples velocity field and manifold estimation in a reformulated, unified framework, so as to coherently identify the parameters of an autonomous dynamical system. Focusing on the cell cycle, we implemented VeloCycle to study gene regulation dynamics on one-dimensional periodic manifolds and validated using live-imaging its ability to infer actual cell cycle periods. We benchmarked RNA velocity inference with sensitivity analyses and demonstrated one- and multiple-sample testing. We also conducted Markov chain Monte Carlo inference on the model, uncovering key relationships between gene-specific kinetics and our gene-independent velocity estimate. Finally, we applied VeloCycle to in vivo samples and in vitro genome-wide Perturb-seq, revealing regionally defined proliferation modes in neural progenitors and the effect of gene knockdowns on cell cycle speed. Ultimately, VeloCycle expands the scRNA-seq analysis toolkit with a modular and statistically rigorous RNA velocity inference framework.
Project description:Despite the fact that the cell cycle is a fundamental process of life, a detailed quantitative understanding of gene regulation dynamics throughout the cell cycle is far from complete. Single-cell RNA-sequencing (scRNA-seq) technology gives access to these dynamics without externally perturbing the cell. Here, by generating scRNA-seq libraries in different cell systems, we observe cycling patterns in the unspliced-spliced RNA space of cell cycle-related genes. Since existing methods to analyze scRNA-seq are not efficient to measure cycling gene dynamics, we propose a deep learning approach (DeepCycle) to fit these patterns and build a high-resolution map of the entire cell cycle transcriptome. Characterizing the cell cycle in embryonic and somatic cells, we identify major waves of transcription during the G1 phase and systematically study the stages of the cell cycle. Our work will facilitate the study of the cell cycle in multiple cellular models and different biological contexts.