Project description:Non-homologous end-joining (NHEJ) plays an important role in double-strand break (DSB) repair of DNA. Recent studies have shown that the error patterns of NHEJ are strongly biased by sequence context, but these studies were based on relatively few templates. To investigate this more thoroughly, we systematically profiled ~1.16 million independent mutational events resulting from CRISPR/Cas9-mediated cleavage and NHEJ-mediated DSB repair of 6,872 synthetic target sequences, introduced into a human cell line via lentiviral infection. We find that: 1) insertions are dominated by 1 bp events templated by sequence immediately upstream of the cleavage site, 2) deletions are predominantly associated with microhomology, and 3) targets exhibit variable but reproducible diversity with respect to the number and relative frequency of the mutational outcomes to which they give rise. From these data, we trained a model (Lindel) that uses local sequence context to predict the distribution of mutational outcomes. Exploiting the bias of NHEJ outcomes towards microhomology mediated events, we demonstrate the programming of deletion patterns by introducing microhomology to specific locations in the vicinity of the DSB site. We anticipate that our results will inform investigations of DSB repair mechanisms as well as the design of CRISPR/Cas9 experiments for diverse applications including genome-wide screens, gene therapy, lineage tracing and molecular recording.
Project description:G-quadruplexes (G4) are non-canonical DNA structures that gained increasing attention for their potential roles in gene regulation, with implications in neurodegenerative diseases and cancer. Despite their biological significance, G4 structures have not been studied systematically across tissues and cell types. In this study, we employ G4 single-cell CUT&Tag (G4 scCUT&Tag) to characterize G4 landscapes in postnatal mouse brain cells, leveraging single-cell analytical approaches commonly used in scRNA-Seq and scATAC-Seq datasets. Using conventional single-cell omics workflows to process and explore our data, we distinguished different cell types based on G4 heterogeneity. Furthermore, we performed uncoupled multi-omics integration of G4 scCUT&Tag data with scRNA-Seq gene expression profiles, using both a covariance-based technique (canonical correlation analysis) and a transfer learning-based deep learning approach. These integrations not only revealed significant co-enrichment of G4 and gene expression signals, but demonstrated that G4 scCUT&Tag enables detailed examination of G4 heterogeneity in complex tissues and supports integrative analysis of G4 profiles with other omics layers, offering new insights into the epigenomic landscapes of the developing central nervous system.
Project description:E12.5 mouse whole embryo and E12.5 placenta total RNA were pooled to create 25:75, 50:50, and 75:25 ratio mixtures, based on Bioanalyzer quantitation. These samples, along with the original unmixed RNAs, were used as templates for duplicate linear amplification labeling reactions. cRNA target mixtures were hybridized against a Universal Mouse Reference (Stratagene). Pairwise comparison using the NIA Microarray Analysis (ANOVA) software produced log ratios, which were compared to the expected log ratios for genes showing statistically significant (FDR<0.05) differential expression between unmixed embryo and placenta. Keywords: cell type comparison design,development or differentiation design,normalization testing design,reference design
Project description:Transcriptional enhancers act as docking stations for combinations of transcription factors and thereby regulate spatiotemporal activation of their target genes. A single enhancer, of a few hundred base pairs in length, can autonomously and independently of its location and orientation drive cell-type specific expression of a gene or transgene. It has been a long-standing goal in the field to decode the regulatory logic of an enhancer and to understand the details of how spatiotemporal gene expression is encoded in an enhancer sequence. Recently, deep learning models have yielded unprecedented insight into the enhancer code, and well-trained models are reaching a level of understanding that may be close to complete. As a consequence, we hypothesized that deep learning models can be used to guide the directed design of synthetic, cell type specific enhancers, and that this process would allow for a detailed tracing of all enhancer features at nucleotide-level resolution. Here we implemented and compared three different design strategies, each built on a deep learning model: (1) directed sequence evolution; (2) directed iterative motif implanting; and (3) generative design. We evaluated the function of fully synthetic enhancers to specifically target Kenyon cells or glial cells in the fruit fly brain using transgenic animals. We then exploited this concept further by creating “dual-code” enhancers that target two cell types, and minimal enhancers smaller than 50 base pairs that are fully functional. By examining the trajectories followed during state space searches towards functional enhancers, we could accurately define the enhancer code as the optimal strength, combination, and relative distance of TF activator motifs, and the absence of TF repressor motifs. Finally, we applied the same three strategies to successfully design human enhancers, finding highly similar design principles as in Drosophila. In conclusion, enhancer design guided by deep learning leads to better understanding of how enhancers work and shows that their code can be exploited to manipulate cell states.