Project description:Non-homologous end-joining (NHEJ) plays an important role in double-strand break (DSB) repair of DNA. Recent studies have shown that the error patterns of NHEJ are strongly biased by sequence context, but these studies were based on relatively few templates. To investigate this more thoroughly, we systematically profiled ~1.16 million independent mutational events resulting from CRISPR/Cas9-mediated cleavage and NHEJ-mediated DSB repair of 6,872 synthetic target sequences, introduced into a human cell line via lentiviral infection. We find that: 1) insertions are dominated by 1 bp events templated by sequence immediately upstream of the cleavage site, 2) deletions are predominantly associated with microhomology, and 3) targets exhibit variable but reproducible diversity with respect to the number and relative frequency of the mutational outcomes to which they give rise. From these data, we trained a model (Lindel) that uses local sequence context to predict the distribution of mutational outcomes. Exploiting the bias of NHEJ outcomes towards microhomology mediated events, we demonstrate the programming of deletion patterns by introducing microhomology to specific locations in the vicinity of the DSB site. We anticipate that our results will inform investigations of DSB repair mechanisms as well as the design of CRISPR/Cas9 experiments for diverse applications including genome-wide screens, gene therapy, lineage tracing and molecular recording.
Project description:The development of modern genome editing and DNA synthesis has enabled researchers to edit DNA sequences with high precision but has left unsolved the problem of designing these edits. We introduce Ledidi, a computational method that rephrases the discrete design task of choosing which edits to make as an easily solvable continuous optimization problem. Ledidi can use any pre-trained deep learning model to guide the optimization, yielding an edited sequence that exhibits the desired outcome while explicitly minimizing the number of edits. When applied in dozens of settings, we find that Ledidi's designs can precisely control transcription factor binding, chromatin accessibility, transcription, and enhancer activity in silico. By using several deep learning models simultaneously, we design cell type-specific enhancers and experimentally validate them in cellulo. Finally, we introduce the concept of an "affinity catalog'', where the design task is repeated multiple times across continuous variants of the design target. We demonstrate how these catalogs can be used to interpret deep learning models and the impact of starting template sequences, and also to design regulatory elements that control transcriptional dosage in a cell type-specific fashion.
2025-12-05 | GSE312234 | GEO
Project description:Protein-Nucleic Acid constrained Language Model-assisted Design of Precise and Compact Adenine Base Editor
Project description:G-quadruplexes (G4) are non-canonical DNA structures that gained increasing attention for their potential roles in gene regulation, with implications in neurodegenerative diseases and cancer. Despite their biological significance, G4 structures have not been studied systematically across tissues and cell types. In this study, we employ G4 single-cell CUT&Tag (G4 scCUT&Tag) to characterize G4 landscapes in postnatal mouse brain cells, leveraging single-cell analytical approaches commonly used in scRNA-Seq and scATAC-Seq datasets. Using conventional single-cell omics workflows to process and explore our data, we distinguished different cell types based on G4 heterogeneity. Furthermore, we performed uncoupled multi-omics integration of G4 scCUT&Tag data with scRNA-Seq gene expression profiles, using both a covariance-based technique (canonical correlation analysis) and a transfer learning-based deep learning approach. These integrations not only revealed significant co-enrichment of G4 and gene expression signals, but demonstrated that G4 scCUT&Tag enables detailed examination of G4 heterogeneity in complex tissues and supports integrative analysis of G4 profiles with other omics layers, offering new insights into the epigenomic landscapes of the developing central nervous system.
Project description:Gene expression profile Predictor on chemical Structures (GPS): Deep Learning-based platform to screen and design novel therapeutics
Project description:E12.5 mouse whole embryo and E12.5 placenta total RNA were pooled to create 25:75, 50:50, and 75:25 ratio mixtures, based on Bioanalyzer quantitation. These samples, along with the original unmixed RNAs, were used as templates for duplicate linear amplification labeling reactions. cRNA target mixtures were hybridized against a Universal Mouse Reference (Stratagene). Pairwise comparison using the NIA Microarray Analysis (ANOVA) software produced log ratios, which were compared to the expected log ratios for genes showing statistically significant (FDR<0.05) differential expression between unmixed embryo and placenta. Keywords: cell type comparison design,development or differentiation design,normalization testing design,reference design