Project description:DNA-Diffusion is a novel generative approach leveraging diffusion probabilistic models for the design of cell type-specific DNA regulatory sequences. To evaluate the capacity of DNA-Diffusion sequences to endogenously alter AXIN2 transcription, we employed a novel MPRA-like system that utilizes CRE-recombinase mediated cassette exchange and long-read sequencing to measure the gene’s transcriptional output in response to enhancer sequences several kilobases away. We evaluated a total of 100 sequences using the AXIN2 endogenous replacement experiment using the following sequence groups: GM12878 DNA-Diffusion, GM12878 Positive Controls, GM12878 Negative Controls, K562 DNA-Diffusion, HepG2 DNA-Diffusion, and shuffled GM12878 DNA-Diffusion. Our findings demonstrate the potential of DNA-Diffusion to design sequences with therapeutic potential, showing their effectiveness in an endogenous setting.
Project description:DNA-Diffusion is a novel generative approach leveraging diffusion probabilistic models for the design of cell type-specific DNA regulatory sequences. To evaluate the capacity of DNA-Diffusion sequences to endogenously alter AXIN2 transcription, we employed a novel MPRA-like system that utilizes CRE-recombinase mediated cassette exchange and long-read sequencing to measure the gene’s transcriptional output in response to enhancer sequences several kilobases away. We evaluated a total of 100 sequences using the AXIN2 endogenous replacement experiment using the following sequence groups: GM12878 DNA-Diffusion, GM12878 Positive Controls, GM12878 Negative Controls, K562 DNA-Diffusion, HepG2 DNA-Diffusion, and shuffled GM12878 DNA-Diffusion. Our findings demonstrate the potential of DNA-Diffusion to design sequences with therapeutic potential, showing their effectiveness in an endogenous setting.
Project description:DNA-Diffusion is a novel generative approach leveraging diffusion probabilistic models for the design of cell type-specific DNA regulatory sequences. To evaluate the capacity of DNA-Diffusion sequences to endogenously alter AXIN2 transcription, we employed a novel MPRA-like system that utilizes CRE-recombinase mediated cassette exchange and long-read sequencing to measure the gene’s transcriptional output in response to enhancer sequences several kilobases away. We evaluated a total of 100 sequences using the AXIN2 endogenous replacement experiment using the following sequence groups: GM12878 DNA-Diffusion, GM12878 Positive Controls, GM12878 Negative Controls, K562 DNA-Diffusion, HepG2 DNA-Diffusion, and shuffled GM12878 DNA-Diffusion. Our findings demonstrate the potential of DNA-Diffusion to design sequences with therapeutic potential, showing their effectiveness in an endogenous setting.
Project description:High-performance promoters are essential tools for precisely regulating gene expres-sion, yet their rational design within the vast combinatorial sequence space remains a major challenge. Here, we present a hybrid framework that integrates a large lan-guage model (LLM) with a diffusion model to enable data-driven and interpretable promoter design. The fine-tuned LLM predicts promoter strength with high accuracy and, through pseudo-sequence mutations, identifies biologically essential core motifs. A diffusion model is then conditioned on these motifs to reconstruct non-core regions and generate complete promoter sequences. We experimentally validated this approach in E. coli by high-throughput barcoded promoter activity sequencing: over 90% of the generated promoters showed measurable activity, and the best variants achieved ap-proximately ∼20-fold higher expression than the benchmark promoter (BBa_J23119). By explicitly coupling interpretability with generative design, this strategy provides a generalizable path to accelerate synthetic biology efforts and advance large-scale regu-latory sequence engineering.
Project description:To test how DNA-Diffusion sequences can induce transcription, we select 2150 sequences, including DNA-Diffusion synthetic and natural occurring DHS sites for each cell type (K562, HepG2, and GM12878) and insert them into STARR-Seq plasmids (N= 6450 sequences). all synthetic and naturally occurring sequences were combined into a single library, and this same library was experimentally tested using STARR-Seq in different cell lines (K562, HepG2, GM12878).
Project description:High-performance promoters are essential tools for precisely regulating gene expres-sion, yet their rational design within the vast combinatorial sequence space remains a major challenge. Here, we present a hybrid framework that integrates a large lan-guage model (LLM) with a diffusion model to enable data-driven and interpretable promoter design. The fine-tuned LLM predicts promoter strength with high accuracy and, through pseudo-sequence mutations, identifies biologically essential core motifs. A diffusion model is then conditioned on these motifs to reconstruct non-core regions and generate complete promoter sequences. We experimentally validated this approach in E. coli by high-throughput barcoded promoter activity sequencing: over 90% of the generated promoters showed measurable activity, and the best variants achieved ap-proximately ∼20-fold higher expression than the benchmark promoter (BBa_J23119). By explicitly coupling interpretability with generative design, this strategy provides a generalizable path to accelerate synthetic biology efforts and advance large-scale regu-latory sequence engineering.
Project description:RNA-Sequencing is a transformative method that captures the quantitative dynamics of a transcriptome with exquisite sensitivity and single-base resolution. There are, however, few computational pipelines for RNA-Seq with statistical tests that evince sufficient robustness and power as demanded by the difficult combination of small sample sizes and high variability in sequence read counts. To this end, we developed GENE-counter, a complete software pipeline for analyzing RNA-Seq data for genome-wide expression differences between replicated treatment groups. One important component of GENE-counter is a statistical test based on the NBP parameterization of the negative binomial distribution for identifying differentially expressed genome features. We used GENE-counter to analyze RNA-Seq data derived from Arabidopsis thaliana infected with a strain of defense-eliciting bacteria. We identified 308 genes that were differentially induced. Using alternative methods, we provided support for the induced expression and biological relevance of a substantial proportion of the genes. These results suggest the NBP parameterization of the negative binomial distribution is well suited for explaining RNA-Seq data and the statistical test makes GENE-counter a powerful pipeline for studying genome-wide expression changes. GENE-counter is freely available at http://changlab.cgrb.oregonstate.edu/. Our RNA-seq data is uploaded on the NCBI short read archive (SRA) under the SRA025952.
Project description:Microfluidic devices provide a low-input and efficient platform for single-cell RNA-seq (scRNA-Seq). Here we present microfluidic diffusion-based RNA-seq (MID-RNA-seq) for conducting scRNA-seq with a diffusion-based reagent swapping scheme. This device incorporates cell trapping, lysis, reverse transcription and PCR amplification all in one microfluidic chamber. MID-RNA-Seq provides high data quality that is comparable to existing scRNA-seq methods while implementing a simple device design that permits multiplexing. The robustness and scalability of MID-RNA-Seq device will be important for transcriptomic studies of scarce cell samples.