{"database":"GEO","file_versions":[{"headers":{"Content-Type":["application/json"]},"body":{"files":{"Other":["ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE318nnn/GSE318900/"]},"type":"primary"},"statusCode":"OK","statusCodeValue":200}],"scores":null,"additional":{"omics_type":["Transcriptomics"],"species":["Escherichia coli DH5[alpha]"],"gds_type":[" Other","Expression profiling by high throughput sequencing"],"full_dataset_link":["https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE318900"],"repository":["GEO"],"entry_type":["GSE"],"additional_accession":[]},"is_claimable":false,"name":"Decoding and Rewiring Promoter Architecture Using Large Language Models and Diffusion Frameworks: High-Throughput Promoter Strength Sequencing Source Data","description":"High-performance promoters are essential tools for precisely regulating gene expres-sion, yet their rational design within the vast combinatorial sequence space remains a major challenge. Here, we present a hybrid framework that integrates a large lan-guage model (LLM) with a diffusion model to enable data-driven and interpretable promoter design. The fine-tuned LLM predicts promoter strength with high accuracy and, through pseudo-sequence mutations, identifies biologically essential core motifs. A diffusion model is then conditioned on these motifs to reconstruct non-core regions and generate complete promoter sequences. We experimentally validated this approach in E. coli by high-throughput barcoded promoter activity sequencing: over 90% of the generated promoters showed measurable activity, and the best variants achieved ap-proximately ∼20-fold higher expression than the benchmark promoter (BBa_J23119). By explicitly coupling interpretability with generative design, this strategy provides a generalizable path to accelerate synthetic biology efforts and advance large-scale regu-latory sequence engineering.","dates":{"publication":"2026/02/16"},"accession":"GSE318900","cross_references":{"GSM":["GSM9505671","GSM9505672","GSM9505675","GSM9505676","GSM9505673","GSM9505674"],"GPL":["36593"],"GSE":["318900"],"taxon":["Escherichia coli DH5[alpha]"]}}