<HashMap><database>GEO</database><file_versions><headers><Content-Type>application/xml</Content-Type></headers><body><files><Other>ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE318nnn/GSE318900/</Other></files><type>primary</type></body><statusCode>OK</statusCode><statusCodeValue>200</statusCodeValue></file_versions><scores/><additional><omics_type>Transcriptomics</omics_type><species>Escherichia coli DH5[alpha]</species><gds_type> Other</gds_type><gds_type>Expression profiling by high throughput sequencing</gds_type><full_dataset_link>https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE318900</full_dataset_link><repository>GEO</repository><entry_type>GSE</entry_type></additional><is_claimable>false</is_claimable><name>Decoding and Rewiring Promoter Architecture Using Large Language Models and Diffusion Frameworks: High-Throughput Promoter Strength Sequencing Source Data</name><description>High-performance promoters are essential tools for precisely regulating gene expres-sion, yet their rational design within the vast combinatorial sequence space remains a major challenge. Here, we present a hybrid framework that integrates a large lan-guage model (LLM) with a diffusion model to enable data-driven and interpretable promoter design. The fine-tuned LLM predicts promoter strength with high accuracy and, through pseudo-sequence mutations, identifies biologically essential core motifs. A diffusion model is then conditioned on these motifs to reconstruct non-core regions and generate complete promoter sequences. We experimentally validated this approach in E. coli by high-throughput barcoded promoter activity sequencing: over 90% of the generated promoters showed measurable activity, and the best variants achieved ap-proximately ∼20-fold higher expression than the benchmark promoter (BBa_J23119). By explicitly coupling interpretability with generative design, this strategy provides a generalizable path to accelerate synthetic biology efforts and advance large-scale regu-latory sequence engineering.</description><dates><publication>2026/02/16</publication></dates><accession>GSE318900</accession><cross_references><GSM>GSM9505671</GSM><GSM>GSM9505672</GSM><GSM>GSM9505675</GSM><GSM>GSM9505676</GSM><GSM>GSM9505673</GSM><GSM>GSM9505674</GSM><GPL>36593</GPL><GSE>318900</GSE><taxon>Escherichia coli DH5[alpha]</taxon></cross_references></HashMap>