{"database":"GEO","file_versions":[{"headers":{"Content-Type":["application/json"]},"body":{"files":{"Other":["ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE313nnn/GSE313397/"]},"type":"primary"},"statusCode":"OK","statusCodeValue":200}],"scores":null,"additional":{"omics_type":["Other"],"species":["Saccharomyces cerevisiae"],"gds_type":["Other"],"full_dataset_link":["https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE313397"],"repository":["GEO"],"entry_type":["GSE"],"additional_accession":[]},"is_claimable":false,"name":"Hierarchy of grammar rules for the language of transcriptional activation domains","description":"Transcriptional activation domains (ADs) of eukaryotic gene activators have remained enigmatic for decades as short, consensus-less, extremely variable amino acid sequences that lack a specific structure and interact fuzzily with an uncertain number of targets. Understanding AD sequence grammar is critical for solving the enigma. Using rational design of AD sequences and high-throughput in vivo experimentation combined with bioinformatic analysis and machine learning, we refined grammar rules for AD sequences, calculated the relative importance of each rule, and linked them to the biochemical features essential for biological function. The key feature – redundant representation in the sequence of aromatic and acidic residues - is consistent with the novel idea that ADs function as acidic-hydrophobic surfactants, which is crucial for understanding eukaryotic gene regulation and the function of intrinsically disordered protein regions.","dates":{"publication":"2026/06/05"},"accession":"GSE313397","cross_references":{"GSM":["GSM9367397","GSM9367396","GSM9367399","GSM9367398","GSM9367393","GSM9367392","GSM9367395","GSM9367394","GSM9367391","GSM9367390","GSM9367389"],"GPL":["34541"],"GSE":["313397"],"taxon":["Saccharomyces cerevisiae"]}}