<HashMap><database>GEO</database><scores/><additional><omics_type>Other</omics_type><species> Chlorocebus aethiops</species><species>Cricetulus griseus</species><species> Homo sapiens</species><gds_type>Other</gds_type><full_dataset_link>https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE324577</full_dataset_link><repository>GEO</repository><entry_type>GSE</entry_type></additional><is_claimable>false</is_claimable><name>Decoding the sequence requirements for translation initiation</name><description>Accurate selection of start codons by ribosomes is a fundamental determinant of proteome composition. Although the ‘Kozak sequence’—an 8-nucleotide sequence flanking the start codon—has long been viewed as the primary determinant of initiation in eukaryotes, it fails to explain the diversity of start codon usage across transcripts. Here we combine massively-parallel reporter assays, bioinformatics, machine learning, single-molecule imaging and cryo-electron microscopy to define the extended translation initiation sequence (eTIS), an ~80-nucleotide sequence surrounding the start codon that governs initiation efficiency. A deep-learning model trained on eTIS features accurately predicts translation initiation across transcripts. Unexpectedly, we also find that the Kozak sequence is not optimal for initiation as is widely presumed, and we identify the origin of this long-standing misconception. eTIS nucleotides that promote efficient initiation are enriched in the human transcriptome and are evolutionarily conserved, underscoring their general functional importance. Biophysical and structural analyses reveal that specific eTIS residues —including the critical +6 position and residues in the mRNA entry and exit channels— engage ribosomal proteins, rRNA and initiation factors, promoting ribosomal pausing at the start codon and enhancing the conformational transitions required for initiation. Finally, optimization of the eTIS markedly enhances translational fidelity and protein output from therapeutic mRNAs, highlighting its practical utility. Together, these findings redefine the sequence logic of translation initiation and establish a framework for precise control of protein expression.</description><dates><publication>2026/05/13</publication></dates><accession>GSE324577</accession><cross_references><GSM>GSM9579658</GSM><GSM>GSM9579659</GSM><GSM>GSM9579656</GSM><GSM>GSM9579657</GSM><GSM>GSM9579698</GSM><GSM>GSM9579699</GSM><GSM>GSM9579696</GSM><GSM>GSM9579697</GSM><GSM>GSM9579694</GSM><GSM>GSM9579695</GSM><GSM>GSM9579692</GSM><GSM>GSM9579693</GSM><GSM>GSM9579690</GSM><GSM>GSM9579691</GSM><GSM>GSM9579706</GSM><GSM>GSM9579707</GSM><GSM>GSM9579704</GSM><GSM>GSM9579705</GSM><GSM>GSM9579702</GSM><GSM>GSM9579669</GSM><GSM>GSM9579703</GSM><GSM>GSM9579667</GSM><GSM>GSM9579700</GSM><GSM>GSM9579701</GSM><GSM>GSM9579668</GSM><GSM>GSM9579665</GSM><GSM>GSM9579666</GSM><GSM>GSM9579663</GSM><GSM>GSM9579664</GSM><GSM>GSM9579661</GSM><GSM>GSM9579662</GSM><GSM>GSM9579660</GSM><GSM>GSM9579678</GSM><GSM>GSM9579679</GSM><GSM>GSM9579676</GSM><GSM>GSM9579677</GSM><GSM>GSM9579674</GSM><GSM>GSM9579675</GSM><GSM>GSM9579672</GSM><GSM>GSM9579673</GSM><GSM>GSM9579670</GSM><GSM>GSM9579671</GSM><GSM>GSM9579708</GSM><GSM>GSM9579709</GSM><GSM>GSM9579689</GSM><GSM>GSM9579687</GSM><GSM>GSM9579688</GSM><GSM>GSM9579685</GSM><GSM>GSM9579686</GSM><GSM>GSM9579683</GSM><GSM>GSM9579684</GSM><GSM>GSM9579681</GSM><GSM>GSM9579682</GSM><GSM>GSM9579680</GSM><GPL>30173</GPL><GPL>18573</GPL><GPL>36701</GPL><GPL>36702</GPL><GSE>324577</GSE><taxon> Chlorocebus aethiops</taxon><taxon>Cricetulus griseus</taxon><taxon> Homo sapiens</taxon></cross_references></HashMap>