Project description:HLA-I molecules bind short peptides and present them to CD8+ T cells for TCR recognition. The length of HLA-I ligands typically ranges from 8 to 12 amino acids, but high variability is observed between different alleles. Here we used recent HLA peptidomics data to analyze in an unbiased way peptide length distributions over 85 different HLA-I alleles. Our results revealed clear clustering of HLA-I alleles with distinct peptide length distributions, which enabled us to unravel some of the molecular basis of peptide length distributions and predict peptide length distributions based on HLA-I sequences only. We further took advantage of our collection of curated HLA peptidomics studies to investigate multiple specificity in HLA-I molecules and validated these observations with binding assays. Explicitly modeling peptide length distributions and multiple specificity significantly improved predictions of naturally presented HLA-I ligands, as demonstrated in an independent benchmarking based on ten newly generated HLA peptidomics datasets from meningioma samples.
Project description:Prediction of HLA epitopes is important for the development of cancer immunotherapies and vaccines. However, current prediction algorithms have limited predictive power, in part because they were not trained on high-quality epitope datasets covering a broad range of HLA alleles. To enable prediction of endogenous HLA class I-associated peptides across a large fraction of the human population, we used mass spectrometry to profile >185,000 peptides eluted from 95 HLA-A, -B, -C and -G mono-allelic cell lines. We identified canonical peptide motifs per HLA allele, unique and shared binding submotifs across alleles and distinct motifs associated with different peptide lengths. By integrating these data with transcript abundance and peptide processing, we developed HLAthena, providing allele-and-length-specific and pan-allele-pan-length prediction models for endogenous peptide presentation. These models predicted endogenous HLA class I-associated ligands with 1.5-fold improvement in positive predictive value compared with existing tools and correctly identified >75% of HLA-bound peptides that were observed experimentally in 11 patient-derived tumor cell lines.