Marek Prachar:
Predicting peptide-MHC stability and immunogenicity with deep learning

Date: 02-01-2024    Supervisor: Frederik O. Bagger, Sune Justesen & Ole Winther

The accurate prediction of peptide-Major Histocompatibility Complex (MHC) binding is a critical endeavor in immunology, with profound 
implications for vaccine development, cancer immunotherapy, and understanding autoimmune diseases. The ability of MHC molecules to present peptide antigens to T cells is forms a foundation of the adaptive immune response, dictating the specificity and efficacy of immune recognition. Despite the significance of peptide-MHC interactions, current prediction tools often fall short when confronted with novel pathogens, such as SARS-CoV-2, and when predicting for less common MHC alleles, revealing a gap between reported accuracies and real-world performance. This discrepancy may stem from biases in training data or a lack of generalisability in the models, underscoring the need for improved methodologies that can anticipate and adapt to previously unseen immunological events.

This thesis addresses the pressing need for reliable peptide-MHC binding predictions by exploring the limitations of existing tools and proposing
innovative solutions. The research presented herein systematically investigates multiple facets of peptide-MHC binding: it evaluates the stability of peptides with C-terminal modifications through experimental assays, benchmarks common prediction tools against novel pathogen data, analyses the role of peptide protrusions in MHC class I molecules and their empirical connection to antigen presentation, and introduces a patented method improving pMHC prediction performance through transfer learning.

Additionally, in this work new stability data was used with a range of other data types, applying thorough optimisation and machine learning methods to improve prediction accuracy. This led to the creation of PrDx, our new prediction tool. When evaluated against challenging targets, PrDx demonstrated superior performance in stability prediction and exhibited a closer correlation to immunogenicity compared to existing tools. However, this evaluation also underscored the complexity of accurately predicting peptide-MHC interactions, with tools’ performance varying across different MHC alleles. The findings suggest that a re-evaluation of current prediction models is necessary, with a focus on integrating various data types and optimising artificial neural network architectures to better capture the nuances of peptide-MHC interactions.