Protein variant effect prediction using machine learning

Department of Biology

Protein variant effect prediction using machine learning

Research output: Book/Report › Ph.D. thesis › Research

Standard

Protein variant effect prediction using machine learning. / Blaabjerg, Lasse Møller.

Department of Biology, Faculty of Science, University of Copenhagen, 2024. 164 p.

Research output: Book/Report › Ph.D. thesis › Research

Harvard

Blaabjerg, LM 2024, Protein variant effect prediction using machine learning. Department of Biology, Faculty of Science, University of Copenhagen.

APA

Blaabjerg, L. M. (2024). Protein variant effect prediction using machine learning. Department of Biology, Faculty of Science, University of Copenhagen.

Vancouver

Blaabjerg LM. Protein variant effect prediction using machine learning. Department of Biology, Faculty of Science, University of Copenhagen, 2024. 164 p.

Author

Blaabjerg, Lasse Møller. / Protein variant effect prediction using machine learning. Department of Biology, Faculty of Science, University of Copenhagen, 2024. 164 p.

Bibtex

@phdthesis{4a8eb80921524ee8803205a71d14c71e,

title = "Protein variant effect prediction using machine learning",

abstract = "Predicting how amino acid changes in a protein can affect different protein properties is an ongoing area of research with applications in studies of the molecular mechanisms behind evolution, human disease, protein engineering and more. In recent years, machine learning-based methods have emerged as powerful tools for modeling such variant effects. In this thesis, we show specific examples of how existing methods for computational variant effect prediction can be improved using modern machine learning techniques. This thesis is centered around one publication and two manuscripts.In the first publication, we show that a combination of self-supervised and supervised machine learning can be used to develop a fast predictor of protein stability changes, RaSP, that is suitable for large-scale variant effect analysis. We validate and test the model using experimental and clinical data. We exemplify the large-scale application by generating stability change predictions for almost all single amino acid changes in the human proteome corresponding to ∼ 230 million predictions.In the first manuscript, we modify our RaSP model into a new model, mRaSP, that is specifically designed for membrane proteins. Membrane proteins are difficult to characterize both experimentally and computationally. However, we show that a relatively simple but specialized model is able to make variant effect predictions at a level comparable, and sometimes superior, to an existing method based on Rosetta.In the second manuscript, we explore how combined information from protein sequence and structure inputs can be used to generate robust variant effect predictions. We introduce a novel self-supervised method, SSEmb, which combines two existing uni-modal models into a single unified model that can be trained end-to-end. We show that the multimodal model is able to generate predictions that are well-correlated to experimental data even in the case were information from one of the inputs, in this case the MSA, is scarce. Furthermore, we show that the SSEmb embeddings contain rich information that can be used to train task-specific downstream models; in our case exemplified by the development of a downstream model to predict protein-protein binding sites at high accuracy.Finally, we summarize our findings from each research project in the conclusion and point to ways in which our methods and results could be useful in future research.",

author = "Blaabjerg, {Lasse M{\o}ller}",

year = "2024",

language = "English",

publisher = "Department of Biology, Faculty of Science, University of Copenhagen",

}

RIS

TY - BOOK

T1 - Protein variant effect prediction using machine learning

AU - Blaabjerg, Lasse Møller

PY - 2024

Y1 - 2024

N2 - Predicting how amino acid changes in a protein can affect different protein properties is an ongoing area of research with applications in studies of the molecular mechanisms behind evolution, human disease, protein engineering and more. In recent years, machine learning-based methods have emerged as powerful tools for modeling such variant effects. In this thesis, we show specific examples of how existing methods for computational variant effect prediction can be improved using modern machine learning techniques. This thesis is centered around one publication and two manuscripts.In the first publication, we show that a combination of self-supervised and supervised machine learning can be used to develop a fast predictor of protein stability changes, RaSP, that is suitable for large-scale variant effect analysis. We validate and test the model using experimental and clinical data. We exemplify the large-scale application by generating stability change predictions for almost all single amino acid changes in the human proteome corresponding to ∼ 230 million predictions.In the first manuscript, we modify our RaSP model into a new model, mRaSP, that is specifically designed for membrane proteins. Membrane proteins are difficult to characterize both experimentally and computationally. However, we show that a relatively simple but specialized model is able to make variant effect predictions at a level comparable, and sometimes superior, to an existing method based on Rosetta.In the second manuscript, we explore how combined information from protein sequence and structure inputs can be used to generate robust variant effect predictions. We introduce a novel self-supervised method, SSEmb, which combines two existing uni-modal models into a single unified model that can be trained end-to-end. We show that the multimodal model is able to generate predictions that are well-correlated to experimental data even in the case were information from one of the inputs, in this case the MSA, is scarce. Furthermore, we show that the SSEmb embeddings contain rich information that can be used to train task-specific downstream models; in our case exemplified by the development of a downstream model to predict protein-protein binding sites at high accuracy.Finally, we summarize our findings from each research project in the conclusion and point to ways in which our methods and results could be useful in future research.

AB - Predicting how amino acid changes in a protein can affect different protein properties is an ongoing area of research with applications in studies of the molecular mechanisms behind evolution, human disease, protein engineering and more. In recent years, machine learning-based methods have emerged as powerful tools for modeling such variant effects. In this thesis, we show specific examples of how existing methods for computational variant effect prediction can be improved using modern machine learning techniques. This thesis is centered around one publication and two manuscripts.In the first publication, we show that a combination of self-supervised and supervised machine learning can be used to develop a fast predictor of protein stability changes, RaSP, that is suitable for large-scale variant effect analysis. We validate and test the model using experimental and clinical data. We exemplify the large-scale application by generating stability change predictions for almost all single amino acid changes in the human proteome corresponding to ∼ 230 million predictions.In the first manuscript, we modify our RaSP model into a new model, mRaSP, that is specifically designed for membrane proteins. Membrane proteins are difficult to characterize both experimentally and computationally. However, we show that a relatively simple but specialized model is able to make variant effect predictions at a level comparable, and sometimes superior, to an existing method based on Rosetta.In the second manuscript, we explore how combined information from protein sequence and structure inputs can be used to generate robust variant effect predictions. We introduce a novel self-supervised method, SSEmb, which combines two existing uni-modal models into a single unified model that can be trained end-to-end. We show that the multimodal model is able to generate predictions that are well-correlated to experimental data even in the case were information from one of the inputs, in this case the MSA, is scarce. Furthermore, we show that the SSEmb embeddings contain rich information that can be used to train task-specific downstream models; in our case exemplified by the development of a downstream model to predict protein-protein binding sites at high accuracy.Finally, we summarize our findings from each research project in the conclusion and point to ways in which our methods and results could be useful in future research.

M3 - Ph.D. thesis

BT - Protein variant effect prediction using machine learning

PB - Department of Biology, Faculty of Science, University of Copenhagen

ER -

ID: 387428532