Maher Mahmoud Kassem:
Towards Protein Structure Determination from Energetics, Experiments & Evolution

Date: 15-08-2018    Supervisor: Kresten Lindorff-Larsen

In the current thesis, I mainly describe the application and development of computational methods that utilize and/or combine individually incomplete or non-traditional sources of information with the end-goal of determining all-atom 3D structures of proteins. The sources of information at the focus of this thesis are (I) computational energy functions (II) experimental NMR backbone chemical shifts and (III) amino acid to amino acid contacts inferred by searching for pairwise constraints in the evolutionary history of protein families (coevolution). The central idea is that the three sources of information are complementary and can, given the right circumstances, be utilized to determine accurate novel protein structures. In contrast to traditional structure determination, this approach is not particularly time consuming or labor intensive. Also, it often offers feasibility for protein targets that are experimentally diffcult to solve by traditional means, such as for e.g. membrane proteins and proteins that take on a filamentous structure.

First, I describe a method that enhances the precision of inferred contacts obtained from coevolutionary analyses of multiple sequence alignments. By simultaneously performing rapid low-resolution structure calculations and contact assignment, the method is able to detect falsely predicted contacts, thus, improving the precision of the set of initially inferred contacts. The results showed that the improved precision also translates into more accurate structures especially for proteins with only a small number of sequences, for which, the initial contact inference is likely to be poor. The benefit is highlighted by that fact that protein families with only a small number of sequences are less likely to have any experimentally solved structures, in which case, homology modeling is more likely to provide a better basis for structural modeling.

Secondly, I describe the accurate structure determination of a novel bacterial cytoskeleton protein, named Bactofilin, by combining a physical-chemical force field with backbone chemical shift obtained from solid-state NMR and amino acid contacts predicted from a coevolutionary analysis of Bactofilin's protein sequence family. The structure determination was blind, however, validation with respect to an experimental structure was enabled by the release of a solid-state NMR structure during publication, which resulted in an all-heavy atom root-meansquare- deviation of 2.2 Å. The structure was revealed to be a right-handed β-helix with a tightly packed hydrophobic core.

Finally, I describe neural network based methods trained to accurately predict backbone dihedral angles and backbone dihedral angle distributions from measured NMR backbone chemical shifts. In particular, I describe a neural network that predicts dihedral angles that rival that of the state-of-the-art without the commonly used ad hoc approach of searching a large protein fragment database for chemical shift similarities. Moreover, I describe a trained neural network that accurately predicts dihedral angle distributions that are capable of taking on multi-modal form. The results indicate that multi-modal distributions alleviate issues that arise from the under-determined nature of predicting dihedral angles from backbone chemical shifts.