Hamelryck Group
One of the major unsolved problems in molecular biology today is the protein folding problem: given an amino acid sequence, predict the overall three-dimensional structure of the corresponding protein. It has been known since the seminal work of Christian B. Anfinsen in the early seventies that the sequence of a protein encodes its structure, but the exact details of the encoding still remain elusive.
Since the protein folding problem is of enormous practical, theoretical and medical importance - and in addition forms a fascinating intellectual challenge - it is often called the holy grail of bioinformatics. The Statistical Structural Biology group focuses on Bayesian, probabilistic models of protein structure and their application to protein structure prediction, protein design and protein structure determination from experimental data (NMR, SAXS), including data obtained from protein ensembles. Recently, we started working on evolutionary models of protein structure evolution.
We are tackling the protein structure prediction problem from an original angle. Our group develops sophisticated probabilistic models that describe various aspects of protein structure, and uses these models in prediction, design and structure determination. We also extended our statistical approach to RNA 3D structure. Currently, our probabilistic models are mainly based on three key ingredients:
- Graphical models (including dynamic Bayesian networks), which are powerful machine learning methods that can be interpreted in the language of statistical physics.
- Directional statistics, the statistics of angles, directions and orientations. When combined with graphical models, this allows the formulation of efficient and flexible probabilistic models of protein structure.
- Probabilistic programming and deep learning: modern probabilistic programming languages such as STAN, Edward, Pyro and pyMC3 offer unprecedented opportunities for formulating probabilistic model of protein structure. Incorporating deep learning architectures in these models allow combining Bayesian modelling with powerful machine learning methods.
Our probabilistic view on protein structure prediction, simulation and inference features prominently in the book "Bayesian methods in structural bioinformatics" (Springer, April, 2012).
- For more information on our statistical approach to protein structure prediction, see our articles on probabilistic models of protein structure that appeared in PLoS computational Biology (2006) and PNAS (2008) and a review on probabilistic methods in structural bioinformatics (2009). See J. Magn. Reson, (2011) for and PNAS (2014) for applications to NMR data.
- Our probabilistic model of side chain conformations, Basilisk (BMC Bioinformatics, 2010), abolishes the need for the use of discrete side chain rotamers in conformational sampling.
- We also developed a probabilistic model of RNA structure in atomic detail, see PLoS Computational Biology (2009).
- After 20 years of controversy and countless publications on the subject, we finally settle the discussion on the validity of so-called potentials of mean force as proposed by Sippl in 1990. Moreover, our results point to important new applications and solve the classic problem of the reference state. See PLoS ONE (2010) for an outline of the reference ratio method and how it explain potentials of mean force, and Proteins (2013) for a proof-of-concept application to Bayesian protein structure prediction. Essentially, Sippl's PMFs are an application of Jeffrey's conditioning or probability kinematics, a little-known variant of Bayesian updating.
- Proteins are flexible molecules, and therefore, often experimental data represent an average over an ensemble of protein structures. We developed a Bayesian method to infer protein ensembles, which is different from most other approaches in that it does not require simulating a discrete set of molecular conformations. See PLoS ONE (2013) for a description of the method, JCTC (2014) for an application to NMR data and PCCP (2016) for an application to SAXS data.
- In collaboration with KU's Department of Mathematical Sciences and the Department of Statistics, University of Oxford, we developed a probabilistic model of protein evolution (Mol. Biol. & Evol, 2017).
Selected peer reviewed articles (2005-now)
- Hamelryck T. (2005) An amino acid has two sides: A new 2D measure provides a different view of solvent exposure. Proteins Struct. Func. Bioinf. 59, 38-48. PDF
- Boomsma, W., Hamelryck, T. (2005) Full Cyclic Coordinate Descent: Solving the protein loop closure problem in Calpha space, BMC Bioinformatics 6:159 Abstract&PDF@BioMed
- Hamelryck, T., Kent, J., Krogh, A. (2006) Sampling realistic protein conformations using local structural bias. PLoS Comp. Biol. 2(9): e131 PDF@PLoS
- Paluszewski, M., Hamelryck, T. and Winter, P. (2006) Reconstructing protein structure from solvent exposure using Tabu Search. Algorithms Mol. Biol. 1:20. PDF@AlgMolBiol.
- Won, KJ., Hamelryck, T., Prugel-Bennett, A. and Krogh, A. (2007) An evolving method for learning HMM Structure: prediction of protein secondary structure. BMC Bioinformatics 8, 357 PDF@BMC Bioinformatics
- Boomsma, W., Mardia, KV., Taylor, CC., Ferkinghoff-Borg, J., Krogh, A. and Hamelryck, T. (2008) A generative, probabilistic model of local protein structure. Proc. Natl. Acad. Sci. USA 105, 8932-8937 PDF@PNAS, Video lecture by Wouter Boomsma
- Hamelryck, T. (2009) Probabilistic models and machine learning in structural bioinformatics. Statistical Methods in Medical Research Review. 18, 505-526. PDF
- Cock, P., Antao, T., Chang, J., Chapman, B., Cox, C., Dalke, A., Friedberg, I., Hamelryck, T., Kauff, F., Wilczynski, B., de Hoon, M. (2009) Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25(11),1422-1423. Free PDF@Bioinformatics
- Frellsen, J., Moltke, I., Thiim, M., Mardia, KV., Ferkinghoff-Borg, J., Hamelryck, T. (2009) A probabilistic model of RNA conformational space. PLoS Comp. Biol. 5(6), e1000406 Free PDF@PLOS, Video of a presentation by Jes Frellsen
- Paluszewski, M., Hamelryck, T. (2010) Mocapy++ - A toolkit for inference and learning in dynamic Bayesian networks. BMC Bioinformatics 11:126. Free PDF@BMC
- Harder, T., Boomsma, W., Paluszewski, M., Frellsen, J., Johansson, KE., Hamelryck, T. (2010) Beyond rotamers: A generative , probabilistic model of side chains in proteins. BMC Bioinformatics 11:306. Free PDF@BMC
- Stovgaard, K., Andreetta, C., Ferkinghoff-Borg, J., Hamelryck, T. (2010) Calculation of accurate small angle X-ray scattering curves from coarse-grained protein models. BMC Bioinformatics 11:429. PDF@BMC Bioinformatics
- Hamelryck, T., Borg, M., Paluszewski, M., Paulsen, J., Frellsen, J., Andreetta, C., Boomsma, W. Bottaro, S., Ferkinghoff-Borg, J. (2010) Potentials of mean force for protein structure prediction vindicated, formalized and generalized. PLoS ONE 5(11): e13714. PDF@PLoS ONE , Preprint@arXiv
- Olsson, S., Boomsma, W., Frellsen, J., Bottaro, S., Harder, T., Ferkinghoff-Borg, J., Hamelryck, T. (2011) Generative probabilistic models extend the scope of inferential structure determination. J. Magn. Reson. 213(1), 182-6. PDF
- Harder, T., Borg, M., Boomsma, W., Røgen, P., Hamelryck, T. (2012) Fast large-scale clustering of protein structures using Gauss integrals. Bioinformatics 28, 510-515. PDF@Bioinformatics.
- Bottaro, S., Boomsma, W., Johansson, K.E., Andreetta, C., Hamelryck, T., Ferkinghoff-Borg, J. (2012) Subtle Monte Carlo updates in dense molecular systems. J. Chem. Theory Comput. 8, 695–702. PDF@ACS
- Harder, T., Borg, M., Bottaro, S., Boomsma, W., Olsson, S., Ferkinghoff-Borg, J., Hamelryck, T. (2012) An efficient null model for conformational fluctuations in proteins. Structure, 20, 1028-1039. PDF@Structure.
- Mardia, KV., Kent, JT., Zhang, Z., Taylor, C., Hamelryck, T. (2012) Mixtures of concentrated multivariate sine distributions with applications to bioinformatics. J. Appl. Stat. 39, 2475-2492. PDF
- Johansson, KE., Hamelryck, T. (2013) A simple probabilistic model of multibody Interactions in proteins. Proteins 81, 1340-50.
- Boomsma, W., Frellsen, J., Harder, T., Bottaro, S., Johansson, KE., Tian, P., Stovgaard, K., Andreetta, C., Olsson, S., Valentin, J., Antonov, L., Christensen, A., Borg, M., Jensen, J., Lindorff-Larsen, K., Ferkinghoff-Borg, J., Hamelryck, T. (2013) PHAISTOS: A framework for Markov chain Monte Carlo simulation and inference of protein structure. J. Comput. Chem. 34, 1697-705. PDF
- Valentin, J., Andreetta, C., Boomsma, W., Bottaro, S., Ferkinghoff-Borg, J., Frellsen, J., Mardia, KV, Tian, P., Hamelryck, T. (2013) Formulation of probabilistic models of protein structure in atomic detail using the reference ratio method. Proteins 82:288–299. PDF@Proteins
- Olsson, S., Frellsen, J., Boomsma, W., Mardia, KV., Hamelryck, T. (2013) Inference of structure ensembles of flexible biomolecules from sparse, averaged data. PLoS ONE. 8(11): e79439. Article@PLoS ONE
- Christensen, AS., Linnet, TE., Borg, M., Boomsma, W., Lindorff-Larsen, K., Hamelryck, T., Jensen, J. (2013) Protein structure validation and refinement using amide proton chemical shifts derived from quantum mechanics. PLoS ONE. 8(12):e84123 . Article@PLoS ONE
- Christensen AS., Hamelryck T., Jensen JH. (2014) FragBuilder: An efficient Python library to setup quantum chemistry calculations on peptide models. PeerJ. 2:e277 Article@PeerJ
- Olsson, S., Vögeli, B., Cavalli, A., Boomsma, W., Ferkinghoff-Borg, J., Lindorff-Larsen, K., Hamelryck, T. (2014) Probabilistic approach to the determination of native state ensembles of proteins. J. Chem. Theory Comput. 10(8):3484-3491. Article@JCTC
- Boomsma, W., Tian, P., Ferkinghoff-Borg, J., Hamelryck, T., Lindorff-Larsen, K. , Vendruscolo, M. (2014) Equilibrium simulations of proteins using molecular fragment replacement and NMR chemical shifts. Proc. Natl. Acad. Sci. USA. 111(38):13852-13857. Article@PNAS
- Bratholm, LA., Christensen, AS., Hamelryck, T., Jensen, JH. (2015) Bayesian inference of protein structure from chemical shift data. PeerJ. 3:e861; DOI 10.7717/peerj.861
- Antonov, LD., Olsson, S., Boomsma, W., Hamelryck, T. (2016) Bayesian inference of protein ensembles from SAXS data. Phys. Chem. Chem. Phys. DOI: 10.1039/C5CP04886A Article@PCCP
- Johansson, KE., Johansen, NT., Christensen, S., Horowitz, S., Bardwell, JC., Olsen, JG., Willemoës, M., Lindorff-Larsen, K., Ferkinghoff-Borg, J., Hamelryck, T. and Winther, JR. (2016) Computational redesign of thioredoxin is hypersensitive toward minor conformational changes in the backbone template. J. Mol. Biol., 428:4361-4377.
- Golden, M., Garcia-Portugues, E., Sørensen, M., Mardia, KV.,
Hamelryck, T., and Hein, J.
(2017) A generative angular model of protein structure evolution. Mol. Biol. Evol. 34:2085–2100 Article@MBE - Postic, G., Hamelryck, T., Chomilier, J., Stratmann, D. (2018) MyPMFs: a simple tool for creating statistical potentials to assess protein structural models. Biochimie. 151:37–41 Article@Biochimie
- Garcia-Portugues, E., Sorensen, M., Mardia, KV. and Hamelryck, T. (2019) Langevin diffusions on the torus: estimation and applications. Statistics and Computing. 29:1-22 Article@Stat Comput
Selected conference proceedings
- Won, KJ., Hamelryck, T., Prugel-Bennet, A., Krogh, A. (2005) Evolving hidden Markov models for protein secondary structure prediction. Proceedings of the 2005 IEEE Congress on Evolutionary Computation, pp. 33-40, Edinburgh. PDF
- Kent, J.T., Hamelryck, T. (2005) Using the Fisher-Bingham distribution in stochastic models for protein structure. In S. Barber, P.D. Baxter, K.V.Mardia, & R.E. Walls (Eds.), LASR 2005 - quantitative biology, shape analysis, and wavelets, pp. 57-60. Leeds university press, Leeds, UK. PDF@LASR
- Boomsma, W., Kent, J.T., Mardia, K.V., Taylor, C.C. & Hamelryck, T. (2006) Graphical models and directional statistics capture protein structure. In S. Barber, P.D. Baxter, K.V.Mardia, & R.E. Walls (Eds.), LASR 2006 - Interdisciplinary statistics and bioinformatics, pp. 91-94. Leeds university press, UK. PDF@LASR
- Boomsma, W., Borg, M., Frellsen, J., Harder, T., Stovgaard, K., Ferkinghoff-Borg, J., Krogh, A., Mardia, KV. and Hamelryck, T. (2008) PHAISTOS: protein structure prediction using a probabilistic model of local structure. Proceedings of CASP8, Cagliari, Sardinia, Italy, December 3-7 2008. pp 82-83. PDF@CASP8
- Borg, M., Mardia, KV., Boomsma, W., Frellsen, J., Harder, T., Stovgaard, K., Ferkinghoff-Borg, J., Røgen, P., Hamelryck, T. A probabilistic approach to protein structure prediction: PHAISTOS in CASP9. LASR 2009 - Statistical tools for challenges in bioinformatics, pp. 65-70. Leeds university press, Leeds, UK. PDF@LASR
- Paulsen, J., Paluszewski, M., Mardia, KV., Hamelryck, T. (2010) A probabilistic model of hydrogen bond geometry in proteins. LASR 2010 - High-throughput sequencing, proteins and statistics, pp. 61-64. Leeds university press, Leeds, UK. PDF@LASR
- Mardia, KV., Frellsen, J., Borg, M., Ferkinghoff-Borg, J., Hamelryck, T. (2011) A statistical view on the reference ratio method, LASR 2011 - High-throughput sequencing, proteins and statistics, pp. 55-61. Leeds university press, Leeds, UK. PDF@LASR
- Antonov, L., Andreetta, C., Hamelryck, T. (2012) An efficient parallel GPU evaluation of small angle X-ray scattering profiles. In BIOSTEC 2012, 5th Int'l Joint Conf. on Biomedical Engineering Systems and Technologies, 102-108, Algarve, Portugal. PDF
- Hamelryck, T., Haslett, J., Mardia, K., Kent, JT., Valentin, J., Frellsen, J., Ferkinghoff-Borg, J. (2013) On the reference ratio method and its application to statistical protein structure prediction. LASR 2013 - Statistical models and methods for non-Euclidean data with current scientific applications. Leeds university press, Leeds, UK. PDF@LASR
- Olsson, S., Hamelryck, T. (2013) On the significance of the reference ratio method in inferential structure determination of biomolecules. LASR 2013 - Statistical models and methods for non-Euclidean data with current scientific applications. Leeds university press, Leeds, UK. PDF@LASR
- Frellsen, J., Hamelryck, T., Ferkinghoff-Borg, J. (2013) Combining the multicanonical ensemble with generative probabilistic models of local biomolecular structure. 59th ISI World Statistics Congress. Hong Kong, China. 25-30 August, 2013. PDF
- Al-Sibahi, AS., Hamelryck, T., Henglein, F. (2018) Probabilistic programming for voucher information extraction. PROBPROG 2018, MIT, Cambridge, MA, USA, 4-6 October, 2018.
Books and book chapters
- Chang, J., Chapman, B., Friedberg, I., Hamelryck, T., de Hoon, M., Cock, P., Antao, T., Talevich, E., Wilczyński, B. (2012) Biopython tutorial and cookbook. Biopython project. PDF@Biopython.org
- Boomsma, W., Bottaro, S., Hamelryck, T., Frellsen, J., Andreetta, C., Borg, M., Harder, T., Johansson, KE., Stovgaard, S., Tian, P. (2012) Phaistos user manual (version 1.0). University of Copenhagen. PDF@SourceForge
- Paluszewski, M., Frellsen, J., Hamelryck, T. (2009) Mocapy++: A C++ toolkit for inference and learning in dynamic Bayesian networks. University of Copenhagen. PDF
- Hamelryck, T., Mardia, KV., Ferkinghoff-Borg, J., Editors. (2012) Bayesian methods in structural bioinformatics. Book in the Springer series "Statistics for biology and health", 385 pages, 13 chapters. Springer Verlag, March, 2012. Book description at Springer.
- Hamelryck, T. (2012) An overview of Bayesian inference and graphical models. In T. Hamelryck et al. (eds). Bayesian methods in structural bioinformatics. Statistics for Biology and Health. Springer-Verlag, Berlin, Heidelberg.
- Borg, M., Hamelryck, T. Ferkinghoff-Borg, J. (2012) On the physical relevance and statistical interpretation of knowledge based potentials. In T. Hamelryck et al. (eds). Bayesian methods in structural bioinformatics. Statistics for Biology and Health. Springer-Verlag, Berlin, Heidelberg.
- Frellsen, J., Mardia, KV., Borg, M., Ferkinghoff-Borg, J., Hamelryck, T. (2012) Towards a probabilistic model of protein structure: The reference ratio method. In T. Hamelryck et al. (eds). Bayesian methods in structural bioinformatics. Statistics for Biology and Health. Springer-Verlag, Berlin, Heidelberg.
- Boomsma, W., Frellsen, J., Hamelryck, T. (2012) Probabilistic models of local biomolecular structure and their applications. In T. Hamelryck et al. (eds). Bayesian methods in structural bioinformatics. Statistics for Biology and Health. Springer-Verlag, Berlin, Heidelberg.
- Antonov, LD., Andreetta, C., Hamelryck, T. (2013) Parallel GPGPU evaluation of small angle X-ray scattering profiles in a Markov chain Monte Carlo framework. In J. Gabriel et al. (eds.). BIOSTEC 2012, CCIS, 357, 222-235. PDF@Springer
- Hamelryck, T., Boomsma, W., Ferkinghoff-Borg, J., Foldager, J., Frellsen, J., Haslett, J., Theobald, D. (2015). Proteins, physics and probability kinematics: a Bayesian formulation of the protein folding problem. In Geometry Driven Statistics, Wiley.
Some public outreach
- One step closer to green chemistry and improved pharmaceuticals. Press release, KU, June, 2008.
- Designerenzymer til grøn kemi. Press release, Det Frie Forskningsråd (DFF), June, 2009.
- Machine Learning & Molecules conference, Copenhagen, November 9-10th 2017
- Flere unge skal have en fremtid med AI og Machine Learning. TechSavvy, 2018
New updated text here
Our dynamic Bayesian network toolkit Mocapy++ (BMC Bioinformatics, 2010) - which was used to formulate the probabilistic models of protein and RNA structure - is freely available from SourceForge.
PHAISTOS version 1.0, our Markov chain Monte Carlo framework for protein structure simulation, is available from Sourceforge.
In 2018, we got funding for applying probabilistic programming to automatic information extraction from invoices (Innovationsfonden) and ancestral protein structure prediction (Danish research council).
Thomas Hamelryck is since August, 2016 employed 50% at the Department of Biology and 50% at the Department of Computer Science in order to promote new initiatives between the two departments. The emphasis lies on bioinformatics, machine learning, probabilistic programming and deep learning.
The structure group is part of the research initiative Dynamical Systems Interdisciplinary Network, led by Prof. Susanne Ditlevsen and funded by UCPH 2016. The project involves 7 teams from the University of Copenhagen. The network will consolidate existing inter-disciplinary collaboration and initiate new collaboration across faculties.
- Danish Research Council for Technology and Production Sciences (FTP), "Data driven protein structure prediction" Feb 2007-Feb 2010. 3,800,000 DKK (510,200 EUR).
- Danish Research Council for Strategic Research (NABIIT), "Simulating proteins on a millisecond time-scale" Sep 2006-Feb 2010. 7,800,000 DKK (1,047,037 EUR). PI: Prof Anders Krogh. In collaboration with Novozymes .
- Danish Research Council for Technology and Production Sciences (FTP), "Protein design: Development of molecular biology and bioinformatics tools" Sep. 2007-Sep. 2010. 5,600,000 DKK (750,900 EUR). Partner in a project of Jakob R. Winther, department of biology, university of Copenhagen.
- Danish Research Council for Technology and Production Sciences (FTP), "Protein structure ensembles from mathematical models - with application to Parkinson's alpha-synuclein" , April 2010-August 2013, 4.280.930 DKK
- Dynamical Systems Interdisciplinary Network, UCPH 2016 initiative (PI: Prof. Susanne Ditlevsen), August 2014-July 2017, one PhD student and one postdoc
- VELUX Visiting Professor Programme 2015-2016, 6 months stay for Assoc. Prof. Douglas Theobald (Brandeis University), 336.861 DKK
- Innovationsfonden, "Intelligent accounting document management using probabilistic programming", March 2018-February 2020, 888.000 DKK
- Danish Research Council for Technology and Production Sciences (FTP), "Resurrecting ancestral proteins in silico to understand how cancer drugs work", September 2018-August 2021, 2.441.543 DKK
Members
Name | Title | Phone | |
---|---|---|---|
Thomas Wim Hamelryck | Associate professor | +45 23 96 06 13 |
Contact
Associate Professor
Thomas Hamelryck
Computational and RNA Biology
Ole Maaløes Vej 5, room 1.2.20
DK-2200 Copenhagen N
thamelry@bio.ku.dk
Phone: +45 23 96 06 13