Algorithms for Protein Structure Prediction

Research output: Book/ReportPh.D. thesis

Documents

  • Martin Paluszewski
The problem of predicting the three-dimensional structure of a protein given its
amino acid sequence is one of the most important open problems in bioinformatics.
One of the carbon atoms in amino acids is the C-atom and the overall
structure of a protein is often represented by a so-called C-trace.
Here we present three different approaches for reconstruction of C-traces
from predictable measures. In our first approach [63, 62], the C-trace is positioned
on a lattice and a tabu-search algorithm is applied to find minimum
energy structures. The energy function is based on half-sphere-exposure (HSE)
and contact number (CN) measures only. We show that the HSE measure is
much more information-rich than CN, nevertheless, HSE does not appear to provide
enough information to reconstruct the C-traces of real-sized proteins. Our
experiments also show that using tabu search (with our novel tabu definition)
is more robust than standard Monte Carlo search.
In the second approach for reconstruction of C-traces, an exact branch and
bound algorithm has been developed [67, 65]. The model is discrete and makes
use of secondary structure predictions, HSE, CN and radius of gyration. We
show how to compute good lower bounds for partial structures very fast. Using
these lower bounds, we are able to find global minimum structures in a huge
conformational space in reasonable time. We show that many of these global
minimum structures are of good quality compared to the native structure. Our
branch and bound algorithm is competitive in quality and speed with other
state-of-the-art decoy generation algorithms.
Our third C-trace reconstruction approach is based on bee-colony optimization
[24]. We demonstrate why this algorithm has some important properties
that makes it suitable for protein structure prediction.
Our approach for model quality assessment (MQA) [64] makes use of distance
constraints extracted from alignments to templates. We show how to use CN
probabilities in an optimization algorithm for selecting good distance constraints
and we introduce the concept of non-contacts. When comparing our algorithm
with state-of-the-art MQA algorithms on the CASP7 benchmark, our algorithm
is among the top-ranked algorithms. We are currently participating in CASP8
MQA with this algorithm.
Original languageEnglish
Place of PublicationKøbenhavn
PublisherDepartment of Computer Science, University of Copenhagen
Number of pages196
Publication statusPublished - 2008

Number of downloads are based on statistics from Google Scholar and www.ku.dk


No data available

ID: 14772122