The document describes a probabilistic ab initio method for protein structure prediction using coarse grain force fields. It introduces a probabilistic score function that considers sequence, structure, and solvation as mixtures of probabilities without energies. A biased fragment assembly search method generates conformations by combining fragments from a statistical fragment library. Results on several proteins show the method generates protein-like folds. Future work will introduce hydrogen bonding as a probabilistic term to improve results.
2. Overview
• Introduction
• Probabilistic Ab Initio – Standard
– Score function
– Search Method
– Results
• Probabilistic Ab Initio - Extended
– Score Function : Introducing Solvation
– Search Method: Bias Fix
– Results
• Outlook
• Summary
2
3. “All the information required
by protein to adopt its final
conformation is encoded in
its sequence”
• information he referred to has not
been decoded yet
• interestingly, these days we also
know about proteins like ‘prions’ Christian B. Anfinsen (1916 - 1995)
Source: http://nobelprize.org/
3
4. X-Ray
Crystallography
Experimental
NMR
Methods
Spectroscopy
N
Cryo-EM
Time (year)
5. X-Ray
Crystallography
Experimental
Methods NMR
Spectroscopy
N
Cryo-EM
Time (year)
More than 3 decades and
only 60000+ structures
5
7. Experimental Data
X-Ray
Experimental Crystallography
Methods
NMR
Spectroscopy
PDB
Methods
Accuracy
Cryo-EM
Computation cost
Homology
PDB dependence
Computational
Modeling
Methods
Fold
Recognition
Ab Initio
Modeling
Physical Principles 7
8. • Monte Carlo Methods
• Molecular Dynamics
• Physics-based
• Best but most difficult (Force fields)
• Computationally expensive
• Statistics-based Pi = e - ∆E/kBT
• Boltzmann distributions
• Statistical mechanical ensembles
• We use Descriptive Statistics
Ab Initio • Bayesian formulation
• No hidden approximations
Methods • No energies but find distributions
8
9. • Simulated Annealing /
• Coarse Grained Monte Carlo
• reduced dimensionality • Move set: biased & unbiased
• relies on dihedral angles • Acceptance criterion: ratio
of probabilities
• no side chains
• 5-atoms representation
• Fragment Assembly
• Purely Probabilistic Force Field
• Mixture of Probabilities:
• Sequence, Structure, Solvation
Our Ab Initio • No energies
Method • No Boltzmann statistics
9
11. 1. Sequence • Multi-way Bernoulli
E MP
N S A
W
Y F I D
KG Q H T S L C
2. Structure • Representation :
• Reduced, Simplified
• 5-atoms per amino acid
• dihedral angles (phi, psi)
• Bivariate Gaussian 11
12. i
i+1
i+2
1.5 × 10 6 (B)
(A)
Sequence Structure
-3.1 -2.0 -0.5 -1.7 -2.0 -1.5 -2.2
i A S T C W R I -1.1 -0.9 -0.7 -0.5 -0.3 -0.8 -1.0
-2.0 -0.5 -1.7 -2.0 -1.5 -2.2 -1.1
i+1 S T C W R I M -0.9 -0.7 -0.5 -0.3 -0.8 -1.0 -1.1
-0.5 -1.7 -2.0 -1.5 -2.2 -1.1 -2.1
i+2 T C W R I M F -0.7 -0.5 -0.3 -0.8 -1.0 -1.1 -0.4
…
…
3.1 2.0 1.5 1.7 -2.0 -1.5 -1.2
N P L E N R R V 1.1 0.9 -2.5 2.3 -0.9 -1.2 -0.8
(C)
12
14. 14
20 05 -32 80
W E W C
87 -71 15 -07
20 05 -32 80
W W E W
87 -71 15 -07
20 05 -32 80
Q W W E
87 -71 15 -07
20 05 -32 80
87 -71 15 -07
A Q W W
20 05 -32 80
87 -71 15 -07
Structure
20 05 -32 80
T A Q W
87 -71 15 -07
20 05 -32 80
T T A T
87 -71 15 -07
20 05 -32 80
L T T A
87 -71 15 -07
20 05 -32 80
T L T I
87 -71 15 -07
T
Sequence
20 05 -32 80
L T L T
87 -71 15 -07
L
20 05 -32 80
S L T M
87 -71 15 -07
S
20 05 -32 80
A S L T
87 -71 15 -07
A
class 0
class 1
class 2
class 3
class 4
class 5
class 6
DCWF ..
GAEG ..
WFDC ..
GGGG ..
GAEG ..
Classified
ACAD ..
CCAD ..
WFTG ..
STDC ..
STST..
16. Initial (random) p(x i )
conformation Relative probabilities: Pi = p(x )
i -1
Probability
• Normal methods : Pi = e - ∆E/kBT
(i)
(i-1) Final
Model
Conformational space 16
29. 1. Sequence
• Multi-way Bernoulli
E MP
N S A
W
Y F I
KG QH T S D L C
2. Structure 3. Solvation
• Representation : • Simple Gaussian
• Reduced, Simplified
• 5-atoms per amino acid
• dihedral angles (phi, psi)
• Bivariate Gaussian
29
30. • Mixture Models: Re-Classified
Connections
ACAD .. CCAD .. WFTG .. STST.. STDC ..
Residues
PDB
Geometry
Location in protein
WFDC .. DCWF .. GAEG .. GAEG .. GGGG ..
Sequence Structure Solvation
-3.1 -2.0 -0.5 -1.7
A S L T 12 07 08 11
-1.1 -0.9 -0.7 -0.5
-2.0 -0.5 -1.7 -1.2
S L T I 07 08 11 09
-0.9 -0.7 -0.5 -0.4
Expectation
Maximization
Fragment Bayesian
Library Statistical Models Classifier
30
39. Future Outlook
• Introduce hydrogen
bonds – as a
probabilistic term
• Hydrogen bond N
energies have normal
distribution
• Use Simple Gaussian
model Hydrogen bond energy
(kcal/mol)
39
40. Summary
• Purely Probabilistic Approach for Protein Structure
Prediction
• Score function consists of a set of probability distributions
• Conformation probabilities - mixture of probabilities, no
energies at all
• generates protein/protein-like conformations
• long-range interactions not well represented
• In future, hydrogen bond term could improve results
• Application to sequence optimization
• Rapid sampling – combine with other score functions
40