2. Common questions
There is no known structure for my protein. What can I do?
How can I see which portions of my macromolecule are charged?
Solvent accessible? Hydrophobic?
I found a mutation in a protein causing drug resistance in a patient.
How does this change affect function?
How are two proteins interacting with each other?
Which amino acid residue should I change to alter protein stability?
How can I create pretty pictures for publication?
3. Computational results must be
verified with real-world
experiments.
molecular biologist/
medicinal chemist
bioinformatician/
computational biologist
4. There is no known structure for my
protein. What can I do?
X-ray crystallography NMR
Source: http://bit.ly/2k4pgZg Source: http://www.langelab.ch.tum.de/
5. Protein homology (comparative) modeling
- constructing an atomic-resolution model of the "target" protein from its amino
acid sequence and an experimental three-dimensional structure of a
related homologous protein (the "template").
Source: https://www.mpibpc.mpg.de/9607405/Dynasome
10. Pre-computed homology models
ModBase - database of comparative protein structure models
https://modbase.compbio.ucsf.edu
Uses ModPipe, automated modeling pipeline relying on the programs
PSI-BLAST and MODELLER
>30% sequence ID, >4 million models, >1 million sequences
Genomic Threading Database - for detecting remote homology
between protein sequences and known folds
http://bioinf.cs.ucl.ac.uk/GTD
seq ID 10-30%, > 1 million sequences
11. Iterative Threading ASSEmbly Refinement (I-TASSER)
• on-line platform for protein structure and function predictions (although it
can be downloaded)
• a hierarchical approach
- structural templates first identified from the PDB by multiple
threading approach LOMETS
- full-length atomic models are then constructed by iterative
template fragment assembly simulations
- function insights of the target are derived by threading the 3D
models through protein function database BioLiP
• Consistently ranked at or near the top in the Community-wide Assessment
for Structure Prediction
- I-TASSER was ranked as the No 1 server for protein structure
prediction in CASP7, CASP8, CASP9, CASP10, CASP11,
CASP12
15. Examine the results
C-score is a confidence score for estimating the quality of models.
• calculated based on the significance of threading template alignments
and the convergence parameters of the structure assembly simulations
• C-score is typically in the range of [-5 to 2], where a C-score of higher
value signifies a model with a high confidence and vice-versa.
Tm-score - solves the problem of local error when calculating RMSD
16. Factors determining model quality
• % sequence identity to templates
• coverage
• steric or electrostatic clashes
• agreement with bench data
• agreement with general protein structure knowledge
• scoring (RMSD, C-score, Tm-score, others….)
% ID Confidence?
> 30 good to great
25 - 30 low to maybe?
< 25 low
17. root-mean-square deviation (RMSD)
the root-mean-square deviation of atomic positions is the measure of the average
distance between the atoms of superimposed proteins
18. An aside: Other I-TASSER features
I-TASSER accepts two types of user-specified restraints:
• inter-residue contact and distance restraints
• template structures and template-target alignment
• secondary structure assignment
* Special algorithm for GPCR modeling
21. PDB: Protein Data Bank
The Protein Data Bank (PDB) archive is the single
worldwide repository of information about the 3D
structures of large biological molecules, including
proteins and nucleic acids.
www.rcsb.org