Intro to homology modeling

Bioinformatics for beginners
Homology modeling
Michael A. Dolan, Ph.D.
Source: AzaToth
Myoglobin

Common questions
There is no known structure for my protein. What can I do?
How can I see which portions of my macromolecule are charged?
Solvent accessible? Hydrophobic?
I found a mutation in a protein causing drug resistance in a patient.
How does this change affect function?
How are two proteins interacting with each other?
Which amino acid residue should I change to alter protein stability?
How can I create pretty pictures for publication?

Computational results must be
verified with real-world
experiments.
molecular biologist/
medicinal chemist
bioinformatician/
computational biologist

There is no known structure for my
protein. What can I do?
X-ray crystallography NMR
Source: http://bit.ly/2k4pgZg Source: http://www.langelab.ch.tum.de/

Protein homology (comparative) modeling
- constructing an atomic-resolution model of the "target" protein from its amino
acid sequence and an experimental three-dimensional structure of a
related homologous protein (the "template").
Source: https://www.mpibpc.mpg.de/9607405/Dynasome

Source: https://www.unil.ch/pmf/en/home/menuinst/technologies/homology-modeling.html

Phyre2: Good, fast homology models
www.sbg.bio.ic.ac.uk/phyre2/

Hands-on exercise: Phyre2
http://www.sbg.bio.ic.ac.uk/phyre2/phyre2_output/f3a1760696e74a26/summary.html
http://bit.ly/2EjbpnF

Pre-computed homology models
ModBase - database of comparative protein structure models
https://modbase.compbio.ucsf.edu
Uses ModPipe, automated modeling pipeline relying on the programs
PSI-BLAST and MODELLER
>30% sequence ID, >4 million models, >1 million sequences
Genomic Threading Database - for detecting remote homology
between protein sequences and known folds
http://bioinf.cs.ucl.ac.uk/GTD
seq ID 10-30%, > 1 million sequences

Iterative Threading ASSEmbly Refinement (I-TASSER)
• on-line platform for protein structure and function predictions (although it
can be downloaded)
• a hierarchical approach
- structural templates first identified from the PDB by multiple
threading approach LOMETS
- full-length atomic models are then constructed by iterative
template fragment assembly simulations
- function insights of the target are derived by threading the 3D
models through protein function database BioLiP
• Consistently ranked at or near the top in the Community-wide Assessment
for Structure Prediction
- I-TASSER was ranked as the No 1 server for protein structure
prediction in CASP7, CASP8, CASP9, CASP10, CASP11,
CASP12

I-TASSER pipeline
http://www.jove.com/video/3259Check out this video:

Hands-on exercise: I-TASSER
https://zhanglab.ccmb.med.umich.edu/I-TASSER/output/S379018/

Examine the results
C-score is a confidence score for estimating the quality of models.
• calculated based on the significance of threading template alignments
and the convergence parameters of the structure assembly simulations
• C-score is typically in the range of [-5 to 2], where a C-score of higher
value signifies a model with a high confidence and vice-versa.
Tm-score - solves the problem of local error when calculating RMSD

Factors determining model quality
• % sequence identity to templates
• coverage
• steric or electrostatic clashes
• agreement with bench data
• agreement with general protein structure knowledge
• scoring (RMSD, C-score, Tm-score, others….)
% ID Confidence?
> 30 good to great
25 - 30 low to maybe?
< 25 low

root-mean-square deviation (RMSD)
the root-mean-square deviation of atomic positions is the measure of the average
distance between the atoms of superimposed proteins

An aside: Other I-TASSER features
I-TASSER accepts two types of user-specified restraints:
• inter-residue contact and distance restraints
• template structures and template-target alignment
• secondary structure assignment
* Special algorithm for GPCR modeling

Homology modeling of Fab fragments
http://rosie.rosettacommons.org/antibody

Hands-on exercise: Antibody
modeling
http://rosie.rosettacommons.org/antibody/viewjob/42648

PDB: Protein Data Bank
The Protein Data Bank (PDB) archive is the single
worldwide repository of information about the 3D
structures of large biological molecules, including
proteins and nucleic acids.
www.rcsb.org

Intro to homology modeling

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Intro to homology modeling

Similaire à Intro to homology modeling (20)

Plus de Bioinformatics and Computational Biosciences Branch

Plus de Bioinformatics and Computational Biosciences Branch (20)

Dernier

Dernier (20)

Intro to homology modeling