SlideShare a Scribd company logo
1 of 60
Download to read offline
Mark Berjanskii, July 27th, 2012
Humans, aliens, and eHarmony
or
why there is no such thing as a free lunch in protein
structure determination from sparse experimental data
… dedicated to all experimentalists
who lost their way
Outline
►Purposes and origins of protein
structural models
►Theoretical models of protein structure
►Models from non-sparse experimental
data
►Models from sparse experimental data
►Recent developments
Humans, aliens, and
Humans, aliens, and eHarmony
Humans, aliens, and eHarmony
Typical human face
by David Tood
What is enough for aliens, not enough for
eHarmony
Typical human face
by David Tood
http://www.trood.dk/blog/the-face-of-humanity/
“Accurate” human faces
Typical ≠ Accurate
Take-home message
Typical ≠ Accurate
Warning!!! Most protein structure validation tools check how typical or
normal your protein model is, not how accurate your protein model is.
Typical
Accurate
Accurate
Normal
The level of required protein
structural accuracy also
depends on its purpose
Why do we need to know
protein structures?
1) Prediction of protein function from 3D structure (e.g. fold,
motifs, active site prediction)
2) Sequence-to-function prediction
3) Mechanism of protein function (e.g. enzyme catalysis,
structural effect of known mutations).
4) Rational drug design and structure design
5) Design of novel proteins with novel function
1) Ubiquitin
- degradation by the proteasome,
2) Ubiquitin-like modifiers
- function regulation by post-translation
modification
When do we need to do a structural experiment?
1) Structure is not known
2) Structure is known but can not be used to answer your scientific question
a) structure is incomplete
b) structure was determined at different conditions
SIV protease
African swine fever virus DNA polymerase X
50mM salt vs 500 mM salt
d) different post-tranlational modification
Calmodulin
myosin phosphatase inhibitor
c) different liganded state
e) mutations
3 mutations
GA95 GB95 f) Insufficient experimental data
1Å 3.5ÅX-Ray resolution
Salt
Tom Cruse under stress conditions
pH 5 pH 1
Prion protein
pH
1) Some biological questions (e.g. prediction of function from protein fold)
may not require high structural accuracy
2) Some biologically interesting proteins are too difficult to study by any
high-res method:
- proteins with extended flexible regions
-large proteins
- fibrillar and membrane proteins
Why use incomplete experimental data for protein
structure determination?
Ubiquitin
Flexible Fibrils Membrane proteins Large proteins
Protein structures from the point
of view of an experimentalist
Do not trust TrustNot sure
Structures with
no experimental data
Structures from
sparse experimental data s
Structures from
large amount of
experimental
data
What is expert’s opinion?
Roman Laskowski
Research Scientist at European Bioinformatics Institute
Jenny Gu (Editor),
Philip E. Bourne (Editor)
ISBN: 978-0-470-18105-8
Hardcover
1067 pages
John Wiley & Sons, Inc.
Non-experimental
structures
Protein structures from the point
of view of an experimentalist
Do not trust TrustNot sure
Structures with
no experimental data
What does Roman Laskowski
write about theoretical structures?
Roman Laskowski
Research Scientist at European Bioinformatics Institute
What are the criteria of success in
protein structure determination?
1) In method development tests:
- Global accuracy: RMSD, TM-score
- Agreement with experimental data (when
available)
- Agreement with protein quality metrics
2) In real-life research:
- Global accuracy
- Agreement with experimental data (when
available)
- Agreement with protein quality metrics
Limitations:
1) Requires powerful hardware or
computing time
2) Limited to small/simple
proteins
3) Can not take into account
chaperone action
4) Criteria for success???
Ab-Initio structures by molecular dynamics
SuperComputer “Anton” for MD simulations
D.E. Shaw Research
How Fast-Folding Proteins Fold
Kresten Lindorff-Larsen, Stefano Piana, Ron O. Dror, David E. Shaw;
Science 28 October 2011: Vol. 334 no. 6055 pp. 517-520
MD force-field
Dihydrofolate reductase:
Anton 512 cores: 15 μs/day
Desmond 512 cores: 0.5 μs/day
Amber-GPU 64 cores: 80 ns/day
Amber 48 cores: 20ns/day
Gromacs 8 cores: ~5-10ns/day
Folding time-scales:
Newton equation of motion
Fragment-based ab initio structures
(Non-experimental structures)
Rosetta
Sequence Secondary Structure
David Baker
3- and 9-residue fragments
Low-resolution folding
Best low-res decoy selection
Full-atom side-chain restoration
Developed by 12 labs
50 people
Model quality evaluation
Side chain
optimization
Backbone
optimization
Small backbone moves
vdW repulsive
Full-atom refinement
Fragment-based ab initio structures
(Non-experimental structures)
Rosetta
David Baker
Scoring function
Rosetta can fold small proteins (<100 residues)
Native
Native
Native Native
Rosetta
Rosetta
Rosetta Rosetta
Rosetta limitations
1) Does not fold well proteins above 100 residues (sampling
problem)
2) Biased by fragment structure
3) Implicit solvation score is too simplistic and only weakly
disfavors buried unsatisfied polar groups.
4) Hydrogen bond potential neglects the effects of charged
atoms, (anti-) cooperativity within H-bond networks .
5) Ignores electrostatic interactions (besides H-bonds) and
their screening,
6) Does not permit rigorous estimation of a model's free
energy.
7) Does not fold properly some very small proteins and RNA
Rosetta forces protein normality
Typical
Accurate
Accurate
Normal
Fragment idealizationRosetta scoring function
Typical ≠ Accurate
Comparative or template-based modeling
Template database
Alignment score based on
residue identity or
similarity
Alignment score based statistical
properties:
mutation potential,
environment fitness potential,
pairwise potential,
secondary structure compatibilities
Homology modelingThreading
In a real-life scenario, success of homology modeling is
judged based on model normality, not model accuracy.
Theoretical scores for protein quality
assessment may have wrong energetic minima.
Limited conformational sampling may not always
yield the native conformation
Typical
Accurate
Accurate
Normal
Typical ≠ Accurate
Some people may think that any structure can be determined
via homology modeling because “all” folds of
NMRable and XRAYable proteins are “known”
But can simple folds provide
all necessary information to
define domain orientation in
and overall structure of
complex proteins?
SV40 T-antigen
NO!
Accuracy of template-based modeling
http://swissmodel.expasy.org/workspace/tutorial/eva.html Curr Protoc Bioinformatics. 2006 Oct;Chapter 5:Unit 5.6.
Comparative protein structure modeling using Modeller.
Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, Shen MY,
Pieper U, Sali A.
Errors in template-based models:
Wrong side-chain
packing
Conformational
changes No-template segments Mis-alignment Wrong template
Do you want to take chance to get this
as your high-accuracy structure?
Template-based models have strong 3D bias to
the template
PrP
pH 5 pH 1
African swine fever virus DNA polymerase X
50mM salt vs 500 mM salt
d) Different post-tranlational modification
Calmodulin
myosin phosphatase inhibitor
c) Different liganded state
e) mutations
3 mutations
GA95 GB95 f) Insufficient experimental data of template
1Å 3.5ÅX-Ray resolution
Ad Bax homology-modeled from
a George Clooney template
Even 100% identical proteins may have very different
structures due to:
1) Different pHs 2) Different ionic strength
Dealing with 3D bias
TASSER
Jeffrey
Skolnick
Rosetta
David
Baker
Pros:
-Models are less
biased by template
Cons:
- Models are more
dependent on
imperfect folding
scores
Distance restraints
Protein structures from the point
of view of an experimentalist
Do not trust TrustNot sure
Structures with
no experimental data:
Homology Modeling,
Threading,
Fragment-based,
Ab Initio
s
Structures from
large amount of
experimental
data
Experimental methods of
high-resolution structure
determination
X-ray crystallography
X-Ray crystallography
Quality metrics:
1) Experimental data:
-Number of reflections
-Signal to noise ratio
2) Model-to-experiment agreement:
- R factor
-R free factor
3) Coordinate uncertainty:
- B-factor
4) Stereo-chemical normality:
- backbone torsion angles
(Ramachandran plot)
- bond length, angles
- side-chain torsion angles
X-RayResolution
Minimum spacing (d)
of crystal lattice planes
that still provide
measurable diffraction
of X-rays.
Minimum distance
between structural
features that can
be distinguished in the
electron-density maps.
High resolution Low resolution
High resolution
Low resolution
200,000 reflections
500 reflectionsMany reflections Few reflections
Resolution and protein quality
Blow, D. (2002). Outline of Crystallographyfor
Biologists (New York: Oxford University Press).
High resolutionLow resolution
When X-ray data is incomplete, you have to rely on
other sources of structural information: knowledge-based
parameters. Imagination, etc.
NMR spectroscopy
Protein NMR spectroscopy
Experiment Spectra processing Spectra assignment
NOE assignment
Distance restraints
Model generation
What is typical NMR experimental
data?
Groups of protein quality parameters:
1) Quality of experimental observables that were used in
structure determination*
2) Agreement between the structure and experimental
observables*
3) Agreement between local geometry of the new structure and
parameters of existing high-quality structures
4) Structural uncertainty*
* Method-dependent
What is non-sparse NMR
data?
Macromolecular NMR spectroscopy for the non-spectroscopist. Kwan AH, Mobli M, Gooley PR, King GF, Mackay JP.
FEBS J. 2011 Mar;278(5):687-703
Protein structures from the point of
view of an experimentalist
Do not trust TrustNot sure
Structures with
no experimental data:
Homology Modeling,
Threading,
Fragment-based,
Ab Initio
s
Structures from large
amount of
experimental data
XRAYNMR
[Equivalent] Resolution
XRAY
NMR
Sparse experimental data
Protein structures from the point of
view of an experimentalist
Do not trust TrustNot sure
Structures with
no experimental data:
Homology Modeling,
Threading,
Fragment-based,
Ab Initio
s
Structures from large
amount of
experimental data
XRAYNMR
Structures
from sparse
experimental data
What is sparse experimental data?
In XRAY:
- When you do not have enough XRAY reflections
Low resolution
500 reflections
In NMR:
- When you do not have enough NOEs (e.g. no side-chain NOEs) or any NOEs
Sparse data from other methods:
-Distance restraints from cross-linking and mass-spectroscopy
-Distance restraints from spin-labeling and electron paramagnetic resonance (EPR)
spectroscopy
- Protein size, shape, radius of gyration from small angle Xray scattering (SAXS)
1) Some biological questions (e.g. prediction of function from protein fold)
may not require high structural accuracy
2) Some biologically interesting proteins are too difficult to study by any
high-res method:
- proteins with extended flexible regions
-large proteins
- fibrillar and membrane proteins
Why use incomplete experimental data for protein
structure determination?
Ubiquitin
Flexible Fibrils Membrane proteins Large proteins
Protein structures from the point of
view of an experimentalist
Do not trust TrustNot sure
Homology Modeling,
Threading,
Fragment-based,
Ab Initio
sExperimental methods
XRAYNMR
Structures
from sparse
experimental data
EPR
SAXS
Mass
spec
Regular NMR structure determination
Complete
NOEs
Standard
Force-field
Dihedral restraints
Sparse NMR structure determination
Sparse
NOEs
Dihedral restraints
Standard
Force-field
Sparse NMR structure determination
Sparse
NOEs
Dihedral restraints
Advanced
Force-field:
solvation term
full electrostatic
Sparse NMR structure determination
Sparse
NOEs
Dihedral restraints
Advanced
Force-field:
solvation term
full electrostatic
Knowledge-based
potentials
Fragment
information
Sparse NMR structure determination
Sparse
NOEs
Dihedral restraints
Advanced
Force-field:
solvation term
full electrostatic
knowledge-based
potentials
Fragment
information
Homology
information
Pushing the boundaries of protein structure
determination
Klaus
Schulten
University
of Illinois
NAMD
Yang Zhang
University of
Michigan
iTASSER
Andrzej
Kolinski
Warsaw
University
CABS-NMR
Jeffrey Skolnick
Georgia Institute
of Technology
TOUCHSTONEX
TASSER
Lewis Kay
University of
Toronto
CHESHIRE
Rosetta
Julia
Forman-Kay
University
of Toronto
ENSEMBLE
Charles L.
Brooks III
University
of Michigan
CHARMM
Hans
Kalbitzer
Universität
Regensburg
PERMOL
George Rose
Johns
Hopkins
University
LINUS
David Baker
University of
Washington
Rosetta
Ad Bax
NIH
CS-Rosetta
Gaetano
Montelione
Rutgers
University
CS-DP-Rosetta
Jens Meiler
Vanderbilt
University
RosettaEPR
David
Wishart
University
of Alberta
CS23D
Michele
Vendruscolo
University of
Cambridge
CHESHIRE
Sequence Secondary Structure
David Baker
3- and 9-residue fragments
Low-resolution folding
Best low-res decoy selection
Full-atom side-chain restoration
Developed by 12 labs
50 people
Rosetta-family methods
Model quality evaluation
Side chain
optimization
Backbone
optimization
Small backbone moves
vdW repulsive
Full-atom refinement
Rosetta-family methods
Method Year Restraints
Rosetta 1996-1999
Rosetta-NMR 2000 NOEs
Rosetta-NMR-RDC 2002 NOEs, RDCs
CS-Rosetta 2008 CS
CS-DP-Rosetta 2010 CS, unassigned NOEs
iterative-CS-RDC-
NOE Rosetta
2010 CS, RDC, backbone
NOEs
PCS-ROSETTA 2011 Pseudo-contact shifts
Rosetta-EPR 2011 EPR data
CS-HM Rosetta 2012 CS, homology
RASREC Rosetta 2011-2012 CS, methyl NOEs, RDCs
Sequence Secondary Structure
David Baker
3- and 9-residue fragments
Low-resolution folding Best low-res decoy selection
Full-atom side-chain restoration
Side chain
optimization
Backbone
optimization
Small backbone moves
vdW repulsive
Full-atom refinement
How is sparse data used in Rosetta-family methods?
Model quality evaluation
Chemical shifts NOEs, RDCs, PCSs, Homology, etc
Performance of iterative Rosetta
for backbone-only NMR data
What are the criteria of Rosetta simulation success?
1) RMSD with respect to the lowest/best score model should be within 2Å
for more than 60% of models
Successful simulation Failed simulation
2) The converged structures should be clearly lower in energy than all
significantly different (RMSD greater than 7 Å
Effect of experimental data
3) The structures generated with experimental data should be at least as
low in energy as those generated without experimental data or even
lower/better
Problems with validation of Rosetta models.
1) It is not clear if the Rosetta success criteria are universal for all scenarios
2) Agreement with experimental data is not very meaningful because
the data is sparse.
3) Rfree like validation is difficult (and also not meaningful) because
experimental data is sparse
4) Independent experimental data for validation will likely be unavailable
5) Normality-based scores for model validation (e.g. ResProx) will likely fail
to detect inaccurate but highly-idealized Rosetta models
Need for developing an independent model validation protocol
Problem with informational content of
protein models from sparse data
Fragment idealization
Bias by fragments from other
proteins
Biased by restraints from
homologous proteins
Rosetta scoring function
VS.
Sparse experimental data
The experimental data is over-powered by knowledge based information
Sparse data
Typical male
Model
Typical
AccurateAccurate
Normal
Protein structures from the point of
view of an experimentalist
Do not trust Trust
Wait for scintific
community to validate
Structures with
no experimental data:
Homology Modeling,
Threading,
Fragment-based,
Ab Initio
s
Structures from large
amount of
experimental data
XRAYNMR
Structures
from sparse
experimental data
Rosetta, etc.
What’s up with the “no free lunch” thing?
1) There is no substitute for a large amount of experimental data. If you do not do
experiment, you do not get the information relevant to your specific experimental
conditions (e.g. protein construct, sample conditions, etc).
You can not get the same level of accuracy with sparse data or theoretical models
2) If you have an easy protein, do a full-blown structure determination
3) If you have no choice other than using sparse data, do not over-interpret your
structure model.
You can not build an accurate high-resolution model of protein structure without
getting high-quality experimental data with your sweat and blood
This is not gonna happen any time soon
Success of theoretical methods is still limited to very small proteins.
Many theoretical models are biased, over-normalized, low-resolution, or
simply inaccurate.
Accuracy and high-resolution of models from sparse data is questionable.

More Related Content

Recently uploaded

Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICEayushi9330
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...ssuser79fe74
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...Lokesh Kothari
 

Recently uploaded (20)

Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 

Featured

Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 

Featured (20)

Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 

Humans Aliens And E-Harmony Or Why There Is No Such Thing As A Free Lunch In Protein Structure Determination From Sparse Experimental Data

  • 1. Mark Berjanskii, July 27th, 2012 Humans, aliens, and eHarmony or why there is no such thing as a free lunch in protein structure determination from sparse experimental data
  • 2. … dedicated to all experimentalists who lost their way
  • 3. Outline ►Purposes and origins of protein structural models ►Theoretical models of protein structure ►Models from non-sparse experimental data ►Models from sparse experimental data ►Recent developments
  • 6. Humans, aliens, and eHarmony Typical human face by David Tood
  • 7. What is enough for aliens, not enough for eHarmony Typical human face by David Tood http://www.trood.dk/blog/the-face-of-humanity/ “Accurate” human faces Typical ≠ Accurate
  • 8. Take-home message Typical ≠ Accurate Warning!!! Most protein structure validation tools check how typical or normal your protein model is, not how accurate your protein model is. Typical Accurate Accurate Normal
  • 9. The level of required protein structural accuracy also depends on its purpose
  • 10. Why do we need to know protein structures? 1) Prediction of protein function from 3D structure (e.g. fold, motifs, active site prediction) 2) Sequence-to-function prediction 3) Mechanism of protein function (e.g. enzyme catalysis, structural effect of known mutations). 4) Rational drug design and structure design 5) Design of novel proteins with novel function 1) Ubiquitin - degradation by the proteasome, 2) Ubiquitin-like modifiers - function regulation by post-translation modification
  • 11. When do we need to do a structural experiment? 1) Structure is not known 2) Structure is known but can not be used to answer your scientific question a) structure is incomplete b) structure was determined at different conditions SIV protease African swine fever virus DNA polymerase X 50mM salt vs 500 mM salt d) different post-tranlational modification Calmodulin myosin phosphatase inhibitor c) different liganded state e) mutations 3 mutations GA95 GB95 f) Insufficient experimental data 1Å 3.5ÅX-Ray resolution Salt Tom Cruse under stress conditions pH 5 pH 1 Prion protein pH
  • 12. 1) Some biological questions (e.g. prediction of function from protein fold) may not require high structural accuracy 2) Some biologically interesting proteins are too difficult to study by any high-res method: - proteins with extended flexible regions -large proteins - fibrillar and membrane proteins Why use incomplete experimental data for protein structure determination? Ubiquitin Flexible Fibrils Membrane proteins Large proteins
  • 13. Protein structures from the point of view of an experimentalist Do not trust TrustNot sure Structures with no experimental data Structures from sparse experimental data s Structures from large amount of experimental data
  • 14. What is expert’s opinion? Roman Laskowski Research Scientist at European Bioinformatics Institute Jenny Gu (Editor), Philip E. Bourne (Editor) ISBN: 978-0-470-18105-8 Hardcover 1067 pages John Wiley & Sons, Inc.
  • 16. Protein structures from the point of view of an experimentalist Do not trust TrustNot sure Structures with no experimental data
  • 17. What does Roman Laskowski write about theoretical structures? Roman Laskowski Research Scientist at European Bioinformatics Institute
  • 18. What are the criteria of success in protein structure determination? 1) In method development tests: - Global accuracy: RMSD, TM-score - Agreement with experimental data (when available) - Agreement with protein quality metrics 2) In real-life research: - Global accuracy - Agreement with experimental data (when available) - Agreement with protein quality metrics
  • 19. Limitations: 1) Requires powerful hardware or computing time 2) Limited to small/simple proteins 3) Can not take into account chaperone action 4) Criteria for success??? Ab-Initio structures by molecular dynamics SuperComputer “Anton” for MD simulations D.E. Shaw Research How Fast-Folding Proteins Fold Kresten Lindorff-Larsen, Stefano Piana, Ron O. Dror, David E. Shaw; Science 28 October 2011: Vol. 334 no. 6055 pp. 517-520 MD force-field Dihydrofolate reductase: Anton 512 cores: 15 μs/day Desmond 512 cores: 0.5 μs/day Amber-GPU 64 cores: 80 ns/day Amber 48 cores: 20ns/day Gromacs 8 cores: ~5-10ns/day Folding time-scales: Newton equation of motion
  • 20. Fragment-based ab initio structures (Non-experimental structures) Rosetta Sequence Secondary Structure David Baker 3- and 9-residue fragments Low-resolution folding Best low-res decoy selection Full-atom side-chain restoration Developed by 12 labs 50 people Model quality evaluation Side chain optimization Backbone optimization Small backbone moves vdW repulsive Full-atom refinement
  • 21. Fragment-based ab initio structures (Non-experimental structures) Rosetta David Baker Scoring function Rosetta can fold small proteins (<100 residues) Native Native Native Native Rosetta Rosetta Rosetta Rosetta Rosetta limitations 1) Does not fold well proteins above 100 residues (sampling problem) 2) Biased by fragment structure 3) Implicit solvation score is too simplistic and only weakly disfavors buried unsatisfied polar groups. 4) Hydrogen bond potential neglects the effects of charged atoms, (anti-) cooperativity within H-bond networks . 5) Ignores electrostatic interactions (besides H-bonds) and their screening, 6) Does not permit rigorous estimation of a model's free energy. 7) Does not fold properly some very small proteins and RNA
  • 22. Rosetta forces protein normality Typical Accurate Accurate Normal Fragment idealizationRosetta scoring function Typical ≠ Accurate
  • 23. Comparative or template-based modeling Template database Alignment score based on residue identity or similarity Alignment score based statistical properties: mutation potential, environment fitness potential, pairwise potential, secondary structure compatibilities Homology modelingThreading
  • 24. In a real-life scenario, success of homology modeling is judged based on model normality, not model accuracy. Theoretical scores for protein quality assessment may have wrong energetic minima. Limited conformational sampling may not always yield the native conformation Typical Accurate Accurate Normal Typical ≠ Accurate
  • 25. Some people may think that any structure can be determined via homology modeling because “all” folds of NMRable and XRAYable proteins are “known” But can simple folds provide all necessary information to define domain orientation in and overall structure of complex proteins? SV40 T-antigen NO!
  • 26. Accuracy of template-based modeling http://swissmodel.expasy.org/workspace/tutorial/eva.html Curr Protoc Bioinformatics. 2006 Oct;Chapter 5:Unit 5.6. Comparative protein structure modeling using Modeller. Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, Shen MY, Pieper U, Sali A. Errors in template-based models: Wrong side-chain packing Conformational changes No-template segments Mis-alignment Wrong template Do you want to take chance to get this as your high-accuracy structure?
  • 27. Template-based models have strong 3D bias to the template PrP pH 5 pH 1 African swine fever virus DNA polymerase X 50mM salt vs 500 mM salt d) Different post-tranlational modification Calmodulin myosin phosphatase inhibitor c) Different liganded state e) mutations 3 mutations GA95 GB95 f) Insufficient experimental data of template 1Å 3.5ÅX-Ray resolution Ad Bax homology-modeled from a George Clooney template Even 100% identical proteins may have very different structures due to: 1) Different pHs 2) Different ionic strength
  • 28. Dealing with 3D bias TASSER Jeffrey Skolnick Rosetta David Baker Pros: -Models are less biased by template Cons: - Models are more dependent on imperfect folding scores Distance restraints
  • 29. Protein structures from the point of view of an experimentalist Do not trust TrustNot sure Structures with no experimental data: Homology Modeling, Threading, Fragment-based, Ab Initio s Structures from large amount of experimental data
  • 30. Experimental methods of high-resolution structure determination
  • 32. X-Ray crystallography Quality metrics: 1) Experimental data: -Number of reflections -Signal to noise ratio 2) Model-to-experiment agreement: - R factor -R free factor 3) Coordinate uncertainty: - B-factor 4) Stereo-chemical normality: - backbone torsion angles (Ramachandran plot) - bond length, angles - side-chain torsion angles
  • 33. X-RayResolution Minimum spacing (d) of crystal lattice planes that still provide measurable diffraction of X-rays. Minimum distance between structural features that can be distinguished in the electron-density maps. High resolution Low resolution High resolution Low resolution 200,000 reflections 500 reflectionsMany reflections Few reflections
  • 34. Resolution and protein quality Blow, D. (2002). Outline of Crystallographyfor Biologists (New York: Oxford University Press). High resolutionLow resolution When X-ray data is incomplete, you have to rely on other sources of structural information: knowledge-based parameters. Imagination, etc.
  • 36. Protein NMR spectroscopy Experiment Spectra processing Spectra assignment NOE assignment Distance restraints Model generation
  • 37. What is typical NMR experimental data? Groups of protein quality parameters: 1) Quality of experimental observables that were used in structure determination* 2) Agreement between the structure and experimental observables* 3) Agreement between local geometry of the new structure and parameters of existing high-quality structures 4) Structural uncertainty* * Method-dependent
  • 38. What is non-sparse NMR data? Macromolecular NMR spectroscopy for the non-spectroscopist. Kwan AH, Mobli M, Gooley PR, King GF, Mackay JP. FEBS J. 2011 Mar;278(5):687-703
  • 39. Protein structures from the point of view of an experimentalist Do not trust TrustNot sure Structures with no experimental data: Homology Modeling, Threading, Fragment-based, Ab Initio s Structures from large amount of experimental data XRAYNMR [Equivalent] Resolution XRAY NMR
  • 41. Protein structures from the point of view of an experimentalist Do not trust TrustNot sure Structures with no experimental data: Homology Modeling, Threading, Fragment-based, Ab Initio s Structures from large amount of experimental data XRAYNMR Structures from sparse experimental data
  • 42. What is sparse experimental data? In XRAY: - When you do not have enough XRAY reflections Low resolution 500 reflections In NMR: - When you do not have enough NOEs (e.g. no side-chain NOEs) or any NOEs Sparse data from other methods: -Distance restraints from cross-linking and mass-spectroscopy -Distance restraints from spin-labeling and electron paramagnetic resonance (EPR) spectroscopy - Protein size, shape, radius of gyration from small angle Xray scattering (SAXS)
  • 43. 1) Some biological questions (e.g. prediction of function from protein fold) may not require high structural accuracy 2) Some biologically interesting proteins are too difficult to study by any high-res method: - proteins with extended flexible regions -large proteins - fibrillar and membrane proteins Why use incomplete experimental data for protein structure determination? Ubiquitin Flexible Fibrils Membrane proteins Large proteins
  • 44. Protein structures from the point of view of an experimentalist Do not trust TrustNot sure Homology Modeling, Threading, Fragment-based, Ab Initio sExperimental methods XRAYNMR Structures from sparse experimental data EPR SAXS Mass spec
  • 45. Regular NMR structure determination Complete NOEs Standard Force-field Dihedral restraints
  • 46. Sparse NMR structure determination Sparse NOEs Dihedral restraints Standard Force-field
  • 47. Sparse NMR structure determination Sparse NOEs Dihedral restraints Advanced Force-field: solvation term full electrostatic
  • 48. Sparse NMR structure determination Sparse NOEs Dihedral restraints Advanced Force-field: solvation term full electrostatic Knowledge-based potentials Fragment information
  • 49. Sparse NMR structure determination Sparse NOEs Dihedral restraints Advanced Force-field: solvation term full electrostatic knowledge-based potentials Fragment information Homology information
  • 50. Pushing the boundaries of protein structure determination Klaus Schulten University of Illinois NAMD Yang Zhang University of Michigan iTASSER Andrzej Kolinski Warsaw University CABS-NMR Jeffrey Skolnick Georgia Institute of Technology TOUCHSTONEX TASSER Lewis Kay University of Toronto CHESHIRE Rosetta Julia Forman-Kay University of Toronto ENSEMBLE Charles L. Brooks III University of Michigan CHARMM Hans Kalbitzer Universität Regensburg PERMOL George Rose Johns Hopkins University LINUS David Baker University of Washington Rosetta Ad Bax NIH CS-Rosetta Gaetano Montelione Rutgers University CS-DP-Rosetta Jens Meiler Vanderbilt University RosettaEPR David Wishart University of Alberta CS23D Michele Vendruscolo University of Cambridge CHESHIRE
  • 51. Sequence Secondary Structure David Baker 3- and 9-residue fragments Low-resolution folding Best low-res decoy selection Full-atom side-chain restoration Developed by 12 labs 50 people Rosetta-family methods Model quality evaluation Side chain optimization Backbone optimization Small backbone moves vdW repulsive Full-atom refinement
  • 52. Rosetta-family methods Method Year Restraints Rosetta 1996-1999 Rosetta-NMR 2000 NOEs Rosetta-NMR-RDC 2002 NOEs, RDCs CS-Rosetta 2008 CS CS-DP-Rosetta 2010 CS, unassigned NOEs iterative-CS-RDC- NOE Rosetta 2010 CS, RDC, backbone NOEs PCS-ROSETTA 2011 Pseudo-contact shifts Rosetta-EPR 2011 EPR data CS-HM Rosetta 2012 CS, homology RASREC Rosetta 2011-2012 CS, methyl NOEs, RDCs
  • 53. Sequence Secondary Structure David Baker 3- and 9-residue fragments Low-resolution folding Best low-res decoy selection Full-atom side-chain restoration Side chain optimization Backbone optimization Small backbone moves vdW repulsive Full-atom refinement How is sparse data used in Rosetta-family methods? Model quality evaluation Chemical shifts NOEs, RDCs, PCSs, Homology, etc
  • 54. Performance of iterative Rosetta for backbone-only NMR data
  • 55. What are the criteria of Rosetta simulation success? 1) RMSD with respect to the lowest/best score model should be within 2Å for more than 60% of models Successful simulation Failed simulation 2) The converged structures should be clearly lower in energy than all significantly different (RMSD greater than 7 Å Effect of experimental data 3) The structures generated with experimental data should be at least as low in energy as those generated without experimental data or even lower/better
  • 56. Problems with validation of Rosetta models. 1) It is not clear if the Rosetta success criteria are universal for all scenarios 2) Agreement with experimental data is not very meaningful because the data is sparse. 3) Rfree like validation is difficult (and also not meaningful) because experimental data is sparse 4) Independent experimental data for validation will likely be unavailable 5) Normality-based scores for model validation (e.g. ResProx) will likely fail to detect inaccurate but highly-idealized Rosetta models Need for developing an independent model validation protocol
  • 57. Problem with informational content of protein models from sparse data Fragment idealization Bias by fragments from other proteins Biased by restraints from homologous proteins Rosetta scoring function VS. Sparse experimental data The experimental data is over-powered by knowledge based information Sparse data Typical male Model Typical AccurateAccurate Normal
  • 58. Protein structures from the point of view of an experimentalist Do not trust Trust Wait for scintific community to validate Structures with no experimental data: Homology Modeling, Threading, Fragment-based, Ab Initio s Structures from large amount of experimental data XRAYNMR Structures from sparse experimental data Rosetta, etc.
  • 59. What’s up with the “no free lunch” thing? 1) There is no substitute for a large amount of experimental data. If you do not do experiment, you do not get the information relevant to your specific experimental conditions (e.g. protein construct, sample conditions, etc). You can not get the same level of accuracy with sparse data or theoretical models 2) If you have an easy protein, do a full-blown structure determination 3) If you have no choice other than using sparse data, do not over-interpret your structure model. You can not build an accurate high-resolution model of protein structure without getting high-quality experimental data with your sweat and blood
  • 60. This is not gonna happen any time soon Success of theoretical methods is still limited to very small proteins. Many theoretical models are biased, over-normalized, low-resolution, or simply inaccurate. Accuracy and high-resolution of models from sparse data is questionable.