SlideShare une entreprise Scribd logo
1  sur  60
Télécharger pour lire hors ligne
Humans, aliens, and eHarmony
                           or
  why there is no such thing as a free lunch in protein
structure determination from sparse experimental data




                    Mark Berjanskii, July 27th, 2012
… dedicated to all experimentalists
       who lost their way
Outline
► Purposes  and origins of protein
  structural models
► Theoretical models of protein structure
► Models from non-sparse experimental
  data
► Models from sparse experimental data
► Recent developments
Humans, aliens, and
Humans, aliens, and eHarmony
Humans, aliens, and eHarmony
         Typical human face
           by David Tood




        http://www.trood.dk/blog/the-face-of-humanity/
What is enough for aliens, not enough for
                    eHarmony
 Typical ≠ Accurate
                                                 “Accurate” human faces
 Typical human face
   by David Tood




http://www.trood.dk/blog/the-face-of-humanity/
Take-home message
                Typical ≠ Accurate


                                            Typical




                                                      Accurate
           Accurate




                                 Normal
Warning!!! Most protein structure validation tools check how typical or
normal your protein model is, not how accurate your protein model is.
The level of required protein
  structural accuracy also
  depends on its purpose
Why do we need to know
        protein structures?
1) Prediction of protein function from 3D structure (e.g. fold,
   motifs, active site prediction)

                                 1) Ubiquitin
                                    - degradation by the proteasome,


                                 2) Ubiquitin-like modifiers
                                    - function regulation by post-translation
                                    modification

2) Sequence-to-function prediction
3) Mechanism of protein function (e.g. enzyme catalysis,
   structural effect of known mutations).
4) Rational drug design and structure design
5) Design of novel proteins with novel function
When do we need to do a structural experiment?
 1) Structure is not known
 2) Structure is known but can not be used to answer your scientific question
       a) structure is incomplete                                                                  SIV protease
       b) structure was determined at different conditions
Tom Cruse under stress conditions           Prion protein
                                       pH


                                                                        Salt
                                    pH 5                    pH 1
                                                                African swine fever virus DNA polymerase X
         c) different liganded state                                     50mM salt vs 500 mM salt

                                                            d) different post-tranlational modification



                          Calmodulin
    e) mutations                                                myosin phosphatase inhibitor


GA95               GB95                                     f) Insufficient experimental data


            3 mutations                                                 X-Ray resolution
                                                                   1Å                                   3.5Å
Why use incomplete experimental data for protein
           structure determination?
1) Some biological questions (e.g. prediction of function from protein fold)
may not require high structural accuracy


                                                 Ubiquitin



2) Some biologically interesting proteins are too difficult to study by any
high-res method:
- proteins with extended flexible regions
-large proteins
- fibrillar and membrane proteins




  Flexible        Fibrils              Membrane proteins        Large proteins
Protein structures from the point
   of view of an experimentalist


                                                  Structures from
                           Structures from
  Structures with      sparse experimental data   large amount of
                                                         s
no experimental data                                experimental
                                                        data




    Do not trust              Not sure               Trust
What is expert’s opinion?


         Roman Laskowski
         Research Scientist at European Bioinformatics Institute




               Jenny Gu (Editor),
               Philip E. Bourne (Editor)
               ISBN: 978-0-470-18105-8
               Hardcover
               1067 pages
               John Wiley & Sons, Inc.
Non-experimental
   structures
Protein structures from the point
   of view of an experimentalist


  Structures with
no experimental data




    Do not trust       Not sure   Trust
What does Roman Laskowski
  write about theoretical structures?


Roman Laskowski
Research Scientist at European Bioinformatics Institute
What are the criteria of success in
protein structure determination?
1) In method development tests:
 - Global accuracy: RMSD, TM-score
- Agreement with experimental data (when
   available)
- Agreement with protein quality metrics


2) In real-life research:
 - Global accuracy
- Agreement with experimental data (when
   available)
-   Agreement with protein quality metrics
Ab-Initio structures by molecular dynamics
SuperComputer “Anton” for MD simulations

       D.E. Shaw Research




                            Dihydrofolate reductase:
                            Anton 512 cores: 15 µs/day
                            Desmond 512 cores: 0.5 µs/day
                            Amber-GPU 64 cores: 80 ns/day
          MD force-field    Amber 48 cores: 20ns/day
                            Gromacs 8 cores: ~5-10ns/day

                                                            How Fast-Folding Proteins Fold
                            Folding time-scales:            Kresten Lindorff-Larsen, Stefano Piana, Ron O. Dror, David E. Shaw;
                                                            Science 28 October 2011: Vol. 334 no. 6055 pp. 517-520


                                                                   Limitations:
                                                                   1) Requires powerful hardware or
                                                                       computing time
                                                                    2) Limited to small/simple
                                                                       proteins
Newton equation of motion                                          3) Can not take into account
                                                                       chaperone action
                                                                   4) Criteria for success???
Fragment-based ab initio structures
                            (Non-experimental structures)


                                      Rosetta               Developed by 12 labs
Sequence                                                    50 people
              Secondary Structure
                                              Low-resolution folding

  3- and 9-residue fragments
                                                                                   David Baker

                                                                           Best low-res decoy selection




 Model quality evaluation
                                      Full-atom refinement

                                                                       Full-atom side-chain restoration
                                      vdW repulsive

                                   Small backbone moves


                               Side chain        Backbone
                               optimization      optimization
Fragment-based ab initio structures
                              (Non-experimental structures)

                                  Rosetta                                 Rosetta limitations
                                                  1)   Does not fold well proteins above 100 residues (sampling
    Scoring function                                   problem)
                                                  2)   Biased by fragment structure
                                                  3)   Implicit solvation score is too simplistic and only weakly
                                                       disfavors buried unsatisfied polar groups.
                                                                                        David Baker
                                                  4)   Hydrogen bond potential neglects the effects of charged
                                                       atoms, (anti-) cooperativity within H-bond networks .
                                                  5)   Ignores electrostatic interactions (besides H-bonds) and
                                                       their screening,
                                                  6)   Does not permit rigorous estimation of a model's free
                                                       energy.
                                                  7)   Does not fold properly some very small proteins and RNA
Rosetta can fold small proteins (<100 residues)




                         Native      Rosetta
Native   Rosetta




Native     Rosetta         Native    Rosetta
Rosetta forces protein normality
Rosetta scoring function              Fragment idealization




                               Typical ≠ Accurate



                                                Typical



                                                          Accurate
                           Accurate




                                       Normal
Comparative or template-based modeling

      Threading                                       Homology modeling



                                  Template database



Alignment score based statistical                 Alignment score based on
          properties:                                 residue identity or
            mutation potential,                            similarity
      environment fitness potential,
            pairwise potential,
    secondary structure compatibilities
In a real-life scenario, success of homology modeling is
judged based on model normality, not model accuracy.
                                                        Typical ≠ Accurate



                                                                        Typical



                                                                                  Accurate
                                           Accurate




                                                               Normal




                   Theoretical scores for protein quality
                   assessment may have wrong energetic minima.

                   Limited conformational sampling may not always
                   yield the native conformation
Some people may think that any structure can be determined
      via homology modeling because “all” folds of
      NMRable and XRAYable proteins are “known”

                                But can simple folds provide
                                all necessary information to
                                define domain orientation in
                                and overall structure of
                                complex proteins?
                                           SV40 T-antigen




                                               NO!
Accuracy of template-based modeling




                Do you want to take chance to get this
                as your high-accuracy structure?




http://swissmodel.expasy.org/workspace/tutorial/eva.html        Curr Protoc Bioinformatics. 2006 Oct;Chapter 5:Unit 5.6.
                                                                Comparative protein structure modeling using Modeller.
                                                                Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, Shen MY,
                                                                Pieper U, Sali A.

    Errors in template-based models:




      Wrong side-chain     Conformational
          packing             changes          No-template segments     Mis-alignment                     Wrong template
Template-based models have strong 3D bias to
                 the template
   Even 100% identical proteins may have very different
   structures due to:
       1) Different pHs          2) Different ionic strength
                                                                              Ad Bax homology-modeled from
                                                                                a George Clooney template
                    PrP




        pH 5                    pH 1
                                       African swine fever virus DNA polymerase X
       c) Different liganded state              50mM salt vs 500 mM salt

                                                d) Different post-tranlational modification



                       Calmodulin
   e) mutations                                       myosin phosphatase inhibitor


GA95            GB95                           f) Insufficient experimental data of template


         3 mutations                                           X-Ray resolution
                                                         1Å                               3.5Å
Dealing with 3D bias
                    TASSER

                                         Rosetta




Jeffrey
Skolnick

                             Distance restraints
Pros:                                              David
-Models are less                                   Baker
biased by template
Cons:
- Models are more
dependent on
imperfect folding
scores
Protein structures from the point
   of view of an experimentalist


   Structures with                 Structures from
no experimental data:              large amount of
                                          s
Homology Modeling,                   experimental
      Threading,
                                         data
  Fragment-based,
       Ab Initio




    Do not trust        Not sure      Trust
Experimental methods of
high-resolution structure
     determination
X-ray crystallography
X-Ray crystallography

              Quality metrics:

              1) Experimental data:
              -Number of reflections
              -Signal to noise ratio

              2) Model-to-experiment agreement:
              - R factor
              -R free factor

              3) Coordinate uncertainty:
              - B-factor

              4) Stereo-chemical normality:
              - backbone torsion angles
              (Ramachandran plot)
              - bond length, angles
              - side-chain torsion angles
X-RayResolution


Minimum spacing (d)
                                                               High resolution
of crystal lattice planes   High resolution Low resolution
that still provide
measurable diffraction
of X-rays.

Minimum distance
between structural
features that can
be distinguished in the                                        200,000 reflections
electron-density maps.
                                                                  Low resolution




                            Many reflections Few reflections      500 reflections
Resolution and protein quality
                                                  When X-ray data is incomplete, you have to rely on
                                                  other sources of structural information: knowledge-based
                                                  parameters. Imagination, etc.




                                                           Low resolution              High resolution




Blow, D. (2002). Outline of Crystallographyfor
Biologists (New York: Oxford University Press).
NMR spectroscopy
Protein NMR spectroscopy
Experiment             Spectra processing            Spectra assignment




         Model generation                                NOE assignment
                               Distance restraints
What is typical NMR experimental
              data?



Groups of protein quality parameters:

1) Quality of experimental observables that were used in
structure determination*

2) Agreement between the structure and experimental
observables*

3) Agreement between local geometry of the new structure and
parameters of existing high-quality structures

4) Structural uncertainty*

* Method-dependent
What is non-sparse NMR
                    data?




Macromolecular NMR spectroscopy for the non-spectroscopist. Kwan AH, Mobli M, Gooley PR, King GF, Mackay JP.
FEBS J. 2011 Mar;278(5):687-703
Protein structures from the point of
      view of an experimentalist
                                     Structures from large
   Structures with                        amount of
                                                  s
no experimental data:                  experimental data
Homology Modeling,
      Threading,
  Fragment-based,
       Ab Initio                   NMR                       XRAY



   Do not trust         Not sure                 Trust


                                          XRAY
                                          NMR




                                         [Equivalent] Resolution
Sparse experimental data
Protein structures from the point of
      view of an experimentalist

                                              Structures from large
   Structures with                                 amount of
                                                         s
no experimental data:                           experimental data
                           Structures
Homology Modeling,        from sparse
      Threading,        experimental data
  Fragment-based,
       Ab Initio                            NMR                       XRAY



   Do not trust               Not sure                    Trust
What is sparse experimental data?
In XRAY:                                                        Low resolution
- When you do not have enough XRAY reflections




                                                                   500 reflections
In NMR:
- When you do not have enough NOEs (e.g. no side-chain NOEs) or any NOEs




Sparse data from other methods:
-Distance restraints from cross-linking and mass-spectroscopy
-Distance restraints from spin-labeling and electron paramagnetic resonance (EPR)
spectroscopy
- Protein size, shape, radius of gyration from small angle Xray scattering (SAXS)
Why use incomplete experimental data for protein
           structure determination?
1) Some biological questions (e.g. prediction of function from protein fold)
may not require high structural accuracy


                                                 Ubiquitin



2) Some biologically interesting proteins are too difficult to study by any
high-res method:
- proteins with extended flexible regions
-large proteins
- fibrillar and membrane proteins




  Flexible        Fibrils              Membrane proteins        Large proteins
Protein structures from the point of
     view of an experimentalist


                                               Experimental methods
                                                       s
Homology Modeling,      Structures
    Threading,         from sparse
 Fragment-based,     experimental data
     Ab Initio
                                         NMR                     XRAY



   Do not trust            Not sure                     Trust


                                                                  EPR


                                                     Mass
                                                                  SAXS
                                                     spec
Regular NMR structure determination




                                   Standard
 Complete                          Force-field
 NOEs




             Dihedral restraints
Sparse NMR structure determination




                               Standard
Sparse                         Force-field
NOEs




             Dihedral restraints
Sparse NMR structure determination


                       Advanced
                       Force-field:
                       solvation term
                       full electrostatic
Sparse
NOEs




             Dihedral restraints
Sparse NMR structure determination


                    Advanced
                    Force-field:
                    solvation term
                    full electrostatic
Sparse              Knowledge-based
NOEs                potentials




            Fragment
            information


              Dihedral restraints
Sparse NMR structure determination


                         Advanced
         Homology        Force-field:
         information     solvation term
                         full electrostatic
Sparse                   knowledge-based
NOEs                     potentials




                Fragment
                information


                   Dihedral restraints
Pushing the boundaries of protein structure
                          determination




 David Baker        Ad Bax   Gaetano Jens Meiler            David        Michele George Rose Klaus
                           Montelione                      Wishart     Vendruscolo          Schulten
  University of    NIH                                                              Johns
                                        Vanderbilt
  Washington                 Rutgers University           University                  Hopkins      University
                                                                       University of University    of Illinois
                            University                    of Alberta
                CS-Rosetta                                              Cambridge
    Rosetta                            RosettaEPR          CS23D
                          CS-DP-Rosetta                                               LINUS          NAMD
                                                                        CHESHIRE




Jeffrey Skolnick     Yang Zhang       Andrzej       Hans          Lewis Kay         Julia         Charles L.
                                      Kolinski     Kalbitzer                     Forman-Kay       Brooks III
Georgia Institute    University of
                                                                 University of                     University
 of Technology        Michigan                     Universität
                                      Warsaw                       Toronto       University
                                                  Regensburg                                      of Michigan
TOUCHSTONEX           iTASSER        University                                  of Toronto
                                                   PERMOL        CHESHIRE
   TASSER                            CABS-NMR                     Rosetta        ENSEMBLE         CHARMM
Rosetta-family methods
                                                       Developed by 12 labs
Sequence                                               50 people
              Secondary Structure
                                          Low-resolution folding

  3- and 9-residue fragments
                                                                               David Baker

                                                                       Best low-res decoy selection




   Model quality evaluation
                                      Full-atom refinement

                                                                   Full-atom side-chain restoration
                                      vdW repulsive

                                    Small backbone moves


                               Side chain      Backbone
                               optimization    optimization
Rosetta-family methods
Method                Year            Restraints
Rosetta             1996-1999
Rosetta-NMR           2000      NOEs
Rosetta-NMR-RDC       2002      NOEs, RDCs
CS-Rosetta            2008      CS
CS-DP-Rosetta         2010      CS, unassigned NOEs
iterative-CS-RDC-     2010      CS, RDC, backbone
NOE Rosetta                     NOEs
 PCS-ROSETTA          2011      Pseudo-contact shifts
Rosetta-EPR           2011      EPR data
CS-HM Rosetta         2012      CS, homology
RASREC Rosetta      2011-2012   CS, methyl NOEs, RDCs
How is sparse data used in Rosetta-family methods?
Sequence      Secondary Structure        Low-resolution folding       Best low-res decoy selection



  3- and 9-residue fragments



                                                                              David Baker




       Chemical shifts                  NOEs, RDCs, PCSs, Homology, etc


   Model quality evaluation           Full-atom refinement

                                                                  Full-atom side-chain restoration
                                      vdW repulsive

                                    Small backbone moves


                               Side chain      Backbone
                               optimization    optimization
Performance of iterative Rosetta
  for backbone-only NMR data
What are the criteria of Rosetta simulation success?
1) RMSD with respect to the lowest/best score model should be within 2Å
   for more than 60% of models               Effect of experimental data
Successful simulation   Failed simulation




2) The converged structures should be clearly lower in energy than all
   significantly different (RMSD greater than 7 Å

3) The structures generated with experimental data should be at least as
   low in energy as those generated without experimental data or even
   lower/better
Problems with validation of Rosetta models.



1) It is not clear if the Rosetta success criteria are universal for all scenarios
2) Agreement with experimental data is not very meaningful because
the data is sparse.
3) Rfree like validation is difficult (and also not meaningful) because
   experimental data is sparse
4) Independent experimental data for validation will likely be unavailable
5) Normality-based scores for model validation (e.g. ResProx) will likely fail
to detect inaccurate but highly-idealized Rosetta models



  Need for developing an independent model validation protocol
Problem with informational content of
     protein models from sparse data
  The experimental data is over-powered by knowledge based information

Rosetta scoring function                         Fragment idealization

                                                 Bias by fragments from other
                                                 proteins

                                                 Biased by restraints from
                                                 homologous proteins


                                                      VS.

                               Typical male      Sparse experimental data

                                              Model              Sparse data

                Typical


Accurate            Accurate




           Normal
Protein structures from the point of
      view of an experimentalist
                                                Structures from large
                                                     amount of
   Structures with                                         s
no experimental data:                             experimental data
                            Structures
Homology Modeling,         from sparse
      Threading,         experimental data
  Fragment-based,          Rosetta, etc.
       Ab Initio                             NMR                        XRAY


                          Wait for scintific
   Do not trust                                             Trust
                        community to validate
What’s up with the “no free lunch” thing?




You can not build an accurate high-resolution model of protein structure without
       getting high-quality experimental data with your sweat and blood

  1) There is no substitute for a large amount of experimental data. If you do not do
     experiment, you do not get the information relevant to your specific experimental
     conditions (e.g. protein construct, sample conditions, etc).
  You can not get the same level of accuracy with sparse data or theoretical models
  2) If you have an easy protein, do a full-blown structure determination
  3) If you have no choice other than using sparse data, do not over-interpret your
       structure model.
This is not gonna happen any time soon




Success of theoretical methods is still limited to very small proteins.

Many theoretical models are biased, over-normalized, low-resolution, or
simply inaccurate.

Accuracy and high-resolution of models from sparse data is questionable.

Contenu connexe

Tendances

Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23Sage Base
 
Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequen...
Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequen...Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequen...
Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequen...Copenhagenomics
 
Infographic CV MB Long
Infographic CV MB LongInfographic CV MB Long
Infographic CV MB LongMetin Bilgin
 
Stephen Friend ICR UK 2012-06-18
Stephen Friend ICR UK 2012-06-18Stephen Friend ICR UK 2012-06-18
Stephen Friend ICR UK 2012-06-18Sage Base
 
Cancer and cell biology research concepts
Cancer and cell biology research conceptsCancer and cell biology research concepts
Cancer and cell biology research conceptsAngela Alexander, PhD
 
Plant Sciences Applications Resource Guide
Plant Sciences Applications Resource GuidePlant Sciences Applications Resource Guide
Plant Sciences Applications Resource GuideThermo Fisher Scientific
 
Friend WIN Symposium 2012-06-28
Friend WIN Symposium 2012-06-28Friend WIN Symposium 2012-06-28
Friend WIN Symposium 2012-06-28Sage Base
 
Computational Enzymology of Ribozymes (from metal-ion to nucleobase catalysis...
Computational Enzymology of Ribozymes (from metal-ion to nucleobase catalysis...Computational Enzymology of Ribozymes (from metal-ion to nucleobase catalysis...
Computational Enzymology of Ribozymes (from metal-ion to nucleobase catalysis...Fabrice Leclerc
 

Tendances (12)

Neuroscience research brochure
Neuroscience research brochureNeuroscience research brochure
Neuroscience research brochure
 
Infographic CV MB
Infographic CV MBInfographic CV MB
Infographic CV MB
 
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
 
Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequen...
Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequen...Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequen...
Discovery of Cow Rumen Biomass-Degrading Genes and Genomes through DNA Sequen...
 
Infographic CV MB Long
Infographic CV MB LongInfographic CV MB Long
Infographic CV MB Long
 
Stephen Friend ICR UK 2012-06-18
Stephen Friend ICR UK 2012-06-18Stephen Friend ICR UK 2012-06-18
Stephen Friend ICR UK 2012-06-18
 
Cancer and cell biology research concepts
Cancer and cell biology research conceptsCancer and cell biology research concepts
Cancer and cell biology research concepts
 
Nuclic acid analysis in nematode systematic
Nuclic acid analysis in nematode systematicNuclic acid analysis in nematode systematic
Nuclic acid analysis in nematode systematic
 
Plant Research with Life Technologies
Plant Research with Life TechnologiesPlant Research with Life Technologies
Plant Research with Life Technologies
 
Plant Sciences Applications Resource Guide
Plant Sciences Applications Resource GuidePlant Sciences Applications Resource Guide
Plant Sciences Applications Resource Guide
 
Friend WIN Symposium 2012-06-28
Friend WIN Symposium 2012-06-28Friend WIN Symposium 2012-06-28
Friend WIN Symposium 2012-06-28
 
Computational Enzymology of Ribozymes (from metal-ion to nucleobase catalysis...
Computational Enzymology of Ribozymes (from metal-ion to nucleobase catalysis...Computational Enzymology of Ribozymes (from metal-ion to nucleobase catalysis...
Computational Enzymology of Ribozymes (from metal-ion to nucleobase catalysis...
 

En vedette

Usuário 3.0 - Um olhar para os hábitos de consumo
Usuário 3.0 - Um olhar para os hábitos de consumoUsuário 3.0 - Um olhar para os hábitos de consumo
Usuário 3.0 - Um olhar para os hábitos de consumoRenato Wajnberg
 
Diabetic and Hypertensive Retinopathy30 3-2011-121109075116-phpapp01 (1)
Diabetic and Hypertensive Retinopathy30 3-2011-121109075116-phpapp01 (1)Diabetic and Hypertensive Retinopathy30 3-2011-121109075116-phpapp01 (1)
Diabetic and Hypertensive Retinopathy30 3-2011-121109075116-phpapp01 (1)saiful islam
 
Partes de la computadora
Partes de la computadoraPartes de la computadora
Partes de la computadoraJazdaly Meza
 
Conhecendo nossa terra 8º ano
Conhecendo nossa terra 8º anoConhecendo nossa terra 8º ano
Conhecendo nossa terra 8º anoIvonilde Lima
 
MBJ_April_2011-1
MBJ_April_2011-1MBJ_April_2011-1
MBJ_April_2011-1Renee Lau
 
It's Not About Technology (pdf with Notes)
It's Not About Technology (pdf with Notes)It's Not About Technology (pdf with Notes)
It's Not About Technology (pdf with Notes)Tim O'Reilly
 
PhRMA STEM Education Factsheet 2014
PhRMA STEM Education Factsheet 2014PhRMA STEM Education Factsheet 2014
PhRMA STEM Education Factsheet 2014PhRMA
 
CV_Ruchi_GSA Project details
CV_Ruchi_GSA Project detailsCV_Ruchi_GSA Project details
CV_Ruchi_GSA Project detailsRuchi Mishra
 
Construction of a simply supported structure with central charge
Construction of a simply supported structure with central chargeConstruction of a simply supported structure with central charge
Construction of a simply supported structure with central chargeJosep M Carbonell Oyonarte
 
Sold! Event - Aug. Presentation
Sold! Event - Aug. PresentationSold! Event - Aug. Presentation
Sold! Event - Aug. PresentationSoldEvents
 
July 2012 Ruby Tuesday - Lana Lodge - Refactoring Lighting Talk
July 2012 Ruby Tuesday - Lana Lodge - Refactoring Lighting TalkJuly 2012 Ruby Tuesday - Lana Lodge - Refactoring Lighting Talk
July 2012 Ruby Tuesday - Lana Lodge - Refactoring Lighting Talkottawaruby
 
Startups and managing startups
Startups and managing startupsStartups and managing startups
Startups and managing startupsPavan Kumar Vijay
 
Motivational quotes for your business success
Motivational quotes for your business successMotivational quotes for your business success
Motivational quotes for your business successGlenn Silvestre
 
Bad designs can happen to good people
Bad designs can happen to good peopleBad designs can happen to good people
Bad designs can happen to good peopleJim Streisel
 

En vedette (20)

Usuário 3.0 - Um olhar para os hábitos de consumo
Usuário 3.0 - Um olhar para os hábitos de consumoUsuário 3.0 - Um olhar para os hábitos de consumo
Usuário 3.0 - Um olhar para os hábitos de consumo
 
Diabetic and Hypertensive Retinopathy30 3-2011-121109075116-phpapp01 (1)
Diabetic and Hypertensive Retinopathy30 3-2011-121109075116-phpapp01 (1)Diabetic and Hypertensive Retinopathy30 3-2011-121109075116-phpapp01 (1)
Diabetic and Hypertensive Retinopathy30 3-2011-121109075116-phpapp01 (1)
 
Partes de la computadora
Partes de la computadoraPartes de la computadora
Partes de la computadora
 
Conhecendo nossa terra 8º ano
Conhecendo nossa terra 8º anoConhecendo nossa terra 8º ano
Conhecendo nossa terra 8º ano
 
10 начина да подобрите атмосферата в спалнята
10 начина да подобрите атмосферата в спалнята10 начина да подобрите атмосферата в спалнята
10 начина да подобрите атмосферата в спалнята
 
MBJ_April_2011-1
MBJ_April_2011-1MBJ_April_2011-1
MBJ_April_2011-1
 
It's Not About Technology (pdf with Notes)
It's Not About Technology (pdf with Notes)It's Not About Technology (pdf with Notes)
It's Not About Technology (pdf with Notes)
 
PhRMA STEM Education Factsheet 2014
PhRMA STEM Education Factsheet 2014PhRMA STEM Education Factsheet 2014
PhRMA STEM Education Factsheet 2014
 
CV_Ruchi_GSA Project details
CV_Ruchi_GSA Project detailsCV_Ruchi_GSA Project details
CV_Ruchi_GSA Project details
 
3414097.2010 3
3414097.2010 33414097.2010 3
3414097.2010 3
 
Transistor
TransistorTransistor
Transistor
 
Construction of a simply supported structure with central charge
Construction of a simply supported structure with central chargeConstruction of a simply supported structure with central charge
Construction of a simply supported structure with central charge
 
Sold! Event - Aug. Presentation
Sold! Event - Aug. PresentationSold! Event - Aug. Presentation
Sold! Event - Aug. Presentation
 
Chap005
Chap005Chap005
Chap005
 
July 2012 Ruby Tuesday - Lana Lodge - Refactoring Lighting Talk
July 2012 Ruby Tuesday - Lana Lodge - Refactoring Lighting TalkJuly 2012 Ruby Tuesday - Lana Lodge - Refactoring Lighting Talk
July 2012 Ruby Tuesday - Lana Lodge - Refactoring Lighting Talk
 
Startups and managing startups
Startups and managing startupsStartups and managing startups
Startups and managing startups
 
Motivational quotes for your business success
Motivational quotes for your business successMotivational quotes for your business success
Motivational quotes for your business success
 
Leadership lessons-from-obama-
Leadership lessons-from-obama-Leadership lessons-from-obama-
Leadership lessons-from-obama-
 
Consultoria BI
Consultoria BIConsultoria BI
Consultoria BI
 
Bad designs can happen to good people
Bad designs can happen to good peopleBad designs can happen to good people
Bad designs can happen to good people
 

Similaire à Humans Aliens And E Harmony Or Why There Is No Such Thing As A Free Lunch In Protein Structure Determination From Sparse Experimental Data

Next-generation sequencing course, part 1: technologies
Next-generation sequencing course, part 1: technologiesNext-generation sequencing course, part 1: technologies
Next-generation sequencing course, part 1: technologiesJan Aerts
 
Protein Structure Determination
Protein Structure DeterminationProtein Structure Determination
Protein Structure DeterminationAmjad Ibrahim
 
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...RussellHanson
 
Identification of pathological mutations from the single-gene case to exome p...
Identification of pathological mutations from the single-gene case to exome p...Identification of pathological mutations from the single-gene case to exome p...
Identification of pathological mutations from the single-gene case to exome p...Vall d'Hebron Institute of Research (VHIR)
 
DNA Markers Techniques for Plant Varietal Identification
DNA Markers Techniques for Plant Varietal Identification DNA Markers Techniques for Plant Varietal Identification
DNA Markers Techniques for Plant Varietal Identification Senthil Natesan
 
Vienna afp2011
Vienna afp2011Vienna afp2011
Vienna afp2011Iddo
 
David Jones AFP/CAFA2011
David Jones AFP/CAFA2011David Jones AFP/CAFA2011
David Jones AFP/CAFA2011Iddo
 
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017David Cook
 
Research presentation-wd
Research presentation-wdResearch presentation-wd
Research presentation-wdWagied Davids
 
Curation Introduction - Apollo Workshop
Curation Introduction - Apollo WorkshopCuration Introduction - Apollo Workshop
Curation Introduction - Apollo WorkshopMonica Munoz-Torres
 
Next-generation genomics: an integrative approach
Next-generation genomics: an integrative approachNext-generation genomics: an integrative approach
Next-generation genomics: an integrative approachHong ChangBum
 
Apollo Workshop AGS2017 Introduction
Apollo Workshop AGS2017 IntroductionApollo Workshop AGS2017 Introduction
Apollo Workshop AGS2017 IntroductionMonica Munoz-Torres
 
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...Natalio Krasnogor
 
Proteomics in VSC for crop improvement programme
Proteomics in VSC for crop improvement programmeProteomics in VSC for crop improvement programme
Proteomics in VSC for crop improvement programmeSumanthBT1
 

Similaire à Humans Aliens And E Harmony Or Why There Is No Such Thing As A Free Lunch In Protein Structure Determination From Sparse Experimental Data (20)

RML NCBI Resources
RML NCBI ResourcesRML NCBI Resources
RML NCBI Resources
 
Complete Human Genome Sequencing
Complete Human Genome SequencingComplete Human Genome Sequencing
Complete Human Genome Sequencing
 
15 arrays
15 arrays15 arrays
15 arrays
 
Next-generation sequencing course, part 1: technologies
Next-generation sequencing course, part 1: technologiesNext-generation sequencing course, part 1: technologies
Next-generation sequencing course, part 1: technologies
 
DNA Barcoding
DNA BarcodingDNA Barcoding
DNA Barcoding
 
Bioinformatica t7-protein structure
Bioinformatica t7-protein structureBioinformatica t7-protein structure
Bioinformatica t7-protein structure
 
Protein Structure Determination
Protein Structure DeterminationProtein Structure Determination
Protein Structure Determination
 
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
Synthetic Biology and Data-Driven Synthetic Biology for Personalized Medicine...
 
Identification of pathological mutations from the single-gene case to exome p...
Identification of pathological mutations from the single-gene case to exome p...Identification of pathological mutations from the single-gene case to exome p...
Identification of pathological mutations from the single-gene case to exome p...
 
DNA Markers Techniques for Plant Varietal Identification
DNA Markers Techniques for Plant Varietal Identification DNA Markers Techniques for Plant Varietal Identification
DNA Markers Techniques for Plant Varietal Identification
 
Vienna afp2011
Vienna afp2011Vienna afp2011
Vienna afp2011
 
David Jones AFP/CAFA2011
David Jones AFP/CAFA2011David Jones AFP/CAFA2011
David Jones AFP/CAFA2011
 
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
 
Research presentation-wd
Research presentation-wdResearch presentation-wd
Research presentation-wd
 
DNA Daily
DNA DailyDNA Daily
DNA Daily
 
Curation Introduction - Apollo Workshop
Curation Introduction - Apollo WorkshopCuration Introduction - Apollo Workshop
Curation Introduction - Apollo Workshop
 
Next-generation genomics: an integrative approach
Next-generation genomics: an integrative approachNext-generation genomics: an integrative approach
Next-generation genomics: an integrative approach
 
Apollo Workshop AGS2017 Introduction
Apollo Workshop AGS2017 IntroductionApollo Workshop AGS2017 Introduction
Apollo Workshop AGS2017 Introduction
 
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
P
 Systems 
Model 
Optimisation 
by
 Means 
of 
Evolutionary 
Based 
Search
 ...
 
Proteomics in VSC for crop improvement programme
Proteomics in VSC for crop improvement programmeProteomics in VSC for crop improvement programme
Proteomics in VSC for crop improvement programme
 

Humans Aliens And E Harmony Or Why There Is No Such Thing As A Free Lunch In Protein Structure Determination From Sparse Experimental Data

  • 1. Humans, aliens, and eHarmony or why there is no such thing as a free lunch in protein structure determination from sparse experimental data Mark Berjanskii, July 27th, 2012
  • 2. … dedicated to all experimentalists who lost their way
  • 3. Outline ► Purposes and origins of protein structural models ► Theoretical models of protein structure ► Models from non-sparse experimental data ► Models from sparse experimental data ► Recent developments
  • 6. Humans, aliens, and eHarmony Typical human face by David Tood http://www.trood.dk/blog/the-face-of-humanity/
  • 7. What is enough for aliens, not enough for eHarmony Typical ≠ Accurate “Accurate” human faces Typical human face by David Tood http://www.trood.dk/blog/the-face-of-humanity/
  • 8. Take-home message Typical ≠ Accurate Typical Accurate Accurate Normal Warning!!! Most protein structure validation tools check how typical or normal your protein model is, not how accurate your protein model is.
  • 9. The level of required protein structural accuracy also depends on its purpose
  • 10. Why do we need to know protein structures? 1) Prediction of protein function from 3D structure (e.g. fold, motifs, active site prediction) 1) Ubiquitin - degradation by the proteasome, 2) Ubiquitin-like modifiers - function regulation by post-translation modification 2) Sequence-to-function prediction 3) Mechanism of protein function (e.g. enzyme catalysis, structural effect of known mutations). 4) Rational drug design and structure design 5) Design of novel proteins with novel function
  • 11. When do we need to do a structural experiment? 1) Structure is not known 2) Structure is known but can not be used to answer your scientific question a) structure is incomplete SIV protease b) structure was determined at different conditions Tom Cruse under stress conditions Prion protein pH Salt pH 5 pH 1 African swine fever virus DNA polymerase X c) different liganded state 50mM salt vs 500 mM salt d) different post-tranlational modification Calmodulin e) mutations myosin phosphatase inhibitor GA95 GB95 f) Insufficient experimental data 3 mutations X-Ray resolution 1Å 3.5Å
  • 12. Why use incomplete experimental data for protein structure determination? 1) Some biological questions (e.g. prediction of function from protein fold) may not require high structural accuracy Ubiquitin 2) Some biologically interesting proteins are too difficult to study by any high-res method: - proteins with extended flexible regions -large proteins - fibrillar and membrane proteins Flexible Fibrils Membrane proteins Large proteins
  • 13. Protein structures from the point of view of an experimentalist Structures from Structures from Structures with sparse experimental data large amount of s no experimental data experimental data Do not trust Not sure Trust
  • 14. What is expert’s opinion? Roman Laskowski Research Scientist at European Bioinformatics Institute Jenny Gu (Editor), Philip E. Bourne (Editor) ISBN: 978-0-470-18105-8 Hardcover 1067 pages John Wiley & Sons, Inc.
  • 15. Non-experimental structures
  • 16. Protein structures from the point of view of an experimentalist Structures with no experimental data Do not trust Not sure Trust
  • 17. What does Roman Laskowski write about theoretical structures? Roman Laskowski Research Scientist at European Bioinformatics Institute
  • 18. What are the criteria of success in protein structure determination? 1) In method development tests: - Global accuracy: RMSD, TM-score - Agreement with experimental data (when available) - Agreement with protein quality metrics 2) In real-life research: - Global accuracy - Agreement with experimental data (when available) - Agreement with protein quality metrics
  • 19. Ab-Initio structures by molecular dynamics SuperComputer “Anton” for MD simulations D.E. Shaw Research Dihydrofolate reductase: Anton 512 cores: 15 µs/day Desmond 512 cores: 0.5 µs/day Amber-GPU 64 cores: 80 ns/day MD force-field Amber 48 cores: 20ns/day Gromacs 8 cores: ~5-10ns/day How Fast-Folding Proteins Fold Folding time-scales: Kresten Lindorff-Larsen, Stefano Piana, Ron O. Dror, David E. Shaw; Science 28 October 2011: Vol. 334 no. 6055 pp. 517-520 Limitations: 1) Requires powerful hardware or computing time 2) Limited to small/simple proteins Newton equation of motion 3) Can not take into account chaperone action 4) Criteria for success???
  • 20. Fragment-based ab initio structures (Non-experimental structures) Rosetta Developed by 12 labs Sequence 50 people Secondary Structure Low-resolution folding 3- and 9-residue fragments David Baker Best low-res decoy selection Model quality evaluation Full-atom refinement Full-atom side-chain restoration vdW repulsive Small backbone moves Side chain Backbone optimization optimization
  • 21. Fragment-based ab initio structures (Non-experimental structures) Rosetta Rosetta limitations 1) Does not fold well proteins above 100 residues (sampling Scoring function problem) 2) Biased by fragment structure 3) Implicit solvation score is too simplistic and only weakly disfavors buried unsatisfied polar groups. David Baker 4) Hydrogen bond potential neglects the effects of charged atoms, (anti-) cooperativity within H-bond networks . 5) Ignores electrostatic interactions (besides H-bonds) and their screening, 6) Does not permit rigorous estimation of a model's free energy. 7) Does not fold properly some very small proteins and RNA Rosetta can fold small proteins (<100 residues) Native Rosetta Native Rosetta Native Rosetta Native Rosetta
  • 22. Rosetta forces protein normality Rosetta scoring function Fragment idealization Typical ≠ Accurate Typical Accurate Accurate Normal
  • 23. Comparative or template-based modeling Threading Homology modeling Template database Alignment score based statistical Alignment score based on properties: residue identity or mutation potential, similarity environment fitness potential, pairwise potential, secondary structure compatibilities
  • 24. In a real-life scenario, success of homology modeling is judged based on model normality, not model accuracy. Typical ≠ Accurate Typical Accurate Accurate Normal Theoretical scores for protein quality assessment may have wrong energetic minima. Limited conformational sampling may not always yield the native conformation
  • 25. Some people may think that any structure can be determined via homology modeling because “all” folds of NMRable and XRAYable proteins are “known” But can simple folds provide all necessary information to define domain orientation in and overall structure of complex proteins? SV40 T-antigen NO!
  • 26. Accuracy of template-based modeling Do you want to take chance to get this as your high-accuracy structure? http://swissmodel.expasy.org/workspace/tutorial/eva.html Curr Protoc Bioinformatics. 2006 Oct;Chapter 5:Unit 5.6. Comparative protein structure modeling using Modeller. Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, Shen MY, Pieper U, Sali A. Errors in template-based models: Wrong side-chain Conformational packing changes No-template segments Mis-alignment Wrong template
  • 27. Template-based models have strong 3D bias to the template Even 100% identical proteins may have very different structures due to: 1) Different pHs 2) Different ionic strength Ad Bax homology-modeled from a George Clooney template PrP pH 5 pH 1 African swine fever virus DNA polymerase X c) Different liganded state 50mM salt vs 500 mM salt d) Different post-tranlational modification Calmodulin e) mutations myosin phosphatase inhibitor GA95 GB95 f) Insufficient experimental data of template 3 mutations X-Ray resolution 1Å 3.5Å
  • 28. Dealing with 3D bias TASSER Rosetta Jeffrey Skolnick Distance restraints Pros: David -Models are less Baker biased by template Cons: - Models are more dependent on imperfect folding scores
  • 29. Protein structures from the point of view of an experimentalist Structures with Structures from no experimental data: large amount of s Homology Modeling, experimental Threading, data Fragment-based, Ab Initio Do not trust Not sure Trust
  • 30. Experimental methods of high-resolution structure determination
  • 32. X-Ray crystallography Quality metrics: 1) Experimental data: -Number of reflections -Signal to noise ratio 2) Model-to-experiment agreement: - R factor -R free factor 3) Coordinate uncertainty: - B-factor 4) Stereo-chemical normality: - backbone torsion angles (Ramachandran plot) - bond length, angles - side-chain torsion angles
  • 33. X-RayResolution Minimum spacing (d) High resolution of crystal lattice planes High resolution Low resolution that still provide measurable diffraction of X-rays. Minimum distance between structural features that can be distinguished in the 200,000 reflections electron-density maps. Low resolution Many reflections Few reflections 500 reflections
  • 34. Resolution and protein quality When X-ray data is incomplete, you have to rely on other sources of structural information: knowledge-based parameters. Imagination, etc. Low resolution High resolution Blow, D. (2002). Outline of Crystallographyfor Biologists (New York: Oxford University Press).
  • 36. Protein NMR spectroscopy Experiment Spectra processing Spectra assignment Model generation NOE assignment Distance restraints
  • 37. What is typical NMR experimental data? Groups of protein quality parameters: 1) Quality of experimental observables that were used in structure determination* 2) Agreement between the structure and experimental observables* 3) Agreement between local geometry of the new structure and parameters of existing high-quality structures 4) Structural uncertainty* * Method-dependent
  • 38. What is non-sparse NMR data? Macromolecular NMR spectroscopy for the non-spectroscopist. Kwan AH, Mobli M, Gooley PR, King GF, Mackay JP. FEBS J. 2011 Mar;278(5):687-703
  • 39. Protein structures from the point of view of an experimentalist Structures from large Structures with amount of s no experimental data: experimental data Homology Modeling, Threading, Fragment-based, Ab Initio NMR XRAY Do not trust Not sure Trust XRAY NMR [Equivalent] Resolution
  • 41. Protein structures from the point of view of an experimentalist Structures from large Structures with amount of s no experimental data: experimental data Structures Homology Modeling, from sparse Threading, experimental data Fragment-based, Ab Initio NMR XRAY Do not trust Not sure Trust
  • 42. What is sparse experimental data? In XRAY: Low resolution - When you do not have enough XRAY reflections 500 reflections In NMR: - When you do not have enough NOEs (e.g. no side-chain NOEs) or any NOEs Sparse data from other methods: -Distance restraints from cross-linking and mass-spectroscopy -Distance restraints from spin-labeling and electron paramagnetic resonance (EPR) spectroscopy - Protein size, shape, radius of gyration from small angle Xray scattering (SAXS)
  • 43. Why use incomplete experimental data for protein structure determination? 1) Some biological questions (e.g. prediction of function from protein fold) may not require high structural accuracy Ubiquitin 2) Some biologically interesting proteins are too difficult to study by any high-res method: - proteins with extended flexible regions -large proteins - fibrillar and membrane proteins Flexible Fibrils Membrane proteins Large proteins
  • 44. Protein structures from the point of view of an experimentalist Experimental methods s Homology Modeling, Structures Threading, from sparse Fragment-based, experimental data Ab Initio NMR XRAY Do not trust Not sure Trust EPR Mass SAXS spec
  • 45. Regular NMR structure determination Standard Complete Force-field NOEs Dihedral restraints
  • 46. Sparse NMR structure determination Standard Sparse Force-field NOEs Dihedral restraints
  • 47. Sparse NMR structure determination Advanced Force-field: solvation term full electrostatic Sparse NOEs Dihedral restraints
  • 48. Sparse NMR structure determination Advanced Force-field: solvation term full electrostatic Sparse Knowledge-based NOEs potentials Fragment information Dihedral restraints
  • 49. Sparse NMR structure determination Advanced Homology Force-field: information solvation term full electrostatic Sparse knowledge-based NOEs potentials Fragment information Dihedral restraints
  • 50. Pushing the boundaries of protein structure determination David Baker Ad Bax Gaetano Jens Meiler David Michele George Rose Klaus Montelione Wishart Vendruscolo Schulten University of NIH Johns Vanderbilt Washington Rutgers University University Hopkins University University of University of Illinois University of Alberta CS-Rosetta Cambridge Rosetta RosettaEPR CS23D CS-DP-Rosetta LINUS NAMD CHESHIRE Jeffrey Skolnick Yang Zhang Andrzej Hans Lewis Kay Julia Charles L. Kolinski Kalbitzer Forman-Kay Brooks III Georgia Institute University of University of University of Technology Michigan Universität Warsaw Toronto University Regensburg of Michigan TOUCHSTONEX iTASSER University of Toronto PERMOL CHESHIRE TASSER CABS-NMR Rosetta ENSEMBLE CHARMM
  • 51. Rosetta-family methods Developed by 12 labs Sequence 50 people Secondary Structure Low-resolution folding 3- and 9-residue fragments David Baker Best low-res decoy selection Model quality evaluation Full-atom refinement Full-atom side-chain restoration vdW repulsive Small backbone moves Side chain Backbone optimization optimization
  • 52. Rosetta-family methods Method Year Restraints Rosetta 1996-1999 Rosetta-NMR 2000 NOEs Rosetta-NMR-RDC 2002 NOEs, RDCs CS-Rosetta 2008 CS CS-DP-Rosetta 2010 CS, unassigned NOEs iterative-CS-RDC- 2010 CS, RDC, backbone NOE Rosetta NOEs PCS-ROSETTA 2011 Pseudo-contact shifts Rosetta-EPR 2011 EPR data CS-HM Rosetta 2012 CS, homology RASREC Rosetta 2011-2012 CS, methyl NOEs, RDCs
  • 53. How is sparse data used in Rosetta-family methods? Sequence Secondary Structure Low-resolution folding Best low-res decoy selection 3- and 9-residue fragments David Baker Chemical shifts NOEs, RDCs, PCSs, Homology, etc Model quality evaluation Full-atom refinement Full-atom side-chain restoration vdW repulsive Small backbone moves Side chain Backbone optimization optimization
  • 54. Performance of iterative Rosetta for backbone-only NMR data
  • 55. What are the criteria of Rosetta simulation success? 1) RMSD with respect to the lowest/best score model should be within 2Å for more than 60% of models Effect of experimental data Successful simulation Failed simulation 2) The converged structures should be clearly lower in energy than all significantly different (RMSD greater than 7 Å 3) The structures generated with experimental data should be at least as low in energy as those generated without experimental data or even lower/better
  • 56. Problems with validation of Rosetta models. 1) It is not clear if the Rosetta success criteria are universal for all scenarios 2) Agreement with experimental data is not very meaningful because the data is sparse. 3) Rfree like validation is difficult (and also not meaningful) because experimental data is sparse 4) Independent experimental data for validation will likely be unavailable 5) Normality-based scores for model validation (e.g. ResProx) will likely fail to detect inaccurate but highly-idealized Rosetta models Need for developing an independent model validation protocol
  • 57. Problem with informational content of protein models from sparse data The experimental data is over-powered by knowledge based information Rosetta scoring function Fragment idealization Bias by fragments from other proteins Biased by restraints from homologous proteins VS. Typical male Sparse experimental data Model Sparse data Typical Accurate Accurate Normal
  • 58. Protein structures from the point of view of an experimentalist Structures from large amount of Structures with s no experimental data: experimental data Structures Homology Modeling, from sparse Threading, experimental data Fragment-based, Rosetta, etc. Ab Initio NMR XRAY Wait for scintific Do not trust Trust community to validate
  • 59. What’s up with the “no free lunch” thing? You can not build an accurate high-resolution model of protein structure without getting high-quality experimental data with your sweat and blood 1) There is no substitute for a large amount of experimental data. If you do not do experiment, you do not get the information relevant to your specific experimental conditions (e.g. protein construct, sample conditions, etc). You can not get the same level of accuracy with sparse data or theoretical models 2) If you have an easy protein, do a full-blown structure determination 3) If you have no choice other than using sparse data, do not over-interpret your structure model.
  • 60. This is not gonna happen any time soon Success of theoretical methods is still limited to very small proteins. Many theoretical models are biased, over-normalized, low-resolution, or simply inaccurate. Accuracy and high-resolution of models from sparse data is questionable.