1. B1 219 ROBERT M. STROUD Understanding crystallography and structures Interactive – Bring conundra – Laboratory course: Crystallize a protein Determine structure Visit ‘Advanced Light Source’ (ALS) for data
2. Determining Atomic Structure X-ray crystallography = optics l ~ 1.5Å (no lenses) Bond lengths ~1.4Å Electrons scatter X-rays; ERGO X-rays ‘see electrons’ Resolution –Best is l/2 Typical is 1 to 3 Å Accuracy of atom center positions ±1/10 Resolution 2q
3. Where do X-rays come from?=accelerating or decelerating electrons
4. Resources: http://www.msg.ucsf.edu Computing Calculation software-all you will ever need On line course for some items: http://www-structmed.cimr.cam.ac.uk/course.html Dr Chris Waddling msg.ucsf.edu Dr James Holton UCSF/LBNL Crystallography accessible to no prior knowledge of the field or its mathematical basis. The most comprehensive and concise reference Rhodes' uses visual and geometric models to help readers understand the basis of x-ray crystallography. http://bl831.als.lbl.gov/~jamesh/movies/
6. Topics Summmary: Resources 1 Crystal lattice optical analogues photons as waves/particles 2 Wave addition complex exponential 3. Argand diagram Repetition ==sampling fringe function 5. Molecular Fourier Transform Fourier Inversion theorem sampling the transform as a product 6. Geometry of diffraction 7. The Phase problem heavy atom Multiple Isomorphous Replacement) MIR Anomalous Dispersion Multi wavelength Anomalous Diffraction MAD/SAD 8. Difference Maps and Errors 9. Structure (= phase) Refinement Thermal factors Least squares Maximum likelihood methods 10. Symmetry –basis, consequences
7. Topics 11. X-ray sources: Storage rings, Free Electron Laser (FELS) 12. Detector systems 13. Errors, and BIG ERRORS! –RETRACTIONS 14. Sources of disorder 15. X-ray sources: Storage rings, Free Electron Laser (FELS)
8. The UCSF beamline 8.3.1 UCSF mission bay If automated- why are there errors? What do I trust? Examples of errors trace sequence backwards, mis assignment of helices etc
12. The universe of protein structures: Our knowledge about protein structures is increasing.. 65,271 protein structures are deposited in PDB (2/15/2010). This number is growing by > ~7000 a year Growing input from Structural Genomics HT structure determination (>1000 structures a year)
13. X-Ray Crystallography for Structure Determination Goals: 1. How does it work 2. Understand how to judge where errors may lurk 3. Understand what is implied, contained in the Protein Data Bank PDB http://www.pdb.org/pdb/home/home.do Resolution: - suspect at resolutions >3 Å R factor, and Rfree: statistical ‘holdout test’ Wavelength ~ atom size Scattering from electrons = electron density Adding atoms? how Observations Intensity I(h,k,l) = F(hkl).F*(hkl) Determine phases y(hkl) Inverse Fourier Transform === electron density Judging electron density- How to interpret? Accuracy versus Reliability
16. The Process is re-iterative, and should converge-but only so far! Crystal Intensities I(h,k,l) Electron density r(x,y,z) calculated I(h,k,l) Known: Amino acid sequence Ligands Bond lengths angles Constraints on geometry Phases f(h,k,l) Experimental heavy atom labels selenium for sulfur Trial & error similar structure Atom positions (x,y,z)
17. Resolution dmin = l /2 sin (qmax) differs from Rayleigh criterion dmin = l /2 sin (qmax) is the wavelength of the shortest wave used to construct the density map
18. The Rayleigh Criterion The Rayleigh criterion is the generally accepted criterion for the minimum resolvable detail - the imaging process is said to be diffraction-limited when the first diffraction minimum of the image of one source point coincides with the maximum of another. compared with sin (qmax) = l /2 dmin
19. How do we judge the Quality of structure? 2. Overall quality criteria: agreement of observations with diffraction calculated from the interpreted structure. 3. Since we refine the structure To match the Ihkloverfitting ? Define Rfree for a ‘hold-out ‘ set of observations. 4. OK? R < 20%, R free< 25% 5. But the experimental errors in measuring Fo are ~ 3%. inadequate models of solvent, atom motion, anharmonicisity 6 Accuracy ~ 0.5*res*R
20. Crystal lattice is made up of many ‘Unit Cells’ Unit cell dimensions are 3 distances a,b,c and angles between them a,b,g h A ‘section’ through Scattering pattern of a crystal l=0 Note symmetry, Absences for h=even k=even k Causes Sampling in ‘scattering space’ Repetition in ‘Real space’
21. Crystal lattice is made up of many ‘Unit Cells’ Unit cell dimensions are 3 distances a,b,c and angles between them a,b,g h Vabc = |a.(bxc)| Vabc = |b.(cxa)| Vabc = |c.(axb)| k Causes Sampling in ‘scattering space’ Repetition in ‘Real space’
22. Scattering Adding up the scattering of Atoms: ‘interference’ of waves Waves add out of phase by 2p[extra path/l]
23. In general they add up to something amplitude In between -2f and +2f. For n atoms
27. Optical Equivalent: eg slide projector; leave out the lens..Optical diffraction = X-ray diffraction object Image of the object object Remove the lens= observe scattering pattern film object object
30. vectors revisited… Vectors have magnitude and direction Position in a unit cell r = xa + yb + zcwhere a, b, c are vectors, x,y,z are scalars 0<x<1 a.b = a b cos (q) projection of a onto b -called ‘dot product’ axb = a b sin (q) a vector perpendicular to a, and b proportional to area in magnitude -- called cross product volume of unit cell = (axb).c = (bxc).a = (cxa).b = -(bxa).c = -(cxb).a additivity: a + b = b + a if r = xa + yb + zc and s = ha* + kb* + lc* then r.s = (xa + yb + zc).(ha* + kb* + lc* ) = xha.a* + xka.b* + xla.c* +yhb.a* + ykb.b* + ylb.c* + zhc.a* +zhc.a* + zkc.b* + zlc.c* as we will see, the components of the reciprocal lattive can be represented in terms of a* + b* + c*, where a.a* = 1, b.b*=1, c.c*=1 and a.b*=0, a.c*=0 etc.. r.s = (xa + yb + zc ).(ha* + kb* + lc* ) = xha.a* + xka.b* + xla.c* +yhb.a* + ykb.b* + ylb.c* + zhc.a* +zhc.a* + zkc.b* + zlc.c* = xh + yk + zl
31. Adding up the scattering of Atoms: ‘interference’ of waves 2pr.S
32. Adding up the scattering of Atoms: ‘interference’ of waves F(S) 2pr.S
38. Revision notes on McClaurin’s theorem. It allows any function f(x) to be defined in terms of its value at some x=a value ie f(a), and derivatives of f(x) at x=a, namely f’(a), f’’(a), f’’’(a) etc
45. Argand Diagram.. F(S) = |F(s)| eiq Intensity = |F(s)|2 How to represent I(s)? |F(s)|2 =F(S) .F*(S) proof? Recall F*(S) is the complex conjugate of F(S) = |F(s)| e-iq so |F(s)|2 =|F(s)|[cos(q) + isin(q)].|F(s)|[cos(q) - isin(q)] =|F(s)|2 [cos2 (q) + sin2 (q)] =|F(s)|2 R.T.P. F(S) 2pr.S q
46. Argand Diagram.. F(S) = |F(s)| eiq Intensity = |F(s)|2 How to represent I(s)? I(s) = |F(s)|2 =F(S) .F*(S) proof? Where F*(S) is defined to be the ‘complex conjugate’ of F(S) = |F(s)| e-iq so |F(s)|2 =|F(s)|[cos(q) + isin(q)].|F(s)|[cos(q) - isin(q)] =|F(s)|2 [cos2 (q) + sin2 (q)] =|F(s)|2 R.T.P. F(S) 2pr.S q
47. F*(S) is the complex conjugate of F(S), = |F(s)| e-iq (c+is)(c-is)=cos2q -cqisq+ cqisq+ sin2q so |F(s)|2 =F(S) .F*(S) -2pr.S -q F*(S)
48.
49. Origin Position is arbitrary.. proof.. So the origin is chosen by choice of: a) conventional choice in each space group -eg Often on a major symmetry axis- BUT for strong reasons—see ‘symmetry section’. Even so there are typically 4 equivalent major symmetry axes per unit cell.. b) chosen when we fix the first heavy metal (or Selenium) atom position, -all becomes relative to that. c) chosen when we place a similar molecule for ‘molecular replacement’ = trial and error solution assuming similarity in structure.
51. and why do we care? How much difference will it make to the average intensity? average amplitude? if we add a single Hg atom?
52. The ‘Random Walk’ problem? (p33.1-33.3) What is the average sum of n steps in random directions? (What is the average amplitude <|F(s)|> from an n atom structure?) -AND why do we care?!........ How much difference from adding a mercury atom (f=80).
53. The average intensity for an n atom structure, each of f electrons is <I>= nf2 The average amplitude is Square root of n, times f
54. and why do we care? How much difference will 10 electrons make to the average intensity? average amplitude? average difference in amplitude? average difference in intensity? if we add a single Hg atom?
55. and why do we care? How much difference will 10 electrons make to the average intensity? 98,000 e2 average amplitude? 313 average difference in amplitude? average difference in intensity? if we add a single Hg atom?
56. and why do we care? How much difference will 10 electrons make to the average intensity? 98,000 e2 average amplitude? 313 e average difference in amplitude? 2.2% of each amplitude! average difference in intensity? 98,100-98000=100 (1%) if we add a single Hg atom?
57. and why do we care? or Hg atom n=80e How much difference will 80 electrons make to the average intensity? 98,000 e2 average amplitude? 313 e average difference in amplitude? 18 % of each amplitude! average difference in intensity? 104,400-98000=6400 (6.5%)
58.
59. WILSON STATISTICS What is the expected intensity of scattering versus the observed for proteins of i atoms,? on average versus resolution |s|?
60.
61. Bottom Lines: This plot should provide The overall scale factor (to intercept at y=1) The overall B factor In practice for proteins it has bumps in it, they correspond to predominant or strong repeat distances in the protein. For proteins these are at 6Å (helices) 3Å (sheets), and 1.4Å (bonded atoms)
62. Topic: Building up a Crystal 1 Dimension Scattering from an array of points, is the same as scattering from one point, SAMPLED at distances ‘inverse’ to the repeat distance in the object The fringe function Scattering from an array of objects, is the same as scattering from one object, SAMPLED at distances ‘inverse’ to the repeat distance in the object eg DNA
72. Transform of two hoizontal lines defined y= ± y1 F(s) = Int [x=0-infexp{2pi(xa+y1b).s} + exp{2pi(xa-y1b).s}dVr ] =2 cos (2py1b).s * Int [x=0-infexp{2pi(xa).s dVr] for a.s=0 the int[x=0-infexp{2pi(xa).s dVr] = total e content of the line for a.s≠0 int[x=0-infexp{2pi(xa).s dVr] = 0 hence r(r)isa line at a.s= 0 parallel to b, with F(s) = 2 cos (2py1b).s -along a vertical line perpendicular to the horizontal lines. Transform of a bilayer.. b=53Å
116. s0sss Adding up the scattering of Atoms: ‘interference’ of waves I(S)= F(S).F(S)* s1 S s0 radius = 1/l F(S) 2pr.S
117.
118.
119. Sum of 7 atoms scattering Result is a wave of amplitude F(S) phase f(s) f7=8 electrons 2pr7.S F(S) 2pr1.S f1=6 electrons
120. Sum of 7 atoms scattering Result is a wave of amplitude F(S) phase F(S) f7=8 electrons 2pr7.S i = √(-1) sin(q) F(S) cos(q) 2pr1.S f1=6 electrons e(iq) = cos(q) + isin(q)
121. Sum of 7 atoms scattering Result is a wave of amplitude |F(S)| phase F(S) f7=8 electrons 2pr7.S i = √(-1) sin(q) F(S) cos(q) 2pr1.S f1=6 electrons e(iq) = cos(q) + isin(q) F(S) = f1 e(2pir1.S) + f2 e(2pir2.S) +….
122. Sum of 7 atoms scattering Result is a wave of amplitude |F(S)| phase F(S) f7=8 electrons 2pr7.S i = √(-1) sin(q) F(S) cos(q) 2pr1.S f1=6 electrons e(iq) = cos(q) + isin(q) F(S) = Sjfj e(2pirj.S)
123. Sum of 7 atoms scattering f7=8 electrons 2pr7.S F(S) 2pr1.S f1=6 electrons
126. The Process is re-iterative, and should converge-but only so far! Crystal Intensities I(h,k,l) Electron density r(x,y,z) Known: Amino acid sequence Ligands Bond lengths angles Constraints on geometry Phases f(h,k,l) Experimental heavy atom labels selenium for sulfur Trial & error similar structure Atom positions (x,y,z)
127. Scattering pattern is the Fourier transform of the structure F(S) = Sjfj e(2pirj.S) Structure is the ‘inverse’ Fourier transform of the Scattering pattern r(r) = SF(S) e(-2pir.S)
141. This is all there is? YES!! Scattering pattern is the Fourier transform of the structure FT F(S) = Sjfj e(2pirj.S) FT-1 Structure is the ‘inverse’ Fourier transform of the Scattering pattern 1/a r(r) = SF(S) e(-2pir.S) a FT b 1/b FT-1
142. But we observe |F(S)|2 and there are ‘Phases’ Scattering pattern is the Fourier transform of the structure FT F(S) = Sjfj e(2pirj.S) FT-1 Structure is the ‘inverse’ Fourier transform of the Scattering pattern 1/a r(r) = SF(S) e(-2pir.S) a FT Where F(S) has phase And amplitude b 1/b FT-1
143. This is all there is? Scattering pattern is the Fourier transform of the structure FT F(S) = Sjfj e(2pirj.S) S FT-1 Structure is the ‘inverse’ Fourier transform of the Scattering pattern 1/a r(r) = SF(S) e(-2pir.S) a FT b F(h,k,l) = Sjfj e(2pi(hx+ky+lz)) S 1/b h=15, k=3, r(x,y,z) = SF(h,k,l) e(-2pir.S) FT-1
144. Relative Information in Intensities versus phases r(r) duck r(r) cat |F(S)| F(S)= Sjfj e(2pirj.S) F(S) duck F(S) cat f(s) |F(S)|duck r(r) = SF(S) e(-2pir.S) f(s) cat Looks like a …..
145. Relative Information in Intensities versus phases r(r) duck r(r) cat |F(S)| F(S)= Sjfj e(2pirj.S) F(S) duck F(S) cat f(s) |F(S)|duck r(r) = SF(S) e(-2pir.S) f(s) cat Looks like a CAT PHASES DOMINATE: -Incorrect phases = incorrect structure -incorrect model = incorrect structure -incorrect assumption = incorrect structure
146. The Process is re-iterative, and should converge-but only so far! Crystal Intensities I(h,k,l) Electron density r(x,y,z) Known: Amino acid sequence Ligands Bond lengths angles Constraints on geometry Phases f(h,k,l) Experimental heavy atom labels selenium for sulfur Trial & error similar structure Atom positions (x,y,z)
162. The Process is re-iterative, and should converge-but only so far! Crystal Intensities I(h,k,l) Electron density r(x,y,z) Known: Amino acid sequence Ligands Bond lengths angles Constraints on geometry Phases f(h,k,l) Experimental heavy atom labels selenium for sulfur Trial & error similar structure Atom positions (x,y,z)