How to Troubleshoot Apps for the Modern Connected Worker
Computational Protein Design. 1. Challenges in Protein Engineering
1. Computational Protein Design
1. Challenges in Protein Engineering
Pablo Carbonell
pablo.carbonell@issb.genopole.fr
iSSB, Institute of Systems and Synthetic Biology
Genopole, University d’Évry-Val d’Essonne, France
mSSB: December 2010
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 1 / 40
2. Outline
1 The Protein Design Cycle
2 Locating the Substitutions
3 Types of Protein Interactions
4 Engineering Protein Activity
5 Introducing the Substitutions
6 Screening and Library Creation
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 2 / 40
3. Outline
1 The Protein Design Cycle
2 Locating the Substitutions
3 Types of Protein Interactions
4 Engineering Protein Activity
5 Introducing the Substitutions
6 Screening and Library Creation
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 3 / 40
4. Protein Engineering
Protein engineering is a technology that alters protein structures in order to
improve their properties in applications such as pharmaceuticals, green chemistry
and biofuels.
The main challenge is to build more accurate models to predict which
substitutions are the best candidates to insert in the parent protein in order to
enhance the desired property.
Both experimental data and in silico predictions can contribute to the model.
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 4 / 40
5. Protein Engineering
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 5 / 40
6. The Protein Engineering Cycle
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 6 / 40
7. Computational Protein Design in the Engineering Cycle
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 7 / 40
8. Outline
1 The Protein Design Cycle
2 Locating the Substitutions
3 Types of Protein Interactions
4 Engineering Protein Activity
5 Introducing the Substitutions
6 Screening and Library Creation
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 8 / 40
9. Locating the Substitutions
How to select the best residues to mutate in the
parent protein?
If detailed structural information on the parent
enzyme is available, a rational approach can
be applied to the design
When partial information on structure is
available, a semi-rational approach is used
If there is no information available, then a
random search is used
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 9 / 40
10. Choosing the Right Strategy
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 10 / 40
11. Additivity and Cooperativity Effects
Additivity of the effects of substitutions is
rarely seen when screening mutants
In order to avoid dead ends, typically a
screening strategy is designed based on
building libraries with simultaneous mutations,
in order to find cooperativity effects
Testing for simultaneous mutations comes at
the cost of a larger screening
Natural evolution, however, has favored
single-step mutations beneficial, although
neutral drift in this case has probably allowed
Additivity/cooperativity experiments searching for high affinity
for a larger search in the sequence space antibody variants.
[Chodorge et al., 2008]
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 11 / 40
12. Outline
1 The Protein Design Cycle
2 Locating the Substitutions
3 Types of Protein Interactions
4 Engineering Protein Activity
5 Introducing the Substitutions
6 Screening and Library Creation
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 12 / 40
13. Types of Protein Interactions
Protein-ligand binding Protein-nucleotide
(drug-target, enzyme-substrate) (DNA/RNA) binding)
Protein-peptide interaction Protein-protein interaction
Protein-Protein interactions
Protein-protein complexes
homo-oligomeric hetero-oligomeric
non-obligate obligate
(weak and strong) transient permanent
Adapted from [Perkins et al., 2010]
[Nooren and Thornton, 2003]
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 13 / 40
14. Protein Specificity and Promiscuity
Multispecificity : broad partner specificity
(multiple substrates, proteins, ligands)
Small molecule ligand : similar chemical
structure, usually with stereoselectivity
Proteins or peptides : structural similar motifs
rather than sequence motifs
Promiscuity : the ability to participate in a
function other than the native one
(moonlighting)
Allostery : regulation of the protein by binding
of some ligand (the effector) at the allosteric
site
Conformational selection
Lock and key Induced fit [Boehr et al., 2009]
[Fischer, 1894] [Koshland, 1958]
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 14 / 40
15. Protein Specificity and Promiscuity: The Case of PPIs
PPI : any physical binding between proteins that occur
in vivo in the cell
PPI screening methods still have some limitations
Y2H : high FP-rate
TAP-MS : limited scalability single-interface multi-interface
Luminiscence-based methods, proteome chips,
co-immunoprecipitation / MS, real-time analysis (3rd
generation DNA-seq)
Transient and PTM-dependent interactions are often
missed
Biological context : developmental stage,
co-localization, protein modifications, presence of
cofactors, presence of other binding partners
Protein hubs : highly connected proteins, related to
essentiality, robustness, modularity, evolvability. Party [Kim et al., 2006]
and date hubs: under debate
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 15 / 40
16. Data Sources
Enzymatic activity
BRENDA: experimental parameters
KEGG, MetaCyc: metabolic networks
Catalytic Site Atlas: catalytic sites
Data validation and prediction
GeneMANIA: lists of genes with functionally similar or shared properties
STRING: based on genomic context, HT experiments, co-expression, literature
ComPASS : assign confidence to an interaction detected by MS
Primary PPI databases
DIP, BioGRID, IntAct, MINT
Common languages: PSICQUIC: expression, co-localization, genetic, metabolic,
signaling pathways, experimental data, SBML
Building the network: Cytoscape
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 16 / 40
17. Outline
1 The Protein Design Cycle
2 Locating the Substitutions
3 Types of Protein Interactions
4 Engineering Protein Activity
5 Introducing the Substitutions
6 Screening and Library Creation
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 17 / 40
18. Overview of Protein Engineering Technology
From a need to adjust enzyme properties for industrial processes ...
... to the challenge of generating novel proteins for therapeutic and biomedical
applications
Goals:
Increased catalytic function related to the parent
Altered specificity, stereospecificity, or affinity to interacting partners
Increased stability
A paradigm shift in the last 2
Property Parameters decades:
Thermostability T50 PCR and recombinant gene
Catalytic activity kcat , KM , kcat /KM
technologies
(kcat /KM )A /(kcat /KM )B
Binding specificity
Kd , KI Recreation of evolution in the
Ka = 1/Kd lab
Binding affinity
∆G = −RT ln 1/Kd
Computer algorithms
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 18 / 40
19. Goal 1. Increasing the Thermostability
Thermostability quantifies the ability of protein’s secondary and tertiary
structures to withstand high temperatures, avoiding denaturation.
Thermostability is typically measured experimentally by T50 , the temperature at
which 50% of the proteins are inactivated in 10 minutes.
Increasing the thermostability can be considered the first step in protein
engineering, in order to make the protein tolerant to a greater range of amino acid
substitutions.
Main design techniques:
Sequence-based design: comparison through multiple alignments
Structure-based approach: assumes that a more rigid protein will be more stable at
high temperatures
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 19 / 40
20. Goal 2. Increasing the Catalytic Activity
How to quantify enzyme activity? Michaelis-Menten model of kinetics
k1
E +S ES E +P (1)
k−1 k2
d[ES]
= k1 [E][S] − [ES](k−1 + k2 ) (2)
dt
d[P]
= k2 [ES] (3)
dt
k2 is also known as kcat or turnover rate (in more
complex cases kcat is function of several rates)
kcat alone is not enough, we need to quantify the affinity
of the enzyme to the substrate
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 20 / 40
21. Enzyme Kinetics
Assumptions
First assumption: the concentration of the substrate-bound enzyme [ES] is
approximately constant compared with the rate of change of the concentration of
substrate [S] and product [P]:
d[ES]
= k1 [E][S] − [ES](k−1 + k2 ) ≈ 0 (4)
dt
Second assumption: the total concentration of enzyme [E]0 does not change
with time:
[E]0 = [E] + [ES] ≈ const (5)
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 21 / 40
22. The Michaelis constant KM
0 = k1 [S]([E]0 − [ES]) − [ES](k−1 + k2 ) (6)
k1 [S][E]0 = k1 [S][ES] + [ES](k−1 + k2 ) (7)
k−1 + k2
[S][E]0 = [S][ES] + [ES] (8)
k1
(9)
KM : Michaelis constant
k−1 + k2
KM = (10)
k1
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 22 / 40
23. The Michaelis Constant KM and the steady-state flux
Rate of product formation (flux):
d[P] [S]
= v = k2 [ES] = k2 [E]0 (11)
dt KM + [S]
vmax [S] 1
v = = vmax (12)
KM + [S] 1 + KM
[S]
KM can be measured as the concentration of substrate [S] that corresponds to a
product formation yield half of the maximum:
vmax
v = (13)
2
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 23 / 40
24. Determining KM from the concentration curve
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 24 / 40
25. Evaluating Enzyme Efficiency
kcat /KM is often used as a specificity constant to compare relative enzyme rates
of reaction of pairs of substrates transformed by an enzyme.
For an enzyme acting simultaneously on two substrates SA , SB at rates vA , vB
A A
vA kcat /KM [SA ]
= B B
(14)
vB kcat /KM [SB ]
At [SA ] = [SB ], kcat /KM provides a measure of substrate promiscuity efficiency
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 25 / 40
26. Goal 3. Protein Binding Affinity and Specificity
Proteins can bind to different partners:
Protein-ligand binding: interaction with a small molecule, such as drug-target or
enzyme-substrate
Protein-nucleotide (DNA/RNA) binding: in transcription regulation, promoters,
etc.
Protein-protein interaction:
Permanent or obligated: in multi-units proteins, it could have a structural or functional
role
Transient: in signaling, transport, and regulation
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 26 / 40
27. 3.1. Protein Binding Affinity
Dissociation constant
k1
A+B AB (15)
k−1
d[AB]
= k1 [A][B] − k−1 [AB] (16)
dt
In equilibrium:
0 = k1 [A][B] − k−1 [AB] (17)
k−1 [A][B]
kd = = (18)
k1 [AB]
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 27 / 40
28. 3.1. Protein Binding Affinity
Affinity constant
1
ka = (19)
kd
In antibodies:
kforward
Ab + Ag AbAg (20)
kback
Binding free energy
1
∆G = −RT ln ka = −RT ln (21)
kd
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 28 / 40
29. Simplified Thermodynamics of an Enzymatic Reaction
[Jonas and Hollfelder, in Protein Engineering Handbook, (2009)]
Ground-state binding (KM )
Transition-state binding (Ktx )
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 29 / 40
30. 3.2. Protein Binding Specificity
These concepts are central to modern protein design, in applications such as drug
design, biosynthesis and degradation
Binding specificity to some partner is determined by comparing either kcat /KM , ka ,
or kd for all partners
KI : inhibition constant. When an inhibitor competes with a ligand
Multispecificity : the protein has broad partner specificity : multiple substrates,
proteins, or ligands
Small molecule ligand : similar chemical structure, usually with stereoselectivity
Proteins or peptides : structural similar motifs rather than sequence motifs
Promiscuity : the ability to participate n a function other than the native one
Allostery : regulation of a protein by binding of some ligand (the effector)
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 30 / 40
31. Thermodynamics of a Reaction with 2 Competing Substrates
[Desari and Miller, in Protein Engineering Handbook, (2009)]
Specificity reflects differences in the absolute heights of the transition states
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 31 / 40
32. Outline
1 The Protein Design Cycle
2 Locating the Substitutions
3 Types of Protein Interactions
4 Engineering Protein Activity
5 Introducing the Substitutions
6 Screening and Library Creation
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 32 / 40
33. Introducing the Substitutions
Site-directed (saturation) mutagenesis
1 Cloning the DNA of interest into a plasmid vector
2 The plasmid DNA is denatured to produce single strands
3 A synthetic oligonucleotide with desired mutation (point
mutation, deletion, or insertion) is annealed to the target
region
4 Extending the mutant oligonucleotide using a plasmid
DNA strand as the template
5 The heteroduplex is propagated by transformation in E.
coli.
Error-prone PCR
Modifications of standard PCR methods, designed to alter
and enhance the natural error rate of the polymerase
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 33 / 40
34. Outline
1 The Protein Design Cycle
2 Locating the Substitutions
3 Types of Protein Interactions
4 Engineering Protein Activity
5 Introducing the Substitutions
6 Screening and Library Creation
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 34 / 40
35. Recombination and DNA-shuffling
A natural approach to making multiple
mutations is recombination
Circular permutation: to alter protein
topology
DNA-shuffling: to perform functional
domain or motif shuffling in vitro
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 35 / 40
36. Recombinant Protein Folding
E. coli is a typically first choice for expressing a heterologous protein
However, numerous recombinant proteins fail to fold into soluble form when
expressed in E. coli
Some misfolding-related issues
Multidomains proteins usually require the assistance of folding modulators such as
chaperones as/or foldases
The environment (crowding, pH, osmolarity, etc.)
Post-translational modifications such as disulfide bond formation or glycoslylation (usually
confined to extra-cytoplasmic compartments)
Two possible outcomes for a misfolded protein:
Insoluble aggregation into inclusion bodies
Degradation: proteolysis
E. coli expressing human leptin as
inclusion body
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 36 / 40
37. Directed Evolution
A remarkable property of proteins is their evolvability: they can adapt under
pressure of selection by changing their behavior, function or even fold
Inspired by natural evolution, directed evolution uses iterative rounds of random
mutation and artificial selection or screening to discover protein variants with novel
functionalities
An iterative process:
Identifying a good starting sequence, usually containing some level of latent
promiscuity
Creation of a library of variants
Selecting variants with improved function (mutation and screening)
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 37 / 40
38. From Natural Enzymes to Protein Engineering
to Computational Protein Design
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 38 / 40
39. Computational Protein Design
1. Challenges in Protein Engineering
Pablo Carbonell
pablo.carbonell@issb.genopole.fr
iSSB, Institute of Systems and Synthetic Biology
Genopole, University d’Évry-Val d’Essonne, France
mSSB: December 2010
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 39 / 40
40. Bibliography I
David D. Boehr, Ruth Nussinov, and Peter E. Wright. The role of dynamic conformational ensembles in biomolecular recognition. Nature chemical biology, 5
(11):789–796, November 2009. ISSN 1552-4469. doi: 10.1038/nchembio.232. URL http://dx.doi.org/10.1038/nchembio.232.
Matthieu Chodorge, Laurent Fourage, Gilles Ravot, Lutz Jermutus, and Ralph Minter. In vitro DNA recombination by L-Shuffling during ribosome display
affinity maturation of an anti-Fas antibody increases the population of improved variants. Protein Engineering Design and Selection, 21(5):343–351, May
2008. doi: 10.1093/protein/gzn013. URL http://dx.doi.org/10.1093/protein/gzn013.
Philip M. Kim, Long J. Lu, Yu Xia, and Mark B. Gerstein. Relating three-dimensional structures to protein networks provides evolutionary insights. Science
(New York, N.Y.), 314(5807):1938–1941, December 2006. ISSN 1095-9203. doi: 10.1126/science.1136174. URL
http://dx.doi.org/10.1126/science.1136174.
D. E. Koshland. Application of a Theory of Enzyme Specificity to Protein Synthesis. Proceedings of the National Academy of Sciences of the United States of
America, 44(2):98–104, February 1958. ISSN 0027-8424. URL http://view.ncbi.nlm.nih.gov/pubmed/16590179].
Irene M. Nooren and Janet M. Thornton. Diversity of protein-protein interactions. The EMBO journal, 22(14):3486–3492, July 2003. ISSN 0261-4189. doi:
10.1093/emboj/cdg359. URL http://dx.doi.org/10.1093/emboj/cdg359.
James R. Perkins, Ilhem Diboun, Benoit H. Dessailly, Jon G. Lees, and Christine Orengo. Transient Protein-Protein Interactions: Structural, Functional, and
Network Properties. Structure, 18(10):1233–1243, October 2010. ISSN 09692126. doi: 10.1016/j.str.2010.08.007. URL
http://dx.doi.org/10.1016/j.str.2010.08.007.
Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 40 / 40