This document discusses using systems biology approaches to predict and explain off-target drug effects. It notes that unexpected secondary drug effects and lack of therapeutic effects are major reasons drug development fails. The document proposes using computational methods to predict all protein targets a drug may affect and using systems biology to model the consequences, including secondary effects, unexpected therapeutic effects via drug repositioning, and unexpected lack of effects. It outlines a master's project to develop models of polypharmacological drug effects by analyzing networks of drug-affected protein targets and retrieving relevant biological annotation to interpret effects.
Systems biology in polypharmacology: explaining and predicting drug secondary effects. - master project
1. Systems biology in polypharmacology:
predicting and explaining off-target
effects
Bourne lab at UCSD
Under the supervision of Pr. Bourne
Under the direction of Pr. Bart Deplancke
Andrei Kucharavy, EPFL SV 2013, Computational Biology minor
2. Problem
Image courtesy of Scannell et al. 2012 : Diagnosing the decline in pharmaceutical R&D efficiency. Nature Reviews. Drug Discovery 11, 191–200
Astra Zenca GlaxoSmithKline Sanofi Roche Holding AG Pfizer Inc
11.8 b$/drug 8.2 b$/drug 7.9 b$/drug 7.8 b$/drug 7.7 b$/drug
Pharma Big 5 drug design expenditures as of 2012 (Matthew Herper @ Forbes )
4. One disease – one gene – one drug
● Step 1: find a gene relevant to a disease
● Step 2: design small molecule inhibitor for it
● Step 3: test it on cellular animal models
● Step 4: discover secondary effects or absence of
therapeutic effect
● Step 5: modify lead to control toxicity
● Repeat steps 3-5 until no more funds available
● If you are lucky, secondary effects are minor to absent
● get more funds and move to human trials.
● Pay attention to unexpected sec. effects
● Pay attention to absence of therapeutic effect in humans
5. Unexpected pharmacological effects
Absence of
therapeutic effect:
– Main cause of rational
drug desing failure in
the 90th
– Have been overcome
with better
understanding of
biolgy
Secondary effects:
– Cyt-c : well
understood and
controlled
– Unspecific binding:
Very frequent
Hard to predict
Hard to interpret
6. Polypharmacology
● Specific agonist / antagonist
design are rare:
● protein sites similarity
● catalytic sites within complexes
● Some drugs owe their
pharmacological action to
their unspecificity:
● Encaptone
● Ibogaine
● Chlorpromazine
● Kanamycin
Polypharmacology:
– Use computational methods to
predict all the targets a small
molecule is likely perturb
– Use systems biology to predict
consequences of such
perturbation
● Secondary effects
● Unexpected therapeutic effect
(repositioning)
● Unexpected absence of
therapeutic effect (animal model
– human difference)
7. Scope of master project
● Prediction of perturbed
targets set:
– Drugdesigntech since 2009
– Bourne lab since 2007-
2008
● Analysis and
interpretation of the
perturbed targets set is
still largely manual. The
goal of this master project
is to curb this.
Image courtesy of Xie et al. (2011). Drug discovery using chemical systems
biology: weak inhibition of multiple kinases may contribute to the anti-cancer
effect of nelfinavir. PLoS Computational Biology
8. Master project environment
– EPFL engineering internship
– Tool for integrative
bioinformatics platform
– Used for biotech consulting
(polypharmacological effect
prediction)
– > 1000 drugs and ~1300 human
proteins
The Bourne Lab
– PDB RSCB, Supertarget, IEDB,
BioLit
– Reliable pipeline for drug off-target
effect prediction (4530 protein
models, 140 approved drugs)
– 7 publications in polypharmacology
9. Polypharmacological action
mechanisms recovery
● Source:
– List of proteins perturbed by a drug
● Wanted:
– Mechanisms of unexpected pharmacological action,
understandable for a biologist
● Pathway, biological entities, mechanism names
● Ordered by relevance
– Unexpected pharmacological action mechanism
model, usable for prediction on new drugs
11. Global idea
Platelet activation
Immune response onset
Th17 activation
Polypharmacological
effect model suited for
prediction on new
drugs
Polypharmacological
effect mechanism
understandable for
biology expert
12. Devil is in the details
● How to retrieve relevant annotation and sort it
by relevance?
● How to determine which targets are to be
included in the model?
13. Missiuro's information flow and
protein informativity
Image courtesy of Missiuro et al. (2009). Information flow analysis of
interactome networks. PLoS Computational Biology 5, e1000350.
● Each protein transmits some information to
all the other protein within interactome / set
of interest (otherwise evolution would have
eliminated it)
● Information can only be transmitted
through direct interaction (contact, co-
complex, participation to the same
biochemical reaction)
● The information conductance of an edge is
proportional to the interaction importance
or confidence
● The information flow is computed between
all the pairs of protein within the set
(Kirchoff laws + matrix operations).
● Set-specific informativity score is defined
for each element of interactome as sum of
all pairwise information flows
14. If Time 1:
Math Behind Information Flow
● Kirchoff law:
– For each node, except for sink and source sum entering currents equal
exiting currents
– For each edge, V = I*R = I/G
● Conductance matrix M:
● Current vector J: Voltage vector V:
● Solve M*V=J; use V to determine information flow through each node
1
2
4
3
G2
G1 G3
G4
G1+
G2
-G1 -G2 0
-G1 G1+
G3
0 -G3
-G2 0 G2+
G4
-G4
0 -G3 -G4 G3+
G4
I1=1
I2=0
I3=0
I4=-1
V1
V2
V3
V4
15. Missiuro's information flow and
protein informativity
● Advantage over betweenness
degree and edge degree:
– recovers weak multi-hub regulators
– Better at predicting essential genes
– Better at predicting genes essential
for a specific function (organ
development)
● Advantage over stocheomtric
methods:
– No need to solve 64k differential
equations (unstable!)
– Reflects not only metabolism, but
also regulation
Image courtesy of Missiuro et al. (2009). Information flow analysis of
interactome networks. PLoS Computational Biology 5, e1000350.
16. Model creation
● Recover targets affected
by drugs with a given
polypharmacological
effect
● Compute the information
circulation within
interactome for these
drugs
● Include all the targets with
a significant informativity
=> “hidden” targets
18. Not all the GO terms
are equivalent
GO term informativity (~protein info for Missiuro et al.)
– Expand annotations:
T-cell apoptosis regulation → T cell + apoptosis +immune system +...
– Define term informativity:
– Use it to compute the flow through
each term in a pair of proteins:
Informativity = conductance
– Compute total informativity within a group as a sum
of flows through each term in each pair, decided by
targets number squared
InfTerm=
STotal
STerm
=
kb⋅log(NTot)
kb⋅log (NTerm)
NTot
NTerm
Total targets
Targets annotated with a given GO term
19. Same secondary effect might have
distinct mechanisms
● Cluster affected targets by
their annotation similarity
● Compute GO-based
information circulation
within each cluster and
sort GO terms by
informativity
● Use clusters as additional
polypharmacological
action models
21. GO term informativity advantages
Map to the biological concepts
Interpretation by expert biologists => biological sense ?
(cf. Potti 2010 scandal at Duke over “metagene” signature)
Molecular relation databases typically do badly in
some cases:
Systemic effects (T-cell maturation, circadian rhythm, … )
Endocrine regulation
Central Nervous System (GO however isn't the best ontology
for this)
Ability to plug in additional data from literature
analysis (just account for confidence)
22. Implementation: case of pancreatitis
and cirrhosis
● Sec. Effects from SIDER (EMBL)
● Drug-target interaction from Bourne lab and
Drugdesigntech simulation results
● Group drugs by secondary effect
● Filter out targets that are frequently affected in
random drug collections (Student T-test)
Name Expected count
Non-random Count in random poll of random poll
PYGL_HUMAN 96.58% 4 0.7968 0.35122176
RHOC_HUMAN 95.77% 4 0.768 0.442368
GLTP_HUMAN 95.77% 4 0.768 0.442368
C43BP_HUMAN 95.77% 4 0.768 0.442368
FUT8_HUMAN 95.77% 4 0.768 0.442368
RET7_HUMAN 95.77% 4 0.768 0.442368
CP2E1_HUMAN 94.43% 5 1.4304 0.70953984
2ABA_HUMAN 93.88% 4 0.9984 0.55148544
AUHM_HUMAN 93.88% 4 0.9984 0.55148544
DX39B_HUMAN 93.66% 4 1.3536 0.44411904
NGF_HUMAN 93.49% 11 5.0112 2.33312256
NTRK1_HUMAN 93.49% 11 5.0112 2.33312256
KIF11_HUMAN 93.03% 5 1.5648 0.82308096
Proba StDev in case
27. Backbone for the interactome
information flow computation
● NIC-Nature Pathway Interaction Database
No, too small coverage
● Kegg Patwhay database
No, pathway-oriented and non-connex for atomic
interactions
● Unipathway
No, too small coverage
● Reactome.org
Yay
28. Reactome.org : idea
● Reactome.org structure:
– BioPax : xml / RDF / OWL
– Physical entities:
● Proteins, small molecules, Complexes, RNA, DNA
● Fragments of physical entities
– Interaction:
● Degradation / polymerisation / Biochemical reactions
● Molecular interaction
● Genetic interaction
– Pathways, Genes, Post-translational modifications...
29. Reactome.org : reality
● Reality of Reactome.org:
– Main connex element: ~ 22 000 entities, but 3 other
with >50 elements
– Presence of generic classes : groups of objects
– Proteins = mix between proteins, domains, groups,
groups of domains…
– 15 000 proteins, 5000 UNIPROT references
– 156 genes, 56 RNA molecules
translation / transcription regulation is not well described
31. Verification of pipeline:
Information routing decay
Image courtesy Wintermute et al. (2010). Emergent cooperation in microbial
metabolism. Molecular Systems Biology 6, 407.
32. Verfication of pipeline
Predicting target drugability
● 186 oral small-molecule drug targets from
Overington's 2006 “How many drugs are there?”
● 77 plasma membrane targets
● 1289 total plasma membrane proteins with
Uniprot references in Reactome.org
● Use the following to predict drugability:
Overall informativity
GO-term specific informativity
Target abundance (higher abidance, more off-target action
in case of total inhibition)
35. Drugability prediction with
some complexity
● Raw prediction is little better then random:
– 65% specificity, 60% selectivity
● However, if we account for:
– Non-oral, Non small molecule drugs
– Drugs developed or in development since 2006
– GO-specific informativity
– The fact Reactome.org / HiNT are bad in
representing CNS functions
● The prediction results are rather encouraging:
– 75% specificity, 90% selectivity
36. Before we can conclude
● The methods required for the information
circulation have been coded
– Information circulation for the target set
– Calculation of information variation in case of perturbed
interactome alteration
● However, before this project can be deemed
concluded
– model creation and model utilization parts have to be
assembled into a single pipeline (right now they are
separate)
– Run model creation prediction on several secondary
effects with random training / testing set validation
37. Conclusions
● GO-based information circulation method
seems to work well for secondary effect
mechanism retrieval
● Reactome.org / HiNT dataset – based
information circulation method seems to be
potentially useful for computationally assisted
drug design
● Information circulation methods for secondary
effects quantitative prediction must be tested
before this project can be concluded
38. Moving further
Finding datasets and people interested in further
development of the method:
– SNP cumulative effect
Requires ability to project on the protein 3D structure and estimate
protein activity inhibition in different contexts
– Drug Design : secondary effect prediction
Typical pharmaceutical firms datastores contain way more
information about toxicity of different compounds and allow much
more finely tuned modeling of pharmacological effects
– Difference between animal and human interactomes:
Predict unexpected polypharmacological effects upon transition
from animal to human trials
39. Acknowledgements
Pr. Philip Bourne
Pr. Bart Deplancke
Cedric Merlot
Li Xie
Spencer Blieven
Roland Diggelmann
Andreas Prlic
Julia Ponomarenko
Lilia Iakoucheva
Jiang Wang
Cole Christie
Audrey Schenker
42. If time: Improvements
● For retrieving statistically significant targets,
– abandon naïve statistical drug target filtering
– build drug-specific information flows
– recover all sufficiently informative proteins for each drug
– use that proteins to get statistically significant targets
=> avoids close miss errors
● When sorting targets:
– Sort the most significant GO terms not by their informativity,
– but by how much information flow associated to them is
perturbed by the given target set
=> avoid need to tune GO term informativity
=> better interpretability
43. If time: Improvements
● When computing the information flow
– Not consider the information flow between any pair of
proteins as constant
– Consider associated tension (voltage) as constant
– Unrelated proteins are likely to exchange less
information
● To avoid information circulation distortion due to GO
terms correlation:
– Don't use Tanimoto distance / conductance model for
GO-based term circulation
– Use the real point-to-point routing within the GO terms
graph
44. If time 1:
Random matrix theory
Molecular evolution:
Adaptive mutations = survival of the fittest
Random mutations = Kimura's drift
Tools to separate the two
Protein interaction network evolution:
Adaptative topology modifications
Random topology artefacts
phosphorilation pattern modification due to random mutations
Separating the 2=????
Nothing in biology makes sense
except in the light of evolution.
Theodius Dobjansky
45. If time 1:
Random matrix theory
In sparse matrices (~=Graphs):
Random matrices have specific eigenvalues
All eignevalues exceeding these values are non-random
Clustering can later be performed in the space generated by
the associated eigenvectors of non-random eigenvalues
47. If time 2:
Graph Databases
Tinkerpop stack: ~ SQL for Graph databases
48. If time 3:
Conclusions – general
● Graph databases are worth a try for systems
biology applications
● We need to assemble one comprehensive,
complete and WELL DOCUMENTED resource
for computational systems biology
Notes de l'éditeur
Binding: absolutely no idea whatsoever about what is going on. The target was designed to bind one single target, but often binds many others. Due to protein conformation variation, existence of complex catalytic sites and post-translational modifications of different proteins, predicting off-target binding is a nightmarish job.
Fixed tension between sink and source Each GO term shared by the sink and the source passes information current
Render Bioinformatics 100 prots name vectors “disease signatures” readable and understandable for biologists: cf. Nature Medecine 2010 retraction scandal Complementarity with pure information circulation methods for the endocrine system: concepts such as increase of blood pressure might be pretty good signals interpreted by cell membranes, but impossible to encode in the conventional interactomes