Bioinformatics in the Bourne Lab.

Bioinformatics in the Bourne Lab

Philip E. Bourne
pbourne@ucsd.edu

BILD 94
May 3, 2012

August 14, 2009

5/3/12 UCSD BILD 94 1

Some Personal Background ….

5/3/12 UCSD BILD 94 2

The Life of One Scientist – The Early Years
So That You Might Not Make the Same Mistakes

• My high school
teacher Mr. Wilson • The opportunity to
said I would be a
failure at chemistry live in different
• My PhD is in places shaped my
chemistry life
• Good friends are
5/3/12 UCSD BILD 94
forever 4

40+ Years Later

Ten Simple Rules for Starting a Company
PLoS Comp Biol 2012 8(3) 1002439

5/3/12 UCSD BILD 94 5

PhD in Physical Chemistry

5/3/12 UCSD BILD 94 7

Always Loved Computing

5/3/12 UCSD BILD 94
Circa 1974 8

Postdoctoral Work – The Molecular
Basis of How the Body Works

• Regrets: never
learnt another
language
5/3/12 UCSD BILD 94 9

Post Doc

5/3/12 UCSD BILD 94 10

Some Things Stay with You Your Whole
Life

5/3/12 UCSD BILD 94 11

Senior Scientist HHMI Columbia
University New York

• Driven not by career but
wanting to live in New York
City
5/3/12 UCSD BILD 94 12

~1990 Got Involved with the The Human
Genome

• Was only possible by
applying computers to
problems in biology

• Developed algorithms
to support physical and
genetic mapping of Chr
13

5/3/12 UCSD BILD 94 13

Came to UCSD to Apply Computers to
Big Biological Problems

• Possibly the best place in the
world to do computational
biology
5/3/12 UCSD BILD 94 14

The Protein Kinase Family
•A large family
important to signal
transduction in
eukaryotes and many
bacteria.
•Phosphotransferases:
transfer phosphate
group from ATP to
Ser/Thr or Tyr residue on
target protein,
producing a range of
downstream signaling
effects.
•PKA: an example of a
typical protein kinase
(TPK) fold, shown in
“open book” format

5/3/12 UCSD BILD 94 16

Sometime Ya Got to Just Do It Yourself

5/3/12 UCSD BILD 94 17

The Growth of Data is A Major Driver
in Biology
Number of released entries

Year
5/3/12 UCSD BILD 94 18

Demo

5/3/12 UCSD BILD 94 19

Big Research Questions in the Lab
1. Can we improve how science is
disseminated and
comprehended?
2. What is the ancestry of the
protein structure universe and
what can we learn from it?
3. Are there alternative ways to
represent proteins from which
we can learn something new?
4. What really happens when we
take a drug?
5. Can we contribute to the
treatment of neglected
{tropical} diseases?

August 14, 2009

5/3/12 UCSD BILD 94 20

Studying Evolution
Through Structure

5/3/12 UCSD BILD 94 21

Nature’s Reductionism
There are ~ 20300 possible proteins
>>>> all the atoms in the Universe

11.2M protein sequences from
10,854 species (source RefSeq)

38,221 protein structures
yield 1195 domain folds (SCOP 1.75)
5/3/12 UCSD BILD 94 22

Initial Question:
With the current coverage of proteomes
by structure and assuming we know a
high percentage of all folds, is structure
a useful discriminator of species?

5/3/12 UCSD BILD 94 23

Chapter 2 Initial Findings

Song Yang
Russ Doolittle, Post Doc UC Berkeley
Professor Department of Chemistry and Biochemistry
Center for Molecular Genetics UCSD
UCSD

Yang, Doolittle & Bourne (2005) PNAS 102(2) 373-8

5/3/12 UCSD BILD 94 24

To Answer this Question We Only Need to
Make Use of Existing Resources

• SCOP – Further catalogs Nature’s
reductionism into structural domains, folds,
families and superfamilies

• SUPERFAMILY assigns the above to fully
sequenced proteomes

5/3/12 UCSD BILD 94 25

The SCOP Hierarchy v1.75
Based on 38221 Structures

7

1195

1962

3902

110800

5/3/12 UCSD BILD 94 26

Is Structure a Useful Discriminator of Species? -
Maybe…
Distribution among the three kingdomsas taken from SUPERFAMILY
Eukaryota (650)

153/14
135

• Superfamily distributions
would seem to be 10
21/2 118
310/0
related to the complexity 645/49
387
of life
9/1
12 29/0
17
42
68/0
• Update of the work of
Caetano-Anolles2 (2003) Archaea (416) Bacteria (564)
Genome Biology 13:1563
SCOP fold (765 total)
Any genome / All genomes

5/3/12 UCSD BILD 94 27

Method – Distance Determination
Presence/Absence Data Matrix
organisms

(FSF)
SCOP
SUPERFAMILY C. intestinalis C. briggsae F. rubripes
a.1.1 1 1 1
a.1.2 1 1 1
a.10.1 0 0 1
a.100.1 1 1 1
a.101.1 0 0 0
a.102.1 0 1 1
a.102.2 1 1 1
Distance Matrix

C. intestinalis C. briggsae F. rubripes

C. intestinalis 0 101 109
C. briggsae 0 144
F. rubripes 0

Chapter 2 Initial Findings
5/3/12 UCSD BILD 94 28

Is Structure a Useful Discriminator of
Species? - Yes

Archaea Bacteria Eukaryota

The method cleanly placed all species in their
correct superkingdoms
5/3/12 UCSD BILD 94 29

The Answer Would Appear to be Yes

• It is possible to
generate a reasonable
tree of life from merely
the presence or
absence of
superfamilies (FSFs)
within a given
proteome

5/3/12 UCSD BILD 94 30

Environmental Influence

Chris Dupont
Scripps Institute of Oceanography
UCSD
DuPont, Yang, Palenik, Bourne. 2006 PNAS 103(47) 17822-17827

5/3/12 UCSD BILD 94 31

Consider the Distribution of Disulfide Bonds
among Folds
• Disulphides are only stable under
oxidizing conditions Eukaryota
• Oxygen content gradually accumulated
during the earth’s evolution 31.9%
(43/135)
• The divergence of the three kingdoms
occurred 1.8-2.2 billion years ago
0% 14.4%
• Oxygen began to accumulate ~ 2.0 (0/10) 4.7% (17/118)

billion years ago (18/387)

• Logical deduction – disulfides more 0% 16.7%
5.9%
prevalent in folds (organisms) that 1
(0/2) (1/17) (7/42)

evolved later Archaea Bacteria
• This would seem to hold true
• Can we take this further?
SCOP fold (708 total)

5/3/12 UCSD BILD 94 32

Evolution of the Earth
• 4.5 billion years of change
• 300+50K
• 1-5 atmospheres
• Constant photoenergy
• Chemical and geological
changes
• Life has evolved in this time

• The ocean was the “cradle”
for 90% of evolution

5/3/12 UCSD BILD 94 33

Theoretical Levels of Trace Metals and Oxygen in the
Deep Ocean Through Earth’s History
• Whether the deep ocean became
oxic or euxinic following the rise
Bacteria Eukarya
in atmospheric oxygen (~2.3 Gya)
Archaea
1 is debated, therefore both are
Oxygen
0.5 shown (oxic ocean-solid lines,

(O2 in arbitrary units, Zn and Fe in moles L-1
0 euxinic ocean-dashed lines).
1.00E-08
Zinc 1.00E-12

Concentration
1.00E-16
1.00E-20
• The phylogenetic tree symbols at
Iron
1.00E-06
1.00E-09 the top of the figure show one
1.00E-12
1.00E-15
1.00E-07
idea as to the theoretical periods
Cobalt 1.00E-09
of diversification for each
Manganese
1.00E-11
Superkingdom.
4.5 4 3.5 3 2.5 2 1.5 1 0.5 0
Billions of years before present

Replotted from Saito et al, 2003
Inorganica Chimica Acta 356: 308-318

5/3/12 UCSD BILD 94 34

Superfamily Distribution As Well As Overall
Content Has Changed
a.1.1 a.1.2 a.1.1 a.1.2
a.104.1 a.110.1

Bacteria Fe
a.104.1
a.119.1
a.2.11
a.110.1
a.138.1
a.24.3
Eukaryotic Fe a.119.1
a.2.11
a.138.1
a.24.3

superfamilies
a.24.4
a.3.1
a.25.1
a.39.3 superfamilies a.24.4
a.3.1
a.25.1
a.39.3

a.56.1 a.93.1 a.56.1 a.93.1

b.1.13 b.2.6 b.1.13 b.2.6

b.3.6 b.33.1 b.3.6 b.33.1

b.70.2 b.82.2 b.70.2 b.82.2

c.56.6 c.83.1 c.56.6 c.83.1

c.96.1 d.134.1 c.96.1 d.134.1

d.15.4 d.174.1 d.15.4 d.174.1

d.178.1 d.35.1 d.178.1 d.35.1

d.44.1 d.58.1 d.44.1 d.58.1

e.18.1 e.19.1 e.18.1 e.19.1

e.26.1 e.5.1 e.26.1 e.5.1

f.21.1 f.21.2 f.21.1 f.21.2

f.24.1 f.26.1 f.24.1 f.26.1

g.35.1 g.36.1 g.35.1 g.36.1

g.41.5 g.41.5

5/3/12 UCSD BILD 94 35

Hypothesis
• Emergence of cyanobacteria changed oxygen
concentrations
• Impacted metal concentrations in the ocean
• Organisms used new metals in new ways to
evolve new biological processes eg complex
signaling
• This in turn further impacted the environment

5/3/12 UCSD BILD 94 36

Big Research Questions in the Lab
1. Can we improve how science is
disseminated and
comprehended?
2. What is the ancestry of the
protein structure universe and
what can we learn from it?
3. Are there alternative ways to
represent proteins from which
we can learn something new?
4. What really happens when we
take a drug?
5. Can we contribute to the
treatment of neglected
{tropical} diseases?

August 14, 2009

5/3/12 UCSD BILD 94 37

Our Motivation
• Tykerb – Breast cancer
• Gleevac – Leukemia, GI
cancers
• Nexavar – Kidney and liver
cancer
• Staurosporine – natural product
– alkaloid – uses many e.g.,
antifungal antihypertensive

5/3/12 UCSD BILD 94 38
Collins and Workman 2006 Nature Chemical Biology 2 689-700
Motivators

Our Broad Approach
• Involves the fields of:
– Structural bioinformatics
– Cheminformatics
– Biophysics
– Systems biology
– Pharmaceutical chemistry

• L. Xie, L. Xie, S.L. Kinnings and P.E. Bourne 2012 Novel Computational Approaches to Polypharmacology as a
Means to Define Responses to Individual Drugs, Annual Review of Pharmacology and Toxicology 52: 361-379
• L. Xie, S.L. Kinnings, L. Xie and P.E. Bourne 2012 Predicting the Polypharmacology of Drugs: Identifying New Uses
Through Bioinformatics and Cheminformatics Approaches in Drug Repurposing M. Barrett and D. Frail (Eds.) Wiley
and Sons. (available upon request)

5/3/12 UCSD BILD 94 39

Approach - Need to Start with a 3D Drug-
Receptor Complex – Either Experimental or
Modeled
Generic Name Other Name Treatment PDBid

Lipitor Atorvastatin High cholesterol 1HWK, 1HW8…

Testosterone Testosterone Osteoporosis 1AFS, 1I9J ..

Taxol Paclitaxel Cancer 1JFF, 2HXF, 2HXH

Viagra Sildenafil citrate ED, pulmonary 1TBF, 1UDT,
arterial 1XOS..
hypertension

Digoxin Lanoxin Congestive heart 1IGJ
failure
5/3/12 UCSD BILD 94 40

A Reverse Engineering Approach to
Drug Discovery Across Gene Families
Characterize ligand binding Identify off-targets by ligand
site of primary target binding site similarity
(Geometric Potential) (Sequence order independent
profile-profile alignment)
Extract known drugs
or inhibitors of the
primary and/or off-targets

Search for similar
small molecules …

Dock molecules to both
primary and off-targets

Statistics analysis
of docking score
correlations 41
5/3/12 Xie and Bourne 2009
Bioinformatics 25(12) 305-312

Characterization of the Ligand Binding
Site - The Geometric Potential

 Conceptually similar to hydrophobicity
or electrostatic potential that is
dependant on both global and local
environments
• Initially assign C atom with a
value that is the distance to the
environmental boundary
• Update the value with those of
surrounding C atoms
dependent on distances and
orientation – atoms within a
10A radius define i

Pi cos( i) 1.0
GP P
neighbors Di 1.0 2.0 Xie and Bourne 2007 BMC Bioinformatics, 8(Suppl 4):S9

5/3/12 UCSD BILD 94 42

Discrimination Power of the Geometric
Potential
4
binding site
non-binding site
3.5

3 • Geometric
2.5 potential can
2 distinguish
1.5
binding and
1
non-binding
0.5
sites
0 100 0
11

22

33

44

55

66

77

88

99
0

Geometric Potential Geometric Potential Scale
For Residue Clusters
5/3/12 UCSD BILD 94 43

Local Sequence-order Independent Alignment with
Maximum-Weight Sub-Graph Algorithm
Xie and Bourne 2008 PNAS, 105(14) 5441

Structure A Structure B

LER

VKDL

LER

VKDL

• Build an associated graph from the graph representations of two
structures being compared. Each of the nodes is assigned with a weight
from the similarity matrix
• The maximum-weight clique corresponds to the optimum alignment of
the two structures
5/3/12 UCSD BILD 94 44

Similarity Matrix of Alignment

Chemical Similarity
• Amino acid grouping: (LVIMC), (AGSTP), (FYW), and (EDNQKRH)
• Amino acid chemical similarity matrix

Evolutionary Correlation
• Amino acid substitution matrix such as BLOSUM45
• Similarity score between two sequence profiles

i i i i
d f a Sb fb Sa
i i

fa, fb are the 20 amino acid target frequencies of profile a
and b, respectively
Sa, Sb are the PSSM of profile a and b, respectively

5/3/12 UCSD BILD 94 45

The Problem with Tuberculosis
• One third of global population infected
• 1.7 million deaths per year
• 95% of deaths in developing countries
• Anti-TB drugs hardly changed in 40 years
• MDR-TB and XDR-TB pose a threat to
human health worldwide
• Development of novel, effective and
inexpensive drugs is an urgent priority

5/3/12 UCSD BILD 94 46

The TB-Drugome
1. Determine the TB structural proteome

2. Determine all known drug binding sites
from the PDB

3. Determine which of the sites found in 2
exist in 1

4. Call the result the TB-drugome
Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976

5/3/12 UCSD BILD 94 47

1. Determine the TB Structural
Proteome

3, 996 2, 266 284

1, 446

• High quality homology models from ModBase
(http://modbase.compbio.ucsf.edu) increase structural
coverage from 7.1% to 43.3%
5/3/12 UCSD BILD 94 48

2. Determine all Known Drug
Binding Sites in the PDB
• Searched the PDB for protein crystal structures
bound with FDA-approved drugs
• 268 drugs bound in a total of 931 binding sites

140

120

100
Acarbose
No. of drugs

Darunavir Alitretinoin
80
Conjugated
60
estrogens
40 Chenodiol
20
Methotrexate
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37

No. of drug binding sites
5/3/12 UCSD BILD 94 49

Map 2 onto 1 – The TB-Drugome
http://funsite.sdsc.edu/drugome/TB/

Similarities between the binding sites of M.tb proteins (blue),
UCSD BILD 94
and binding sites containing approved drugs (red).

Bioinformatics in the Bourne Lab.

Recommandé

Recommandé

Contenu connexe

Plus de Philip Bourne

Plus de Philip Bourne (20)

Dernier

Dernier (20)

Bioinformatics in the Bourne Lab.

Notes de l'éditeur