Antony J Williams

http://tinyurl.com/d6wodsl

Mining public domain data as a basis
for drug repurposing

Antony J Williams, Sean Ekins and Valery Tkachenko

ACS Philadelphia August 2012

Drug Repurposing
 Drug repurposing commonly
means data reexamination also!

 Lots of data mining occurs

 Then more screening which
creates more data..

 LOTS of public databases used
to examine repurposing…

Interlinked on the semantic web

Where do you get your data?
 Databases?
 Patents?
 Papers?
 Your own lab?
 Collaborators?
 All of the above?

 What is likely common to all sources? Data
Quality issues. There is no perfect database.

Public Domain Databases
 Our databases are a mess…
 Non-curated databases are proliferating errors

 We source and deposit data between databases

 Original sources of errors hard to determine

 Curation is time-consuming and challenging

Availability of libraries of FDA drugs

Johns Hopkins Clinical Compound library- made compounds available at cost

Government Databases Should
Come With a Health Warning

Williams and Ekins, DDT, 16: 747-750 (2011)

Data Errors in the NPC Browser: Analysis of Steroids

Substructure # of # of No Incomplete Complete but

Hits Correct stereochemistry Stereochemistry incorrect

Hits stereochemistry

Gonane 34 5 8 21 0

Gon-4-ene 55 12 3 33 7

Gon-1,4-diene 60 17 10 23 10

Williams, Ekins and Tkachenko
Drug Disc Today 17: 685-701 (2012)

NCATS Discovering “New Therapeutic
Uses for Existing Molecules”

58 Molecule names
and identifiers. Where
are the “structures”?

NCATS dataset
• Several groups tried to collate molecules
• Chris Lipinski provided approximately 30 unique molecules

• Simple molecule descriptors shows no difference between
compounds classified as discontinued (N= 15) or those in
clinical trials (n = 14).

• Where is the definitive set of publicly accessible molecules
for computational repurposing and analysis?

Drug structure quality is important..
 Many groups ARE doing in silico repositioning

 Integrating or using sets of FDA drugs..and if
structures are incorrect predictions will be

 Where is the definitive set of FDA approved
drugs with correct structures?

 Ideally we need linkage between in vitro data
and clinical data

We have a problem…
 Lots of data available but quality is suspect
 Errors proliferate database to database
 Data continues to flow in unabated
 When errors are identified hard to get fixed!
 Data licensing is confusing – “Open Data”
 We are “takers” not “givers” mostly…
 Standards are lacking:
 Data licensing
 Data processing – structure standardization

So what needs to happen to improve?
• Let’s agree collaboration and crowdsourcing
can help
• Provide SIMPLE ways to provide feedback
• Contribute when possible – databases should
provide feedback mechanisms
• Adopt standards for structure handling and
representation
• Adopt standards for data interchange
• Allow machine handling of data – use the
power of the semantic web

Williams, Ekins and Tkachenko, Drug Disc Today 17: 685-701 (2012)

Collaboration on Curation
 Collaborate on curation…share through standards
and open interfaces

Standardize

 Use the SRS as guidance for standardization

“Appify” curation and collaboration

• The data network is complex
• “Appify” collaboration and
curation networks
• Increasing crowdsourcing role
for data analysis

Ekins & Williams, Pharm Res, 27: 393-395, 2010.

Mobile Apps for Drug Discovery

Open Drug Discovery Teams

 Free iOS app used to expose repurposing data
 All of this data has been tweeted
http://tinyurl.com/6l9qy4f

Ekins, Clark and Williams, Mol Informatics, in Press 2012

Simple Rules for licensing “open” data
 Gather stakeholders. Decide if goals are primarily scientific,
commercial or mixed.

 Explore benefits of open licensing and drawbacks of
enclosure. Hold closely to open definitions and standards.
Do not write your own IP licenses!

 Provide simple explanations for terms of use. Use
metadata to indicate licensing terms explicitly - the
Creative Commons Rights Expression Language is a
good tool.

 Do not lock up metadata. If you can’t make the data public
domain, make the metadata public domain.
Williams, Wilbanks and Ekins.
PLoS Comput. Biol. in Press Sept.2012

Open PHACTS Project
 Develop a set of robust standards…
 Implement the standards in a semantic integration hub
 Deliver services to support drug discovery programs
in pharma and public domain
 22 partners, 8 pharmaceutical companies, 3 biotechs
 36 months project

Guiding principle is open access, open usage, open source
- Key to standards adoption -

To facilitate THIS process!
IP?
What’s the
structure?
Are they in
our file?
What’s
similar?
What’s the
Pharmacology target?
data?

Known
Pathways?
Competitors?
Working On
Connections Now?
to disease?
Expressed in
right cell type?

It’s not JUST structures of course…

Taxol: Paclitaxel Bioassay Data
 Most Bioassay data associated with structure
with one ambiguous stereocenter

Measuring data: dispensing dependencies
Data from 2 AstraZeneca patents - Ephrin pharmacophores
developed using data for 14 compounds with IC50. Different
dispensing methods give different results. Impact
hypotheses and could impact drug discovery.

Acoustic Disposable tip
Hydrophobic Hydrogen Hydrogen Observed vs.

features (HPF) bond acceptor bond donor predicted IC50

(HBA) (HBD) r

Acoustic mediated process
2 1 1 0.92
Disposable tip mediated process
0 2 1 0.80

Ekins, Olechno and Williams, Submitted 2012

Measuring data: dispensing dependencies
Acoustically-derived IC50 values were 1.5 to 276.5-fold
lower than for tip-based dispensing
• Pharmacophores and other computational models are used
to guide medicinal chemistry.

• Non tip-based methods may improve HTS results and avoid
misleading computational and statistical models.

• No analysis of influence of dispensing processes on data.

• Public databases should annotate metadata to create larger
datasets for comparing different computational methods.
How much data is reproducible, accurate, valid? The
challenge of high-throughput science.

Acknowledgments
 Sean Ekins
 Christopher Lipinski
 Joe Olechno
 John Wilbanks
 Drug Disambiguation project team
 RSC Cheminformatics Team

Thank you

Email: williamsa@rsc.org
Twitter: @chemconnector
Blog: www.chemconnector.com
SLIDES: www.slideshare.net/AntonyWilliams

Email: ekinssean@yahoo.com
Twitter: collabchem
Blog: http://www.collabchem.com/

Antony J Williams

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Antony J Williams

Similaire à Antony J Williams (20)

Dernier

Dernier (20)

Antony J Williams

Notes de l'éditeur