BARD: Chemical Biology Database

The
BioAssay
Research
Database

A
Pla4orm
to
Support
the
Collec:on,
Management
and

Analysis
of
Chemical
Biology
Data

hCp://bard.nih.gov

ACS
Na'onal
Mee'ng

New
Orleans
@AskTheBARD

April
7,
2013

Direct
Contributors

NIH Molecular Libraries – Glenn McFadden, Ajay Pillai
NIH Chemical Genomics Center – Chris Austin (PI), John Braisted, Marc
Ferrer, Rajarshi Guha, Ajit Jadhav, Dac-Trung Nguyen, Tyler Peryea, Noel
Southall, Henrike Veith
Broad Institute – Benjamin Alexander, Jacob Asiedu, Kay Aubrey, Joshua
Bittker, Steve Brudz, Simon Chatwin, Paul Clemons, Vlado Dancik, Siva
Dandapani, Andrea DeSouza, Dan Durkin, David Lahr, Jeri Levine, Judy
McGloughlin, Phil Montgomery, Jose Perez, Stuart Schreiber (PI), Gil
Walzer, Xiaorong Xiang
University of New Mexico – Cristian Bologa, Steve Mathias, Tudor Oprea,
Larry Sklar, Oleg Ursu, Anna Waller, Jeremy Yang
University of Miami – Saminda Abeyruwan, Hande Küküc, Vance
Lemmon, Ahsan Mir, Magdalena Przydzial, Kunie Sakurai, Stephan
Schürer, Uma Vempati, Ubbo Visser
Vanderbilt University – Eric Dawson, Bill Graham, Craig Lindsley, Shaun
Stauffer
Sanford-Burnham Medical Research Institute – “T.C.” Chung, Jena
Diwan, Michael Hedrick, Gavin Magnuson, Siobhan Malany, Ian Pass,
Anthony Pinkerton, Derek Stonich
Scripps Research Institute – Yasel Cruz, Mark Southern

BARD: BioAssay Research Database
BARD’s mission is to enable novice and expert scientists to
effectively utilize MLP data to generate new hypotheses
•  Unique collaboration amongst NIH and academic centers
with expertise in screening and software development
•  Developed as an open-source, industrial-strength platform
to support public translational research.
•  Provides opportunity to address existing cheminformatics barriers
o  Deploy predictive models
o  Foster new methods to interpret chemical biology data
o  Enable private data sharing
o  Develop and adopt a Assay Data Standard with tools to:
o  Annotate assays to a minimum standards and definitions
o  Integrate and extend existing ontologies for meaningful experiment
descriptions
o  Enable assay creation, registration and modification
o  Provide an easy-to-use portal and an advanced desktop
client

Engagement
&
Milestones

Summer
2011
MLP issues administrative supplement and call for proposals to
create the Molecular Libraries Biological Database
January

2012
Inaugural
mee'ng
of
MLPCN
Stakeholders
&
NIH
MLP
PT

February
2012
Update
on
progress-‐
data
extrac'on
&
annota'on,
test
plaKorm

selec'on,
GUI
design
&
test,
Outreach

March
2012
BARD
Program
Kick-‐oﬀ

April
2012
Outreach
strategy
&
tac'c
session
at
UNM
w/
subteam

May
–
July
2012
Discussions
with
and
reviews
of
Amgen,
Vertex,
Novar's,
Sanoﬁ
assay

registra'on
and
chem-‐bio
informa'on
query
systems

November
2012
Conducted
mul'-‐level
usability
interviews
on
BARD
GUI
&
func'on
w/

Dir.
Computa'on,
Informa'cs/Lab
Mgr,
TA
Lead,
Dir.
Chem,
Med
chem,

Db
developer,
Cmpd
curator

January

2013
BARD
Review
by
Ext.
Sci
Panel
&
Public
alpha
release
(CAP,
REST
API,
Web

&
Desktop
clients)

March
2013
BARD
limited
beta-‐release
–
then
transi'on
to
enabling
science

BARD
Technology
Components

Define & Register
Assays

Enable Hypothesis Generation
Data Dictionary – std terms
Catalog of Assay Protocols

High Quality Data &
Result Deposition
Calculations & Results
Project-experiment association

Query & Interpret
Information
Intuitive Guided Queries
Cross Assay & SAR centric views
Advance applications

Novice
Expert

Where
Are
We
today?

CAP, Data Dictionary, Dictionary defined as
and Results OWL using Protégé
Deposition Data
model created & Annotations for 85%
populated of MLPCN
experiments &
CAP UI with View and projects loaded via
basic editing spreadsheet

Warehouse loaded Manual annotation of
with all PubChem AIDs ~70% completed
AIDs and results by centers

~95% of PubChem
Warehouse loaded result types mapped
with GO terms, KEGG to BARD dictionary
terms, and DrugBank
annotations ~70% of PubChem
columns mapped to
BARD result types

The
BARD
Data
Warehouse

•  Running on MySQL with replication
•  0.85 TB of data…
–  151M result rows
–  46M compound rows
•  Locally deployed at UNM
•  Planning to build better packaging
–  VM based deployment

Open
Source
As
Far
as
Possible

http://bard.nih.gov/api

Jersey Webapps
deployed on HA
Application
Server Cluster

Caching Layer

ETL Database Text Search Engine Structure Search Engine

The
BARD
Public
API

•  Java, REST-like, read-only, deployed on
Glassfish cluster
•  Different functionality
hosted in different
containers API Plugins

–  Maintenance, security
–  Stability Text Struct
–  Performance Search Search

•  Versioned Data Warehouse

•  Fully documented

API
Resources

•  Extensive list of
resources covering
many data types
•  Each resource
supports a variety of
sub-resources
–  Usually linked to
other resources

API
Level
of
Detail

•  Supports different
levels of detail
•  Allows clients to trade-
off detail for speed
•  Good for mobile apps

API
Caching

&
Storage

•  Caching is enabled at resource level
•  The API supports ETags
–  Every request returns an ETag in the header
–  With If-None-Match, supports web caching
•  We also abuse ETags to support persistent
references to collections
•  An ETag can refer to other ETags recursively
–  Allows clients to create and store arbitrarily
complex collections
•  Not permanent, not infinite!

Annota:ng
Data

•  To best exploit the current data set, and
encourage discoverability, we need to
better structure the data
–  Annotate all assays to a minimum standard
–  Integrate and extend existing ontologies to
support meaningful experiment descriptions
–  Develop processes
BARD
Assay
Definition
Hierarchy

and tools to BARD Dictionary & Term Hierarchy

enable assay BioAssay Ontology BioAssay Ontology

Gene Ontology
BioAssay Ontology

Gene Ontology
BioAssay Ontology

registration Uniprot Uniprot Uniprot

Chemical Ontology
Entrez
Disease Ontology

Unit Ontology Unit Ontology

(Pseudo)
Linked
Data

•  Full text search enabled by Solr
–  Enables filtering, faceting, auto-suggest
–  Key entry point for users
–  Type ahead suggestions provide guidance
•  By virtue of manual associations of data
types, we enable “linked data”
–  Allows searches to indicate what matched the
query and how
–  Solr supports sophisticated scoring schemes
•  Doesn’t yet take advantage of ontologies

Desktop
Client

•  Support large datasets
•  Merge private &
public data
•  Examine SAR

Web
Client

Google-‐like
searching
of:
4,000+
assays,
35M+
compounds,
300+
projects

Amazon-‐like
Query
Cart

Save
items
of

interest
for
further

analysis

Filter
on
annota'ons,
such
as

detec'on
method
type

Community
Engagement

•  Sustained outreach efforts
–  7 MLPCN sites participating
•  Facilitate access, driven by compelling use-
cases and stakeholder feedback
–  Assay definition standard is collaboration with
industrial partners in addition to MLPCN
•  Publish APIs for data access, first-adopters
•  A ‘BARD App Store’: Enabling new
approaches to data integration, mining
–  Promiscuity calculations
–  CYP450 prediction

Extending
BARD
with
Plugins

•  BARD supports deployment of external code
as part of core API
•  Plugins can access the data warehouse via
direct calls
–  No need to go via REST API
•  Plugin resources can accept anything
–  Text, JSON, files, links, …
•  Plugin responses can be anything
–  Plain text, JSON, HTML, SVG, …

BARD
Plugin
Development

Plugins
have
to

be
deployable

on
the
JVM

BARD
-‐
SMARTCyp

•  Predicts site of metabolism by CYP450
isoforms using 2D structures
•  Developed by Patrik Rydberg and co-
workers
•  Released under LGPL
•  BARD plugin exposes two resources
–  Summary HTML view
–  Data view (JSON)

BARD
-‐
SMARTCyp

P.
Rydberg
et
al,
hgp://www.farma.ku.dk/smartcyp/

BARD - BADAPPLE
•  BioActivity Data Associative
Promiscuity Pattern Learning Engine
•  Associations via scaffolds for chemical
space navigation.
Example
URI*
descrip'on

<base>/badapple/prom/cid/ For
compound
with
specified
ID,

752424
return
scaffold
IDs
and
scores.

<base>/badapple/prom/cid/ Addi'onal
sta's'cs,
scaffold
smiles,

752424?expand=true
and
inDrug
flag.

<base>/badapple/prom/ For
scaffold
with
specified
ID,

scafid/233
return
sta's'cs
and
smiles.

On the Horizon

•  Reproducibility
–  Be honest with me …

•  Private data in the context of public data
–  Local installs, molecule hashes

•  Mobile
–  Compounds as funny looking QR tags

23

Long-Term Path Forward

•  BARD is not just a data store – it’s a platform
–  Seamlessly interact with users’ preferred tools
–  Allows the community to tailor it to their needs
–  Serve as a meeting ground for experimental and
computational methods
–  Enhance collaboration opportunities
–  Consider cloud deployment
•  Enhance the ability to translate data from
individual experiments to systems level insight

BARD: Chemical Biology Database

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (11)

Similaire à BARD: Chemical Biology Database

Similaire à BARD: Chemical Biology Database (20)

Plus de Rajarshi Guha

Plus de Rajarshi Guha (20)

Dernier

Dernier (20)

BARD: Chemical Biology Database