The BioAssay Research Database (BARD) aims to enable scientists to utilize data from the Molecular Libraries Program Collection (MLPCN) to generate new hypotheses. BARD provides a platform for public data sharing and analysis through intuitive query and visualization tools accessible via a web portal or desktop client. BARD integrates data from multiple sources and centers, and aims to improve data annotation and standardization to enable more meaningful experiment descriptions and discovery. The project involves ongoing community engagement and development of new analytical tools through its open API and plugin framework.
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
BARD: Chemical Biology Database
1. The
BioAssay
Research
Database
A
Pla4orm
to
Support
the
Collec:on,
Management
and
Analysis
of
Chemical
Biology
Data
hCp://bard.nih.gov
ACS
Na'onal
Mee'ng
New
Orleans
@AskTheBARD
April
7,
2013
2. Direct
Contributors
NIH Molecular Libraries – Glenn McFadden, Ajay Pillai
NIH Chemical Genomics Center – Chris Austin (PI), John Braisted, Marc
Ferrer, Rajarshi Guha, Ajit Jadhav, Dac-Trung Nguyen, Tyler Peryea, Noel
Southall, Henrike Veith
Broad Institute – Benjamin Alexander, Jacob Asiedu, Kay Aubrey, Joshua
Bittker, Steve Brudz, Simon Chatwin, Paul Clemons, Vlado Dancik, Siva
Dandapani, Andrea DeSouza, Dan Durkin, David Lahr, Jeri Levine, Judy
McGloughlin, Phil Montgomery, Jose Perez, Stuart Schreiber (PI), Gil
Walzer, Xiaorong Xiang
University of New Mexico – Cristian Bologa, Steve Mathias, Tudor Oprea,
Larry Sklar, Oleg Ursu, Anna Waller, Jeremy Yang
University of Miami – Saminda Abeyruwan, Hande Küküc, Vance
Lemmon, Ahsan Mir, Magdalena Przydzial, Kunie Sakurai, Stephan
Schürer, Uma Vempati, Ubbo Visser
Vanderbilt University – Eric Dawson, Bill Graham, Craig Lindsley, Shaun
Stauffer
Sanford-Burnham Medical Research Institute – “T.C.” Chung, Jena
Diwan, Michael Hedrick, Gavin Magnuson, Siobhan Malany, Ian Pass,
Anthony Pinkerton, Derek Stonich
Scripps Research Institute – Yasel Cruz, Mark Southern
3. BARD: BioAssay Research Database
BARD’s mission is to enable novice and expert scientists to
effectively utilize MLP data to generate new hypotheses
• Unique collaboration amongst NIH and academic centers
with expertise in screening and software development
• Developed as an open-source, industrial-strength platform
to support public translational research.
• Provides opportunity to address existing cheminformatics barriers
o Deploy predictive models
o Foster new methods to interpret chemical biology data
o Enable private data sharing
o Develop and adopt a Assay Data Standard with tools to:
o Annotate assays to a minimum standards and definitions
o Integrate and extend existing ontologies for meaningful experiment
descriptions
o Enable assay creation, registration and modification
o Provide an easy-to-use portal and an advanced desktop
client
4. Engagement
&
Milestones
Summer
2011
MLP issues administrative supplement and call for proposals to
create the Molecular Libraries Biological Database
January
2012
Inaugural
mee'ng
of
MLPCN
Stakeholders
&
NIH
MLP
PT
February
2012
Update
on
progress-‐
data
extrac'on
&
annota'on,
test
plaKorm
selec'on,
GUI
design
&
test,
Outreach
March
2012
BARD
Program
Kick-‐off
April
2012
Outreach
strategy
&
tac'c
session
at
UNM
w/
subteam
May
–
July
2012
Discussions
with
and
reviews
of
Amgen,
Vertex,
Novar's,
Sanofi
assay
registra'on
and
chem-‐bio
informa'on
query
systems
November
2012
Conducted
mul'-‐level
usability
interviews
on
BARD
GUI
&
func'on
w/
Dir.
Computa'on,
Informa'cs/Lab
Mgr,
TA
Lead,
Dir.
Chem,
Med
chem,
Db
developer,
Cmpd
curator
January
2013
BARD
Review
by
Ext.
Sci
Panel
&
Public
alpha
release
(CAP,
REST
API,
Web
&
Desktop
clients)
March
2013
BARD
limited
beta-‐release
–
then
transi'on
to
enabling
science
5. BARD
Technology
Components
Define & Register
Assays
Enable Hypothesis Generation
Data Dictionary – std terms
Catalog of Assay Protocols
High Quality Data &
Result Deposition
Calculations & Results
Project-experiment association
Query & Interpret
Information
Intuitive Guided Queries
Cross Assay & SAR centric views
Advance applications
Novice
Expert
6. Where
Are
We
today?
CAP, Data Dictionary, Dictionary defined as
and Results OWL using Protégé
Deposition Data
model created & Annotations for 85%
populated of MLPCN
experiments &
CAP UI with View and projects loaded via
basic editing spreadsheet
Warehouse loaded Manual annotation of
with all PubChem AIDs ~70% completed
AIDs and results by centers
~95% of PubChem
Warehouse loaded result types mapped
with GO terms, KEGG to BARD dictionary
terms, and DrugBank
annotations ~70% of PubChem
columns mapped to
BARD result types
7. The
BARD
Data
Warehouse
• Running on MySQL with replication
• 0.85 TB of data…
– 151M result rows
– 46M compound rows
• Locally deployed at UNM
• Planning to build better packaging
– VM based deployment
8. Open
Source
As
Far
as
Possible
http://bard.nih.gov/api
Jersey Webapps
deployed on HA
Application
Server Cluster
Caching Layer
ETL Database Text Search Engine Structure Search Engine
9. The
BARD
Public
API
• Java, REST-like, read-only, deployed on
Glassfish cluster
• Different functionality
hosted in different
containers API Plugins
– Maintenance, security
– Stability Text Struct
– Performance Search Search
• Versioned Data Warehouse
• Fully documented
10. API
Resources
• Extensive list of
resources covering
many data types
• Each resource
supports a variety of
sub-resources
– Usually linked to
other resources
11. API
Level
of
Detail
• Supports different
levels of detail
• Allows clients to trade-
off detail for speed
• Good for mobile apps
12. API
Caching
&
Storage
• Caching is enabled at resource level
• The API supports ETags
– Every request returns an ETag in the header
– With If-None-Match, supports web caching
• We also abuse ETags to support persistent
references to collections
• An ETag can refer to other ETags recursively
– Allows clients to create and store arbitrarily
complex collections
• Not permanent, not infinite!
13. Annota:ng
Data
• To best exploit the current data set, and
encourage discoverability, we need to
better structure the data
– Annotate all assays to a minimum standard
– Integrate and extend existing ontologies to
support meaningful experiment descriptions
– Develop processes
BARD
Assay
Definition
Hierarchy
and tools to BARD Dictionary & Term Hierarchy
enable assay BioAssay Ontology BioAssay Ontology
Gene Ontology
BioAssay Ontology
Gene Ontology
BioAssay Ontology
registration Uniprot Uniprot Uniprot
Chemical Ontology
Entrez
Disease Ontology
Unit Ontology Unit Ontology
14. (Pseudo)
Linked
Data
• Full text search enabled by Solr
– Enables filtering, faceting, auto-suggest
– Key entry point for users
– Type ahead suggestions provide guidance
• By virtue of manual associations of data
types, we enable “linked data”
– Allows searches to indicate what matched the
query and how
– Solr supports sophisticated scoring schemes
• Doesn’t yet take advantage of ontologies
15. Desktop
Client
• Support large datasets
• Merge private &
public data
• Examine SAR
16. Web
Client
Google-‐like
searching
of:
4,000+
assays,
35M+
compounds,
300+
projects
Amazon-‐like
Query
Cart
Save
items
of
interest
for
further
analysis
Filter
on
annota'ons,
such
as
detec'on
method
type
17. Community
Engagement
• Sustained outreach efforts
– 7 MLPCN sites participating
• Facilitate access, driven by compelling use-
cases and stakeholder feedback
– Assay definition standard is collaboration with
industrial partners in addition to MLPCN
• Publish APIs for data access, first-adopters
• A ‘BARD App Store’: Enabling new
approaches to data integration, mining
– Promiscuity calculations
– CYP450 prediction
18. Extending
BARD
with
Plugins
• BARD supports deployment of external code
as part of core API
• Plugins can access the data warehouse via
direct calls
– No need to go via REST API
• Plugin resources can accept anything
– Text, JSON, files, links, …
• Plugin responses can be anything
– Plain text, JSON, HTML, SVG, …
20. BARD
-‐
SMARTCyp
• Predicts site of metabolism by CYP450
isoforms using 2D structures
• Developed by Patrik Rydberg and co-
workers
• Released under LGPL
• BARD plugin exposes two resources
– Summary HTML view
– Data view (JSON)
22. BARD - BADAPPLE
• BioActivity Data Associative
Promiscuity Pattern Learning Engine
• Associations via scaffolds for chemical
space navigation.
Example
URI*
descrip'on
<base>/badapple/prom/cid/ For
compound
with
specified
ID,
752424
return
scaffold
IDs
and
scores.
<base>/badapple/prom/cid/ Addi'onal
sta's'cs,
scaffold
smiles,
752424?expand=true
and
inDrug
flag.
<base>/badapple/prom/ For
scaffold
with
specified
ID,
scafid/233
return
sta's'cs
and
smiles.
23. On the Horizon
• Reproducibility
– Be honest with me …
• Private data in the context of public data
– Local installs, molecule hashes
• Mobile
– Compounds as funny looking QR tags
23
24. Long-Term Path Forward
• BARD is not just a data store – it’s a platform
– Seamlessly interact with users’ preferred tools
– Allows the community to tailor it to their needs
– Serve as a meeting ground for experimental and
computational methods
– Enhance collaboration opportunities
– Consider cloud deployment
• Enhance the ability to translate data from
individual experiments to systems level insight