SlideShare une entreprise Scribd logo
1  sur  24
Télécharger pour lire hors ligne
The	
  BioAssay	
  Research	
  Database	
  
                A	
  Pla4orm	
  to	
  Support	
  the	
  Collec:on,	
  Management	
  and	
  
                                             Analysis	
  of	
  Chemical	
  Biology	
  Data	
  	
  




                                                                                  hCp://bard.nih.gov	
  
ACS	
  Na'onal	
  Mee'ng	
  
New	
  Orleans	
                                                                  @AskTheBARD	
  
April	
  7,	
  2013	
  
Direct	
  Contributors	
  
NIH Molecular Libraries – Glenn McFadden, Ajay Pillai
NIH Chemical Genomics Center – Chris Austin (PI), John Braisted, Marc
Ferrer, Rajarshi Guha, Ajit Jadhav, Dac-Trung Nguyen, Tyler Peryea, Noel
Southall, Henrike Veith
Broad Institute – Benjamin Alexander, Jacob Asiedu, Kay Aubrey, Joshua
Bittker, Steve Brudz, Simon Chatwin, Paul Clemons, Vlado Dancik, Siva
Dandapani, Andrea DeSouza, Dan Durkin, David Lahr, Jeri Levine, Judy
McGloughlin, Phil Montgomery, Jose Perez, Stuart Schreiber (PI), Gil
Walzer, Xiaorong Xiang
University of New Mexico – Cristian Bologa, Steve Mathias, Tudor Oprea,
Larry Sklar, Oleg Ursu, Anna Waller, Jeremy Yang
University of Miami – Saminda Abeyruwan, Hande Küküc, Vance
Lemmon, Ahsan Mir, Magdalena Przydzial, Kunie Sakurai, Stephan
Schürer, Uma Vempati, Ubbo Visser
Vanderbilt University – Eric Dawson, Bill Graham, Craig Lindsley, Shaun
Stauffer
Sanford-Burnham Medical Research Institute – “T.C.” Chung, Jena
Diwan, Michael Hedrick, Gavin Magnuson, Siobhan Malany, Ian Pass,
Anthony Pinkerton, Derek Stonich
Scripps Research Institute – Yasel Cruz, Mark Southern
BARD: BioAssay Research Database
BARD’s mission is to enable novice and expert scientists to
effectively utilize MLP data to generate new hypotheses
•    Unique collaboration amongst NIH and academic centers
     with expertise in screening and software development
•    Developed as an open-source, industrial-strength platform
     to support public translational research.
•    Provides opportunity to address existing cheminformatics barriers
       o  Deploy predictive models
       o  Foster new methods to interpret chemical biology data
       o  Enable private data sharing
       o  Develop and adopt a Assay Data Standard with tools to:
            o    Annotate assays to a minimum standards and definitions
            o    Integrate and extend existing ontologies for meaningful experiment
                 descriptions
            o    Enable assay creation, registration and modification
       o    Provide an easy-to-use portal and an advanced desktop
            client
Engagement	
  &	
  Milestones	
  
Summer	
  2011	
               MLP issues administrative supplement and call for proposals to
                               create the Molecular Libraries Biological Database
January	
  	
  2012	
          Inaugural	
  mee'ng	
  of	
  MLPCN	
  Stakeholders	
  &	
  NIH	
  MLP	
  PT	
  
February	
  2012	
             Update	
  on	
  progress-­‐	
  data	
  extrac'on	
  &	
  annota'on,	
  test	
  plaKorm	
  
                               selec'on,	
  GUI	
  design	
  &	
  test,	
  Outreach	
  
March	
  2012	
                BARD	
  Program	
  Kick-­‐off	
  
April	
  2012	
                Outreach	
  strategy	
  &	
  tac'c	
  session	
  at	
  UNM	
  w/	
  subteam	
  
May	
  –	
  July	
  2012	
     Discussions	
  with	
  and	
  reviews	
  of	
  Amgen,	
  Vertex,	
  Novar's,	
  Sanofi	
  assay	
  
                               registra'on	
  and	
  chem-­‐bio	
  informa'on	
  query	
  systems	
  
November	
  2012	
             Conducted	
  mul'-­‐level	
  usability	
  interviews	
  on	
  BARD	
  GUI	
  &	
  func'on	
  w/	
  
                               Dir.	
  Computa'on,	
  Informa'cs/Lab	
  Mgr,	
  TA	
  Lead,	
  Dir.	
  Chem,	
  Med	
  chem,	
  
                               Db	
  developer,	
  Cmpd	
  curator	
  
January	
  	
  2013	
          BARD	
  Review	
  by	
  Ext.	
  Sci	
  Panel	
  &	
  Public	
  alpha	
  release	
  (CAP,	
  REST	
  API,	
  Web	
  
                               &	
  Desktop	
  clients)	
  
March	
  2013	
                BARD	
  limited	
  beta-­‐release	
  –	
  then	
  transi'on	
  to	
  enabling	
  science	
  
BARD	
  Technology	
  Components	
  
                           Define & Register
                                Assays




                                                            Enable Hypothesis Generation
                          Data Dictionary – std terms
                          Catalog of Assay Protocols



                           High Quality Data &
                            Result Deposition
                              Calculations & Results
                          Project-experiment association




                              Query & Interpret
                                Information
                              Intuitive Guided Queries
                          Cross Assay & SAR centric views
                                Advance applications



Novice	
     Expert	
  
Where	
  Are	
  We	
  today?	
  
CAP, Data Dictionary,                    Dictionary defined as
and Results                              OWL using Protégé
Deposition Data
model created &                          Annotations for 85%
populated                                of MLPCN
                                         experiments &
CAP UI with View and                     projects loaded via
basic editing                            spreadsheet

Warehouse loaded                         Manual annotation of
with all PubChem                         AIDs ~70% completed
AIDs and results                         by centers

                                         ~95% of PubChem
Warehouse loaded                         result types mapped
with GO terms, KEGG                      to BARD dictionary
terms, and DrugBank
annotations                              ~70% of PubChem
                                         columns mapped to
                                         BARD result types
The	
  BARD	
  Data	
  Warehouse	
  
•  Running on MySQL with replication
•  0.85 TB of data…
  –  151M result rows
  –  46M compound rows
•  Locally deployed at UNM
•  Planning to build better packaging
  –  VM based deployment
Open	
  Source	
  As	
  Far	
  as	
  Possible	
  
               http://bard.nih.gov/api




                                                    Jersey Webapps
                                                    deployed on HA
                                                      Application
                                                     Server Cluster




                    Caching Layer




ETL Database      Text Search Engine   Structure Search Engine
The	
  BARD	
  Public	
  API	
  
•  Java, REST-like, read-only, deployed on
   Glassfish cluster
•  Different functionality
   hosted in different
   containers                 API Plugins

  –  Maintenance, security
  –  Stability                       Text     Struct
  –  Performance                    Search    Search


•  Versioned                      Data Warehouse

•  Fully documented
API	
  Resources	
  
•  Extensive list of
   resources covering
   many data types
•  Each resource
   supports a variety of
   sub-resources
  –  Usually linked to
     other resources
API	
  Level	
  of	
  Detail	
  
•  Supports different
   levels of detail
•  Allows clients to trade-
   off detail for speed
•  Good for mobile apps
API	
  Caching	
  	
  &	
  Storage	
  
•  Caching is enabled at resource level
•  The API supports ETags
  –  Every request returns an ETag in the header
  –  With If-None-Match, supports web caching
•  We also abuse ETags to support persistent
   references to collections
•  An ETag can refer to other ETags recursively
  –  Allows clients to create and store arbitrarily
     complex collections
•  Not permanent, not infinite!
Annota:ng	
  Data	
  
•  To best exploit the current data set, and
   encourage discoverability, we need to
   better structure the data
  –  Annotate all assays to a minimum standard
  –  Integrate and extend existing ontologies to
     support meaningful experiment descriptions
  –  Develop processes
                                                       BARD	
  Assay	
  Definition	
   Hierarchy




     and tools to                                     BARD Dictionary & Term Hierarchy



     enable assay       BioAssay Ontology   BioAssay Ontology

                                            Gene Ontology
                                                                                 BioAssay Ontology

                                                                                 Gene Ontology
                                                                                                     BioAssay Ontology




     registration       Uniprot             Uniprot                              Uniprot

                                                                                                     Chemical Ontology
                        Entrez
                                                                                 Disease Ontology


                        Unit Ontology                                                                Unit Ontology
(Pseudo)	
  Linked	
  Data	
  
•  Full text search enabled by Solr
  –  Enables filtering, faceting, auto-suggest
  –  Key entry point for users
  –  Type ahead suggestions provide guidance
•  By virtue of manual associations of data
   types, we enable “linked data”
  –  Allows searches to indicate what matched the
     query and how
  –  Solr supports sophisticated scoring schemes
•  Doesn’t yet take advantage of ontologies
Desktop	
  Client	
  
•  Support large datasets
•  Merge private &
   public data
•  Examine SAR
Web	
  Client	
  
      Google-­‐like	
  searching	
  of:	
  4,000+	
  assays,	
  35M+	
  compounds,	
  300+	
  projects	
  



                                                                                                Amazon-­‐like	
  Query	
  Cart	
  



                                                                                               Save	
  items	
  of	
  
                                                                                               interest	
  for	
  further	
  
                                                                                               analysis	
  




Filter	
  on	
  annota'ons,	
  such	
  as	
  
detec'on	
  method	
  type	
  
Community	
  Engagement	
  
•  Sustained outreach efforts
  –  7 MLPCN sites participating
•  Facilitate access, driven by compelling use-
   cases and stakeholder feedback
  –  Assay definition standard is collaboration with
     industrial partners in addition to MLPCN
•  Publish APIs for data access, first-adopters
•  A ‘BARD App Store’: Enabling new
   approaches to data integration, mining
  –  Promiscuity calculations
  –  CYP450 prediction
Extending	
  BARD	
  with	
  Plugins	
  
•  BARD supports deployment of external code
   as part of core API
•  Plugins can access the data warehouse via
   direct calls
  –  No need to go via REST API
•  Plugin resources can accept anything
  –  Text, JSON, files, links, …
•  Plugin responses can be anything
  –  Plain text, JSON, HTML, SVG, …
BARD	
  Plugin	
  Development	
  




Plugins	
  have	
  to	
  	
  
 be	
  deployable	
  
  	
  on	
  the	
  JVM	
  
BARD	
  -­‐	
  SMARTCyp	
  
•  Predicts site of metabolism by CYP450
   isoforms using 2D structures
•  Developed by Patrik Rydberg and co-
   workers
•  Released under LGPL
•  BARD plugin exposes two resources
  –  Summary HTML view
  –  Data view (JSON)
BARD	
  -­‐	
  SMARTCyp	
  




P.	
  Rydberg	
  et	
  al,	
  hgp://www.farma.ku.dk/smartcyp/	
  
BARD - BADAPPLE
    •    BioActivity Data Associative
         Promiscuity Pattern Learning Engine
    •    Associations via scaffolds for chemical
         space navigation.
         Example	
  URI*	
                       descrip'on	
  

<base>/badapple/prom/cid/      For	
  compound	
  with	
  specified	
  ID,	
  
752424	
                       return	
  scaffold	
  IDs	
  and	
  scores.	
  

<base>/badapple/prom/cid/      Addi'onal	
  sta's'cs,	
  scaffold	
  smiles,	
  
752424?expand=true	
           and	
  inDrug	
  flag.	
  	
  

<base>/badapple/prom/          For	
  scaffold	
  with	
  specified	
  ID,	
  
scafid/233	
                    return	
  sta's'cs	
  and	
  smiles.	
  
On the Horizon

         •  Reproducibility
           –  Be honest with me …


         •  Private data in the context of public data
           –  Local installs, molecule hashes


         •  Mobile
           –  Compounds as funny looking QR tags


23	
  
Long-Term Path Forward

•  BARD is not just a data store – it’s a platform
   –  Seamlessly interact with users’ preferred tools
   –  Allows the community to tailor it to their needs
   –  Serve as a meeting ground for experimental and
      computational methods
   –  Enhance collaboration opportunities
   –  Consider cloud deployment
•  Enhance the ability to translate data from
   individual experiments to systems level insight

Contenu connexe

En vedette

R & CDK: A Sturdy Platform in the Oceans of Chemical Data}
R & CDK: A Sturdy Platform in the Oceans of Chemical Data}R & CDK: A Sturdy Platform in the Oceans of Chemical Data}
R & CDK: A Sturdy Platform in the Oceans of Chemical Data}Rajarshi Guha
 
The Trans-NIH RNAi Initiative : Informatics
The Trans-NIH RNAi Initiative: InformaticsThe Trans-NIH RNAi Initiative: Informatics
The Trans-NIH RNAi Initiative : InformaticsRajarshi Guha
 
6 metabolite enrichment analysis
6  metabolite enrichment analysis6  metabolite enrichment analysis
6 metabolite enrichment analysisDmitry Grapov
 
4 partial least squares modeling
4  partial least squares modeling4  partial least squares modeling
4 partial least squares modelingDmitry Grapov
 
1 statistical analysis
1  statistical analysis1  statistical analysis
1 statistical analysisDmitry Grapov
 
5 data analysis case study
5  data analysis case study5  data analysis case study
5 data analysis case studyDmitry Grapov
 
Crunching Molecules and Numbers in R
Crunching Molecules and Numbers in RCrunching Molecules and Numbers in R
Crunching Molecules and Numbers in RRajarshi Guha
 
3 principal components analysis
3  principal components analysis3  principal components analysis
3 principal components analysisDmitry Grapov
 

En vedette (11)

R & CDK: A Sturdy Platform in the Oceans of Chemical Data}
R & CDK: A Sturdy Platform in the Oceans of Chemical Data}R & CDK: A Sturdy Platform in the Oceans of Chemical Data}
R & CDK: A Sturdy Platform in the Oceans of Chemical Data}
 
The Trans-NIH RNAi Initiative : Informatics
The Trans-NIH RNAi Initiative: InformaticsThe Trans-NIH RNAi Initiative: Informatics
The Trans-NIH RNAi Initiative : Informatics
 
6 metabolite enrichment analysis
6  metabolite enrichment analysis6  metabolite enrichment analysis
6 metabolite enrichment analysis
 
0 introduction
0  introduction0  introduction
0 introduction
 
7 network mapping i
7  network mapping i7  network mapping i
7 network mapping i
 
4 partial least squares modeling
4  partial least squares modeling4  partial least squares modeling
4 partial least squares modeling
 
1 statistical analysis
1  statistical analysis1  statistical analysis
1 statistical analysis
 
5 data analysis case study
5  data analysis case study5  data analysis case study
5 data analysis case study
 
2 cluster analysis
2  cluster analysis2  cluster analysis
2 cluster analysis
 
Crunching Molecules and Numbers in R
Crunching Molecules and Numbers in RCrunching Molecules and Numbers in R
Crunching Molecules and Numbers in R
 
3 principal components analysis
3  principal components analysis3  principal components analysis
3 principal components analysis
 

Similaire à BARD: Chemical Biology Database

Pushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the PipesPushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the PipesRajarshi Guha
 
"Ontology-centric navigation of the scientific literature"
"Ontology-centric navigation of the scientific literature""Ontology-centric navigation of the scientific literature"
"Ontology-centric navigation of the scientific literature"bridgingworlds2008
 
Final Acb All Hands 26 11 07.Key
Final Acb All Hands 26 11 07.KeyFinal Acb All Hands 26 11 07.Key
Final Acb All Hands 26 11 07.Keyguest3d0531
 
EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...
EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...
EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...ChemAxon
 
BioAssay Research Database Presentation at the Chem Axon UGM 2013
BioAssay Research Database Presentation at the Chem Axon UGM 2013BioAssay Research Database Presentation at the Chem Axon UGM 2013
BioAssay Research Database Presentation at the Chem Axon UGM 2013Andrea de Souza
 
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseTowards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseHilmar Lapp
 
Resource Description Framework Approach to Data Publication and Federation
Resource Description Framework Approach to Data Publication and FederationResource Description Framework Approach to Data Publication and Federation
Resource Description Framework Approach to Data Publication and FederationPistoia Alliance
 
Scientific Data Management
Scientific Data ManagementScientific Data Management
Scientific Data ManagementAlberto Labarga
 
Mar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working GroupMar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working GroupGenomeInABottle
 
150219 agbt giab_poster_marc
150219 agbt giab_poster_marc150219 agbt giab_poster_marc
150219 agbt giab_poster_marcGenomeInABottle
 
EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EITESANGO
 
SooryaKiran Bioinformatics
SooryaKiran BioinformaticsSooryaKiran Bioinformatics
SooryaKiran Bioinformaticscontactsoorya
 
Case Study in Linked Data and Semantic Web: Human Genome
Case Study in Linked Data and Semantic Web: Human GenomeCase Study in Linked Data and Semantic Web: Human Genome
Case Study in Linked Data and Semantic Web: Human GenomeDavid Portnoy
 
Automated and Explainable Deep Learning for Clinical Language Understanding a...
Automated and Explainable Deep Learning for Clinical Language Understanding a...Automated and Explainable Deep Learning for Clinical Language Understanding a...
Automated and Explainable Deep Learning for Clinical Language Understanding a...Databricks
 
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
 Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ... Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...Syed Ahmad Chan Bukhari, PhD
 
Ontology Web Services for Semantic Applications
Ontology Web Services for Semantic Applications Ontology Web Services for Semantic Applications
Ontology Web Services for Semantic Applications Trish Whetzel
 
Implementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSImplementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSValery Tkachenko
 

Similaire à BARD: Chemical Biology Database (20)

Pushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the PipesPushing Chemical Biology Through the Pipes
Pushing Chemical Biology Through the Pipes
 
"Ontology-centric navigation of the scientific literature"
"Ontology-centric navigation of the scientific literature""Ontology-centric navigation of the scientific literature"
"Ontology-centric navigation of the scientific literature"
 
Final Acb All Hands 26 11 07.Key
Final Acb All Hands 26 11 07.KeyFinal Acb All Hands 26 11 07.Key
Final Acb All Hands 26 11 07.Key
 
EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...
EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...
EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...
 
BioAssay Research Database Presentation at the Chem Axon UGM 2013
BioAssay Research Database Presentation at the Chem Axon UGM 2013BioAssay Research Database Presentation at the Chem Axon UGM 2013
BioAssay Research Database Presentation at the Chem Axon UGM 2013
 
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseTowards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
 
Resource Description Framework Approach to Data Publication and Federation
Resource Description Framework Approach to Data Publication and FederationResource Description Framework Approach to Data Publication and Federation
Resource Description Framework Approach to Data Publication and Federation
 
COPO kick-off meeting
COPO kick-off meetingCOPO kick-off meeting
COPO kick-off meeting
 
Scientific Data Management
Scientific Data ManagementScientific Data Management
Scientific Data Management
 
Mar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working GroupMar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working Group
 
150219 agbt giab_poster_marc
150219 agbt giab_poster_marc150219 agbt giab_poster_marc
150219 agbt giab_poster_marc
 
CV_10/17
CV_10/17CV_10/17
CV_10/17
 
Cv long
Cv longCv long
Cv long
 
EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017
 
SooryaKiran Bioinformatics
SooryaKiran BioinformaticsSooryaKiran Bioinformatics
SooryaKiran Bioinformatics
 
Case Study in Linked Data and Semantic Web: Human Genome
Case Study in Linked Data and Semantic Web: Human GenomeCase Study in Linked Data and Semantic Web: Human Genome
Case Study in Linked Data and Semantic Web: Human Genome
 
Automated and Explainable Deep Learning for Clinical Language Understanding a...
Automated and Explainable Deep Learning for Clinical Language Understanding a...Automated and Explainable Deep Learning for Clinical Language Understanding a...
Automated and Explainable Deep Learning for Clinical Language Understanding a...
 
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
 Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ... Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
 
Ontology Web Services for Semantic Applications
Ontology Web Services for Semantic Applications Ontology Web Services for Semantic Applications
Ontology Web Services for Semantic Applications
 
Implementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSImplementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTS
 

Plus de Rajarshi Guha

Pharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomePharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomeRajarshi Guha
 
Pharos: Putting targets in context
Pharos: Putting targets in contextPharos: Putting targets in context
Pharos: Putting targets in contextRajarshi Guha
 
Pharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomePharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomeRajarshi Guha
 
Pharos - Face of the KMC
Pharos - Face of the KMCPharos - Face of the KMC
Pharos - Face of the KMCRajarshi Guha
 
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS PlatformEnhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS PlatformRajarshi Guha
 
What can your library do for you?
What can your library do for you?What can your library do for you?
What can your library do for you?Rajarshi Guha
 
So I have an SD File … What do I do next?
So I have an SD File … What do I do next?So I have an SD File … What do I do next?
So I have an SD File … What do I do next?Rajarshi Guha
 
Characterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network ModelsCharacterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network ModelsRajarshi Guha
 
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action: Bridging Chemistry and Biology with Informatics at NCATSFrom Data to Action: Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATSRajarshi Guha
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical StructuresRajarshi Guha
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Rajarshi Guha
 
When the whole is better than the parts
When the whole is better than the partsWhen the whole is better than the parts
When the whole is better than the partsRajarshi Guha
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Rajarshi Guha
 
Cloudy with a Touch of Cheminformatics
Cloudy with a Touch of CheminformaticsCloudy with a Touch of Cheminformatics
Cloudy with a Touch of CheminformaticsRajarshi Guha
 
Chemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleChemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleRajarshi Guha
 
Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Rajarshi Guha
 
Quantifying Text Sentiment in R
Quantifying Text Sentiment in RQuantifying Text Sentiment in R
Quantifying Text Sentiment in RRajarshi Guha
 
PMML for QSAR Model Exchange
PMML for QSAR Model Exchange PMML for QSAR Model Exchange
PMML for QSAR Model Exchange Rajarshi Guha
 
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity DataSmall Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity DataRajarshi Guha
 

Plus de Rajarshi Guha (20)

Pharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomePharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark Genome
 
Pharos: Putting targets in context
Pharos: Putting targets in contextPharos: Putting targets in context
Pharos: Putting targets in context
 
Pharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomePharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark Genome
 
Pharos - Face of the KMC
Pharos - Face of the KMCPharos - Face of the KMC
Pharos - Face of the KMC
 
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS PlatformEnhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
 
What can your library do for you?
What can your library do for you?What can your library do for you?
What can your library do for you?
 
So I have an SD File … What do I do next?
So I have an SD File … What do I do next?So I have an SD File … What do I do next?
So I have an SD File … What do I do next?
 
Characterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network ModelsCharacterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network Models
 
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action: Bridging Chemistry and Biology with Informatics at NCATSFrom Data to Action: Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical Structures
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
 
When the whole is better than the parts
When the whole is better than the partsWhen the whole is better than the parts
When the whole is better than the parts
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
 
Cloudy with a Touch of Cheminformatics
Cloudy with a Touch of CheminformaticsCloudy with a Touch of Cheminformatics
Cloudy with a Touch of Cheminformatics
 
Chemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleChemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & Reproducible
 
Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?
 
Quantifying Text Sentiment in R
Quantifying Text Sentiment in RQuantifying Text Sentiment in R
Quantifying Text Sentiment in R
 
PMML for QSAR Model Exchange
PMML for QSAR Model Exchange PMML for QSAR Model Exchange
PMML for QSAR Model Exchange
 
Smashing Molecules
Smashing MoleculesSmashing Molecules
Smashing Molecules
 
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity DataSmall Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
 

Dernier

Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Dernier (20)

Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

BARD: Chemical Biology Database

  • 1. The  BioAssay  Research  Database   A  Pla4orm  to  Support  the  Collec:on,  Management  and   Analysis  of  Chemical  Biology  Data     hCp://bard.nih.gov   ACS  Na'onal  Mee'ng   New  Orleans   @AskTheBARD   April  7,  2013  
  • 2. Direct  Contributors   NIH Molecular Libraries – Glenn McFadden, Ajay Pillai NIH Chemical Genomics Center – Chris Austin (PI), John Braisted, Marc Ferrer, Rajarshi Guha, Ajit Jadhav, Dac-Trung Nguyen, Tyler Peryea, Noel Southall, Henrike Veith Broad Institute – Benjamin Alexander, Jacob Asiedu, Kay Aubrey, Joshua Bittker, Steve Brudz, Simon Chatwin, Paul Clemons, Vlado Dancik, Siva Dandapani, Andrea DeSouza, Dan Durkin, David Lahr, Jeri Levine, Judy McGloughlin, Phil Montgomery, Jose Perez, Stuart Schreiber (PI), Gil Walzer, Xiaorong Xiang University of New Mexico – Cristian Bologa, Steve Mathias, Tudor Oprea, Larry Sklar, Oleg Ursu, Anna Waller, Jeremy Yang University of Miami – Saminda Abeyruwan, Hande Küküc, Vance Lemmon, Ahsan Mir, Magdalena Przydzial, Kunie Sakurai, Stephan Schürer, Uma Vempati, Ubbo Visser Vanderbilt University – Eric Dawson, Bill Graham, Craig Lindsley, Shaun Stauffer Sanford-Burnham Medical Research Institute – “T.C.” Chung, Jena Diwan, Michael Hedrick, Gavin Magnuson, Siobhan Malany, Ian Pass, Anthony Pinkerton, Derek Stonich Scripps Research Institute – Yasel Cruz, Mark Southern
  • 3. BARD: BioAssay Research Database BARD’s mission is to enable novice and expert scientists to effectively utilize MLP data to generate new hypotheses •  Unique collaboration amongst NIH and academic centers with expertise in screening and software development •  Developed as an open-source, industrial-strength platform to support public translational research. •  Provides opportunity to address existing cheminformatics barriers o  Deploy predictive models o  Foster new methods to interpret chemical biology data o  Enable private data sharing o  Develop and adopt a Assay Data Standard with tools to: o  Annotate assays to a minimum standards and definitions o  Integrate and extend existing ontologies for meaningful experiment descriptions o  Enable assay creation, registration and modification o  Provide an easy-to-use portal and an advanced desktop client
  • 4. Engagement  &  Milestones   Summer  2011   MLP issues administrative supplement and call for proposals to create the Molecular Libraries Biological Database January    2012   Inaugural  mee'ng  of  MLPCN  Stakeholders  &  NIH  MLP  PT   February  2012   Update  on  progress-­‐  data  extrac'on  &  annota'on,  test  plaKorm   selec'on,  GUI  design  &  test,  Outreach   March  2012   BARD  Program  Kick-­‐off   April  2012   Outreach  strategy  &  tac'c  session  at  UNM  w/  subteam   May  –  July  2012   Discussions  with  and  reviews  of  Amgen,  Vertex,  Novar's,  Sanofi  assay   registra'on  and  chem-­‐bio  informa'on  query  systems   November  2012   Conducted  mul'-­‐level  usability  interviews  on  BARD  GUI  &  func'on  w/   Dir.  Computa'on,  Informa'cs/Lab  Mgr,  TA  Lead,  Dir.  Chem,  Med  chem,   Db  developer,  Cmpd  curator   January    2013   BARD  Review  by  Ext.  Sci  Panel  &  Public  alpha  release  (CAP,  REST  API,  Web   &  Desktop  clients)   March  2013   BARD  limited  beta-­‐release  –  then  transi'on  to  enabling  science  
  • 5. BARD  Technology  Components   Define & Register Assays Enable Hypothesis Generation Data Dictionary – std terms Catalog of Assay Protocols High Quality Data & Result Deposition Calculations & Results Project-experiment association Query & Interpret Information Intuitive Guided Queries Cross Assay & SAR centric views Advance applications Novice   Expert  
  • 6. Where  Are  We  today?   CAP, Data Dictionary, Dictionary defined as and Results OWL using Protégé Deposition Data model created & Annotations for 85% populated of MLPCN experiments & CAP UI with View and projects loaded via basic editing spreadsheet Warehouse loaded Manual annotation of with all PubChem AIDs ~70% completed AIDs and results by centers ~95% of PubChem Warehouse loaded result types mapped with GO terms, KEGG to BARD dictionary terms, and DrugBank annotations ~70% of PubChem columns mapped to BARD result types
  • 7. The  BARD  Data  Warehouse   •  Running on MySQL with replication •  0.85 TB of data… –  151M result rows –  46M compound rows •  Locally deployed at UNM •  Planning to build better packaging –  VM based deployment
  • 8. Open  Source  As  Far  as  Possible   http://bard.nih.gov/api Jersey Webapps deployed on HA Application Server Cluster Caching Layer ETL Database Text Search Engine Structure Search Engine
  • 9. The  BARD  Public  API   •  Java, REST-like, read-only, deployed on Glassfish cluster •  Different functionality hosted in different containers API Plugins –  Maintenance, security –  Stability Text Struct –  Performance Search Search •  Versioned Data Warehouse •  Fully documented
  • 10. API  Resources   •  Extensive list of resources covering many data types •  Each resource supports a variety of sub-resources –  Usually linked to other resources
  • 11. API  Level  of  Detail   •  Supports different levels of detail •  Allows clients to trade- off detail for speed •  Good for mobile apps
  • 12. API  Caching    &  Storage   •  Caching is enabled at resource level •  The API supports ETags –  Every request returns an ETag in the header –  With If-None-Match, supports web caching •  We also abuse ETags to support persistent references to collections •  An ETag can refer to other ETags recursively –  Allows clients to create and store arbitrarily complex collections •  Not permanent, not infinite!
  • 13. Annota:ng  Data   •  To best exploit the current data set, and encourage discoverability, we need to better structure the data –  Annotate all assays to a minimum standard –  Integrate and extend existing ontologies to support meaningful experiment descriptions –  Develop processes BARD  Assay  Definition   Hierarchy and tools to BARD Dictionary & Term Hierarchy enable assay BioAssay Ontology BioAssay Ontology Gene Ontology BioAssay Ontology Gene Ontology BioAssay Ontology registration Uniprot Uniprot Uniprot Chemical Ontology Entrez Disease Ontology Unit Ontology Unit Ontology
  • 14. (Pseudo)  Linked  Data   •  Full text search enabled by Solr –  Enables filtering, faceting, auto-suggest –  Key entry point for users –  Type ahead suggestions provide guidance •  By virtue of manual associations of data types, we enable “linked data” –  Allows searches to indicate what matched the query and how –  Solr supports sophisticated scoring schemes •  Doesn’t yet take advantage of ontologies
  • 15. Desktop  Client   •  Support large datasets •  Merge private & public data •  Examine SAR
  • 16. Web  Client   Google-­‐like  searching  of:  4,000+  assays,  35M+  compounds,  300+  projects   Amazon-­‐like  Query  Cart   Save  items  of   interest  for  further   analysis   Filter  on  annota'ons,  such  as   detec'on  method  type  
  • 17. Community  Engagement   •  Sustained outreach efforts –  7 MLPCN sites participating •  Facilitate access, driven by compelling use- cases and stakeholder feedback –  Assay definition standard is collaboration with industrial partners in addition to MLPCN •  Publish APIs for data access, first-adopters •  A ‘BARD App Store’: Enabling new approaches to data integration, mining –  Promiscuity calculations –  CYP450 prediction
  • 18. Extending  BARD  with  Plugins   •  BARD supports deployment of external code as part of core API •  Plugins can access the data warehouse via direct calls –  No need to go via REST API •  Plugin resources can accept anything –  Text, JSON, files, links, … •  Plugin responses can be anything –  Plain text, JSON, HTML, SVG, …
  • 19. BARD  Plugin  Development   Plugins  have  to     be  deployable    on  the  JVM  
  • 20. BARD  -­‐  SMARTCyp   •  Predicts site of metabolism by CYP450 isoforms using 2D structures •  Developed by Patrik Rydberg and co- workers •  Released under LGPL •  BARD plugin exposes two resources –  Summary HTML view –  Data view (JSON)
  • 21. BARD  -­‐  SMARTCyp   P.  Rydberg  et  al,  hgp://www.farma.ku.dk/smartcyp/  
  • 22. BARD - BADAPPLE •  BioActivity Data Associative Promiscuity Pattern Learning Engine •  Associations via scaffolds for chemical space navigation. Example  URI*   descrip'on   <base>/badapple/prom/cid/ For  compound  with  specified  ID,   752424   return  scaffold  IDs  and  scores.   <base>/badapple/prom/cid/ Addi'onal  sta's'cs,  scaffold  smiles,   752424?expand=true   and  inDrug  flag.     <base>/badapple/prom/ For  scaffold  with  specified  ID,   scafid/233   return  sta's'cs  and  smiles.  
  • 23. On the Horizon •  Reproducibility –  Be honest with me … •  Private data in the context of public data –  Local installs, molecule hashes •  Mobile –  Compounds as funny looking QR tags 23  
  • 24. Long-Term Path Forward •  BARD is not just a data store – it’s a platform –  Seamlessly interact with users’ preferred tools –  Allows the community to tailor it to their needs –  Serve as a meeting ground for experimental and computational methods –  Enhance collaboration opportunities –  Consider cloud deployment •  Enhance the ability to translate data from individual experiments to systems level insight