Q-Factor General Quiz-7th April 2024, Quiz Club NITW
I Don't Care Where My Data and Methods Are: A Web-Service Approach for Distributed Access to Methods, Data and Models
1. A Web-Service
Approach for
Distributed
Access to
Methods, Data
and Models
A Web-Service Approach for Distributed Rajarshi Guha
Geoffrey Fox
Access to Methods, Data and Models Kevin E. Gilbert
Marlon Pierce
David Wild
(I Don’t Care Where My Data and Methods Are)
Overview
Pub3D
Rajarshi Guha Geoffrey Fox Kevin E. Gilbert Model Exchange
Marlon Pierce David Wild
School of Informatics
Indiana University
235th ACS National Meeting
6th March, 2008
2. A Web-Service
Outline Approach for
Distributed
Access to
Methods, Data
and Models
Rajarshi Guha
Geoffrey Fox
Kevin E. Gilbert
Marlon Pierce
David Wild
Overview
Overview
Pub3D
Model Exchange
Pub3D
Model Exchange
3. A Web-Service
Local versus Distributed Resources Approach for
Distributed
Access to
Methods, Data
and Models
Rajarshi Guha
Geoffrey Fox
Kevin E. Gilbert
Marlon Pierce
David Wild
Overview
Pub3D
Model Exchange
Access arbitrary resources (methods, data, applications)
All resources look like function calls
4. A Web-Service
Combining Data & Functionality Approach for
Distributed
Access to
Methods, Data
and Models
Rajarshi Guha
Geoffrey Fox
Kevin E. Gilbert
But what do we mean by “combining” all these Marlon Pierce
David Wild
resources?
Overview
Different levels of complexity
Pub3D
Keep track of new additions to a database via an RSS
Model Exchange
feed
Use Yahoo Pipes to combine and manipulate output
easily
Write full fledged programs in your language of choice
Web services allow us to support all these activities in a
uniform manner
5. A Web-Service
Web Services - What’s Available Approach for
Distributed
Access to
Methods, Data
and Models
Rajarshi Guha
Geoffrey Fox
Kevin E. Gilbert
Cheminformatics functionality Marlon Pierce
David Wild
Molecular descriptors Overview
Similarity (2D, 3D) Pub3D
Format conversion Model Exchange
Depictions
3D coordinate generation
Summarized on ChemBioGrid
Dong, X. et al, J. Chem. Inf. Model., 2007, 47, 1303–1307
http://www.chembiogrid.org/projects/proj_core.html
6. A Web-Service
Web Services for Modeling Approach for
Distributed
Access to
Methods, Data
and Models
Computational web services can be viewed as wrappers Rajarshi Guha
around the actual program Geoffrey Fox
Kevin E. Gilbert
Marlon Pierce
Since predictive models are a common feature in David Wild
cheminformatics we’d like to support them as well Overview
This leads to a number of requirements Pub3D
Ability to develop models Model Exchange
Store (deploy) models
Use the models via the web service infrastructure
We provide a computational infrastructure based on R
which supports
Feature selection
Model development
Model deployment
Arbitrary R code
Guha, R. J. Chem. Inf. Model., 2008, 48, 456–464
7. A Web-Service
What’s Available? Approach for
Distributed
Access to
Methods, Data
and Models
Rajarshi Guha
Regression (OLS, CNN, RF) Geoffrey Fox
Kevin E. Gilbert
Classification (LDA) Marlon Pierce
David Wild
Clustering (k-means)
Overview
Feature selection (stepwise and exhaustive) Pub3D
Automated model generation Model Exchange
Load X, Y data, build linear and non-linear models with
optional LOO CV
Deployed models
Random forests for 60 NCI DTP cell lines
Cytotoxicity
Ames mutagenicity
Wang, H. et al., J. Chem. Inf. Model., 2007, 47, 2063–2076
Guha, R. and Sch¨rer, S., J. Comp. Aid. Molec. Des, 2008, ASAP
u
8. A Web-Service
Web Services - What’s Available Approach for
Distributed
Access to
Methods, Data
and Models
Rajarshi Guha
Geoffrey Fox
Kevin E. Gilbert
Marlon Pierce
David Wild
Data sources
Overview
We maintain a number of databases
Pub3D
PubChem mirror (mainly for local research)
Model Exchange
Pub3D
Directly accessible via queries in SQL
We also encapsulate specific queries as web service calls
http://www.chembiogrid.org/projects/proj_db.html
9. A Web-Service
REST Frontends to SOAP Services Approach for
Distributed
Access to
Methods, Data
and Models
Rajarshi Guha
Geoffrey Fox
REST is a network architecture that avoids complex Kevin E. Gilbert
Marlon Pierce
message formats David Wild
No extra libraries required Overview
Access services using URL’s
Pub3D
Much simpler interface compared to SOAP
Model Exchange
We’ve been putting REST interfaces onto a number of
our SOAP services
Current REST services include
Database (3D structure, PubChem mirror)
Molecular descriptors
Depictions
10. A Web-Service
REST 3D Structure Service Approach for
Distributed
Access to
Methods, Data
To get a 3D structure for a PubChem CID and Models
Rajarshi Guha
Geoffrey Fox
Kevin E. Gilbert
http://www.chembiogrid.org/cheminfo/rest/db/pub3d/2244 Marlon Pierce
David Wild
Overview
Pub3D
Model Exchange
11. A Web-Service
REST Depiction Service Approach for
Distributed
Access to
Methods, Data
To get a 2D depiction for arbitrary SMILES and Models
Rajarshi Guha
Geoffrey Fox
Kevin E. Gilbert
http://www.chembiogrid.org/cheminfo/rest/depict/C(=O)OCC Marlon Pierce
David Wild
Overview
Pub3D
Model Exchange
12. A Web-Service
REST Descriptor Service Approach for
Distributed
Access to
Methods, Data
To get descriptors for an arbitrary SMILES string and Models
Rajarshi Guha
Geoffrey Fox
Kevin E. Gilbert
http://www.chembiogrid.org/cheminfo/rest/desc/ Marlon Pierce
descriptors/CC(=O)OCCN David Wild
Overview
Pub3D
Model Exchange
13. A Web-Service
Pub3D Approach for
Distributed
Access to
Methods, Data
and Models
Rajarshi Guha
Geoffrey Fox
A 3D structure database derived from PubChem Kevin E. Gilbert
Marlon Pierce
Current version contains a single 3D structure for 17M David Wild
compounds
Overview
Structures obtained using MMFF94 Pub3D
Not the lowest energy conformer
Model Exchange
Structures can be retrieved in SD format by CID using a
web page or web service interfaces
We also store distance moment shape descriptors
allowing us to perform shape similarity searches
http://www.chembiogrid.org/cheminfo/p3d/
Ballester, P.J. and Graham Richards, W., J. Comp. Chem., 2007, 28, 1711–1723
14. A Web-Service
Pub3D Performance Approach for
Distributed
Access to
Methods, Data
Shape searches can be as fast as 5s, for reasonably large and Models
result sets Rajarshi Guha
Geoffrey Fox
Fast enough for us to explore the “density of space” Kevin E. Gilbert
Marlon Pierce
around a given query compound David Wild
Overview
Similarity
Query Times for 10000 Random CIDs (R = 0.4)
1 0.91 0.83 0.77 0.71 0.67 0.62 Pub3D
3500
140000
Levitra (110634)
Model Exchange
3000
Diazepam (3016)
Taxol (36314)
2500
Nearest Neighbor Count
Didanosine (50599)
100000
Number of Queries
Calcascorbin (6247)
2000
q
1500
60000 q
q
1000
q q
q
20000
q
500
q
q
q
q
q q
q q q q
q q
0
q
0
q q
0 20 40 60 80 0.0 0.1 0.2 0.3 0.4 0.5 0.6
Time (sec) Radius
Guha, R. et al., J. Chem. Inf. Model., 2006, 46, 1713–1722
15. A Web-Service
Pub3D & Clustering Approach for
Distributed
Access to
Methods, Data
and Models
Rajarshi Guha
Geoffrey Fox
Kevin E. Gilbert
Marlon Pierce
David Wild
Indexing gives good
performance Overview
Pub3D
As index size increases,
Model Exchange
performance degrades
Could add more RAM
Clustering the database
allows us to scale to
significantly larger collections
16. A Web-Service
Pub3D & Clustering Approach for
Distributed
Access to
Methods, Data
and Models
Rajarshi Guha
Geoffrey Fox
Kevin E. Gilbert
Marlon Pierce
David Wild
Indexing gives good
performance Overview
Pub3D
As index size increases,
Model Exchange
performance degrades
Could add more RAM
Clustering the database
allows us to scale to
significantly larger collections
17. A Web-Service
Pub3D and Conformers Approach for
Distributed
Access to
Methods, Data
and Models
Rajarshi Guha
Geoffrey Fox
Kevin E. Gilbert
Single, arbitrary conformers aren’t too useful Marlon Pierce
David Wild
We have currently generated conformers for a subset of
PubChem (4 to 10 heavy atoms) Overview
Pub3D
3 Kcal energy window
Model Exchange
243,892 compounds
≈ 2M conformers
Clustering will be vital when conformers are considered
Allows us to handle arbitrary numbers of conformers
May need to consider some sort of compression
Could use cluster information to optimize queries
18. A Web-Service
Model Exchange Approach for
Distributed
Access to
Methods, Data
and Models
Rajarshi Guha
Geoffrey Fox
Kevin E. Gilbert
The literature contains many published models Marlon Pierce
David Wild
No way to utilize them unless we manually rebuild them
In many cases we will not have access to descriptors Overview
Pub3D
Difficult to search for models specifically
Model Exchange
We should be able to do the following
Search for models
Exchange them
Execute them
Is this a “format” issue?
To some extent yes
19. A Web-Service
Model Exchange Approach for
Distributed
Access to
Methods, Data
and Models
Rajarshi Guha
Geoffrey Fox
Kevin E. Gilbert
The literature contains many published models Marlon Pierce
David Wild
No way to utilize them unless we manually rebuild them
In many cases we will not have access to descriptors Overview
Pub3D
Difficult to search for models specifically
Model Exchange
We should be able to do the following
Search for models
Exchange them
Execute them
Is this a “format” issue?
To some extent yes
20. A Web-Service
PMML for Model Exchange Approach for
Distributed
Access to
Methods, Data
and Models
Rajarshi Guha
Geoffrey Fox
Predictive Modelling Markup Language Kevin E. Gilbert
Marlon Pierce
A standard supported by the Data Modeling Group David Wild
Allows you to serialize predictive models to XML Overview
Linear, logistic regression Pub3D
Tree models Model Exchange
Neural networks, Na¨ Bayes models
ıve
Association models
Ensemble models (random forests, arbitrary ensembles)
Supported by a number of platforms
IBM, Salford Systems
SAS, SPSS
R
21. A Web-Service
PMML for Model Exchange Approach for
Distributed
Access to
Methods, Data
and Models
Rajarshi Guha
Geoffrey Fox
Kevin E. Gilbert
Marlon Pierce
David Wild
Since it’s XML, it’s extensible
Overview
PMML allows us to specify the model itself Pub3D
But we need to add extra information to make a model Model Exchange
truly searchable
Provenance (who made it, when was it made, . . . )
Property (what is being modeled)
Requirements (what are the inputs, how to get them)
22. A Web-Service
PMML for Model Exchange Approach for
Distributed
Access to
Methods, Data
Provenance and Models
Rajarshi Guha
Geoffrey Fox
Easily solved using Dublin Core Kevin E. Gilbert
Marlon Pierce
David Wild
Property Overview
Pub3D
Could be addressed using keywords
Model Exchange
Could reuse pre-existing ontologies
Requirements
This is tricky
Need to identify what descriptors were used
What software, version
How to evaluate the descriptors (if possible)
23. A Web-Service
PMML for Model Exchange Approach for
Distributed
Access to
Methods, Data
Provenance and Models
Rajarshi Guha
Geoffrey Fox
Easily solved using Dublin Core Kevin E. Gilbert
Marlon Pierce
David Wild
Property Overview
Pub3D
Could be addressed using keywords
Model Exchange
Could reuse pre-existing ontologies
Requirements
This is tricky
Need to identify what descriptors were used
What software, version
How to evaluate the descriptors (if possible)
24. A Web-Service
PMML for Model Exchange Approach for
Distributed
Access to
Methods, Data
Provenance and Models
Rajarshi Guha
Geoffrey Fox
Easily solved using Dublin Core Kevin E. Gilbert
Marlon Pierce
David Wild
Property Overview
Pub3D
Could be addressed using keywords
Model Exchange
Could reuse pre-existing ontologies
Requirements
This is tricky
Need to identify what descriptors were used
What software, version
How to evaluate the descriptors (if possible)
25. A Web-Service
Summary Approach for
Distributed
Access to
Methods, Data
and Models
Rajarshi Guha
Geoffrey Fox
Kevin E. Gilbert
Pub3D is a shape searchable version of PubChem Marlon Pierce
David Wild
Conformers will have to be considered for it to be useful
Overview
Are searches meaningful? Benchmarks required Pub3D
Model Exchange
Web services provide one approach to model deployment
We should be able to search for models explicitly
PMML is one extensible solution the addresses model
exchange
Automated model execution is more challenging
26. A Web-Service
Summary Approach for
Distributed
Access to
Methods, Data
and Models
Rajarshi Guha
Geoffrey Fox
Kevin E. Gilbert
Pub3D is a shape searchable version of PubChem Marlon Pierce
David Wild
Conformers will have to be considered for it to be useful
Overview
Are searches meaningful? Benchmarks required Pub3D
Model Exchange
Web services provide one approach to model deployment
We should be able to search for models explicitly
PMML is one extensible solution the addresses model
exchange
Automated model execution is more challenging
27. A Web-Service
Acknowledgments Approach for
Distributed
Access to
Methods, Data
and Models
Rajarshi Guha
Geoffrey Fox
Kevin E. Gilbert
Marlon Pierce
David Wild
Overview
Pub3D
Kangseok Kim
Model Exchange
NIH