At a time when the data explosion has simply been redefined as “Big”, the hurdles associated with building a subject-specific data repository for chemistry are daunting. Combining a multitude of non-standard data formats for chemicals, related properties, reactions, spectra etc., together with the confusion of licensing and embargoing, and providing for data exchange and integration with services and platforms external to the repository, the challenge is significant. This all at a time when semantic technologies are touted as the fundamental technology to enhance integration and discoverability. Funding agencies are demanding change, especially a change towards access to open data to parallel their expectations around Open Access publishing. The Royal Society of Chemistry has been funded by the Engineering and Physical Science Research of the UK to deliver a “chemical database service” for UK scientists. This presentation will provide an overview of the challenges associated with this project and our progress in delivering a chemistry repository capable of handling the complex data types associated with chemistry. The benefits of such a repository in terms of providing data to develop prediction models to further enable scientific discovery will be discussed and the potential impact on the future of scientific publishing will also be examined.
The application of cloud computing to royal society of chemistry data platforms
Similaire à The UK National Chemical Database Service – an integration of commercial and public chemistry services to support chemists in the United Kingdom
Revolutionizing Laboratory Instrument Data for the Pharmaceutical Industry:...OSTHUS
Similaire à The UK National Chemical Database Service – an integration of commercial and public chemistry services to support chemists in the United Kingdom (20)
Human Factors of XR: Using Human Factors to Design XR Systems
The UK National Chemical Database Service – an integration of commercial and public chemistry services to support chemists in the United Kingdom
1. UK National Chemical Database
Service: An integration of
commercial and public chemistry
services to support chemists in
the United Kingdom
Antony Williams, Valery Tkachenko
and Richard Kidd
ACS Dallas
March 2014
2. UK Chemical Database Service
• The National Chemical Database Service is for
UK academics – see later for Rest of World
3. Vision for the Service PART 1
• Provide access to databases and services of
interest to the academic community to serve
their needs. Access to services to include:
• Crystallography data – Organic and inorganic
materials
• Thermophysical data
• Reactions Data including retrosynthetic analysis
• Prediction technologies – name generation,
physicochemical parameters, NMR prediction
4.
5. Service Rollout
• Many services are hosted in the cloud
• Access through login/password, IP
authentication or Shibboleth authentication
• Lots of hard work in a very short time – so
much thanks to all of the service providers
• More providers stepped up to help –
ChemAxon
• Crystallography concern (understatement!)
6. Feedback from Community
• Converted initial public negativity spike on
Twitter pre-release to very positive feedback
post-release
• Training required – onsite training sessions
organized
• Available Chemicals Directory is big plus!
• Concerns with Retrosynthetic Analysis tool
7. Usage
• Majority of usage is for crystallography data –
previous provider had same bias
• Usage is increasing month-by-month
• Still way-under used and in many cases low
awareness
8. Vision for the Service PART 2
• Response to the call for proposals included
our vision for a 21st Century data repository
• At a time of Open Access, Open Data and
funding agency requirement to make data
public – build a data repository
• Funding is split for licensing content and
services (VAST MAJORITY) and some
funding for research and development
9. An Initial “Vague” Vision Set
• Manage “all” of the chemistry data associated
with chemical substances
• Data to be downloadable, reusable, interactive
• Build a platform that enables the scientist
• Data storage, validation, standardization and
curation
• Collaborative data sharing
• Provide data platform that can enable and
enhance publishing of scientific papers
10. Data Repository
• Registration of chemical compounds
• Deposition of chemical syntheses
• Addition of analytical data
• Integration to electronic notebooks
• Rewards and recognition for data sharing
• Document processing
• Hosting of data as private, embargoed or
public
11. What we will deliver for all data
• Simple interfaces for uploading of data
• Embeddable widgets and programming
interfaces to utilize in in-house systems, ELNs
• Automated harvesting approaches – sweeping
directories for data
• Data validation where possible
12. Input data pipeline
Deposition Gateway
Staging
databases
Compounds
Reactions
Spectra
Materials
Articles / CSSP
Compounds
Module
Spectra
Module
Reactions
Module
Materials
Module
Textmining
Module Module
Web UI for unified depositions
DropBox, Google Drive,
SkyDrive, etc
LabTroveand other templated
data
Documents
API, FTP, etc
Raw data Validated data
Staging
databases
Alldatabases are
sliced by data
sources/data
collections and
havesimple
security model
where each data
slice/sourceis
private, public or
embargoed
13. Compounds upload
• Draw chemicals in the interface (Javascript
editors – PC, Mac, Tablets, Phones)
• Drag and drop of compounds
• Automated generate of properties – Formulae,
Mw, Mi, physchem properties
• Metadata input forms
• Bulk upload
18. Reactions
• Hosting of reaction data – standard “document
formats” – full flexibility but limiting – extraction
of data from embedded objects
• Encourage template formats – using ELNs for
example, community agreed templates
19.
20.
21.
22. Electronic Notebook Data
• Development work integrating chemistry into
the Southampton Labtrove notebook
• Stoichiometry table development
• Analytical data integration
• “ChemTrove” rolled out to a small test group
in January
26. Requirements
• Community agreement on acceptable
templates for CSSP/Reactions deposition
• Data Model deposition based on mappings
between template and CSSP model
• Adoption of Labtrove interface for deposition
27. What we will deliver
• Micropublishing platform for submission of
• Protocols and Procedures
• Reactions
• Safety and Hazard data (LATER)
• Template-based submissions of procedures
• Matched to ELN submissions
• Full details for user submission versus
mapped submission into database
30. Spectral Data
• Support for “structure identification” is a must
– “greatest value” for reference and lookup
• Support for data standards primarily – JCAMP,
mzML, SPC
• Want to support ASSIGNED data formats
• Hold binary files but prefer standards – WHY?
32. 10 years from now…
• Binary file formats generally need original
data processing software to deal with them –
from Bruker, Agilent, Jeol, Thermo, Waters,
blah, blah, blah, blah,…
• While we can store the original raw data files
for posterity should we? This has been one
focus for data repositories
38. Addition of Analytical Data
• Spectral Container is in development using
componentized widgets for display
• NIST spectra converted into standardized
JCAMP format for deposition - 296,103
spectra deposited
• 10% of remaining NIST spectra need to be
curated as there are obvious structure issues
49. Medicinal Chemist
Search
(against database of properties)
Source
(find vendor)
Analyse
(cluster, dock, screen)
Computational Chemist
Search or Develop algorithm
Store results
Run calculations
Synthesize
Measure activity
50. Present activities for ACS Fall
• Deposition process development of compounds,
reactions and spectral data by end of Spring
• FTP, DropBox, Web-upload, ELN integration
• Compounds, Reactions, Spectral data search,
display, download
• Data sharing – private, public, collaborative
• Metadata, metadata, metadata standards!
• Open Sourcing Chemical Registry System
including CVSP
51. UK Chemical Database Service
• The National Chemical Database Service is for
UK academics
• What would be necessary to make this
available for “Rest of World”, a single
institution, an organization?
• It’s not really technology…that’s scale out and
can be handled
• It’s negotiation with database providers,
pricing, login/authentication, localization?
52. Acknowledgments
• Jeremy Frey and Simon Coles, University of
Southampton
• Will Dichtel and Leah McEwan, Cornell
University
• Stuart Chalk, University of North Florida
• Bob Hanson and Bob Lancashire, Jmol and
JSpecView
53. Thank you
Email: williamsa@rsc.org
ORCID: 0000-0002-2668-4821
Twitter: @ChemConnector
Personal Blog: www.chemconnector.com
SLIDES: www.slideshare.net/AntonyWilliams