ChemSpider is one of the chemistry community’s primary public compound databases. Containing tens of millions of chemical compounds and its associated data ChemSpider serves data to many tens of websites and software applications at this point. This presentation will provide an overview of the expanding reach of the ChemSpider platform and the nature of solutions that it helps to enable. We will also discuss some of the future directions for the project that are envisaged and how we intend to continue expanding the impact for the platform.
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
1. ChemSpider – disseminating data
and enabling an abundance of
chemistry platforms
Antony Williams, Valery Tkachenko, Ken Karapetyan, Alexey
Pshenichnov, Dmitry Ivanov, Colin Batchelor, Jon Steele
and David Sharpe
ACS New Orleans April 2013
2. ChemSpider
• >28.5 million unique chemicals from >400
data sources
• Focus on improving data quality, enhancing
functionality, integrating and enabling
3.
4. Some usage statistics
• ca. 200 visitors at any one time, ~30,000 visits per day
• Mar 4-Apr 3, 2013
– Visits = 731,656
– Unique Visitors = 527,008
• Independent servers to support other projects
5. Access ChemSpider
• APIs
– Programmatic access used by Mobile Apps, Funded
Consortia projects, many Academic groups
• Widgets
– UI components for embedding in other websites
• Data
– Data access, downloads, reuse, licensing
12. It is so difficult to navigate…
What’s the
structure?
What’s the
structure?
Are they in
our file?
Are they in
our file?
What’s
similar?
What’s
similar?
What’s the
target?
What’s the
target?Pharmacology
data?
Pharmacology
data?
Known
Pathways?
Known
Pathways?
Working On
Now?
Working On
Now?Connections to
disease?
Connections to
disease?
Expressed in
right cell type?
Expressed in
right cell type?
Competitors?Competitors?
IP?IP?
13. • 3-year knowledge management IMI project
• Integrating chemistry and biology data and delivering
using semantic web technologies
• Open source code, open data and open standards
• Academics, Pharma companies, Publishers….
14. ChemSpider Contributions
• The host of the chemistry services
– Supplier of “standardized” chemical data files
– Chemistry searching (structure, substructure etc)
– Provider of data in RDF format
– Curator and data quality checking
• Now building the Open PHACTS chemical
registration system
15. ChemSpider Contributions
• Supplier of chemistry UI components
• “Quality Police” for data checking
• Chemical Validation and Standardization Platform
• Nanopublications from RSC publications
16. • FP7 Initiative. PharmaSea: increasing value and flow in
the marine biodiscovery pipeline
17. PharmaSea
• Dereplication via ChemSpider
• Segregation of natural products datasets
• Analytical data algorithms & integration
– Mass spec searching – predicted fragmentation
– NMR feature searching – NMR prediction
– Computer-assisted structure elucidation
18. Integrate to instruments and software
• Integration to analytical instrumentation vendors
already in place
– Agilent, Bruker, Thermo, Waters
• Also, Cheminformatics vendors link to ChemSpider
– Accelrys, ACD/Labs, ChemAxon, iChemLabs, and…
19. Natural Products Updates
• Names hard, Structures
“Obvious”
• New content based on
monthly updates of the
database
• Click through to the Natural
Products Updates entry
21. Chemical Database
Service
• National Chemical Database Service
for UK Academics
• Integrating Commercial Databases
and Services
• Chemicals, analytical data,
prediction algorithms
• Development of data repository
23. Publications - a summary of work
• Scientific publications are a summary of work
– Is all work reported?
– How much science is lost to pruning?
– What of value sits in notebooks and is lost?
• How much data is lost?
– How many compounds never reported?
– How many syntheses fail or succeed?
– How many characterization measurements?
24. Community Repository for Data
• Funding agencies encourage sharing of data
• Increasing availability of “Open Data”
• Institutional repositories no specific domain
support
• Develop a community repository for chemistry
data – private, public, embargoed
• Provides data to develop models/algorithms
25. Community Repository for Data
• Automated depositions of data
• DOI’ed data objects for citation purposes
• A database of reference data, but validated by
the community
• National services feeding the repository –
crystallography, mass spectrometry
• Integrate to blogging tools for chemistry
• Integrate to Electronic Lab Notebooks as feeds
26. Model Building with Community Data
• Community data as a basis of model building
– Consume data from available databases, community
data, new publications and build predictive
algorithms for the community
– How many algorithms are reported and lost? How
much repeat work is done in the domain of
algorithmic development?
29. E-Lab Notebooks
• Previous work with IDBS and
University of Cambridge
• Working on LabTrove integration
win U. Southampton
• Integration between ELNs and:
• ChemSpider
• ChemSpider Reactions
• CDS Repository
• Publish data from ELNs issue DOIs
• Data aggregated into fully indexed
ESI format for publication
30. Support for Chemical Reactions
• Integrating mined reaction data from patents
(Daniel Lowe)
• Will also incorporate and integrate: Methods
of Organic Synthesis, Catalysts and Catalyzed
Reactions and…
34. Inside our Publication Archive
• How much data is in the archive, in the
publications and in the supplementary info?
– How many compounds for ChemSpider?
– How many syntheses for ChemSpider reactions?
– How many characterization measurements?
• Property Data
• Spectral Data
• Graphs and charts to be used for modeling?
35. What if we could capture it all?
Digitally Enhancing the RSC Archive
41. Data Validation and Curation Required
Encouraging Participation with
Rewards and RECOGNITION
42. Manual Curation
• Integrated commenting, curating and validation
platform across ALL eScience and publishing
platforms
• All integrated to a central RSC profile and
feeding the AltMetrics tools
45. Rewards and Recognition
Congratulations! Your 1st CSSP article
has been published. Philosopher Lao
Tzu said “A journey of a thousand
miles begins with a single step”. In the
same way we hope that this will be
the first of many submissions that you
make to CSSP.
The First Step badge is
awarded when a user
submits (& has published)
their 1st
CSSP article.
47. Why is ChemSpider “different”
• Interfaces for integration
• Sharing of data – and increasingly open
• Open for community participation
– Deposition
– Annotation
– Curation
• We are clear…the world is changing
48. Internet Data
The Future
Commercial Software
Pre-competitive Data
Open Science
Open Data
Publishers
Educators
Open Databases
Chemical Vendors
Small organic molecules
Undefined materials
Organometallics
Nanomaterials
Polymers
Minerals
Particle bound
Links to Biologicals
49. Acknowledgments
• The RSC eScience and infrastructure teams
• Our data providers, depositors, collaborators
and curators
• Daniel Lowe for Reaction Data
• William Brouwer, Penn State
• Software providers – OpenEye, ChemDoodle,
ACD/Labs, GGA Software, Open Source (Jmol,
JSpecView, OpenBabel)