SlideShare une entreprise Scribd logo
1  sur  44
Feeding and consuming data to
support Open Notebook Science via
          the ChemSpider Platform

Antony Williams, Jean-Claude Bradley, Andrew Lang and
                                      Valery Tkachenko

                               ACS Philadelphia August 2012
Setting the Stage
 Chemists want access to tools and data

     The more capabilities the better
     The more data the better
     And give us an API with that…
     And it should be free…
     And constantly updated…
     And all data should be Open…
     And make it fully Open Source…
     And it needs to be on my mobile…
Setting the Stage
 Chemists have access to tools and data

     The more capabilities the better – we’ll see
     The more data the better – changing daily
     And give us an API with that… - not just one
     And it should be free… - sure
     And constantly updated… - indeed..please help!
     And all data should be Open…- licensing
     And make it fully Open Source… - kinda, sorta
     And it needs to be on my mobile… - sure
Welcome to ChemSpider
 5 years, 28 million chemicals, linking 400 data
  sources and growing daily

 Hosted by the Royal Society of Chemistry
 An important part of our long term strategic vision

 Free to access
 With lots/most/all (?) of the functionality
  necessary to support chemists and Open
  Notebook Science…
Why Use ChemSpider?
Why Use ChemSpider?
Why Use ChemSpider?
Why Use ChemSpider?
Why Use ChemSpider? LINKING OUT
Why Use ChemSpider?
Why Use ChemSpider
Why Use ChemSpider
Why Use ChemSpider
Why Use ChemSpider
What about Syntheses?
ChemSpider SyntheticPages
Work in Progress – 300k Reactions
Storing ONS Reactions
 Working with JC Bradley to host ONS reactions
 Linking directly back to ONS reactions


 What if the links decay?
 Host all related ONS data – benefits of Openness!
 Future applications for RInChIs
What we have been asked for
   “Allow us to grab data”
   “Let us link”
   “Give us web services to integrate”
   “Can we store our data with you?”
   “Can you give us predictions to validate data?”
What we have been asked for
   “Allow us to grab data”
   “Let us link”
   “Give us web services to integrate”
   “Can we store our data with you?”
   “Can you give us predictions to validate data?”



 “Can you build us an ELN?”
Simple Linking to ChemSpider
 Link using ChemSpiderID
 http://www.chemspider.com/1234567
ChemSpider IDs Proliferating Now
Simple Querying Example
 http://
  www.chemspider.com/Search.aspx?q=InChIKey=XXO
Or InChI, or SMILES
 http://www.chemspider.com/Search.aspx?q=InChI=1S
  m1/s1

 http://www.chemspider.com/Search.aspx?
  q=Clc1ccc(cc1)C(O)=C3C(=O)C(=O)N([C@@H]3
  c2cccc(F)c2)CCc5c4ccccc4nc5
Better to provide APIs….
Various Flavors of API
Various Flavors of API
MANY Web Services for integration
Feeding ONS Data into ChemSpider
 ONS data can be deposited into ChemSpider and
  linked out to the ONS pages
 Simply deposit structure(s) and links
Feeding ONS Data into ChemSpider
 ONS Solubility Challenge
Feeding ONS Data into ChemSpider
So isn’t ONS all about ELNs?
 Open Notebook Science is about
   Making records of research publicly available
    online as it is recorded

 ONS is enabled by software tools and platforms
   Keep the notebook of the researcher online
    with all raw and processed data as it is
    generated (close to or near real time)
   Notebooks as Wikis, Commercial or Free ELNs
    published to the web (choose public/private –
    what data to expose)
Feeding ELN Data into ChemSpider
 Integrate e-Notebooks into ChemSpider

   IDBS e-Workbook plug-in allows direct
    deposition of chemical structures
   Can be extended to more ELN content
      Spectra
      Reactions
      Properties etc.

      Integration Video http://tinyurl.com/9xnprqr
Feeding ELN Data into ChemSpider
How much data is lost?
 How many reactions in a thesis never get
  published?
 How many spectra of common materials could be
  shared?
 How many properties are measured and lost?
 What stands in the way of sharing?
    Is it technology?
    Permissions? “The Boss”, Licensing?

 And yes – there are data quality issues but there
  is algorithmic checking and data curation to help
What could the future look like?
 “Publicly funded” research data flows onto the web
 Licensing is clear and NOT a challenge
 Machines are picking up data and depositing

 EXAMPLE project – Any interest?
   Put your spectra/structure in folders (Dropbox)
   ChemSpider robot scoops, processes and
    deposits – opportunity with JC Bradley
   While processing also predicts spectra and
    compares for validation
Leaving the Stage
 Chemists have access to tools and data

     The more capabilities the better – what’s missing?
     The more data the better – anyone want to share?
     And give us an API with that… - ask us for help
     And it should be free… - it is
     And constantly updated… - help annotate/curate
     And all data should be Open…- licensing
     And make it fully Open Source… - book chapter
     And it needs to be on my mobile… - it is
ChemSpider Mobile
New URLs to try out
 ChemSpider Reactions:
  www.chemspider.com/reactions

 ChemSpider Validation and Standardization
  Platform: www.chemspider.com/cvsp

 ChemSpider Google:
  www.chemspider.com/google
ChemSpider Google
ChemSpider Google
Acknowledgments
 RSC Cheminformatics team
 JC Bradley’s lab
 Daniel Lowe – reactions
 Commercial Software – GGA Software,
  ACD/Labs, OpenEye
 Open Source Components
Thank you

Email: williamsa@rsc.org
Blog: www.chemconnector.com
SLIDES: www.slideshare.net/AntonyWilliams

Contenu connexe

Similaire à Feeding and consuming data to support open notebook science via the chem spider platform

RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Carole Goble
 

Similaire à Feeding and consuming data to support open notebook science via the chem spider platform (20)

Connecting Chemistry Across the Internet Using ChemSpider
Connecting Chemistry Across the Internet Using ChemSpiderConnecting Chemistry Across the Internet Using ChemSpider
Connecting Chemistry Across the Internet Using ChemSpider
 
ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
 
Doing Clever Things with the Semantic Web
Doing Clever Things with the Semantic WebDoing Clever Things with the Semantic Web
Doing Clever Things with the Semantic Web
 
Six Principles of Software Design to Empower Scientists
Six Principles of Software Design to Empower ScientistsSix Principles of Software Design to Empower Scientists
Six Principles of Software Design to Empower Scientists
 
Chemistry in the hand: The delivery of structure databases and spectroscopy g...
Chemistry in the hand: The delivery of structure databases and spectroscopy g...Chemistry in the hand: The delivery of structure databases and spectroscopy g...
Chemistry in the hand: The delivery of structure databases and spectroscopy g...
 
Providing support for JC Bradleys vision of open science using RSC cheminform...
Providing support for JC Bradleys vision of open science using RSC cheminform...Providing support for JC Bradleys vision of open science using RSC cheminform...
Providing support for JC Bradleys vision of open science using RSC cheminform...
 
Utilizing ChemSpider As A Platform For Education And Exposure Of Student Data...
Utilizing ChemSpider As A Platform For Education And Exposure Of Student Data...Utilizing ChemSpider As A Platform For Education And Exposure Of Student Data...
Utilizing ChemSpider As A Platform For Education And Exposure Of Student Data...
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
 
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
 
RSC ChemSpider is the online chemistry database where community contributions...
RSC ChemSpider is the online chemistry database where community contributions...RSC ChemSpider is the online chemistry database where community contributions...
RSC ChemSpider is the online chemistry database where community contributions...
 
Hosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry dataHosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry data
 
Abcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosasAbcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosas
 
Marrying ACDLabs technologies to eScience Projects at the Royal Society of C...
Marrying ACDLabs technologies to eScience Projects at the  Royal Society of C...Marrying ACDLabs technologies to eScience Projects at the  Royal Society of C...
Marrying ACDLabs technologies to eScience Projects at the Royal Society of C...
 
BioIT Europe 2010 - BioCatalogue
BioIT Europe 2010 - BioCatalogueBioIT Europe 2010 - BioCatalogue
BioIT Europe 2010 - BioCatalogue
 
Checking, Curating And Qualifying Chemistry
Checking, Curating And Qualifying ChemistryChecking, Curating And Qualifying Chemistry
Checking, Curating And Qualifying Chemistry
 
Qualifying Online Information Resources for Chemists
Qualifying Online Information Resources for ChemistsQualifying Online Information Resources for Chemists
Qualifying Online Information Resources for Chemists
 
Berlin 6 Open Access Conference: Tony Hey
Berlin 6 Open Access Conference: Tony HeyBerlin 6 Open Access Conference: Tony Hey
Berlin 6 Open Access Conference: Tony Hey
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
HKU Data Curation MLIM7350 Class 10
HKU Data Curation MLIM7350 Class 10HKU Data Curation MLIM7350 Class 10
HKU Data Curation MLIM7350 Class 10
 
ChemSpider Overview Presentation at Special Libraries Association
ChemSpider Overview Presentation at Special Libraries AssociationChemSpider Overview Presentation at Special Libraries Association
ChemSpider Overview Presentation at Special Libraries Association
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

Feeding and consuming data to support open notebook science via the chem spider platform

  • 1. Feeding and consuming data to support Open Notebook Science via the ChemSpider Platform Antony Williams, Jean-Claude Bradley, Andrew Lang and Valery Tkachenko ACS Philadelphia August 2012
  • 2. Setting the Stage  Chemists want access to tools and data  The more capabilities the better  The more data the better  And give us an API with that…  And it should be free…  And constantly updated…  And all data should be Open…  And make it fully Open Source…  And it needs to be on my mobile…
  • 3. Setting the Stage  Chemists have access to tools and data  The more capabilities the better – we’ll see  The more data the better – changing daily  And give us an API with that… - not just one  And it should be free… - sure  And constantly updated… - indeed..please help!  And all data should be Open…- licensing  And make it fully Open Source… - kinda, sorta  And it needs to be on my mobile… - sure
  • 4. Welcome to ChemSpider  5 years, 28 million chemicals, linking 400 data sources and growing daily  Hosted by the Royal Society of Chemistry  An important part of our long term strategic vision  Free to access  With lots/most/all (?) of the functionality necessary to support chemists and Open Notebook Science…
  • 9. Why Use ChemSpider? LINKING OUT
  • 17. Work in Progress – 300k Reactions
  • 18. Storing ONS Reactions  Working with JC Bradley to host ONS reactions  Linking directly back to ONS reactions  What if the links decay?  Host all related ONS data – benefits of Openness!  Future applications for RInChIs
  • 19. What we have been asked for  “Allow us to grab data”  “Let us link”  “Give us web services to integrate”  “Can we store our data with you?”  “Can you give us predictions to validate data?”
  • 20. What we have been asked for  “Allow us to grab data”  “Let us link”  “Give us web services to integrate”  “Can we store our data with you?”  “Can you give us predictions to validate data?”  “Can you build us an ELN?”
  • 21. Simple Linking to ChemSpider  Link using ChemSpiderID  http://www.chemspider.com/1234567
  • 23. Simple Querying Example  http:// www.chemspider.com/Search.aspx?q=InChIKey=XXO
  • 24. Or InChI, or SMILES  http://www.chemspider.com/Search.aspx?q=InChI=1S m1/s1  http://www.chemspider.com/Search.aspx? q=Clc1ccc(cc1)C(O)=C3C(=O)C(=O)N([C@@H]3 c2cccc(F)c2)CCc5c4ccccc4nc5
  • 25. Better to provide APIs….
  • 28. MANY Web Services for integration
  • 29. Feeding ONS Data into ChemSpider  ONS data can be deposited into ChemSpider and linked out to the ONS pages  Simply deposit structure(s) and links
  • 30.
  • 31. Feeding ONS Data into ChemSpider  ONS Solubility Challenge
  • 32. Feeding ONS Data into ChemSpider
  • 33. So isn’t ONS all about ELNs?  Open Notebook Science is about  Making records of research publicly available online as it is recorded  ONS is enabled by software tools and platforms  Keep the notebook of the researcher online with all raw and processed data as it is generated (close to or near real time)  Notebooks as Wikis, Commercial or Free ELNs published to the web (choose public/private – what data to expose)
  • 34. Feeding ELN Data into ChemSpider  Integrate e-Notebooks into ChemSpider  IDBS e-Workbook plug-in allows direct deposition of chemical structures  Can be extended to more ELN content  Spectra  Reactions  Properties etc.  Integration Video http://tinyurl.com/9xnprqr
  • 35. Feeding ELN Data into ChemSpider
  • 36. How much data is lost?  How many reactions in a thesis never get published?  How many spectra of common materials could be shared?  How many properties are measured and lost?  What stands in the way of sharing?  Is it technology?  Permissions? “The Boss”, Licensing?  And yes – there are data quality issues but there is algorithmic checking and data curation to help
  • 37. What could the future look like?  “Publicly funded” research data flows onto the web  Licensing is clear and NOT a challenge  Machines are picking up data and depositing  EXAMPLE project – Any interest?  Put your spectra/structure in folders (Dropbox)  ChemSpider robot scoops, processes and deposits – opportunity with JC Bradley  While processing also predicts spectra and compares for validation
  • 38. Leaving the Stage  Chemists have access to tools and data  The more capabilities the better – what’s missing?  The more data the better – anyone want to share?  And give us an API with that… - ask us for help  And it should be free… - it is  And constantly updated… - help annotate/curate  And all data should be Open…- licensing  And make it fully Open Source… - book chapter  And it needs to be on my mobile… - it is
  • 40. New URLs to try out  ChemSpider Reactions: www.chemspider.com/reactions  ChemSpider Validation and Standardization Platform: www.chemspider.com/cvsp  ChemSpider Google: www.chemspider.com/google
  • 43. Acknowledgments  RSC Cheminformatics team  JC Bradley’s lab  Daniel Lowe – reactions  Commercial Software – GGA Software, ACD/Labs, OpenEye  Open Source Components
  • 44. Thank you Email: williamsa@rsc.org Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams