This is a general presentation about our efforts to build an internet based community for chemists using ChemSpider. A general overview of data quality online, crowdsourced deposition and curation and our progress to deliver a solution to the community for resourcing data.
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
RSC ChemSpider – Building An Internet Based Community For Chemists
1. RSC ChemSpider – Building an Internet Based Community for Chemists
2. Where is chemistry online? Encyclopedic articles (Wikipedia) Chemical vendor databases Metabolic pathway databases Property databases Patents with chemical structures Drug Discovery data Scientific publications Compound aggregators Blogs/Wikis and Open Notebook Science
3. Chemistry on the Internet TODAY Chemistry searches are generally limited to text-based searches across the internet Poor quality and little curation/validation work Too many searches required to resource data
4. What do humans want? media.obsessable.com As few interfaces as possible
5. Chemistry on the Internet FUTURE Search by chemical structure and substructure Chemistry articles indexed and searchable Reduced number of searches to find data Data are integrated – compounds, vendors, syntheses, data, publications and patents
12. What is ChemSpider? ChemSpider is: Building a Structure Centric Community for Chemists >23 million compounds, >300 data sources A deposition and curation platform A publishing platform for the community Grows daily – more depositions, more links, more data sources
13. How Was ChemSpider Built? ChemSpider was a “hobby project” Housed in a basement and running off three servers – one bought, two built Sensitive to weather and power stability Went live at ACS Spring 2007 in Chicago
25. Answering Questions for Chemists Questions a chemist might ask… What is the melting point of n-butanol? What is the chemical structure of Xanax? Chemically, what is phenolphthalein? What are the stereocenters of cholesterol? Where can I find publications about xylene? What are the different trade names for Ketoconazole? What is the NMR spectrum of Aspirin? What are the safety handling issues for Thymol Blue?
27. ChemSpider is a structure-centric hub ChemSpider aggregates and links out across the internet Data aggregate based on “structures and links” What defines a chemical compound?
61. Back to Taxol DrugBank: RCINICONZNJXQF-CLDWUXIMDD ChEBI: RCINICONZNJXQF-GXKQXQCDDN Wikipedia: RCINICONZNJXQF-MZXODVADBJ Which one is correct???
62. InChIKeys for Taxol DrugBank: RCINICONZNJXQF-CLDWUXIMDD ChEBI: RCINICONZNJXQF-GXKQXQCDDN Wikipedia: RCINICONZNJXQF-MZXODVADBJ ChEBI and Wikipedia are the SAME structure Drugbank is a DIFFERENT structure – ONE stereocenter
64. Does one stereocenter matter? Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, and Softenon
65.
66. Assertion and Chemical Entities Who says what Taxol is? What is the “timeline” for a molecule? How do we clean up the Public data? The Quality source is Chemical Abstracts Service…
85. Semantic Linking of Structures What would you want to link off a structure? Chemical suppliers Other publications Analytical Data Related Reactions Wikipedia Patents “Everything”
86. ChemSpider Everywhere Linked from Wikipedia Linked from Open Notebook Science sites using EMBED Linked from Blogs using Structure/Spectra EMBED Integrated into structure drawing packages such as ACD/ChemSketch, Symyx Draw, Open Source applets Integrated to software offerings from Thermo, Waters, Agilent, Bruker
92. There are always gaps... What ChemSpider doesn’t deal with yet... Markush structures and other “non-defineds” Materials Minerals Polymers Biological macromolecules
93. What’s next? Continue the curation effort and keep cleaning Finish depositions – millions left to deposit Layer on RDF to allow the semantic web to benefit from our efforts Integrate RSC content – a massive archive! Integrate RSC publishing workflows and databases