SlideShare une entreprise Scribd logo
1  sur  93
The Expansive Reach of
ChemSpider as a Resource for
the Chemistry Community
Antony Williams
University of Oregon, April 24th
2013
The World of Online Chemistry
• Property databases
• Compound aggregators
• Screening assay results
• Scientific publications
• Encyclopedic articles (Wikipedia)
• Metabolic pathway databases
• ADME/Tox data – eTOX for example
• Blogs/Wikis and Open Notebook Science
We Have …Too Much Data!!!
e-Science and Primary Data
• How much data generated in a lab, that COULD go public, is
lost forever?
TotallySynthetic.com
e-Science and Primary Data
• How much data generated in a lab, that COULD go public, is
lost forever?
• Public Domain reference databases of value?
– Syntheses
– Properties
– Spectra
– CIFs
– Images
Collaborative Knowledge Management
e-Science and Primary Data
• How much data generated in a lab, that COULD go public, is
lost forever?
• Public Domain reference databases of value?
– Syntheses
– Properties
– Spectra
– CIFs
– Images
• Much of chemistry is chemical structure-based – where and
how could we host these data?
RSC’s ChemSpider
Crowdsourced “Annotations”
• Users can add
– Descriptions/Syntheses/Commentaries
– Links to PubMed articles
– Links to articles via DOIs
– Add spectral data
– Add Crystallographic Information Files
– Add photos
– Add MP3 files
– Add Videos
Spectra
Chemistry Data online is messy
• We have inherited errors
• All public compound databases, including ours, have
errors
• “Incorrect” structures – assertions, timelines etc
• “Incorrect” names associated with structures
• Properties
• Links
• Publications
• ENORMOUS CHALLENGE
The Structure of Vitamin K?
MeSH
• A lipid cofactor that is required for normal blood clotting.
Several forms of vitamin K have been identified:
VITAMIN K 1 (phytomenadione) derived from plants,
VITAMIN K 2 (menaquinone) from bacteria, and synthetic
naphthoquinone provitamins, VITAMIN K 3 (menadione).
Vitamin K 3 provitamins, after being alkylated in vivo,
exhibit the antifibrinolytic activity of vitamin K. Green
leafy vegetables, liver, cheese, butter, and egg yolk are
good sources of vitamin K
The Structure of Vitamin K1?
What is the Structure of Vitamin
K1?
CAS’s Common Chemistry
Wikipedia
“2-methyl-3-(3,7,11,15-tetramethylhexadec-2-
enyl)naphthalene-1,4-dione”
• Variants of systematic names on PubChem
– 2-methyl-3-[(E,7R,11R)-3,7,11,15-tetramethyl
– 2-methyl-3-[(E,7S,11R)-3,7,11,15-tetramethyl
– 2-methyl-3-[(E,7R,11S)-3,7,11,15-tetramethyl
– 2-methyl-3-[(E,7S,11S)-3,7,11,15-tetramethyl
– 2-methyl-3-[(E,11S)-3,7,11,15-tetramethyl
– 2-methyl-3-[(E)-3,7,11,15-tetramethyl
– 2-methyl-3-(3,7,11,15-tetramethyl
– 2-methyl-3-[(E)-3,7,11,15-tetramethyl
Question Everything online: www.dhmo.org
It’s all on Wikipedia…
Chemistry on The Internet Is Messy
It’s Methane…
What’s Methane?
What’s Methane?
What ELSE is Methane???
With Great Fanfare…
NPC Browser http://tripod.nih.gov/npc/
NPC Browser http://tripod.nih.gov/npc/
Public Domain Databases
• Our databases are a mess…
• Non-curated databases are proliferating errors
• We source and deposit data between databases
• Original sources of errors hard to determine
• Curation is time-consuming and challenging
Stop Whining – Fix it
Crowdsourced Curation
• Crowd-sourced curation: identify/tag errors,
edit names, synonyms, identify records to
deprecate
Search “Vitamin H”
“Curate” Identifiers
“Curate” Identifiers
“Curate” Identifiers
Standards : Structure Standardization
Standards : Structure Standardization
Standards : Structure Standardization
The InChI Identifier
Multiple Layers
InChIStrings Hash to InChIKeys
Vancomycin – Search the Internet
Vancomycin
Search Molecular
SKELETON
Search Full Molecule
Full Skeleton Search: 104 Hits
Full Molecule Search: 4 Hits
Validated Name-Structure Dictionaries
• Chemical name dictionaries are used for:
• Text-mining (publications, patents)
– Used to index PubMed and link to Google Patents
• Linking to other databases – think Biology!
– When structures are not available drug names link
• Searching the web
– Names link to structures link to InChIs
I want to know about “Vincristine”
If all algorithms work then everything on the page is correct by
default except the name-structure relationship!
Vincristine: Identifiers and
Properties
Vincristine: Vendors and Sources
Linked by Structure
Vincristine: Patents
Linked by Name
Vincristine: Articles
Linked by Name
ChemSpider Resources for Chemistry
Micropublishing Syntheses
ChemSpider SyntheticPages
Olympicene
So you Want a Profile???
Interactive Data
PharmaSea
• Dereplication via ChemSpider
• Segregation of natural products datasets
• Analytical data algorithms & integration
– Mass spec searching – predicted fragmentation
– NMR feature searching – NMR prediction
– Computer-assisted structure elucidation
It is so difficult to navigate…
What’s the
structure?
What’s the
structure?
Are they in
our file?
Are they in
our file?
What’s
similar?
What’s
similar?
What’s the
target?
What’s the
target?Pharmacology
data?
Pharmacology
data?
Known
Pathways?
Known
Pathways?
Working On
Now?
Working On
Now?Connections to
disease?
Connections to
disease?
Expressed in
right cell type?
Expressed in
right cell type?
Competitors?Competitors?
IP?IP?
• 3-year Innovative Medicines Initiative project
• Integrating chemistry and biology data using semantic
web technologies
• Open source code, open data and open standards
• Academics, Pharma companies, Publishers….
ChemSpider Contributions
• The host of the chemistry services
– Supplier of “standardized” chemical data files
– Chemistry searching (structure, substructure etc)
– Provider of data in RDF format
– Curator and data quality checking
• Now building the Open PHACTS chemical
registration system
ChemSpider Contributions
• Supplier of chemistry UI components
• “Quality Police” for data checking
• Chemical Validation and Standardization Platform
• Nanopublications from RSC publications
Integrate to instruments and software
• Integration to analytical instrumentation vendors
already in place
– Agilent, Bruker, Thermo, Waters
• Also, Cheminformatics vendors link to ChemSpider
– Accelrys, ACD/Labs, ChemAxon, iChemLabs, and…
Natural Products Updates
• Names hard, Structures
“Obvious”
• New content based on
monthly updates of the
database
• Click through to the Natural
Products Updates entry
National Chemical Database Service
Chemical Database
Service
• National Chemical Database Service
for UK Academics
• Integrating Commercial Databases
and Services
• Chemicals, analytical data,
prediction algorithms
• Development of data repository
Publications - a summary of work
• Scientific publications are a summary of work
– Is all work reported?
– How much science is lost to pruning?
– What of value sits in notebooks and is lost?
• How much data is lost?
– How many compounds never reported?
– How many syntheses fail or succeed?
– How many characterization measurements?
Community Repository for Data
• Funding agencies encourage sharing of data
• Increasing availability of “Open Data”
• Institutional repositories no specific domain
support
• Develop a community repository for chemistry
data – private, public, embargoed
• Provides data to develop models/algorithms
Community Repository for Data
• Automated depositions of data
• DOI’ed data objects for citation purposes
• A database of reference data, but validated by
the community
• National services feeding the repository –
crystallography, mass spectrometry
• Integrate to blogging tools for chemistry
• Integrate to Electronic Lab Notebooks as feeds
Model Building with Community Data
• Community data as a basis of model building
– Consume data from available databases, community
data, new publications and build predictive
algorithms for the community
– How many algorithms are reported and lost? How
much repeat work is done in the domain of
algorithmic development?
Pulling Data from our Archive
• Our contribution to the world of chemistry data
• DERA – digitally enabling the RSC archive
– Text mining
• Find chemicals, reactions, analytical data, properties
– Algorithmic checking
• Validate algorithmically what we can - robots
– “Web 2.0 interfaces” for curating and validating
What if we could capture it all?
Digitally Enhancing the RSC Archive
Data Validation and Curation Required
Encouraging Participation with
Rewards and RECOGNITION
Manual Curation
• Integrated commenting, curating and validation
platform across ALL eScience and publishing
platforms
• All integrated to a central RSC profile and
feeding the AltMetrics tools
Structure Review
Maybe Hybrid Man-Machine
Where we are now…
Rewards and Recognition
Congratulations! Your 1st CSSP article
has been published. Philosopher Lao
Tzu said “A journey of a thousand
miles begins with a single step”. In the
same way we hope that this will be
the first of many submissions that you
make to CSSP.
The First Step badge is
awarded when a user
submits (& has published)
their 1st
CSSP article.
Future Recognition in AltMetrics?
ChemSpider
Internet Data
The Future
Commercial Software
Pre-competitive Data
Open Science
Open Data
Publishers
Educators
Open Databases
Chemical Vendors
Small organic molecules
Undefined materials
Organometallics
Nanomaterials
Polymers
Minerals
Particle bound
Links to Biologicals
The Future of Chemistry on the Web?
• Public compound databases federate & build a
linked environment of validated data!
• Data validation needs are not ignored
• Publishers layer on information to make
publications discoverable
• Public-Private databases can be linked
• Open Data proliferate
• The “Semantic Web” in action
Acknowledgments
• Valery Tkachenko and the eScience team
• Our data providers, depositors, collaborators
and curators
• Software providers – OpenEye, ChemDoodle,
ACD/Labs, GGA Software, Open Source (Jmol,
JSpecView, OpenBabel)
Thank you
Email: williamsa@rsc.org
Twitter: @ChemConnector
Personal Blog: www.chemconnector.com
SLIDES: www.slideshare.net/AntonyWilliams

Contenu connexe

Tendances

Can machines understand the scientific literature
Can machines understand the scientific literatureCan machines understand the scientific literature
Can machines understand the scientific literaturepetermurrayrust
 
ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectStuart Chalk
 
FAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use CaseFAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use CaseRothamsted Research, UK
 

Tendances (20)

Building A Community Resource For The Life Sciences
Building A Community Resource For The Life SciencesBuilding A Community Resource For The Life Sciences
Building A Community Resource For The Life Sciences
 
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
ChemSpider – A Platform to Gather, Host and Integrate Structure Based Data Ac...
 
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
 
Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...
 
Integrating and curating internet based chemistry resources to serve life sci...
Integrating and curating internet based chemistry resources to serve life sci...Integrating and curating internet based chemistry resources to serve life sci...
Integrating and curating internet based chemistry resources to serve life sci...
 
RSC ChemSpider is the online chemistry database where community contributions...
RSC ChemSpider is the online chemistry database where community contributions...RSC ChemSpider is the online chemistry database where community contributions...
RSC ChemSpider is the online chemistry database where community contributions...
 
ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...
ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...
ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...
 
Can machines understand the scientific literature
Can machines understand the scientific literatureCan machines understand the scientific literature
Can machines understand the scientific literature
 
Better Data for a Better World
Better Data for a Better WorldBetter Data for a Better World
Better Data for a Better World
 
Ebi public meeting on internet chemistry databases november 2010
Ebi public meeting on internet chemistry databases november 2010Ebi public meeting on internet chemistry databases november 2010
Ebi public meeting on internet chemistry databases november 2010
 
Cheminformatics and the Structure Elucidation of Natural Products
Cheminformatics and the Structure Elucidation of Natural ProductsCheminformatics and the Structure Elucidation of Natural Products
Cheminformatics and the Structure Elucidation of Natural Products
 
Value of the mediawiki platform for providing content to the chemistry community
Value of the mediawiki platform for providing content to the chemistry communityValue of the mediawiki platform for providing content to the chemistry community
Value of the mediawiki platform for providing content to the chemistry community
 
How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...
 
ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP Project
 
ChemSpider – A Crowdsourcing Environment for Hosting and Validating Chemistry...
ChemSpider – A Crowdsourcing Environment for Hosting and Validating Chemistry...ChemSpider – A Crowdsourcing Environment for Hosting and Validating Chemistry...
ChemSpider – A Crowdsourcing Environment for Hosting and Validating Chemistry...
 
FAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use CaseFAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use Case
 
Our dire need to mandate data standards and expectations for scientific publi...
Our dire need to mandate data standards and expectations for scientific publi...Our dire need to mandate data standards and expectations for scientific publi...
Our dire need to mandate data standards and expectations for scientific publi...
 
ChemSpider – The Vision and Challenges Associated with Building a Free Online...
ChemSpider – The Vision and Challenges Associated with Building a Free Online...ChemSpider – The Vision and Challenges Associated with Building a Free Online...
ChemSpider – The Vision and Challenges Associated with Building a Free Online...
 
DCC Keynote 2007
DCC Keynote 2007DCC Keynote 2007
DCC Keynote 2007
 
The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...
 

En vedette

ChemSpider compound database as one of the pillars of a semantic web for …
ChemSpider compound database as one of the pillars of a semantic web for …ChemSpider compound database as one of the pillars of a semantic web for …
ChemSpider compound database as one of the pillars of a semantic web for …Valery Tkachenko
 
SEO: Getting Personal
SEO: Getting PersonalSEO: Getting Personal
SEO: Getting PersonalKirsty Hulse
 
10 Insightful Quotes On Designing A Better Customer Experience
10 Insightful Quotes On Designing A Better Customer Experience10 Insightful Quotes On Designing A Better Customer Experience
10 Insightful Quotes On Designing A Better Customer ExperienceYuan Wang
 
How to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media PlanHow to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media PlanPost Planner
 
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika AldabaLightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldabaux singapore
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsLinkedIn
 

En vedette (7)

ChemSpider compound database as one of the pillars of a semantic web for …
ChemSpider compound database as one of the pillars of a semantic web for …ChemSpider compound database as one of the pillars of a semantic web for …
ChemSpider compound database as one of the pillars of a semantic web for …
 
SEO: Getting Personal
SEO: Getting PersonalSEO: Getting Personal
SEO: Getting Personal
 
10 Insightful Quotes On Designing A Better Customer Experience
10 Insightful Quotes On Designing A Better Customer Experience10 Insightful Quotes On Designing A Better Customer Experience
10 Insightful Quotes On Designing A Better Customer Experience
 
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job? Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
 
How to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media PlanHow to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media Plan
 
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika AldabaLightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving Cars
 

Similaire à The expansive reach of ChemSpider as a resource for the chemistry community

ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryDr. Haxel Consult
 
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsKen Karapetyan
 
Building a semantic chemistry platform with the royal society of chemistry
Building a semantic chemistry platform with the royal society of chemistryBuilding a semantic chemistry platform with the royal society of chemistry
Building a semantic chemistry platform with the royal society of chemistryValery Tkachenko
 
Chemspider hosting linking and curating chemistry data for the community
Chemspider hosting linking and curating chemistry data for the communityChemspider hosting linking and curating chemistry data for the community
Chemspider hosting linking and curating chemistry data for the communityRoyal Society of Chemistry
 

Similaire à The expansive reach of ChemSpider as a resource for the chemistry community (20)

How the web has weaved a web of interlinked chemistry data final
How the web has weaved a web of interlinked chemistry data finalHow the web has weaved a web of interlinked chemistry data final
How the web has weaved a web of interlinked chemistry data final
 
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
 
Big data challenges associated with building a national data repository for c...
Big data challenges associated with building a national data repository for c...Big data challenges associated with building a national data repository for c...
Big data challenges associated with building a national data repository for c...
 
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platformsChemSpider – disseminating data and enabling an abundance of chemistry platforms
ChemSpider – disseminating data and enabling an abundance of chemistry platforms
 
AZ of Chemspider February 2011
AZ of Chemspider February 2011AZ of Chemspider February 2011
AZ of Chemspider February 2011
 
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
 
Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...
 
ChemSpider -Connecting and Curating Online Chemistry Resources
ChemSpider -Connecting and Curating Online Chemistry ResourcesChemSpider -Connecting and Curating Online Chemistry Resources
ChemSpider -Connecting and Curating Online Chemistry Resources
 
Whitney Symposium Lecture June 2008
Whitney Symposium Lecture June 2008Whitney Symposium Lecture June 2008
Whitney Symposium Lecture June 2008
 
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspnRSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
RSC ChemSpider Science Commons Symposium Pacific Northwest #scspn
 
Utilizing Online Databases for the Purpose of Structure Identification – Appr...
Utilizing Online Databases for the Purpose of Structure Identification – Appr...Utilizing Online Databases for the Purpose of Structure Identification – Appr...
Utilizing Online Databases for the Purpose of Structure Identification – Appr...
 
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
 
Sourcing high quality online data resources for computational toxicology
Sourcing high quality online data resources for computational toxicologySourcing high quality online data resources for computational toxicology
Sourcing high quality online data resources for computational toxicology
 
ChemSpider as an integration hub for interlinked chemistry data
ChemSpider as an integration hub for interlinked chemistry dataChemSpider as an integration hub for interlinked chemistry data
ChemSpider as an integration hub for interlinked chemistry data
 
Structure verification and elucidation using the ChemSpider database
Structure verification and elucidation using the ChemSpider databaseStructure verification and elucidation using the ChemSpider database
Structure verification and elucidation using the ChemSpider database
 
Crawling Across the Web of Chemistry Using ChemSpider
Crawling Across the Web of Chemistry Using ChemSpider Crawling Across the Web of Chemistry Using ChemSpider
Crawling Across the Web of Chemistry Using ChemSpider
 
ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
ChemSpider - Does Community Engagement work to Build a Quality Online Resourc...
 
Marrying ACDLabs technologies to eScience Projects at the Royal Society of C...
Marrying ACDLabs technologies to eScience Projects at the  Royal Society of C...Marrying ACDLabs technologies to eScience Projects at the  Royal Society of C...
Marrying ACDLabs technologies to eScience Projects at the Royal Society of C...
 
Building a semantic chemistry platform with the royal society of chemistry
Building a semantic chemistry platform with the royal society of chemistryBuilding a semantic chemistry platform with the royal society of chemistry
Building a semantic chemistry platform with the royal society of chemistry
 
Chemspider hosting linking and curating chemistry data for the community
Chemspider hosting linking and curating chemistry data for the communityChemspider hosting linking and curating chemistry data for the community
Chemspider hosting linking and curating chemistry data for the community
 

Dernier

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 

Dernier (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

The expansive reach of ChemSpider as a resource for the chemistry community

  • 1. The Expansive Reach of ChemSpider as a Resource for the Chemistry Community Antony Williams University of Oregon, April 24th 2013
  • 2. The World of Online Chemistry • Property databases • Compound aggregators • Screening assay results • Scientific publications • Encyclopedic articles (Wikipedia) • Metabolic pathway databases • ADME/Tox data – eTOX for example • Blogs/Wikis and Open Notebook Science
  • 3. We Have …Too Much Data!!!
  • 4. e-Science and Primary Data • How much data generated in a lab, that COULD go public, is lost forever?
  • 6. e-Science and Primary Data • How much data generated in a lab, that COULD go public, is lost forever? • Public Domain reference databases of value? – Syntheses – Properties – Spectra – CIFs – Images
  • 8. e-Science and Primary Data • How much data generated in a lab, that COULD go public, is lost forever? • Public Domain reference databases of value? – Syntheses – Properties – Spectra – CIFs – Images • Much of chemistry is chemical structure-based – where and how could we host these data?
  • 10. Crowdsourced “Annotations” • Users can add – Descriptions/Syntheses/Commentaries – Links to PubMed articles – Links to articles via DOIs – Add spectral data – Add Crystallographic Information Files – Add photos – Add MP3 files – Add Videos
  • 11.
  • 13. Chemistry Data online is messy • We have inherited errors • All public compound databases, including ours, have errors • “Incorrect” structures – assertions, timelines etc • “Incorrect” names associated with structures • Properties • Links • Publications • ENORMOUS CHALLENGE
  • 14. The Structure of Vitamin K?
  • 15. MeSH • A lipid cofactor that is required for normal blood clotting. Several forms of vitamin K have been identified: VITAMIN K 1 (phytomenadione) derived from plants, VITAMIN K 2 (menaquinone) from bacteria, and synthetic naphthoquinone provitamins, VITAMIN K 3 (menadione). Vitamin K 3 provitamins, after being alkylated in vivo, exhibit the antifibrinolytic activity of vitamin K. Green leafy vegetables, liver, cheese, butter, and egg yolk are good sources of vitamin K
  • 16. The Structure of Vitamin K1?
  • 17. What is the Structure of Vitamin K1?
  • 20.
  • 21.
  • 22.
  • 23.
  • 24. “2-methyl-3-(3,7,11,15-tetramethylhexadec-2- enyl)naphthalene-1,4-dione” • Variants of systematic names on PubChem – 2-methyl-3-[(E,7R,11R)-3,7,11,15-tetramethyl – 2-methyl-3-[(E,7S,11R)-3,7,11,15-tetramethyl – 2-methyl-3-[(E,7R,11S)-3,7,11,15-tetramethyl – 2-methyl-3-[(E,7S,11S)-3,7,11,15-tetramethyl – 2-methyl-3-[(E,11S)-3,7,11,15-tetramethyl – 2-methyl-3-[(E)-3,7,11,15-tetramethyl – 2-methyl-3-(3,7,11,15-tetramethyl – 2-methyl-3-[(E)-3,7,11,15-tetramethyl
  • 26. It’s all on Wikipedia…
  • 27. Chemistry on The Internet Is Messy
  • 31. What ELSE is Methane???
  • 35.
  • 36. Public Domain Databases • Our databases are a mess… • Non-curated databases are proliferating errors • We source and deposit data between databases • Original sources of errors hard to determine • Curation is time-consuming and challenging
  • 38. Crowdsourced Curation • Crowd-sourced curation: identify/tag errors, edit names, synonyms, identify records to deprecate
  • 43. Standards : Structure Standardization
  • 44. Standards : Structure Standardization
  • 45. Standards : Structure Standardization
  • 48. InChIStrings Hash to InChIKeys
  • 49. Vancomycin – Search the Internet
  • 53. Validated Name-Structure Dictionaries • Chemical name dictionaries are used for: • Text-mining (publications, patents) – Used to index PubMed and link to Google Patents • Linking to other databases – think Biology! – When structures are not available drug names link • Searching the web – Names link to structures link to InChIs
  • 54. I want to know about “Vincristine” If all algorithms work then everything on the page is correct by default except the name-structure relationship!
  • 56. Vincristine: Vendors and Sources Linked by Structure
  • 63. So you Want a Profile???
  • 64.
  • 65.
  • 67.
  • 68. PharmaSea • Dereplication via ChemSpider • Segregation of natural products datasets • Analytical data algorithms & integration – Mass spec searching – predicted fragmentation – NMR feature searching – NMR prediction – Computer-assisted structure elucidation
  • 69. It is so difficult to navigate… What’s the structure? What’s the structure? Are they in our file? Are they in our file? What’s similar? What’s similar? What’s the target? What’s the target?Pharmacology data? Pharmacology data? Known Pathways? Known Pathways? Working On Now? Working On Now?Connections to disease? Connections to disease? Expressed in right cell type? Expressed in right cell type? Competitors?Competitors? IP?IP?
  • 70. • 3-year Innovative Medicines Initiative project • Integrating chemistry and biology data using semantic web technologies • Open source code, open data and open standards • Academics, Pharma companies, Publishers….
  • 71. ChemSpider Contributions • The host of the chemistry services – Supplier of “standardized” chemical data files – Chemistry searching (structure, substructure etc) – Provider of data in RDF format – Curator and data quality checking • Now building the Open PHACTS chemical registration system
  • 72. ChemSpider Contributions • Supplier of chemistry UI components • “Quality Police” for data checking • Chemical Validation and Standardization Platform • Nanopublications from RSC publications
  • 73. Integrate to instruments and software • Integration to analytical instrumentation vendors already in place – Agilent, Bruker, Thermo, Waters • Also, Cheminformatics vendors link to ChemSpider – Accelrys, ACD/Labs, ChemAxon, iChemLabs, and…
  • 74. Natural Products Updates • Names hard, Structures “Obvious” • New content based on monthly updates of the database • Click through to the Natural Products Updates entry
  • 76. Chemical Database Service • National Chemical Database Service for UK Academics • Integrating Commercial Databases and Services • Chemicals, analytical data, prediction algorithms • Development of data repository
  • 77. Publications - a summary of work • Scientific publications are a summary of work – Is all work reported? – How much science is lost to pruning? – What of value sits in notebooks and is lost? • How much data is lost? – How many compounds never reported? – How many syntheses fail or succeed? – How many characterization measurements?
  • 78. Community Repository for Data • Funding agencies encourage sharing of data • Increasing availability of “Open Data” • Institutional repositories no specific domain support • Develop a community repository for chemistry data – private, public, embargoed • Provides data to develop models/algorithms
  • 79. Community Repository for Data • Automated depositions of data • DOI’ed data objects for citation purposes • A database of reference data, but validated by the community • National services feeding the repository – crystallography, mass spectrometry • Integrate to blogging tools for chemistry • Integrate to Electronic Lab Notebooks as feeds
  • 80. Model Building with Community Data • Community data as a basis of model building – Consume data from available databases, community data, new publications and build predictive algorithms for the community – How many algorithms are reported and lost? How much repeat work is done in the domain of algorithmic development?
  • 81. Pulling Data from our Archive • Our contribution to the world of chemistry data • DERA – digitally enabling the RSC archive – Text mining • Find chemicals, reactions, analytical data, properties – Algorithmic checking • Validate algorithmically what we can - robots – “Web 2.0 interfaces” for curating and validating
  • 82. What if we could capture it all? Digitally Enhancing the RSC Archive
  • 83. Data Validation and Curation Required Encouraging Participation with Rewards and RECOGNITION
  • 84. Manual Curation • Integrated commenting, curating and validation platform across ALL eScience and publishing platforms • All integrated to a central RSC profile and feeding the AltMetrics tools
  • 87. Where we are now…
  • 88. Rewards and Recognition Congratulations! Your 1st CSSP article has been published. Philosopher Lao Tzu said “A journey of a thousand miles begins with a single step”. In the same way we hope that this will be the first of many submissions that you make to CSSP. The First Step badge is awarded when a user submits (& has published) their 1st CSSP article.
  • 89. Future Recognition in AltMetrics? ChemSpider
  • 90. Internet Data The Future Commercial Software Pre-competitive Data Open Science Open Data Publishers Educators Open Databases Chemical Vendors Small organic molecules Undefined materials Organometallics Nanomaterials Polymers Minerals Particle bound Links to Biologicals
  • 91. The Future of Chemistry on the Web? • Public compound databases federate & build a linked environment of validated data! • Data validation needs are not ignored • Publishers layer on information to make publications discoverable • Public-Private databases can be linked • Open Data proliferate • The “Semantic Web” in action
  • 92. Acknowledgments • Valery Tkachenko and the eScience team • Our data providers, depositors, collaborators and curators • Software providers – OpenEye, ChemDoodle, ACD/Labs, GGA Software, Open Source (Jmol, JSpecView, OpenBabel)
  • 93. Thank you Email: williamsa@rsc.org Twitter: @ChemConnector Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams