SlideShare une entreprise Scribd logo
1  sur  17
[1]
Chemicalize.org, SureChemOpen, PubChem and
the InChIKey: A heavenly conjunction with
transformative utility
Christopher Southan, TW2Informatics, Göteborg, Sweden,
ChemAxon UGM, Budapest, May 2013
Image credit: http://www.eso.org/public/images/yb_vlt_moon_cnn_cc/
[2]
Dr Christopher Southan, Ph.D., M.Sc.,B.Sc.
TW2Informatics: http://www.cdsouthan.info/Consult/CDS_cons.htm
Mobile: +46(0)702-530710
Skype: cdsouthan
Email: cdsouthan@hotmail.com
Twitter: http://twitter.com/#!/cdsouthan
Blog: http://cdsouthan.blogspot.com/
LinkedIN: http://www.linkedin.com/in/cdsouthan
Publications: http://www.citeulike.org/user/cdsouthan/order/year,,/publications
Presentations: http://www.slideshare.net/cdsouthan
[3]
The ChemAxon name-to-struc functionality is not only a component of the SureChem
patent extraction pipeline but also powers chemicalize.org. Both operations are now
submitting sources to PubChem. The former has deposited structures that bring the
patent-extracted total in PC to 14.5 mill. CIDs. The deposition from chemicalize is
~0.3 mill., but has been actively selected by users and is 20% unique. The final
conjunction is that all three sources generate the InChIKey (IK) that turns Google into
a de-facto merge of PubChem and ChemSpider of ~50 mill. structures.
Chemicalize.org users can convert new patents, other external or internal documents
and web based text. Individual results can be Googled, searched against
SurChemOpen and bulk extractions triaged against PubChem. It thus becomes
possible to connect chemistry between patents, papers, abstracts and database
records via exact match or similarity searching. When SureChem and
chemicalize.org update their submissions, relationships with the other ~200 PubChem
sources (including ChEMBL and vendor databases) are re-computed and new CID
links made. The synergy between SureChem and chemicalize.org is powerful because
matches between them (~ 0.15 mill.) via SureChemOpen, give occurrence statistics
and the location of the structure within patents. The applications of chemicalize.org
are extended by web tools such as Venny for determining intersects from multiple
extractions and CheS-Mapper for cluster visualization. These utility expansions will be
illustrated by documents specifying BACE1 inhibitors for Alzheimer’s disease.
Abstract
[4]
Auspicious Conjunctions 2012-13
• PubChem: global chemistry to slice ‘n dice
• SureChemOpen: majority of patent chemistry opened up
• Chemicalize.org : chemistry extractable from any text toombs
• Chemical images: patents extracted in SureChemOpen, OSRA
handles papers
• InChIKey indexing in Google
• ChemSpider: crowdsourcing chemisty quality
• Exapnding toolbox e.g.OPSIN, Venny, Ches-mapper
• SciBite alerts
• Expanding preview and surfacing options e.g. ChEMBLntd, Github,
OSDD, Open Lab Books, figshare etc
• Rise of mobile chemistry
[5]
Databases <> structures < > documents
Abstracts
Patents
Papers
15 mill
0.2 mill (MeSH)
0.8 mill
(ChEMBL)
12K
Google InChIKey ~ 50 million
(47m PubChem + 33m
UniChem + 28m ChemSpider)
[6]
Triaging chemistry from text
• Identify the structure specification types, e.g.
– Semantic names (all sources)
– Code names (press releases, papers and abstracts)
– IUPAC names (papers, patents and abstracts)
– Images (papers, patents, & Google images)
– SMILES (open lab books)
– InChi strings (open lab books)
– SDF files (open lab books, & github)
Convert these to a structure (e.g. SDF, SMILES, InChI) then:
– Search InChIKey in Google
– Search major databases
– Search SureChemOpen
– Compare extracted sets for intersects and diffs
– Extend exact match connectivity with similarity searching
[7]
PubChem Composition
[8]
SureChemOpen Composition (in PubChem)
[9]
Chemicalize.org Composition (in PubChem)
[10]
BACE2 Conjunctions
[11]
BACE2 Conjunctions
[12]
Chemicalise.org Triage
[13]
BACE2 Conjunctions
1. WO2013054291 > chemicalize.org
2. Download 450 structures
3. Upload to PubChem search
[14]
Clustering document extraction sets: CheS-Mapper
[15]
Venny: intersects, diffs, de-dupes and merges
[16]
Conclusions
• Transformative opening up of chemistry > biology via structure >document
connectivity
• Open mining of patent metadata and data
• Expanding toolbox
• Inexorable expansion of open-access publishing
But;
• Journal chemistry extraction > database records still slow
• Text mining of journals still restricted
• Author annotation and direct db submission rare
• Pharmaceutical research publications are still blinding structures (see
PMID: 23159359)
[17]
References
http://www.slideshare.net/cdsouthan/the-patent-chemistry-big-bang-in-pubchem
http://www.slideshare.net/cdsouthan/cs-cax-bioitchemicalizeposter03apr
http://www.ncbi.nlm.nih.gov/pubmed/23399051
http://www.ncbi.nlm.nih.gov/pubmed/23618056
http://www.ncbi.nlm.nih.gov/pubmed/23506624

Contenu connexe

En vedette

EUGM 2013 - Eufrozina Hoffmann (ChemAxon): Marvin extending the scope of usab...
EUGM 2013 - Eufrozina Hoffmann (ChemAxon): Marvin extending the scope of usab...EUGM 2013 - Eufrozina Hoffmann (ChemAxon): Marvin extending the scope of usab...
EUGM 2013 - Eufrozina Hoffmann (ChemAxon): Marvin extending the scope of usab...ChemAxon
 
EUGM 2013 - Andras Stracz (ChemAxon) - ChemAxon Plexus: A desktop application...
EUGM 2013 - Andras Stracz (ChemAxon) - ChemAxon Plexus: A desktop application...EUGM 2013 - Andras Stracz (ChemAxon) - ChemAxon Plexus: A desktop application...
EUGM 2013 - Andras Stracz (ChemAxon) - ChemAxon Plexus: A desktop application...ChemAxon
 
EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...
EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...
EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...ChemAxon
 
EUGM 2013 - Sergio H. Rotstein (Pfizer): What about the “big guys”? The emerg...
EUGM 2013 - Sergio H. Rotstein (Pfizer): What about the “big guys”? The emerg...EUGM 2013 - Sergio H. Rotstein (Pfizer): What about the “big guys”? The emerg...
EUGM 2013 - Sergio H. Rotstein (Pfizer): What about the “big guys”? The emerg...ChemAxon
 
EUGM 2013 - Peter Englert, Peter Kovacs (ChemAxon) - The Next Generation of M...
EUGM 2013 - Peter Englert, Peter Kovacs (ChemAxon) - The Next Generation of M...EUGM 2013 - Peter Englert, Peter Kovacs (ChemAxon) - The Next Generation of M...
EUGM 2013 - Peter Englert, Peter Kovacs (ChemAxon) - The Next Generation of M...ChemAxon
 
EUGM 2013 - Steve Hajkowski (Thomson Reuters): Patent analytics - what can Ma...
EUGM 2013 - Steve Hajkowski (Thomson Reuters): Patent analytics - what can Ma...EUGM 2013 - Steve Hajkowski (Thomson Reuters): Patent analytics - what can Ma...
EUGM 2013 - Steve Hajkowski (Thomson Reuters): Patent analytics - what can Ma...ChemAxon
 
EUGM 2013 - Michael Dippolito (Deltasoft): Great Migrations! – Approaches to ...
EUGM 2013 - Michael Dippolito (Deltasoft): Great Migrations! – Approaches to ...EUGM 2013 - Michael Dippolito (Deltasoft): Great Migrations! – Approaches to ...
EUGM 2013 - Michael Dippolito (Deltasoft): Great Migrations! – Approaches to ...ChemAxon
 
EUGM 2013 - Björn Windshügel (European ScreeningPort): Chemoinformatic tools ...
EUGM 2013 - Björn Windshügel (European ScreeningPort): Chemoinformatic tools ...EUGM 2013 - Björn Windshügel (European ScreeningPort): Chemoinformatic tools ...
EUGM 2013 - Björn Windshügel (European ScreeningPort): Chemoinformatic tools ...ChemAxon
 
EUGM 2013 - Gyorgy Pirok (ChemAxon) - Prediction of Xenobiotic Metabolism
EUGM 2013 - Gyorgy Pirok (ChemAxon) - Prediction of Xenobiotic MetabolismEUGM 2013 - Gyorgy Pirok (ChemAxon) - Prediction of Xenobiotic Metabolism
EUGM 2013 - Gyorgy Pirok (ChemAxon) - Prediction of Xenobiotic MetabolismChemAxon
 
EUGM 2013 - Ian Berry, Bob Marmon (Evotec): Classification and analysis of 21...
EUGM 2013 - Ian Berry, Bob Marmon (Evotec): Classification and analysis of 21...EUGM 2013 - Ian Berry, Bob Marmon (Evotec): Classification and analysis of 21...
EUGM 2013 - Ian Berry, Bob Marmon (Evotec): Classification and analysis of 21...ChemAxon
 
EUGM 2013 - Bernd Rupp (FMP) Chemical Information systems: From compound coll...
EUGM 2013 - Bernd Rupp (FMP) Chemical Information systems: From compound coll...EUGM 2013 - Bernd Rupp (FMP) Chemical Information systems: From compound coll...
EUGM 2013 - Bernd Rupp (FMP) Chemical Information systems: From compound coll...ChemAxon
 
EUGM 2013 - Jon Patterson (ChemAxon) ChemAxon Platform for Scientists
EUGM 2013 - Jon Patterson (ChemAxon) ChemAxon Platform for Scientists EUGM 2013 - Jon Patterson (ChemAxon) ChemAxon Platform for Scientists
EUGM 2013 - Jon Patterson (ChemAxon) ChemAxon Platform for Scientists ChemAxon
 
EUGM 2013 - Anna Tomin (ChemAxon) - Reaction Library Design
EUGM 2013 - Anna Tomin (ChemAxon) - Reaction Library DesignEUGM 2013 - Anna Tomin (ChemAxon) - Reaction Library Design
EUGM 2013 - Anna Tomin (ChemAxon) - Reaction Library DesignChemAxon
 
EUGM 2013 - Timea Polgar (ChemAxon) - 3D visualization for medicinal chemists
EUGM 2013 - Timea Polgar (ChemAxon) - 3D visualization for medicinal chemistsEUGM 2013 - Timea Polgar (ChemAxon) - 3D visualization for medicinal chemists
EUGM 2013 - Timea Polgar (ChemAxon) - 3D visualization for medicinal chemistsChemAxon
 
20 million public patent structures: looking at the gift horse
20 million public patent structures: looking at the gift horse20 million public patent structures: looking at the gift horse
20 million public patent structures: looking at the gift horseChris Southan
 
Southan real drugs_paris_oct_11_2014
Southan real drugs_paris_oct_11_2014Southan real drugs_paris_oct_11_2014
Southan real drugs_paris_oct_11_2014Chris Southan
 

En vedette (17)

EUGM 2013 - Eufrozina Hoffmann (ChemAxon): Marvin extending the scope of usab...
EUGM 2013 - Eufrozina Hoffmann (ChemAxon): Marvin extending the scope of usab...EUGM 2013 - Eufrozina Hoffmann (ChemAxon): Marvin extending the scope of usab...
EUGM 2013 - Eufrozina Hoffmann (ChemAxon): Marvin extending the scope of usab...
 
EUGM 2013 - Andras Stracz (ChemAxon) - ChemAxon Plexus: A desktop application...
EUGM 2013 - Andras Stracz (ChemAxon) - ChemAxon Plexus: A desktop application...EUGM 2013 - Andras Stracz (ChemAxon) - ChemAxon Plexus: A desktop application...
EUGM 2013 - Andras Stracz (ChemAxon) - ChemAxon Plexus: A desktop application...
 
EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...
EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...
EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...
 
EUGM 2013 - Sergio H. Rotstein (Pfizer): What about the “big guys”? The emerg...
EUGM 2013 - Sergio H. Rotstein (Pfizer): What about the “big guys”? The emerg...EUGM 2013 - Sergio H. Rotstein (Pfizer): What about the “big guys”? The emerg...
EUGM 2013 - Sergio H. Rotstein (Pfizer): What about the “big guys”? The emerg...
 
EUGM 2013 - Peter Englert, Peter Kovacs (ChemAxon) - The Next Generation of M...
EUGM 2013 - Peter Englert, Peter Kovacs (ChemAxon) - The Next Generation of M...EUGM 2013 - Peter Englert, Peter Kovacs (ChemAxon) - The Next Generation of M...
EUGM 2013 - Peter Englert, Peter Kovacs (ChemAxon) - The Next Generation of M...
 
EUGM 2013 - Steve Hajkowski (Thomson Reuters): Patent analytics - what can Ma...
EUGM 2013 - Steve Hajkowski (Thomson Reuters): Patent analytics - what can Ma...EUGM 2013 - Steve Hajkowski (Thomson Reuters): Patent analytics - what can Ma...
EUGM 2013 - Steve Hajkowski (Thomson Reuters): Patent analytics - what can Ma...
 
EUGM 2013 - Michael Dippolito (Deltasoft): Great Migrations! – Approaches to ...
EUGM 2013 - Michael Dippolito (Deltasoft): Great Migrations! – Approaches to ...EUGM 2013 - Michael Dippolito (Deltasoft): Great Migrations! – Approaches to ...
EUGM 2013 - Michael Dippolito (Deltasoft): Great Migrations! – Approaches to ...
 
EUGM 2013 - Björn Windshügel (European ScreeningPort): Chemoinformatic tools ...
EUGM 2013 - Björn Windshügel (European ScreeningPort): Chemoinformatic tools ...EUGM 2013 - Björn Windshügel (European ScreeningPort): Chemoinformatic tools ...
EUGM 2013 - Björn Windshügel (European ScreeningPort): Chemoinformatic tools ...
 
EUGM 2013 - Gyorgy Pirok (ChemAxon) - Prediction of Xenobiotic Metabolism
EUGM 2013 - Gyorgy Pirok (ChemAxon) - Prediction of Xenobiotic MetabolismEUGM 2013 - Gyorgy Pirok (ChemAxon) - Prediction of Xenobiotic Metabolism
EUGM 2013 - Gyorgy Pirok (ChemAxon) - Prediction of Xenobiotic Metabolism
 
EUGM 2013 - Ian Berry, Bob Marmon (Evotec): Classification and analysis of 21...
EUGM 2013 - Ian Berry, Bob Marmon (Evotec): Classification and analysis of 21...EUGM 2013 - Ian Berry, Bob Marmon (Evotec): Classification and analysis of 21...
EUGM 2013 - Ian Berry, Bob Marmon (Evotec): Classification and analysis of 21...
 
EUGM 2013 - Bernd Rupp (FMP) Chemical Information systems: From compound coll...
EUGM 2013 - Bernd Rupp (FMP) Chemical Information systems: From compound coll...EUGM 2013 - Bernd Rupp (FMP) Chemical Information systems: From compound coll...
EUGM 2013 - Bernd Rupp (FMP) Chemical Information systems: From compound coll...
 
EUGM 2013 - Jon Patterson (ChemAxon) ChemAxon Platform for Scientists
EUGM 2013 - Jon Patterson (ChemAxon) ChemAxon Platform for Scientists EUGM 2013 - Jon Patterson (ChemAxon) ChemAxon Platform for Scientists
EUGM 2013 - Jon Patterson (ChemAxon) ChemAxon Platform for Scientists
 
EUGM 2013 - Anna Tomin (ChemAxon) - Reaction Library Design
EUGM 2013 - Anna Tomin (ChemAxon) - Reaction Library DesignEUGM 2013 - Anna Tomin (ChemAxon) - Reaction Library Design
EUGM 2013 - Anna Tomin (ChemAxon) - Reaction Library Design
 
EUGM 2013 - Timea Polgar (ChemAxon) - 3D visualization for medicinal chemists
EUGM 2013 - Timea Polgar (ChemAxon) - 3D visualization for medicinal chemistsEUGM 2013 - Timea Polgar (ChemAxon) - 3D visualization for medicinal chemists
EUGM 2013 - Timea Polgar (ChemAxon) - 3D visualization for medicinal chemists
 
biologydriven
biologydrivenbiologydriven
biologydriven
 
20 million public patent structures: looking at the gift horse
20 million public patent structures: looking at the gift horse20 million public patent structures: looking at the gift horse
20 million public patent structures: looking at the gift horse
 
Southan real drugs_paris_oct_11_2014
Southan real drugs_paris_oct_11_2014Southan real drugs_paris_oct_11_2014
Southan real drugs_paris_oct_11_2014
 

Similaire à EUGM 2013 - Christopher Southan (TW2Informatics): Chemicalize.org, SureChemOpen, PubChem and the InChIKey: A heavenly conjunction with transformative utility

A Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and WikidataA Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and Wikidatapetermurrayrust
 
Closing the gap between chemistry and biology: Joining between text tombs and...
Closing the gap between chemistry and biology: Joining between text tombs and...Closing the gap between chemistry and biology: Joining between text tombs and...
Closing the gap between chemistry and biology: Joining between text tombs and...Chris Southan
 
Content Mining for Machines and Humans
Content Mining for Machines and HumansContent Mining for Machines and Humans
Content Mining for Machines and Humanspetermurrayrust
 
Content Mining for Machines and Humans
Content Mining for Machines and HumansContent Mining for Machines and Humans
Content Mining for Machines and HumansTheContentMine
 
Big Data and ContentMining for Libraries
Big Data and ContentMining for LibrariesBig Data and ContentMining for Libraries
Big Data and ContentMining for Librariespetermurrayrust
 
Connecting Bioactive Chemistry Across Documents and Databases
Connecting Bioactive Chemistry Across Documents and Databases Connecting Bioactive Chemistry Across Documents and Databases
Connecting Bioactive Chemistry Across Documents and Databases Chris Southan
 
Paradise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to MineParadise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to Minepetermurrayrust
 
ContentMining for Synthetic Biology
ContentMining for Synthetic BiologyContentMining for Synthetic Biology
ContentMining for Synthetic BiologyTheContentMine
 
ContentMining for Synthetic Biology
ContentMining for Synthetic BiologyContentMining for Synthetic Biology
ContentMining for Synthetic Biologypetermurrayrust
 
ContentMine and WikiData
ContentMine and WikiDataContentMine and WikiData
ContentMine and WikiDatapetermurrayrust
 
ContentMine and WikiData
ContentMine and WikiDataContentMine and WikiData
ContentMine and WikiDataTheContentMine
 
Scooteroer pg cert talk introduction to open education by v rolfe sept11
Scooteroer pg cert talk introduction to open education by v rolfe sept11Scooteroer pg cert talk introduction to open education by v rolfe sept11
Scooteroer pg cert talk introduction to open education by v rolfe sept11Vivien Rolfe
 
Berlin 6 Open Access Conference: Tony Hey
Berlin 6 Open Access Conference: Tony HeyBerlin 6 Open Access Conference: Tony Hey
Berlin 6 Open Access Conference: Tony HeyCornelius Puschmann
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themRoss Mounce
 
Open Data HK: open science meets open data. A primer from Scott Edmunds
Open Data HK: open science meets open data. A primer from Scott EdmundsOpen Data HK: open science meets open data. A primer from Scott Edmunds
Open Data HK: open science meets open data. A primer from Scott EdmundsScott Edmunds
 
Why ContentMining is useful
Why ContentMining is usefulWhy ContentMining is useful
Why ContentMining is usefulTheContentMine
 
Why ContentMining is useful
Why ContentMining is usefulWhy ContentMining is useful
Why ContentMining is usefulpetermurrayrust
 
Open Educational Resources (OER) and OpenCourseWare (OCW)
Open Educational Resources (OER) and OpenCourseWare (OCW)Open Educational Resources (OER) and OpenCourseWare (OCW)
Open Educational Resources (OER) and OpenCourseWare (OCW)Anne Arendt
 
Open PHACTS April 2017 Science webinar Workflow tools
Open PHACTS April 2017 Science webinar Workflow toolsOpen PHACTS April 2017 Science webinar Workflow tools
Open PHACTS April 2017 Science webinar Workflow toolsopen_phacts
 
Towards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectiveTowards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectivepetermurrayrust
 

Similaire à EUGM 2013 - Christopher Southan (TW2Informatics): Chemicalize.org, SureChemOpen, PubChem and the InChIKey: A heavenly conjunction with transformative utility (20)

A Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and WikidataA Global Commons for Scientific Data: Molecules and Wikidata
A Global Commons for Scientific Data: Molecules and Wikidata
 
Closing the gap between chemistry and biology: Joining between text tombs and...
Closing the gap between chemistry and biology: Joining between text tombs and...Closing the gap between chemistry and biology: Joining between text tombs and...
Closing the gap between chemistry and biology: Joining between text tombs and...
 
Content Mining for Machines and Humans
Content Mining for Machines and HumansContent Mining for Machines and Humans
Content Mining for Machines and Humans
 
Content Mining for Machines and Humans
Content Mining for Machines and HumansContent Mining for Machines and Humans
Content Mining for Machines and Humans
 
Big Data and ContentMining for Libraries
Big Data and ContentMining for LibrariesBig Data and ContentMining for Libraries
Big Data and ContentMining for Libraries
 
Connecting Bioactive Chemistry Across Documents and Databases
Connecting Bioactive Chemistry Across Documents and Databases Connecting Bioactive Chemistry Across Documents and Databases
Connecting Bioactive Chemistry Across Documents and Databases
 
Paradise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to MineParadise Lost and The Right to Read is the Right to Mine
Paradise Lost and The Right to Read is the Right to Mine
 
ContentMining for Synthetic Biology
ContentMining for Synthetic BiologyContentMining for Synthetic Biology
ContentMining for Synthetic Biology
 
ContentMining for Synthetic Biology
ContentMining for Synthetic BiologyContentMining for Synthetic Biology
ContentMining for Synthetic Biology
 
ContentMine and WikiData
ContentMine and WikiDataContentMine and WikiData
ContentMine and WikiData
 
ContentMine and WikiData
ContentMine and WikiDataContentMine and WikiData
ContentMine and WikiData
 
Scooteroer pg cert talk introduction to open education by v rolfe sept11
Scooteroer pg cert talk introduction to open education by v rolfe sept11Scooteroer pg cert talk introduction to open education by v rolfe sept11
Scooteroer pg cert talk introduction to open education by v rolfe sept11
 
Berlin 6 Open Access Conference: Tony Hey
Berlin 6 Open Access Conference: Tony HeyBerlin 6 Open Access Conference: Tony Hey
Berlin 6 Open Access Conference: Tony Hey
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on them
 
Open Data HK: open science meets open data. A primer from Scott Edmunds
Open Data HK: open science meets open data. A primer from Scott EdmundsOpen Data HK: open science meets open data. A primer from Scott Edmunds
Open Data HK: open science meets open data. A primer from Scott Edmunds
 
Why ContentMining is useful
Why ContentMining is usefulWhy ContentMining is useful
Why ContentMining is useful
 
Why ContentMining is useful
Why ContentMining is usefulWhy ContentMining is useful
Why ContentMining is useful
 
Open Educational Resources (OER) and OpenCourseWare (OCW)
Open Educational Resources (OER) and OpenCourseWare (OCW)Open Educational Resources (OER) and OpenCourseWare (OCW)
Open Educational Resources (OER) and OpenCourseWare (OCW)
 
Open PHACTS April 2017 Science webinar Workflow tools
Open PHACTS April 2017 Science webinar Workflow toolsOpen PHACTS April 2017 Science webinar Workflow tools
Open PHACTS April 2017 Science webinar Workflow tools
 
Towards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspectiveTowards Responsible Content Mining: A Cambridge perspective
Towards Responsible Content Mining: A Cambridge perspective
 

Plus de ChemAxon

Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?
Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?
Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?ChemAxon
 
Chemaxon EU UGM 2022 | Translating data to predictive models
Chemaxon EU UGM 2022 | Translating data to predictive modelsChemaxon EU UGM 2022 | Translating data to predictive models
Chemaxon EU UGM 2022 | Translating data to predictive modelsChemAxon
 
Translating data to predictive models
Translating data to predictive modelsTranslating data to predictive models
Translating data to predictive modelsChemAxon
 
Efficient biomolecular structural data handling and analysis - Webinar with D...
Efficient biomolecular structural data handling and analysis - Webinar with D...Efficient biomolecular structural data handling and analysis - Webinar with D...
Efficient biomolecular structural data handling and analysis - Webinar with D...ChemAxon
 
Biomolecule structural data management
Biomolecule structural data managementBiomolecule structural data management
Biomolecule structural data managementChemAxon
 
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first release
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first releaseCheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first release
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first releaseChemAxon
 
Enhanced stereochemistry representation
Enhanced stereochemistry representation Enhanced stereochemistry representation
Enhanced stereochemistry representation ChemAxon
 
Intellectual property (IP) intelligence solutions designed for the way resear...
Intellectual property (IP) intelligence solutions designed for the way resear...Intellectual property (IP) intelligence solutions designed for the way resear...
Intellectual property (IP) intelligence solutions designed for the way resear...ChemAxon
 
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...ChemAxon
 
Patent Data for Artificial Intelligence based Drug Discovery
Patent Data for Artificial Intelligence based Drug DiscoveryPatent Data for Artificial Intelligence based Drug Discovery
Patent Data for Artificial Intelligence based Drug DiscoveryChemAxon
 
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...ChemAxon
 
Research data management on the cloud
Research data management on the cloudResearch data management on the cloud
Research data management on the cloudChemAxon
 
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound Registration
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound RegistrationCheminfo Stories APAC 2020 - Introducing Design Hub & Compound Registration
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound RegistrationChemAxon
 
Cheminfo Stories APAC 2020 - JChem Engines introduction
Cheminfo Stories APAC 2020 - JChem Engines introduction Cheminfo Stories APAC 2020 - JChem Engines introduction
Cheminfo Stories APAC 2020 - JChem Engines introduction ChemAxon
 
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...ChemAxon
 
Cheminfo Stories APAC 2020 -- Markush technology
Cheminfo Stories APAC 2020 -- Markush technology Cheminfo Stories APAC 2020 -- Markush technology
Cheminfo Stories APAC 2020 -- Markush technology ChemAxon
 
JChem Microservices
JChem MicroservicesJChem Microservices
JChem MicroservicesChemAxon
 
Migration from joc to jpc or choral
Migration from joc to jpc or choralMigration from joc to jpc or choral
Migration from joc to jpc or choralChemAxon
 
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5ChemAxon
 
Chemicalize Pro - Cheminfo Stories 2020 Day 5
Chemicalize Pro - Cheminfo Stories 2020 Day 5Chemicalize Pro - Cheminfo Stories 2020 Day 5
Chemicalize Pro - Cheminfo Stories 2020 Day 5ChemAxon
 

Plus de ChemAxon (20)

Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?
Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?
Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?
 
Chemaxon EU UGM 2022 | Translating data to predictive models
Chemaxon EU UGM 2022 | Translating data to predictive modelsChemaxon EU UGM 2022 | Translating data to predictive models
Chemaxon EU UGM 2022 | Translating data to predictive models
 
Translating data to predictive models
Translating data to predictive modelsTranslating data to predictive models
Translating data to predictive models
 
Efficient biomolecular structural data handling and analysis - Webinar with D...
Efficient biomolecular structural data handling and analysis - Webinar with D...Efficient biomolecular structural data handling and analysis - Webinar with D...
Efficient biomolecular structural data handling and analysis - Webinar with D...
 
Biomolecule structural data management
Biomolecule structural data managementBiomolecule structural data management
Biomolecule structural data management
 
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first release
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first releaseCheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first release
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first release
 
Enhanced stereochemistry representation
Enhanced stereochemistry representation Enhanced stereochemistry representation
Enhanced stereochemistry representation
 
Intellectual property (IP) intelligence solutions designed for the way resear...
Intellectual property (IP) intelligence solutions designed for the way resear...Intellectual property (IP) intelligence solutions designed for the way resear...
Intellectual property (IP) intelligence solutions designed for the way resear...
 
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
 
Patent Data for Artificial Intelligence based Drug Discovery
Patent Data for Artificial Intelligence based Drug DiscoveryPatent Data for Artificial Intelligence based Drug Discovery
Patent Data for Artificial Intelligence based Drug Discovery
 
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
 
Research data management on the cloud
Research data management on the cloudResearch data management on the cloud
Research data management on the cloud
 
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound Registration
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound RegistrationCheminfo Stories APAC 2020 - Introducing Design Hub & Compound Registration
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound Registration
 
Cheminfo Stories APAC 2020 - JChem Engines introduction
Cheminfo Stories APAC 2020 - JChem Engines introduction Cheminfo Stories APAC 2020 - JChem Engines introduction
Cheminfo Stories APAC 2020 - JChem Engines introduction
 
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...
 
Cheminfo Stories APAC 2020 -- Markush technology
Cheminfo Stories APAC 2020 -- Markush technology Cheminfo Stories APAC 2020 -- Markush technology
Cheminfo Stories APAC 2020 -- Markush technology
 
JChem Microservices
JChem MicroservicesJChem Microservices
JChem Microservices
 
Migration from joc to jpc or choral
Migration from joc to jpc or choralMigration from joc to jpc or choral
Migration from joc to jpc or choral
 
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5
 
Chemicalize Pro - Cheminfo Stories 2020 Day 5
Chemicalize Pro - Cheminfo Stories 2020 Day 5Chemicalize Pro - Cheminfo Stories 2020 Day 5
Chemicalize Pro - Cheminfo Stories 2020 Day 5
 

Dernier

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 

Dernier (20)

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 

EUGM 2013 - Christopher Southan (TW2Informatics): Chemicalize.org, SureChemOpen, PubChem and the InChIKey: A heavenly conjunction with transformative utility

  • 1. [1] Chemicalize.org, SureChemOpen, PubChem and the InChIKey: A heavenly conjunction with transformative utility Christopher Southan, TW2Informatics, Göteborg, Sweden, ChemAxon UGM, Budapest, May 2013 Image credit: http://www.eso.org/public/images/yb_vlt_moon_cnn_cc/
  • 2. [2] Dr Christopher Southan, Ph.D., M.Sc.,B.Sc. TW2Informatics: http://www.cdsouthan.info/Consult/CDS_cons.htm Mobile: +46(0)702-530710 Skype: cdsouthan Email: cdsouthan@hotmail.com Twitter: http://twitter.com/#!/cdsouthan Blog: http://cdsouthan.blogspot.com/ LinkedIN: http://www.linkedin.com/in/cdsouthan Publications: http://www.citeulike.org/user/cdsouthan/order/year,,/publications Presentations: http://www.slideshare.net/cdsouthan
  • 3. [3] The ChemAxon name-to-struc functionality is not only a component of the SureChem patent extraction pipeline but also powers chemicalize.org. Both operations are now submitting sources to PubChem. The former has deposited structures that bring the patent-extracted total in PC to 14.5 mill. CIDs. The deposition from chemicalize is ~0.3 mill., but has been actively selected by users and is 20% unique. The final conjunction is that all three sources generate the InChIKey (IK) that turns Google into a de-facto merge of PubChem and ChemSpider of ~50 mill. structures. Chemicalize.org users can convert new patents, other external or internal documents and web based text. Individual results can be Googled, searched against SurChemOpen and bulk extractions triaged against PubChem. It thus becomes possible to connect chemistry between patents, papers, abstracts and database records via exact match or similarity searching. When SureChem and chemicalize.org update their submissions, relationships with the other ~200 PubChem sources (including ChEMBL and vendor databases) are re-computed and new CID links made. The synergy between SureChem and chemicalize.org is powerful because matches between them (~ 0.15 mill.) via SureChemOpen, give occurrence statistics and the location of the structure within patents. The applications of chemicalize.org are extended by web tools such as Venny for determining intersects from multiple extractions and CheS-Mapper for cluster visualization. These utility expansions will be illustrated by documents specifying BACE1 inhibitors for Alzheimer’s disease. Abstract
  • 4. [4] Auspicious Conjunctions 2012-13 • PubChem: global chemistry to slice ‘n dice • SureChemOpen: majority of patent chemistry opened up • Chemicalize.org : chemistry extractable from any text toombs • Chemical images: patents extracted in SureChemOpen, OSRA handles papers • InChIKey indexing in Google • ChemSpider: crowdsourcing chemisty quality • Exapnding toolbox e.g.OPSIN, Venny, Ches-mapper • SciBite alerts • Expanding preview and surfacing options e.g. ChEMBLntd, Github, OSDD, Open Lab Books, figshare etc • Rise of mobile chemistry
  • 5. [5] Databases <> structures < > documents Abstracts Patents Papers 15 mill 0.2 mill (MeSH) 0.8 mill (ChEMBL) 12K Google InChIKey ~ 50 million (47m PubChem + 33m UniChem + 28m ChemSpider)
  • 6. [6] Triaging chemistry from text • Identify the structure specification types, e.g. – Semantic names (all sources) – Code names (press releases, papers and abstracts) – IUPAC names (papers, patents and abstracts) – Images (papers, patents, & Google images) – SMILES (open lab books) – InChi strings (open lab books) – SDF files (open lab books, & github) Convert these to a structure (e.g. SDF, SMILES, InChI) then: – Search InChIKey in Google – Search major databases – Search SureChemOpen – Compare extracted sets for intersects and diffs – Extend exact match connectivity with similarity searching
  • 13. [13] BACE2 Conjunctions 1. WO2013054291 > chemicalize.org 2. Download 450 structures 3. Upload to PubChem search
  • 15. [15] Venny: intersects, diffs, de-dupes and merges
  • 16. [16] Conclusions • Transformative opening up of chemistry > biology via structure >document connectivity • Open mining of patent metadata and data • Expanding toolbox • Inexorable expansion of open-access publishing But; • Journal chemistry extraction > database records still slow • Text mining of journals still restricted • Author annotation and direct db submission rare • Pharmaceutical research publications are still blinding structures (see PMID: 23159359)