SlideShare une entreprise Scribd logo
1  sur  19
Integrating patent chemistry with
public research resources


Andrew Hinton, PhD         ICIC 2012
Christopher Southan, PhD   17 October
Evan Bolton, PhD
Nicko Goncharoff
SureChem Data Collection

Database of automatically mined structure data
from text and images

•20M annotated US, EP, WO full text records
and Japan patent abstracts
•12.8M unique chemical structures
                            I
•MEDLINE – 19M abstracts (upcoming)
 Free resource for researchers          Professional search needs
 Enables linking to public and          Data export, alerts, patent family
  proprietary content                     search, chemical relevance filters…




                          API or Data Feed access to
                           chemistry & full text
                          Integrate with internal
                           databases & workflows
Chemistry Mining Workflow
Public Patent Chemistry – A
   Changing Landscape
SureChem Depositing All*
 Structures into PubChem – Q4
              2012




•1976 to present
•Deposition of structures only
•Currently ‘on hold’
•Will link to patents in SureChemOpen
           * After filtering of fragments and highly common chemistry
Compounds Derived from Patents and Literature found in PubChem
                                 By Molecular Weight Range (MWT) and Source
                                   Compounds Dervied from Patents and Literature found in
                                   PubChem Banded by Molecular Weight Range and Source
                                                                                                        *8.29M
                       9,000,000
                                                                                                     Drug-like 66%
                       8,000,000                                                                          600-700
Compounds in PubChem




                                                                                                         500-600
                       7,000,000                                                                          MWT
                       6,000,000
                                                                                                         400-500
                                                                                      3.99M               MWT
                       5,000,000                                      3.80M
                                                                                   Drug-like 60%
                                                                   Drug-like 62%
                       4,000,000
                                                      2.36M                                              300-400
                       3,000,000                   Drug-like 51%                                          MWT

                       2,000,000      0.76M
                       1,000,000
                                   Drug-like 69%                                                          200-300


                                                                                                          100-200
                              0

                                   ChEMBL             IBM            Thomson        SCRIPDB            SureChem
                                                                      Pharma
                                                                                           *Provisional Numbers
                                                                       Source
SureChem Deposition Pushes
PubChem to 40 Million Compounds
Uniques and Overlaps
SC - SCRIPDB                  SC - IBM




     1.5M                         1.2M




SC - TPharma                SC - ChEMBL




    0.9M                           0.1M
ChEMBL overlaps with Patent
   Sources in PubChem
Intersects – Patent Document
                  View
       (2 Examples – SC & IBM)
SureChem Total: 776 IBM Total : 527
                                        US583593, Inhibitors of squalene
                                             synthetase and protein
                                          farnesyltransferase. Abbott


   478        298       229           SureChem Total: 832   IBM Total: 239




                                             686      146        93
       WO-1994018188-A1
 4-hydroxy-benzopyran-2-ones and 4-
  hydroxy-cycloalkyl[b]pyran-2-ones
   HIV protease inhibitors, Upjohn
Identifying Relevant Chemistry -
               IC 50
    US-20120035195-A1 BACE2, Hoffman LaRoche
Structures with IC 50 Values
         US-20120035195-A1




 PDF      SureChemOpen       Excel
Search IC 50 Structures in PubChem

              search
SureChem Unique Contribution


                SureChem                       Pubchem
                                     96     (ThomsonPharma ,
                   79
                                               Chemicalize)




  Stage                              No. of Structures
  Available from SureChem (SC)       1848
  Pre-Exist in PubChem               669
  Pre-Exist – not from IC 50 table   573
  Pre-Exist – from IC 50 table       96 (12 from TP + 84 via chemicalize.org)
  Unique-SC with IC 50               79
  Unique-SC – beyond IC 50 table     1100
SureChem Chemical Relevance
                Filtering
•   Frequency counts of chemicals within patents
•   Additional molecular property filtering and structural alerts
•   Structural identification of “Likely Exemplars”
•   Natural Language Processing – based indexing of Exemplified Compounds

             Automated indexing of Exemplified Compounds in text
Conclusions
SureChem deposition into PubChem:

  – Significantly expands public patent chemistry scope
  – Contributes unique and timely MedChem-relevant
    data
  – Enables open drug discovery and chemical biology
  – Advances progress toward a more open, federated
    chemical information network
SureChem is a product from Digital Science

Contenu connexe

En vedette

Seminario Ju Jitsu Patxi
Seminario Ju Jitsu PatxiSeminario Ju Jitsu Patxi
Seminario Ju Jitsu Patxifirencir
 
Hanshi Ross
Hanshi RossHanshi Ross
Hanshi Rossfirencir
 
Curso Tenerife 2010
Curso Tenerife 2010Curso Tenerife 2010
Curso Tenerife 2010firencir
 
Wireless Cyber Warfare
Wireless Cyber WarfareWireless Cyber Warfare
Wireless Cyber Warfareideaflashed
 
Digital Forensic tools - Application Specific
Digital Forensic tools - Application SpecificDigital Forensic tools - Application Specific
Digital Forensic tools - Application Specificideaflashed
 

En vedette (11)

Seminario Ju Jitsu Patxi
Seminario Ju Jitsu PatxiSeminario Ju Jitsu Patxi
Seminario Ju Jitsu Patxi
 
database.pdf
database.pdfdatabase.pdf
database.pdf
 
Who's Getting Funded?
Who's Getting Funded?Who's Getting Funded?
Who's Getting Funded?
 
L.A.
L.A.L.A.
L.A.
 
Hanshi Ross
Hanshi RossHanshi Ross
Hanshi Ross
 
Curso Tenerife 2010
Curso Tenerife 2010Curso Tenerife 2010
Curso Tenerife 2010
 
Wireless Cyber Warfare
Wireless Cyber WarfareWireless Cyber Warfare
Wireless Cyber Warfare
 
Chapter 7
Chapter 7Chapter 7
Chapter 7
 
Smart Boards
Smart BoardsSmart Boards
Smart Boards
 
Digital Forensic tools - Application Specific
Digital Forensic tools - Application SpecificDigital Forensic tools - Application Specific
Digital Forensic tools - Application Specific
 
Cyber Warfare -
Cyber Warfare -Cyber Warfare -
Cyber Warfare -
 

Similaire à SureChem Pubchem Deposition Preview - ICIC 2012 Conference

Integrating Patents with Research Data
Integrating Patents with Research DataIntegrating Patents with Research Data
Integrating Patents with Research DataChris Southan
 
The Open Patent Chemistry “Big Bang”: Implications, Opportunities and Caveats
The Open Patent Chemistry “Big Bang”: Implications, Opportunities and CaveatsThe Open Patent Chemistry “Big Bang”: Implications, Opportunities and Caveats
The Open Patent Chemistry “Big Bang”: Implications, Opportunities and CaveatsChris Southan
 
Patent chemisty big bang: utilities for SMEs
Patent chemisty big bang: utilities for SMEsPatent chemisty big bang: utilities for SMEs
Patent chemisty big bang: utilities for SMEsChris Southan
 
Causes and consequences of automated extraction of patent-specified virtual d...
Causes and consequences of automated extraction of patent-specified virtual d...Causes and consequences of automated extraction of patent-specified virtual d...
Causes and consequences of automated extraction of patent-specified virtual d...Chris Southan
 
The open patent chemistry “big bang”: Implications, opportunities and caveats
The open patent chemistry “big bang”: Implications, opportunities and caveatsThe open patent chemistry “big bang”: Implications, opportunities and caveats
The open patent chemistry “big bang”: Implications, opportunities and caveatsDr. Haxel Consult
 
Cardiomyocyte Video Talk At 2008 Ddt In Boston
Cardiomyocyte Video Talk At 2008 Ddt In BostonCardiomyocyte Video Talk At 2008 Ddt In Boston
Cardiomyocyte Video Talk At 2008 Ddt In BostonMei Zhang
 
Introduction to Chemoinformatics
Introduction to ChemoinformaticsIntroduction to Chemoinformatics
Introduction to ChemoinformaticsSSA KPI
 
Jack Tuszynski Accelerating Chemotherapy Drug Discovery with Analytics and Hi...
Jack Tuszynski Accelerating Chemotherapy Drug Discovery with Analytics and Hi...Jack Tuszynski Accelerating Chemotherapy Drug Discovery with Analytics and Hi...
Jack Tuszynski Accelerating Chemotherapy Drug Discovery with Analytics and Hi...Kim Solez ,
 
Pros and cons of patent-extracted structures in PubChem
Pros and cons of patent-extracted structures in PubChemPros and cons of patent-extracted structures in PubChem
Pros and cons of patent-extracted structures in PubChemChris Southan
 
Microwave and Radio Frequency Drying
Microwave and Radio Frequency DryingMicrowave and Radio Frequency Drying
Microwave and Radio Frequency DryingGerard B. Hawkins
 
ICIC 2017: Looking at the gift horse: pros and cons of over 20 million patent...
ICIC 2017: Looking at the gift horse: pros and cons of over 20 million patent...ICIC 2017: Looking at the gift horse: pros and cons of over 20 million patent...
ICIC 2017: Looking at the gift horse: pros and cons of over 20 million patent...Dr. Haxel Consult
 
Biotechflow expanded bed columns + streamline 2018
Biotechflow expanded bed columns + streamline 2018Biotechflow expanded bed columns + streamline 2018
Biotechflow expanded bed columns + streamline 2018Martin Hofmann
 
Jcup 3 (2012) Presentation: Lexichem, a new Era. By Ed Cannon
Jcup 3 (2012) Presentation: Lexichem, a new Era.  By Ed Cannon Jcup 3 (2012) Presentation: Lexichem, a new Era.  By Ed Cannon
Jcup 3 (2012) Presentation: Lexichem, a new Era. By Ed Cannon Ed Cannon
 
Chemoinformatics in Action
Chemoinformatics in ActionChemoinformatics in Action
Chemoinformatics in ActionSSA KPI
 
20 million public patent structures: looking at the gift horse
20 million public patent structures: looking at the gift horse20 million public patent structures: looking at the gift horse
20 million public patent structures: looking at the gift horseChris Southan
 
Patent Cheminformatics: Identification of key compounds in patents
Patent Cheminformatics: Identification of key compounds in patentsPatent Cheminformatics: Identification of key compounds in patents
Patent Cheminformatics: Identification of key compounds in patentsSorel Muresan
 
Screening Assays For Gpc Rs 3
Screening Assays For Gpc Rs 3Screening Assays For Gpc Rs 3
Screening Assays For Gpc Rs 3Shirley Pullan
 
Food Effect-Delayed Tmax-pAUC as an additional BA/BE criteria
Food Effect-Delayed Tmax-pAUC as an additional BA/BE criteriaFood Effect-Delayed Tmax-pAUC as an additional BA/BE criteria
Food Effect-Delayed Tmax-pAUC as an additional BA/BE criteriaLoan Pham
 

Similaire à SureChem Pubchem Deposition Preview - ICIC 2012 Conference (20)

Integrating Patents with Research Data
Integrating Patents with Research DataIntegrating Patents with Research Data
Integrating Patents with Research Data
 
The Open Patent Chemistry “Big Bang”: Implications, Opportunities and Caveats
The Open Patent Chemistry “Big Bang”: Implications, Opportunities and CaveatsThe Open Patent Chemistry “Big Bang”: Implications, Opportunities and Caveats
The Open Patent Chemistry “Big Bang”: Implications, Opportunities and Caveats
 
Patent chemisty big bang: utilities for SMEs
Patent chemisty big bang: utilities for SMEsPatent chemisty big bang: utilities for SMEs
Patent chemisty big bang: utilities for SMEs
 
Patents in PubChem
Patents in PubChemPatents in PubChem
Patents in PubChem
 
Causes and consequences of automated extraction of patent-specified virtual d...
Causes and consequences of automated extraction of patent-specified virtual d...Causes and consequences of automated extraction of patent-specified virtual d...
Causes and consequences of automated extraction of patent-specified virtual d...
 
The open patent chemistry “big bang”: Implications, opportunities and caveats
The open patent chemistry “big bang”: Implications, opportunities and caveatsThe open patent chemistry “big bang”: Implications, opportunities and caveats
The open patent chemistry “big bang”: Implications, opportunities and caveats
 
Cardiomyocyte Video Talk At 2008 Ddt In Boston
Cardiomyocyte Video Talk At 2008 Ddt In BostonCardiomyocyte Video Talk At 2008 Ddt In Boston
Cardiomyocyte Video Talk At 2008 Ddt In Boston
 
Introduction to Chemoinformatics
Introduction to ChemoinformaticsIntroduction to Chemoinformatics
Introduction to Chemoinformatics
 
Jack Tuszynski Accelerating Chemotherapy Drug Discovery with Analytics and Hi...
Jack Tuszynski Accelerating Chemotherapy Drug Discovery with Analytics and Hi...Jack Tuszynski Accelerating Chemotherapy Drug Discovery with Analytics and Hi...
Jack Tuszynski Accelerating Chemotherapy Drug Discovery with Analytics and Hi...
 
Pros and cons of patent-extracted structures in PubChem
Pros and cons of patent-extracted structures in PubChemPros and cons of patent-extracted structures in PubChem
Pros and cons of patent-extracted structures in PubChem
 
Microwave and Radio Frequency Drying
Microwave and Radio Frequency DryingMicrowave and Radio Frequency Drying
Microwave and Radio Frequency Drying
 
ICIC 2017: Looking at the gift horse: pros and cons of over 20 million patent...
ICIC 2017: Looking at the gift horse: pros and cons of over 20 million patent...ICIC 2017: Looking at the gift horse: pros and cons of over 20 million patent...
ICIC 2017: Looking at the gift horse: pros and cons of over 20 million patent...
 
Biotechflow expanded bed columns + streamline 2018
Biotechflow expanded bed columns + streamline 2018Biotechflow expanded bed columns + streamline 2018
Biotechflow expanded bed columns + streamline 2018
 
Jcup 3 (2012) Presentation: Lexichem, a new Era. By Ed Cannon
Jcup 3 (2012) Presentation: Lexichem, a new Era.  By Ed Cannon Jcup 3 (2012) Presentation: Lexichem, a new Era.  By Ed Cannon
Jcup 3 (2012) Presentation: Lexichem, a new Era. By Ed Cannon
 
Chemoinformatics in Action
Chemoinformatics in ActionChemoinformatics in Action
Chemoinformatics in Action
 
20 million public patent structures: looking at the gift horse
20 million public patent structures: looking at the gift horse20 million public patent structures: looking at the gift horse
20 million public patent structures: looking at the gift horse
 
Patent Cheminformatics: Identification of key compounds in patents
Patent Cheminformatics: Identification of key compounds in patentsPatent Cheminformatics: Identification of key compounds in patents
Patent Cheminformatics: Identification of key compounds in patents
 
Screening Assays For Gpc Rs 3
Screening Assays For Gpc Rs 3Screening Assays For Gpc Rs 3
Screening Assays For Gpc Rs 3
 
Lawrence Hightower
Lawrence HightowerLawrence Hightower
Lawrence Hightower
 
Food Effect-Delayed Tmax-pAUC as an additional BA/BE criteria
Food Effect-Delayed Tmax-pAUC as an additional BA/BE criteriaFood Effect-Delayed Tmax-pAUC as an additional BA/BE criteria
Food Effect-Delayed Tmax-pAUC as an additional BA/BE criteria
 

SureChem Pubchem Deposition Preview - ICIC 2012 Conference

  • 1. Integrating patent chemistry with public research resources Andrew Hinton, PhD ICIC 2012 Christopher Southan, PhD 17 October Evan Bolton, PhD Nicko Goncharoff
  • 2.
  • 3. SureChem Data Collection Database of automatically mined structure data from text and images •20M annotated US, EP, WO full text records and Japan patent abstracts •12.8M unique chemical structures I •MEDLINE – 19M abstracts (upcoming)
  • 4.  Free resource for researchers  Professional search needs  Enables linking to public and  Data export, alerts, patent family proprietary content search, chemical relevance filters…  API or Data Feed access to chemistry & full text  Integrate with internal databases & workflows
  • 6. Public Patent Chemistry – A Changing Landscape
  • 7. SureChem Depositing All* Structures into PubChem – Q4 2012 •1976 to present •Deposition of structures only •Currently ‘on hold’ •Will link to patents in SureChemOpen * After filtering of fragments and highly common chemistry
  • 8. Compounds Derived from Patents and Literature found in PubChem By Molecular Weight Range (MWT) and Source Compounds Dervied from Patents and Literature found in PubChem Banded by Molecular Weight Range and Source *8.29M 9,000,000 Drug-like 66% 8,000,000 600-700 Compounds in PubChem 500-600 7,000,000 MWT 6,000,000 400-500 3.99M MWT 5,000,000 3.80M Drug-like 60% Drug-like 62% 4,000,000 2.36M 300-400 3,000,000 Drug-like 51% MWT 2,000,000 0.76M 1,000,000 Drug-like 69% 200-300 100-200 0 ChEMBL IBM Thomson SCRIPDB SureChem Pharma *Provisional Numbers Source
  • 9. SureChem Deposition Pushes PubChem to 40 Million Compounds
  • 10. Uniques and Overlaps SC - SCRIPDB SC - IBM 1.5M 1.2M SC - TPharma SC - ChEMBL 0.9M 0.1M
  • 11. ChEMBL overlaps with Patent Sources in PubChem
  • 12. Intersects – Patent Document View (2 Examples – SC & IBM) SureChem Total: 776 IBM Total : 527 US583593, Inhibitors of squalene synthetase and protein farnesyltransferase. Abbott 478 298 229 SureChem Total: 832 IBM Total: 239 686 146 93 WO-1994018188-A1 4-hydroxy-benzopyran-2-ones and 4- hydroxy-cycloalkyl[b]pyran-2-ones HIV protease inhibitors, Upjohn
  • 13. Identifying Relevant Chemistry - IC 50 US-20120035195-A1 BACE2, Hoffman LaRoche
  • 14. Structures with IC 50 Values US-20120035195-A1 PDF SureChemOpen Excel
  • 15. Search IC 50 Structures in PubChem search
  • 16. SureChem Unique Contribution SureChem Pubchem 96 (ThomsonPharma , 79 Chemicalize) Stage No. of Structures Available from SureChem (SC) 1848 Pre-Exist in PubChem 669 Pre-Exist – not from IC 50 table 573 Pre-Exist – from IC 50 table 96 (12 from TP + 84 via chemicalize.org) Unique-SC with IC 50 79 Unique-SC – beyond IC 50 table 1100
  • 17. SureChem Chemical Relevance Filtering • Frequency counts of chemicals within patents • Additional molecular property filtering and structural alerts • Structural identification of “Likely Exemplars” • Natural Language Processing – based indexing of Exemplified Compounds Automated indexing of Exemplified Compounds in text
  • 18. Conclusions SureChem deposition into PubChem: – Significantly expands public patent chemistry scope – Contributes unique and timely MedChem-relevant data – Enables open drug discovery and chemical biology – Advances progress toward a more open, federated chemical information network
  • 19. SureChem is a product from Digital Science

Notes de l'éditeur

  1. Tracing structures associated with SAR data from the PDF to SureChemOpen
  2. Structures have been manually exported from the PDF.
  3. Structures are then searched in PubChem
  4. Results show 96 structures from table were in PubChem. SureChem has all of those, plus an additional 79 structures.