SlideShare une entreprise Scribd logo
1  sur  22
Which Drug Did You Mean?
Resolving the linkage spaghetti between
semantic names, structures, bioactivity
                          and mixtures
                         Christopher Southan
                     ChrisDS Consulting, Göteborg,
                               Sweden,

                     Prepared for BioIT, Boston, April
                        2012, Track 14, Tuesday

                                  See also
                    http://cdsouthan.blogspot.se/2012/
                    06/will-real-bosinhib-please-stand-
                                up-take.html


                                                          [1]
History of Drug Names




                                    Approximate timelines

[cpd registration system structure and ID------------------------------------------------------------]
           [patent IUPAC or image--------------------------------------------------------------------]
                    [internal code name(s) externally blinded-------]
                                     [code name(s) > structure declared externally -----]
                 [journal papers -----------------------------------------------------------------------]
                                                   [International Non-proprietary name INN]
                                                          [INN indexed in MeSH-----------------]
                                                           [USAN, BAN, JAN --------------------]
                                                                  [brand name(s)-------------------]
                                                                              [combination brand ]
                                                                                                      [2]
History of Atorvastatin




•   1985: (3R,5R)-7-[2-(4-fluorophenyl)-3-phenyl-4-(phenylcarbamoyl)-5-(propan-2-yl)-1H-
    pyrrol-1-yl]-3,5-dihydroxyheptanoic acid IUPAC
•   ~ 1987: Park-Davis internal code number CI-981
•   ~ 1995: Atorvastatin [INN:BAN] Atorvastatin calcium [USAN], Atorvastatin calcium
    trihydrate INN (error ?) Atorvastatina (Spain)
•   1997 Lipitor (brand name) Faboxim (Argentina) Zurinel (Chile) etc
•   2004: Caduet (brand name) Norvasc (amlodipine besylate) and Lipitor(atorvastatin
    calcium)
•   2012: atorvastatin calcium – generic - Ranbaxy
•   2012: amlodipine besylate and atorvastatin calcium – generic - Ranbaxy


                                                                                           [3]
Causes of Drug Linkage Spaghetti (I)

•   Tautomer/stereo mutiplexing and structure interconversion differences (e.g.
    complex antibiotics)

•   Popular structures > 100s of submitters > many vendors > more noise

•   Opaque ecosystem of primary submitters, secondary linkers, declared circularity,
    cryptic circularity, and submitters having independent portals with different rules

•   Older drugs accumulate 100’s of synonyms and database x-refs, with erros

•   Accumulated wet assay results are dependent on how long the drug has been in
    which public screening collection

•   Deprecated structures not always refreshed between databases globally

•   Pro-drugs, metabolites or tested combinations rarely have explicit x-refs


                                                                                     [4]
Causes of Drug Linkage Spaghetti (II)
•   Literature extractions flowing into drug databases (including MeSH) can have
     – Author errors and paucity of standards in the primary report
     – No quality filtration at the result level
     – Curation errors and different annotation rules
     – No discrimination of independent de-novo checking from annotation recycling

•   Large-scale patent extraction feeds into databases bring in
     – Forests of analogues with no data links
     – High redundency for drugs and leads
     – Structural differences between pipeline outputs
     – Opportunistic permutations of salts and mixtures
     – Opportunistic virtual deuteration of all best-selling drugs

•   Drug discovery operations use many drugs as reference compounds in their
    internal screening collections . This means
      – Name > structure cross-mapping, internal, public and commercial
      – Integration of internal and external data across the same drugs

                                                                                [5]
Atorvastatin
• The scale of links provides a good cross section of problems

• Relationship cross-mappings and the PubChem tool-box
  facilitate navigation through the links

• External submissons get a substance ID (SID) which are
  merged to compound records (CID) vi chemistry rules (see
  PubChem documentation)

• This drug has accumulated years of submissions from different
  sources, BioAssay entries and pharmacology literature links

• The parent CID 60823 has
   –   99 synonyms
   –   6 stero forms
   –   70 cannonicaly-related structures
   –   449 substance records
                                                                  [6]
What is Atorvastatin ? - for Patients




                                        [7]
Atorvastatin - for Informaticians

PubChem CID 60823

                               PubChem submissions include:
Wikepedia
                               (3R,5R) CID 60823
                               (5R)     CID 51052072
ChemSpider 54810               (3R)     CID 21029434
                               (3S,5R) CID 6093359
                               (3S,5S) CID 62976
DrugBank APRD00055             No stereo CID 2250

                               Query: Same, Isotopes for
CHEMBL1487                     PubChem Compound (Select
                               60823)

CAS 134523-00-5




                                                              [8]
Name Retrieval Specificity (I)




                                 [9]
Name Retrieval Specificity (II)




”atorvastin” in DailyMed link not synonyms




                                              [10]
Drug BioAssay Data: Splitting by
Submitted Structure Differences
               Mainly uHTS and counterscreens
               from Scripps & Burnham



               AIDs 406848-53 in ChEMBL –
               (antimalarial assay specified salt)




                ChEMBL Antimalarial strain assays
                (also specified salt), in vivo plus
                three target links


                Mainly qHTS from NCGC, no hits


                                                      [11]
Pharmacological Activity in vivo is ~70% Active
         Metabolites i.e. not Atorvastatin
Hazardous Substances Data
Bank x-ref in the CID, but no
direct links to the metabolites
(yet). Only one in-vitro assay    CID 9851106
result for 9808225




 CID 60823




                                   CID 9808225
                                                 [12]
Salt Confusion (I) Atorvastatin Calcium
                                     FDA packege
CID 656846 Mw 1209                   insert lable,
CAS 344423-98-9                      hemicalcium
                                     trihydrate




CID 60822 Mw 1155
CAS 134523-03-8




                           INN = atorvastatin
                           USAN/BAN = atorvastatin
CID 11227182 Mw 598        calcium




                                                     [13]
Salt Confusion (II): What gets to Patients

CID 656846




CID 53252956




CID 23665101




 No INNs, USANs or clinical trials entries for these salts

                                                             [14]
Mixtures: Problematic all Round
•   Atorvastatin parent (CID 60823) has 379 mixture SIDs and 147 mixture CIDs
    permuatated from 122 component CIDs
•   Of the 122 components 58 have a MeSH pharmacology tag, 92 have
    BioAssays results, 70 are in DrugBank, 101 are in ChEMBL, and 47 are below
    200 mw (and thus probably salts not drugs)
•   Of the 147 mixture CIDs, only the 2 atorvastatin dimers have assay results or
    pharmacology so none of the drug mixtures have direct data links
•   None are in DrugBank CIDs and only atorvastin calcium is in ChEMBL
•   138 of the 147 have been extracted from patents by Derwent/Thomson and
    are unlikely to get data links
•   The small number of important drug combinations that do have data and/or
    trial results are difficult to identify
•   Tested drug mixtures rarely get public code names, some get trade names but
    never INNs
•   Chemistry rules may split mixtures and synonyms in databases
•   PubMed "Drug Combinations"[MeSH Term] = 54,186 but no SID or CID links
•   Mixture components can be designated with space, / , + or ”co”

                                                                                    [15]
The Famous Polypill: A Fuzzy term




                                                 CID 44602839 Thomson Pharma

                                                 18 clinicaltrials.gov entries, but
                                                 only partial component links




aspirin 81 mg, enalapril 2.5 mg, atorvastatin 20 mg and hydrochlorothiazide 12.5 mg
(polypill) PMID: 21647425: Australian New Zealand Clinical Trials Registry
ACTRN12607000099426

DrugBank and TTD negative

                                                                                      [16]
Caduet: an Approved Combination
Drugbank                                         Wikipedia




http://clinicaltrials.gov/ct2/show/NCT01107743




                                                             [17]
Submitter Synonym Noise in PubChem




                                     [18]
A more Recent Combination




     But, QA149 is negative in PubChem, DrugBank and TTD




                                                       [19]
Spaghetti is Resolvable but Errors are Tough:
     Will the Real LX4211 Please Stand up ?
 http://cenblog.org/the-haystack/2012/03/liveblogging-first-time-disclosures-from-acssandiego/




See also:     http://cdsouthan.blogspot.se/2012/03/live-chemical-structure-blogging-but.html
                                                                                                 [20]
Summary
•   You can navigate the linkage spaghetti in name, synonym, structure
    bioactivity and mixture space, but this needs perspicacity and
    circumspection.
•   The current drug information ecosystem with multiple stakeholders seems
    destined to remain ”fuzzy”
•   Beyond informatics challenges the consequences, particularly from frank
    errors, could be more serious
•   WHO INNs and naming stems play a key positive role – but ;
     –   No open athoritative database - only 7000 PDF entries (!)
     –   No transparent coordination between USAN, FDA, MeSH, national offices, or
         clinical trials registries
     –   Susceptable to commercial flanking tactics
•   Drug combinations have a bright pharmacological future but a difficult
    informatics one
•   The fuzz includes scientific challenges (e.g. complex strucutures,
    dynamic tautomerism, active metabolites, formulation differences,
    paucity of standardised and comparable activity data.
•   Efforts are being made to improve the situation, including from the
    databases represented in this Workshop session.
                                                                                     [21]
Questions Welcome
ChrisDS Consulting: http://www.cdsouthan.info/Consult/CDS_cons.htm
Mobile: +46(0)702-530710, Skype: cdsouthan
Email: cdsouthan@hotmail.com
Twitter: http://twitter.com/#!/cdsouthan
Blog: http://cdsouthan.blogspot.com/
LinkedIN: http://www.linkedin.com/in/cdsouthan
Website: http://www.cdsouthan.info/CDS_prof.htm
Publications: http://www.citeulike.org/user/cdsouthan/publications/order/year
Citations: http://scholar.google.com/citations?user=y1DsHJ8AAAAJ&hl=en
Presentations: http://www.slideshare.net/cdsouthan


FYI : A short piece on identifying the names and molecular details of
drugs in clinicaltrials.gov

http://www.samedanltd.com/magazine/13/issue/166/article/3152



                                                                                [22]

Contenu connexe

Tendances

Design, Synthesis, and Characterization of New 1,3,5-Trisubstituted-2-pyrazol...
Design, Synthesis, and Characterization of New 1,3,5-Trisubstituted-2-pyrazol...Design, Synthesis, and Characterization of New 1,3,5-Trisubstituted-2-pyrazol...
Design, Synthesis, and Characterization of New 1,3,5-Trisubstituted-2-pyrazol...
BRNSS Publication Hub
 
Withania somnifera Roots-PPT
Withania somnifera Roots-PPTWithania somnifera Roots-PPT
Withania somnifera Roots-PPT
Ruchi Saharan
 
Stephen 205 (1)
Stephen 205 (1)Stephen 205 (1)
Stephen 205 (1)
farsiya
 

Tendances (20)

Design, Synthesis, and Characterization of New 1,3,5-Trisubstituted-2-pyrazol...
Design, Synthesis, and Characterization of New 1,3,5-Trisubstituted-2-pyrazol...Design, Synthesis, and Characterization of New 1,3,5-Trisubstituted-2-pyrazol...
Design, Synthesis, and Characterization of New 1,3,5-Trisubstituted-2-pyrazol...
 
CorMedix (AMEX: CRMD; Stock Twits: $CRMD) April 2011
CorMedix (AMEX: CRMD; Stock Twits: $CRMD) April 2011CorMedix (AMEX: CRMD; Stock Twits: $CRMD) April 2011
CorMedix (AMEX: CRMD; Stock Twits: $CRMD) April 2011
 
PPT-2 MODIFIED(1)
PPT-2 MODIFIED(1)PPT-2 MODIFIED(1)
PPT-2 MODIFIED(1)
 
IAJPR LAVANYA
IAJPR LAVANYAIAJPR LAVANYA
IAJPR LAVANYA
 
Withania somnifera Roots-PPT
Withania somnifera Roots-PPTWithania somnifera Roots-PPT
Withania somnifera Roots-PPT
 
Ang herb price list
Ang herb price listAng herb price list
Ang herb price list
 
Novel Herbal Drug Delivery Systems: Prospects and Perspectives
Novel Herbal Drug Delivery Systems: Prospects and PerspectivesNovel Herbal Drug Delivery Systems: Prospects and Perspectives
Novel Herbal Drug Delivery Systems: Prospects and Perspectives
 
Paper study (2)
Paper study (2)Paper study (2)
Paper study (2)
 
Polyherbal formulation development for anti asthmatic activity
Polyherbal formulation development for anti asthmatic activityPolyherbal formulation development for anti asthmatic activity
Polyherbal formulation development for anti asthmatic activity
 
Suppository
SuppositorySuppository
Suppository
 
Development and Validation of Simultaneous Equation Estimation Method For Ham...
Development and Validation of Simultaneous Equation Estimation Method For Ham...Development and Validation of Simultaneous Equation Estimation Method For Ham...
Development and Validation of Simultaneous Equation Estimation Method For Ham...
 
Impurity Profile
Impurity ProfileImpurity Profile
Impurity Profile
 
Tiêu chuẩn GMP WHO cho thuốc đông dược
Tiêu chuẩn GMP WHO cho thuốc đông dượcTiêu chuẩn GMP WHO cho thuốc đông dược
Tiêu chuẩn GMP WHO cho thuốc đông dược
 
Drug degradation impurity in excipients
Drug degradation impurity in excipients Drug degradation impurity in excipients
Drug degradation impurity in excipients
 
KFDA Policy on Food Safety Control for Imported Foods
KFDA Policy on Food Safety Control for Imported FoodsKFDA Policy on Food Safety Control for Imported Foods
KFDA Policy on Food Safety Control for Imported Foods
 
Intranasal delivery of drug loaded thiolated co-polymeric microparticles for...
Intranasal delivery of drug loaded thiolated co-polymeric microparticles for...Intranasal delivery of drug loaded thiolated co-polymeric microparticles for...
Intranasal delivery of drug loaded thiolated co-polymeric microparticles for...
 
REMINGTON'S JOURNAL CLUB PRESENTATION
REMINGTON'S JOURNAL CLUB PRESENTATIONREMINGTON'S JOURNAL CLUB PRESENTATION
REMINGTON'S JOURNAL CLUB PRESENTATION
 
Ashwagandha withania somnifera
Ashwagandha withania somniferaAshwagandha withania somnifera
Ashwagandha withania somnifera
 
Stephen 205 (1)
Stephen 205 (1)Stephen 205 (1)
Stephen 205 (1)
 
Licensed Establishments In Human Tissue Sector March 2010
Licensed Establishments In Human Tissue Sector March 2010Licensed Establishments In Human Tissue Sector March 2010
Licensed Establishments In Human Tissue Sector March 2010
 

Similaire à Which Drug Did You Mean ?

Sorting bioactive wheat from database chaff
Sorting bioactive wheat from database chaffSorting bioactive wheat from database chaff
Sorting bioactive wheat from database chaff
Chris Southan
 

Similaire à Which Drug Did You Mean ? (20)

Correct drug structures for pharmacology
Correct drug structures for pharmacologyCorrect drug structures for pharmacology
Correct drug structures for pharmacology
 
GtoPdb teaching slides
GtoPdb teaching slidesGtoPdb teaching slides
GtoPdb teaching slides
 
Digging out Structures for Repurposing: Non-competitive Intelligence ...
Digging out Structures for Repurposing: Non-competitive Intelligence        ...Digging out Structures for Repurposing: Non-competitive Intelligence        ...
Digging out Structures for Repurposing: Non-competitive Intelligence ...
 
CMC Logistics
CMC LogisticsCMC Logistics
CMC Logistics
 
Peptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdbPeptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdb
 
FDA container closure system & drug stability saurav anand 23 iip
FDA container closure system & drug stability saurav anand 23 iipFDA container closure system & drug stability saurav anand 23 iip
FDA container closure system & drug stability saurav anand 23 iip
 
Avoiding Pitfalls in the Regulatory Path - MaRS Best Practices
Avoiding Pitfalls in the Regulatory Path - MaRS Best PracticesAvoiding Pitfalls in the Regulatory Path - MaRS Best Practices
Avoiding Pitfalls in the Regulatory Path - MaRS Best Practices
 
Curatorial data wrangling for the Guide to PHARMACOLGY
Curatorial data wrangling for the Guide to PHARMACOLGY Curatorial data wrangling for the Guide to PHARMACOLGY
Curatorial data wrangling for the Guide to PHARMACOLGY
 
Regulatory affairs cmc , post approval regulatory affairs
Regulatory affairs   cmc , post approval regulatory affairsRegulatory affairs   cmc , post approval regulatory affairs
Regulatory affairs cmc , post approval regulatory affairs
 
PubChem as a source of systems biology perturbagens
PubChem as a source of  systems biology perturbagensPubChem as a source of  systems biology perturbagens
PubChem as a source of systems biology perturbagens
 
Dorota Matecka_Clin Investig Course 2016_Final.pptx
Dorota Matecka_Clin Investig Course 2016_Final.pptxDorota Matecka_Clin Investig Course 2016_Final.pptx
Dorota Matecka_Clin Investig Course 2016_Final.pptx
 
Drug discovery process.
Drug discovery process.Drug discovery process.
Drug discovery process.
 
Unc slides on computational toxicology
Unc slides on computational toxicologyUnc slides on computational toxicology
Unc slides on computational toxicology
 
ANDA
ANDAANDA
ANDA
 
CDSCO Biologicals - Rules, Regulations, Guidelines and Standards for Regulato...
CDSCO Biologicals - Rules, Regulations, Guidelines and Standards for Regulato...CDSCO Biologicals - Rules, Regulations, Guidelines and Standards for Regulato...
CDSCO Biologicals - Rules, Regulations, Guidelines and Standards for Regulato...
 
Drug Administration GoAP - Shri BL Meena
Drug Administration GoAP - Shri BL MeenaDrug Administration GoAP - Shri BL Meena
Drug Administration GoAP - Shri BL Meena
 
CMC, post approval regulatory affairs, etc
CMC, post approval regulatory affairs, etcCMC, post approval regulatory affairs, etc
CMC, post approval regulatory affairs, etc
 
ETORICOXIB AND PREGABALIN OF METHOD DEVLOPMENT IN RPHPLC BY UPEXA BAVADIYA
ETORICOXIB AND PREGABALIN OF  METHOD DEVLOPMENT IN RPHPLC BY UPEXA BAVADIYAETORICOXIB AND PREGABALIN OF  METHOD DEVLOPMENT IN RPHPLC BY UPEXA BAVADIYA
ETORICOXIB AND PREGABALIN OF METHOD DEVLOPMENT IN RPHPLC BY UPEXA BAVADIYA
 
Radiopharmaceuticals from a regulatory perspective
Radiopharmaceuticals from a regulatory perspectiveRadiopharmaceuticals from a regulatory perspective
Radiopharmaceuticals from a regulatory perspective
 
Sorting bioactive wheat from database chaff
Sorting bioactive wheat from database chaffSorting bioactive wheat from database chaff
Sorting bioactive wheat from database chaff
 

Plus de Chris Southan

Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2 Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2
Chris Southan
 
In silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug DevelopmentIn silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug Development
Chris Southan
 

Plus de Chris Southan (20)

FAIR connectivity for DARCP
FAIR  connectivity for DARCPFAIR  connectivity for DARCP
FAIR connectivity for DARCP
 
Connectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivityConnectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivity
 
Peptide tribulations
Peptide tribulationsPeptide tribulations
Peptide tribulations
 
Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2 Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2
 
Guide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updaeGuide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updae
 
In silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug DevelopmentIn silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug Development
 
Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?
 
Desperately seeking DARCP
Desperately seeking DARCPDesperately seeking DARCP
Desperately seeking DARCP
 
Seeking glimmers of light in Pharos “Tdark” proteins
Seeking glimmers of light in  Pharos “Tdark” proteinsSeeking glimmers of light in  Pharos “Tdark” proteins
Seeking glimmers of light in Pharos “Tdark” proteins
 
5HT2A modulators update for SAFER
5HT2A modulators update for SAFER5HT2A modulators update for SAFER
5HT2A modulators update for SAFER
 
Quality and noise in big chemistry databases
Quality and noise in big chemistry databasesQuality and noise in big chemistry databases
Quality and noise in big chemistry databases
 
Connecting chemistry-to-biology
Connecting chemistry-to-biology Connecting chemistry-to-biology
Connecting chemistry-to-biology
 
GtoPdb June 2019 poster
GtoPdb June 2019 posterGtoPdb June 2019 poster
GtoPdb June 2019 poster
 
PubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biologyPubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biology
 
Will the real proteins please stand up
Will the real proteins please stand upWill the real proteins please stand up
Will the real proteins please stand up
 
Peptide Tribulations
Peptide TribulationsPeptide Tribulations
Peptide Tribulations
 
Looking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIRLooking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIR
 
Guide to Immunopharmacology update
Guide to Immunopharmacology updateGuide to Immunopharmacology update
Guide to Immunopharmacology update
 
Druggable Proteome sources in UniProt
Druggable Proteome sources in UniProtDruggable Proteome sources in UniProt
Druggable Proteome sources in UniProt
 
Patents in PubChem
Patents in PubChemPatents in PubChem
Patents in PubChem
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Dernier (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Which Drug Did You Mean ?

  • 1. Which Drug Did You Mean? Resolving the linkage spaghetti between semantic names, structures, bioactivity and mixtures Christopher Southan ChrisDS Consulting, Göteborg, Sweden, Prepared for BioIT, Boston, April 2012, Track 14, Tuesday See also http://cdsouthan.blogspot.se/2012/ 06/will-real-bosinhib-please-stand- up-take.html [1]
  • 2. History of Drug Names Approximate timelines [cpd registration system structure and ID------------------------------------------------------------] [patent IUPAC or image--------------------------------------------------------------------] [internal code name(s) externally blinded-------] [code name(s) > structure declared externally -----] [journal papers -----------------------------------------------------------------------] [International Non-proprietary name INN] [INN indexed in MeSH-----------------] [USAN, BAN, JAN --------------------] [brand name(s)-------------------] [combination brand ] [2]
  • 3. History of Atorvastatin • 1985: (3R,5R)-7-[2-(4-fluorophenyl)-3-phenyl-4-(phenylcarbamoyl)-5-(propan-2-yl)-1H- pyrrol-1-yl]-3,5-dihydroxyheptanoic acid IUPAC • ~ 1987: Park-Davis internal code number CI-981 • ~ 1995: Atorvastatin [INN:BAN] Atorvastatin calcium [USAN], Atorvastatin calcium trihydrate INN (error ?) Atorvastatina (Spain) • 1997 Lipitor (brand name) Faboxim (Argentina) Zurinel (Chile) etc • 2004: Caduet (brand name) Norvasc (amlodipine besylate) and Lipitor(atorvastatin calcium) • 2012: atorvastatin calcium – generic - Ranbaxy • 2012: amlodipine besylate and atorvastatin calcium – generic - Ranbaxy [3]
  • 4. Causes of Drug Linkage Spaghetti (I) • Tautomer/stereo mutiplexing and structure interconversion differences (e.g. complex antibiotics) • Popular structures > 100s of submitters > many vendors > more noise • Opaque ecosystem of primary submitters, secondary linkers, declared circularity, cryptic circularity, and submitters having independent portals with different rules • Older drugs accumulate 100’s of synonyms and database x-refs, with erros • Accumulated wet assay results are dependent on how long the drug has been in which public screening collection • Deprecated structures not always refreshed between databases globally • Pro-drugs, metabolites or tested combinations rarely have explicit x-refs [4]
  • 5. Causes of Drug Linkage Spaghetti (II) • Literature extractions flowing into drug databases (including MeSH) can have – Author errors and paucity of standards in the primary report – No quality filtration at the result level – Curation errors and different annotation rules – No discrimination of independent de-novo checking from annotation recycling • Large-scale patent extraction feeds into databases bring in – Forests of analogues with no data links – High redundency for drugs and leads – Structural differences between pipeline outputs – Opportunistic permutations of salts and mixtures – Opportunistic virtual deuteration of all best-selling drugs • Drug discovery operations use many drugs as reference compounds in their internal screening collections . This means – Name > structure cross-mapping, internal, public and commercial – Integration of internal and external data across the same drugs [5]
  • 6. Atorvastatin • The scale of links provides a good cross section of problems • Relationship cross-mappings and the PubChem tool-box facilitate navigation through the links • External submissons get a substance ID (SID) which are merged to compound records (CID) vi chemistry rules (see PubChem documentation) • This drug has accumulated years of submissions from different sources, BioAssay entries and pharmacology literature links • The parent CID 60823 has – 99 synonyms – 6 stero forms – 70 cannonicaly-related structures – 449 substance records [6]
  • 7. What is Atorvastatin ? - for Patients [7]
  • 8. Atorvastatin - for Informaticians PubChem CID 60823 PubChem submissions include: Wikepedia (3R,5R) CID 60823 (5R) CID 51052072 ChemSpider 54810 (3R) CID 21029434 (3S,5R) CID 6093359 (3S,5S) CID 62976 DrugBank APRD00055 No stereo CID 2250 Query: Same, Isotopes for CHEMBL1487 PubChem Compound (Select 60823) CAS 134523-00-5 [8]
  • 10. Name Retrieval Specificity (II) ”atorvastin” in DailyMed link not synonyms [10]
  • 11. Drug BioAssay Data: Splitting by Submitted Structure Differences Mainly uHTS and counterscreens from Scripps & Burnham AIDs 406848-53 in ChEMBL – (antimalarial assay specified salt) ChEMBL Antimalarial strain assays (also specified salt), in vivo plus three target links Mainly qHTS from NCGC, no hits [11]
  • 12. Pharmacological Activity in vivo is ~70% Active Metabolites i.e. not Atorvastatin Hazardous Substances Data Bank x-ref in the CID, but no direct links to the metabolites (yet). Only one in-vitro assay CID 9851106 result for 9808225 CID 60823 CID 9808225 [12]
  • 13. Salt Confusion (I) Atorvastatin Calcium FDA packege CID 656846 Mw 1209 insert lable, CAS 344423-98-9 hemicalcium trihydrate CID 60822 Mw 1155 CAS 134523-03-8 INN = atorvastatin USAN/BAN = atorvastatin CID 11227182 Mw 598 calcium [13]
  • 14. Salt Confusion (II): What gets to Patients CID 656846 CID 53252956 CID 23665101 No INNs, USANs or clinical trials entries for these salts [14]
  • 15. Mixtures: Problematic all Round • Atorvastatin parent (CID 60823) has 379 mixture SIDs and 147 mixture CIDs permuatated from 122 component CIDs • Of the 122 components 58 have a MeSH pharmacology tag, 92 have BioAssays results, 70 are in DrugBank, 101 are in ChEMBL, and 47 are below 200 mw (and thus probably salts not drugs) • Of the 147 mixture CIDs, only the 2 atorvastatin dimers have assay results or pharmacology so none of the drug mixtures have direct data links • None are in DrugBank CIDs and only atorvastin calcium is in ChEMBL • 138 of the 147 have been extracted from patents by Derwent/Thomson and are unlikely to get data links • The small number of important drug combinations that do have data and/or trial results are difficult to identify • Tested drug mixtures rarely get public code names, some get trade names but never INNs • Chemistry rules may split mixtures and synonyms in databases • PubMed "Drug Combinations"[MeSH Term] = 54,186 but no SID or CID links • Mixture components can be designated with space, / , + or ”co” [15]
  • 16. The Famous Polypill: A Fuzzy term CID 44602839 Thomson Pharma 18 clinicaltrials.gov entries, but only partial component links aspirin 81 mg, enalapril 2.5 mg, atorvastatin 20 mg and hydrochlorothiazide 12.5 mg (polypill) PMID: 21647425: Australian New Zealand Clinical Trials Registry ACTRN12607000099426 DrugBank and TTD negative [16]
  • 17. Caduet: an Approved Combination Drugbank Wikipedia http://clinicaltrials.gov/ct2/show/NCT01107743 [17]
  • 18. Submitter Synonym Noise in PubChem [18]
  • 19. A more Recent Combination But, QA149 is negative in PubChem, DrugBank and TTD [19]
  • 20. Spaghetti is Resolvable but Errors are Tough: Will the Real LX4211 Please Stand up ? http://cenblog.org/the-haystack/2012/03/liveblogging-first-time-disclosures-from-acssandiego/ See also: http://cdsouthan.blogspot.se/2012/03/live-chemical-structure-blogging-but.html [20]
  • 21. Summary • You can navigate the linkage spaghetti in name, synonym, structure bioactivity and mixture space, but this needs perspicacity and circumspection. • The current drug information ecosystem with multiple stakeholders seems destined to remain ”fuzzy” • Beyond informatics challenges the consequences, particularly from frank errors, could be more serious • WHO INNs and naming stems play a key positive role – but ; – No open athoritative database - only 7000 PDF entries (!) – No transparent coordination between USAN, FDA, MeSH, national offices, or clinical trials registries – Susceptable to commercial flanking tactics • Drug combinations have a bright pharmacological future but a difficult informatics one • The fuzz includes scientific challenges (e.g. complex strucutures, dynamic tautomerism, active metabolites, formulation differences, paucity of standardised and comparable activity data. • Efforts are being made to improve the situation, including from the databases represented in this Workshop session. [21]
  • 22. Questions Welcome ChrisDS Consulting: http://www.cdsouthan.info/Consult/CDS_cons.htm Mobile: +46(0)702-530710, Skype: cdsouthan Email: cdsouthan@hotmail.com Twitter: http://twitter.com/#!/cdsouthan Blog: http://cdsouthan.blogspot.com/ LinkedIN: http://www.linkedin.com/in/cdsouthan Website: http://www.cdsouthan.info/CDS_prof.htm Publications: http://www.citeulike.org/user/cdsouthan/publications/order/year Citations: http://scholar.google.com/citations?user=y1DsHJ8AAAAJ&hl=en Presentations: http://www.slideshare.net/cdsouthan FYI : A short piece on identifying the names and molecular details of drugs in clinicaltrials.gov http://www.samedanltd.com/magazine/13/issue/166/article/3152 [22]