Report on Linked Structured Product Labels (LinkedSPLs) and a study evaluating three different approaches to mapping active ingredients coded in Structured Product Labels to DrugBank.
Extending the "Web of Drug Identity" with knowledge extracted from United States product labels
1. Extending the “Web of Drug
Identity” with Knowledge
Extracted from United States
Product Labels
Oktie Hassanzadeh, IBM Research
Qian Zhu, Mayo Clinic
Robert Freimuth, Mayo Clinic
Richard Boyce*, University of Pittsburgh
1 Biomedical Informatics
Department of Biomedical Informatics
2. Take home message
• Drug product labeling is a vital, unique, and
under-utilized source of claims and evidence
about drugs
– genes, diseases, drugs, drug interactions, special
populations, and adverse reactions
• All American product labeling content is
available in an accessible format
– Structured Product Labeling (SPL)
• LinkedSPLs is a Linked Data version of SPLs
– simplifies access to SPL content
– interoperable with other important drug
terminologies
2 Biomedical Informatics
3. Drug product labeling is special?
• It complements existing knowledge sources
– 40% of 44 pharmacokinetic drug-drug
interactions affecting 25 drugs were located
exclusively in product labeling [1]
– 24% of clinical efficacy trials for 90 drugs were
discussed in the product label but not the
scientific literature [2]
– 1/5th of the evidence for metabolic pathways for
16 drugs and 19 metabolites was found in
product labeling but not the scientific literature
[3]
1. Boyce RD, Collins C, Clayton M, Kloke J, Horn JR. Inhibitory metabolic drug interactions with newer psycho-tropic drugs: inclusion in package inserts and
influences of concurrence in drug interaction screening software. Ann Pharmacother. 2012;46(10):1287–1298.
2. Lee K, Bacchetti P, Sim I. Publication of Clinical Trials Supporting Successful New Drug Applications: A Literature Analysis. PLoS Med. 2008;5(9):e191.
3. Boyce R, Collins C, Horn J, Kalet I. Computing with evidence: Part I: A drug-mechanism evidence taxonomy oriented toward confidence assignment. Journal of
Biomedical Informatics. 2009;42(6):979–989.
3 Biomedical Informatics
4. Why product labeling has information
that is not in the scientific literature
1. Product labels contain a summary of
information reported in detail in a
drug’s New Drug Application
– Often difficult/impossible for a
researcher to access
1. Until recently, there was no
requirement to publish pre-market
drug study results
– This has changed since ~2010
4 Biomedical Informatics
5. Product labeling is under-utilized
by translational researchers
• only two out of more than 2,300
MEDLINE abstracts discuss product
label NLP [1]
• Several recent informatics projects
did not explicitly include product label
information [2-6]
1. Query done on 11/26: (Natural Language Processing [MeSH Terms] OR Natural Language Processing [Text Word]) AND ((Drug Labeling [MeSH Terms] OR drug
labeling[Text Word]) OR (Product Labeling, Drug [MeSH Terms]) OR ("product labeling" [Text Word]))
2. Segura-Bedmar I, Martinez P, Sanchez-Cisneros D eds. Proceedings of the First Challenge Task: Drug-Drug Interaction Extraction 2011. Huelva, Spain; 2011.
Available at: http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-761/. Accessed December 9, 2011.
3. 16. SEMEVAL. Task Description - Extraction of Drug-Drug Interactions from BioMedical Texts. 2012. Available at: http://www.cs.york.ac.uk/semeval-2013/task9/.
Accessed November 20, 2012.
4. Percha B, Garten Y, Altman RB. Discovery and explanation of drug-drug interactions via text mining. Pac Symp Biocomput. 2012:410–421.
5. Tari L, Anwar S, Liang S, Cai J, Baral C. Discovering drug-drug interactions: a text-mining and reasoning approach based on properties of drug metabolism.
Bioinformatics. 2010;26(18):i547–553.
6. Duke JD, Han X, Wang Z, et al. Literature based drug interaction prediction with clinical assessment using electronic medical records: novel myopathy associated
drug interactions. PLoS computational biology. 2012;8(8):e1002614.
5 Biomedical Informatics
6. Doesn’t DrugBank handle this?
• Not really!
– DrugBank includes product label content from
the Physicians’ Desk Reference (PDR) [1]
– However, the PDR is actually a subset of
available product label content
• claims and evidence unique to those drug product
labels not included in the PDR will be missing from
DrugBank
• potential negative effects on informatics experiments
that that require complete drug information.
• E.g., possibly missed drug-interactions (DrugBank 3.0)
include cimetidine-sertraline, cimetidine-venlafaxine,
http://dailymed.nlm.nih.gov/dailymed/lookup.cfm?setid=b1de3ed9-1cb8-e419-3f25-5b0aeed5779a. Accessed November 27, 2012. [2-
cimetidine-citalopram, and venlafaxine-haloperidol.
1. Physicians’ Desk Reference, 66th Edition. 2012 Edition. PDR Network; 2011.
2.
3. http://dailymed.nlm.nih.gov/dailymed/lookup.cfm?setid=cf2d9bee-f8e3-477a-e4b4-f0e82657b7d2. Accessed November 27, 2012.
4. 5]
http://dailymed.nlm.nih.gov/dailymed/lookup.cfm?setid=4259d9b1-de34-43a4-85a8-41dd214e9177. Accessed November 27, 2012.
5. http://dailymed.nlm.nih.gov/dailymed/lookup.cfm?setid=53c3e7ac-1852-4d70-d2b6-4fca819acf26. Accessed November 27, 2012.
6 Biomedical Informatics
7. Second take home point:
• All American product labeling content
is available in an accessible format
– Structured Product Labeling (SPL)
7 Biomedical Informatics
8. Structured Product Labels (SPLs)
• What you would see if you downloaded an
SPL from DailyMed
1. http://www.fda.gov/OHRMS/DOCKETS/98fr/FDA-2005-N-0464-gdl.pdf
2. http://www.fda.gov/ForIndustry/DataStandards/StructuredProductLabeling/default.htm
3. http://dailymed.nlm.nih.gov/dailymed/downloadLabels.cfm
8 Biomedical Informatics
11. Third take home point
• LinkedSPLs is a Linked Data version of
SPLs
– simplifies access to SPL content
– interoperable with other important drug
terminologies
11 Biomedical Informatics
12. LinkedSPLs – hypothesis
Hypothesis: A Linked Data knowledge base of
drug product labels with accurate links to other
relevant sources of drug information will provide a
dynamic platform for drug information NLP that
provides real value to translational researchers
12 Biomedical Informatics
13. LinkedSPLs – A research program
13 Biomedical Informatics
14. LinkedSPLs – A research program
Your annotations
would go here!
14 Biomedical Informatics
15. LinkedSPLs – Method
• Currently we are focusing on
linking active ingredients in the
structured portion of SPLs
• unstructured text for future
work
15 Biomedical Informatics
16. Linkage to external sources
• There are many sources of drug information
that are complementary to each other.
– DrugBank: contains drug targets, pathways,
interactions
– RxNorm: provides UMLS mappings
– ChEBI: provides rigorous classification of drugs
16 Biomedical Informatics
17. Example
prodName rxNormProduct epcClass contraindications
Nefazodone rxcui:1098666 SEROTONIN CONTRAINDICATIONS
Hydrochloride REUPTAKE Coadministration of
INHIBITOR terfenadine, astemizole,
cisapride, pimozide, or
carbamazepine with
nefazodone hydrochloride
is contraindicated….
17 Biomedical Informatics
18. What we tested
• Three different linking approaches to link
to DrugBank
1. Structure string (InChI)
2. Ontology label matching (ChEBI)
3. Unsupervised linkage point discovery
(Automated) [1]
1. O. Hassanzadeh et al. “Discovering Linkage Points over Web Data”. To Appear in PVLDB, Vol
6. Issue 6, August 2013
18 Biomedical Informatics
19. Linkage to DrugBank – Results
• 1,246 active ingredients could be mapped to
DrugBank by at least one method
• 1,096 unmapped ingredients
• The three approaches complement each other
InChI ChEBI InChI + Automatic
identifier identifier ChEBI
InChI identifier 424 261 424 395
ChEBI identifier --- 707 707 650
InChI + ChEBI -- -- 831 791
Automatic -- -- -- 1162
19 Biomedical Informatics
20. Conclusions
• The automatic approach performs very well
– A greater number of accurate links discovered
with less effort
• A significant number remain unmapped:
– Some salt or racemic forms of mapped ingredients
(e.g., alpha tocopherol acetate D)
– Elements (e.g., gold, iodine), and variety of natural
organic compounds including pollens (N~200)
• Not all ingredients are included in DrugBank
– other resources may be required to obtain
complete mappings for active ingredients.
20 Biomedical Informatics
21. Want more information?
• LinkedSPLs
– http://purl.org/LinkedSPLs
• Google code project
– code.google.com/p/swat-4-med-safety/
• Publications
– Hassanzadeh, O., Zhu, Qian., Freimuth, RR., Boyce R. Extending the
“Web of Drug Identity” with Knowledge Extracted from United States
Product Labels. Proceedings of the 2013 AMIA Summit on Translational
Bioinformatics. San Francisco, March 2013.
– Boyce, RD., Freimuth, RR., Romagnoli, KM., Pummer, T., Hochheiser,
H., Empey, PE. Toward semantic modeling of pharmacogenomic
knowledge for clinical and translational decision support. Proceedings
of the 2013 AMIA Summit on Translational Bioinformatics. San
Francisco, March 2013.
– Boyce RD, Horn JR, Hassanzadeh O, de Waard A, Schneider J, Luciano
JS, Rastegar-Mojarad M, Liakata M. Dynamic enhancement of drug
product labels to support drug safety, efficacy, and effectiveness. J
Biomed Semantics. 2013 Jan 26;4(1):5. PMID: 23351881.
21 Biomedical Informatics
22. Acknowledgements
• NIH/NIGMS (U19 GM61388; the
Pharmacogenomic Research Network)
• Agency for Healthcare Research and
Quality (K12HS019461).
22 Biomedical Informatics
24. Linkage in LinkedSPLs
An active ingredient from an SPL
Active ingredient resource in Linked SPLs
dailymed:activeMoiety
SPL resource “OLANZAPINE”
dailymed:activeMoietyUNII
“N7U69T4SZR”
24 Biomedical Informatics
25. Linkage to DrugBank – Approach 1
Starting with UNII….
“N7U69T4SZR” Idea: Using NCI Resolver & InChIKey
1. FDA UNII table provides structure string:
2-METHYL-4-(4-METHYL-1-PIPERAZINYL)-10H-THIENO(2,3-B)(1,5)BENZODIAZEPINE
2. NCI Resolver provides InChIKey:
KVWDHTXUZHCGIO-UHFFFAOYSA-N
3. DrugBank record with the above InChIKey provides
identifier: DB00334
Results:
429 out of 2,264 ingredients are linked, out of which 424 are
valid
25 Biomedical Informatics
26. Linkage to DrugBank – Approach 2
Starting with name….
“OLANZAPINE” Idea: Using ChEBI identifier & NCBO Portal
1. ChEBI preferred name from NCBO Bioportal:
“OLANZAPINE”
2. ChEBI identifier from NCBO Bioportal:
7735
3. DrugBank record with the above ChEBI identifier provides
identifier: DB00334
Results:
718 out of 2,264 ingredients are linked, out of which 707 are
valid
26 Biomedical Informatics
27. Linkage to DrugBank – Approach 3
Starting with all data in the FDA UNII table and DrugBank….
Preferred Substance Name
Molecular Formula
“OLANZAPINE” Idea:
“2-METHYL-4….”
Automatic discovery of
UNII
“N7U69T4SZR”
synonym
linkage points
“ZYPREXA”
1. Index all FDA UNII table and DrugBank XML attributes
2. Search for linkage points and score similarity:
UNII -> Substance Name DrugBank -> brands -> brand: 0.94
UNII -> Preferred Substance Name DrugBank -> name : 0.91
UNII -> Substance Name DrugBank -> synonyms -> synonym : 0.83
…
3. Prune list of linkage points based on cardinality, coverage, and average score
4. Establish links between FDA UNII table and DrugBank using the linkage points
UNII “OLANZAPINE” DrugBank “Zyprexa” : 1.0
…
Results: 1,179 out of 2,264 ingredients are linked, out of which 1,169 are valid
27 Biomedical Informatics
28. Linkage Point Discovery Framework
• A generic framework for unsupervised discovery
of linkage points
Details can be found at:
O. Hassanzadeh et al. “Discovering Linkage Points over Web Data”. To Appear in
PVLDB, Vol 6. Issue 6, August 2013
28 Biomedical Informatics
Editor's Notes
Discuss the shortcomings of Structured Product Labels published by FDA
Introduce LinkedSPLs and discuss its goals
Discuss why we need linkage to external resources This can be using an example use case that relies on existence of links and so LinkedSPLs makes it possible (if not shown already in the discussion of the shortcomings of existing SPLs) Examples from paper: For example, RxNorm provides normalized names for the drug products and Unified Medical Language System mappings from the drug product and its active ingredients to concepts in numerous other vocabularies. DrugBank contains information on the specific biochemical targets that a drug entity may influence, major enzymatic pathways, and potential drug-drug interactions. While information on the latter two items may be present in the SPLs, it is hidden in the unstructured text. Similarly, ChEBI provides a rigorous classification of drug entities using a formal ontology maintained by members of the OBO. Both resources provide links to other important drug taxonomies (such as the ATC system) as well as resources that provide further information on the genes that encode drug targets, metabolism and transport of the drug, and diseases that the drug may help treat.