All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
Predicting Adverse Drug Reactions Using PubChem Screening Data
1. It’s Back: Predicting Adverse
Drug Reactions Using PubChem
Screening Data
TITLE
Yannick Pouliot
(with significant contributions from Annie Chiang)
8/31/2010
2. Motivation
Short-term: Determine feasibility of
predicting specific classes of adverse
drug reactions (ADRs) using machine
learning and compound screening data
Long-term: Use collection of simple
screens to assess likelihood of tissuespecific ADRs
3. Understanding “BioAssay” Notion
• Usually, BioAssay = collection of activity
measurements for compounds screened against
a specific target in a cell type at one or more
concentrations
• However, scope of BioAssay DB goes beyond
compound screening:
▫ Cell-free assays
▫ In vivo assays
4. What’s a SOC?
• SOC = System of Organ Classes
• A SOC groups “… adverse reaction Preferred
Terms pertaining to the same system-organ”.
• Example: SOC C0236104 - “Resistance
Mechanism Disorders”
5. Knowns
• Drugs frequently exhibit a higher frequency of
tissue-specific ADRs beyond generic liver and
kidney damage.
• Pubchem Bioassays DB offers a large number of
assays involving a significant number of protein
targets
6. Hypothesis
H1: Drugs with increased frequency of SOCspecific ADRs can be identified from patterns of
reactivity in PubChem BioAssay screens.
Ho: Reactivity patterns in PubChem BioAssay do
not distinguish drugs with increased frequency
of tissue-specific ADRs .
7. Data Features
• For a given SOC, matrix of
▫ PRR
▫ drug CUI
▫ BioAssay ID (“AID”)
• Sparse matrix: most compounds have been
screened in a few assays only
▫ limited overlap between CVAR and BioAssay
• Very large data sets (more later)
10. Selected Statistic:
Proportional Risk Ratio (PRR)
Event of
interest
Other events
Drug of
interest
A
Other drugs
C
D
B
• PRR = OBS/EXP = [A / (A+C)] / [B / (B+D)]
• Serious ADR Threshold PRR≥2, w/at least
3 cases reported
12. Addressing Zero ADR
• Many drugs do not have a SOC-specific PRR
▫ Unclear if this means they are unusually safe
(could be due to e.g. low prescription volume)
▫ Approach: Assign SOC-specific PRR = 0 if at least
10 ADR reports exist overall
14. Properties of CVAR drug ingredients
Ingredients with drug reports in CVAR
Ingredients with drug reports in CVAR WITH `health_product_role` = 'suspect' and `reaction_type` = 'Adverse
Reaction'
Number
2,901
2,746
Ingredients with drug reports in CVAR with `health_product_role` = 'suspect' and `reaction_type` = 'Adverse
Reaction' AND whoart_soc_cui is not null
2,731
Ingredients with drug reports in CVAR with `health_product_role` = 'suspect' and `reaction_type` = 'Adverse
Reaction' and whoart_soc_cui is not null AND total_number_reports >= 10
1,550
Ingredients with drug reports in CVAR with `health_product_role` = 'suspect' and `reaction_type` = 'Adverse
Reaction' and whoart_soc_cui is not null and total_number_reports >= 10 AND present in PUBCHEM_BIOASSAY
485
15. BioAssay Subset Properties
Assays and Drugs in PubChem BioAssay with SOC-identified CVAR drug
ingredients and ADR reports >=10
AssayType
confirmatory
in vivo_screening
other
screening
NumberOfAssays
545
81
93
466
6
summary
Total:
NumberCVARCmpds
664
341
790
629
202
1,191
2,626
16. Mapping Results
All SIDs
CVAR drug ingredients mapped to SIDs
Number
913,742
7,913
CVAR drug ingredients with SOC-identified ADRs mapped to SIDs
4,382
CVAR drug ingredients with SOC-identified ADRs and >= 10 reports mapped to SIDs
3,136
23. Indications For Drugs Correlated with Model For SOC
C0236104 (“Resistance Mechanism Disorders”)
Antineoplastic Agents
Anti-Bacterial Agents
Anti-inflammatory Agents
Anticholesteremic Agents
Anti-Inflammatory
Agents, Non-Steroidal
Anti-Allergic Agents
Analgesics
Anti-Dyskinesia Agents
24. Lessons Learned
• Limitation of relational databases sans partitioning
▫ Queries won’t return if >50M rows
• Sneaky MySQL loader
▫ Can fail to load records w/o reporting error
▫ Problem when on can’t easily verify expected number of records
from XML files
▫ Solution: Write your own loader (can include data validation)
• BMIR cluster has serious NFS problems
▫ Couldn’t run more than a few parsing jobs at same time
• … and my favorite: The dreaded NCBI surprise!
25. The Case Of The Missing Atorvastatin
… and no
• Problem: Why were some statins missing from my
synonyms!
dataset?
▫ E.g.: Atorvastatin
• Answer: It is present, but there is no way to identify it as such
• Example from AID 881 Atorvastatin SID = 29215408
27. Acknowledgements
• Alex and Chirag, for contributing secret R
knowledge
• Atul, for being helpfully skeptical and patient
• Alex S for quickly addressing DB issues
• NCBI, for providing DBs and messing up my life
28. Need To Standardize And Normalize Assay Activity Metrics
Types of activity metrics (substr 1-12)
% Cell Viabi
% cellular A
% CPE Inhibi
% Inhibition
%Activity at
%displacemen
%Efficacy at
%Inhibition
%Response of
Activity at
AF_20uM
AreaNm
AreaoftheNuc
Ave %Efficac
Ave %Inhibit
AverageInteg
AverageInten
AverageSpots
Baseline-Act
Cell-Activit
CellCount
CellsNucInte
Donor-Activi
Fed-Activity
FP-Activity
F_Ratio
GFP-Activity
Mean High
Mean Low
Mean_NC
Mean_PC
MPIPiCm
MPIPiNm
MS % Inhibit
NucleiNucAre
NumberofCell
Parental-Act
PercentagePo
PiNmbyPiCm
Primary % In
Rate-Activit
Ratio-Activi
RatioofSpoti
RFP-Activity
STD Deviatio
Std.Err(Repe
StdDev_NC
StdDev_PC
TIINiNM
TotalCytopla
TotalIntegra
TotalSpotInt
Total_fluore
TSHR-Activit
W460-Activit
W530-Activit
ZScore
ZScore at 10
ZScore at 20
Notes de l'éditeur
SELECTa.PRR,a.NumbCasesCompA,a.active_ingredient_name,a.whoart_socFROM v_m_prr_suspect_overall1 awherea.active_ingredient_name like '%statin%'order by active_ingredient_name, a.PRR desc