SlideShare une entreprise Scribd logo
1  sur  35
MedChemica
What have we done? What could we do with
Advanced Analytics in the Chemistry
Industry?
Ed Griffen
MedChemica Ltd
MedChemica
Big Data – Focus on Benefits not Features
From the Gartner IT Glossary:
What is Big Data?
Big Data is
high-volume,
high-velocity and/or high-variety information assets
that demand cost-effective,
innovative forms of information processing that
enable
enhanced insight,
decision making,
and process automation.
2
Features
Benefits
MedChemica
Where is Big Data proving most Successful?
• Customer analysis
• Targeted advertising
• Language translation
3
• What do these have in common?
• Underlying theoretical model insufficiently accurate or unknown
• Very, very large data sets
• Straightforward statistical methods
• Most users are unskilled and not interested in mechanics
MedChemica
What are the classes of chemical problem?
4
‘Potency’ Properties Production Patents
• Lead finding
• Potency
improvement
• Pharmacokinetics
• Solubility
• Off target toxicity
• First
successful
route
• ‘Best” route
• Freedom to
Operate
Product Size
Duration of action
Safety margin
Speed
Cost
Commercial
Position
Common to all
• Pharmaceuticals
• Agrochemicals
• Flavors and Fragrances
• Consumer products
• Materials science
• Underlying theoretical model insufficiently
accurate or unknown
• Very, very large data sets
• Straightforward statistical methods
• Most users are unskilled and not interested in
mechanics
MedChemica
‘Big Data’ analysis for Chemistry
Making and testing compounds is expensive!
• No new compounds to make
• No new testing to do
• Exploit the compounds and data you’ve already paid for
• Accelerate all new projects
• Augment the skills and experience of your chemists
• Mythbusting…
All very cost effective
MedChemica
Help the HiPPOs – or they’ll crush you
6
1. McAfee & Brynjolfsson “Big Data: The Management Revolution”,
Harvard Business Review October 2012
“Companies often make most of
their important decisions by
relying on “HiPPO”—the highest-
paid person’s opinion.”1
Chemistry HiPPs:
• experts in pattern recognition
• judged on their ability to make the best decisions with partial data
• highly trained
• time poor
• delivery focused
• gatekeepers to the adoption of new approaches
MedChemica
Making a real textbook of Medicinal Chemistry
MMPA
MMPA
MMPA
Combine
and
Extract
Rules
Multiple Pharma
ADMET data
>437000 rules
Better
Project
decisions
Increased
Medicinal
Chemistry
learning
Kramer, Robb, Ting, Zheng, Griffen, et al: J. Med. Chem 2017
http://pubs.acs.org/doi/10.1021/acs.jmedchem.7b00935
‘Potency’ Properties Production Patents
• Lead finding
• Potency
improvement
• Pharmacokinetics
• Solubility
• Off target toxicity
• First
successful
route
• ‘Best” route
• Freedom to
Operate
MedChemica
Making the complicated simple: HOT-Fit
Learning from the development of clinical decision support software
Algorithms
Technology
Data
Speed
Benefits
Human
System Use
User
Satisfaction
Organization
Structure
Environment
E.Kilsdonk, L.W.Peute, M.W.M.Jaspers, Factors Influencing Implementation Success of Guideline-based Clinical Decision
Support Systems: a systematic review and gaps analysis, International Journal of Medical Informatics
http://dx.doi.org/10.1016/j.ijmedinf.2016.12.001
MedChemica
Chemistry Knowledge extraction methods
Remember: your HiPPO needs to understand
9
substructures Physical chemistry
descriptors(Hansch,
Taft, Fujita, Abraham)
Atomic, pair, triplet
descriptors
Indices
Counts & descriptive
statistics
MMPA
(M)LR Free Wilson
PLS
Trees / Forests
SVM
Bayesian NN
Deep Learning Dark
Black
Descriptors
Method
It’s a
summit –
but what
else is out
there?
MedChemica
• Matched Molecular Pairs –
Molecules that differ only by a
particular, well-defined
structural transformation
Griffen, E. et al. J. Med. Chem. 2011, 54(22), pp.7739-7750.
Advanced MMPA with MCPairs
• Transformation with environment
capture – MMPs can be recorded
as transformations from A B
Δ Data A-
B
1
2
2
3
3
3
4
4
4
12
23
3
34
4
4
A B
Environment is key - must be captured in the chemical encoding
MMPA: Environment really matters
HMe:
• Median Dlog(Solubility)
• 225 different
environments
2.5log
1.5log
HMe:
• Median Dlog(Clint)
Human microsomal
clearance
• 278 different
environments
MedChemica
MedChemica
Matched Molecular Pair methods matter
If you don’t use both you’ll miss 12-56% of the pairs
2 Methods:
Maximum Common SubStructure(MCSS) Fragment and Index(FI)
Warner, Sheridan Hussein & Rea
Strengths:
Ring replacement linker and core swaps
Macrocycle ring pairs
12
EGF D1 Cav3.2
fF+I
fMCSS
0.1
0.9
0.1 0.9 0.1 0.9 0.1 0.9
0.1
0.9
0.1
0.9
Leach et al J.Chem. Inf. Model. 2017 http://dx.doi.org/10.1021/acs.jcim.7b00335
MedChemica
Identify and group matching SMIRKS
Calc ulate statistical parameters for eac h unique
SMIRKS(n, median, sd, se, n_up/ n_down)
Is n ≥ 6?
Not enough data:
ignore transformation
Is the | median| ≤ 0.05 and the
interc entile range (10-90%) ≤ 0.3?
Perform two-tailed binomial test on the
transformation to determine the
signific anc e of the up/ down frequenc y
transformation is
c lassified as ‘neutral’
Transformation c lassified as
‘NED’ (No Effec t Determined)
Transformation c lassified as
‘increase’ or ‘ decrease’
depending on whic h direc tion the
property is c hanging
passfail
yesno
yesno
Rule selection
0 +ve-ve
Median data difference
Neutral IncreaseDecrease
NED
• No assumption of normal
distribution
• Manage ‘censored’ = qualified
/ out-of-range data
MedChemica
Making the complicated simple: HOT-Fit
Algorithms
Technology
Human
Organization
Data
Speed
System Use
User
Satisfaction
Structure
Environment
Benefits
MedChemica
Where to get data?
• Public data is unrepresentative
• Censored by publication bias
• Pharma data – can’t share
structures due to IP.
• Use chemical transformations to
encode knowledge from matched
molecular pair (MMP) analysis 
now sharable
Novartis: Kramer, C.; Kalliokoski,
T. et al The Experimental
Uncertainty of Heterogeneous
Public Ki Data J. Med. Chem
2012, 55, 5165
If project data really looked like
that, there would be no problem
in the Pharma industry.
MedChemica
Data Sources
Roche
Database
AZ
Data
MMP
finder
AZ
Database
MMP
finder
MMP
finder
Roche
Data
Genentech
Data
Grand Rule
Database
Grand Rule
Database
Grand Rule
Database
Grand Rule
Database
AZ
Exploitation
Roche
Exploitation
Genentech
Exploitation
>500 million pairs
MedChemica
Aggregation
Individual
company
firewall
Genentech
Database
0.5 million rules
MedChemica
Merge
Pharma 1 100k rules
Pharma 2 92k rules
Pharma 3 37k rules
5.8k rules in common (pre-merge) ~ 2%
New Rules 88k
~26% of total
Combining data yields brand new rules
Gains: 300 - 900%
Merging knowledge – GRDv1
MedChemica
Knowledge Extracted
Numbers of statistically valid transforms
Grouped Datasets Number of Rules
logD7.4 153449
Merged solubility 46655
In vitro microsomal clearance:
Human, rat, mouse, cyno, dog
88423
In vitro hepatocyte clearance :
Human, rat, mouse, cyno, dog 26627
MCDK permeability A-B / B – A efflux 1852
Cytochrome P450 inhibition:
2C9, 2D6 , 3A4 , 2C19 , 1A2
40605
Cardiac ion channels
NaV 1.5, hERG ion channel inhibition
15636
Glutathione Stability 116
plasma protein or albumin binding
Human, rat, mouse, cyno, dog
64622
Grand Rule
Database
v3
MedChemica
Single company vs merged
Comparison between Roche-only and GRD rules for human
microsomal clearance. Overall R2 is 0.76 and RMSE 0.11.
MedChemica
Chemists use logD as a benchmark:
• Standard to use lipophilicity as a design surrogate
• Provides a context for changes
• Key multi-objective design issues are centered round
conflicting logD correlations:
• Solubility & metabolic stabilitypotency & permeability
• Particularly useful to look at chemical transformations that
‘ break the dogma’ of logD correlation
MedChemica
Solubility : logD – trends & exceptions
>=20 examples per rule, n=13,453
R2 = 0.66, slope = -0.57, intercept = 0.
Magenta line: line of slope -1, intercept 0, dark blue line linear best fit, pale blue density ellipse contains
99% and the mid blue ellipse contains 50% of the transformations.
MedChemica
Exceptional Solubility transformations
Transformation median ΔlogD ±std
(nPairs)
median ΔlogSol ±std
(nPairs)
Comment
0.00 ± 0.67
(91)
0.73 ± 0.72
(87)
DlogD ==
Solubility 
-0.10 ±0.83
(83)
0.65 ± 0.96
(69)
0.07 ± 0.50
(108)
0.52 ± 0.77
(80)
-0.10 ± 0.54
(208)
0.40 ± 0.78
(115)
-0.59 ± 0.49
(82)
0.03 ± 0.72
(98)
DlogD 
Solubility ==
MedChemica
Clearance : logD – trends & exceptions
>=20 examples per rule, n=11,572
R2 = 0.40, slope 0.23, intercept = 0.
Magenta line: line of slope 1, intercept 0, dark blue line linear best fit, pale blue density
ellipse contains 99% and the mid blue ellipse contains 50% of the transformations.
MedChemica
Exceptional HLM transformations
Transformation median ΔlogD ±std
(nPairs)
HLM
median Δlog(Clint) ±std
(nPairs)
Comment
0.35±0.45
(15)
-0.34±0.71
(13)
DlogD 
Clint

0.70±0.74
(117)
-0.32±0.51
(53)
0.73±0.61
(26)
-0.23±0.36
(18)
0.00±0.11
(19)
-0.59±0.38
(14)
DlogD ==
Clint

-0.69±0.42
(8)
0.76±0.59
(7)
DlogD 
Clint 
MedChemica
Making the complicated simple: HOT-Fit
Algorithms
Technology
Human
Organization
Data
Speed
System Use
User
Satisfaction
Structure
Environment
Benefits
MedChemica
MMPA: Engineering challenges
• Quick to implement on a small scale
• Always becomes an n2 problem….
• ‘Challenging’ at enterprise scales 100,000+
- Cheminformatics ‘gotchas’
• Tautomers, charge states
• Unusual aromatic systems
• Highly symmetric molecules
• Capturing and coding environments accurately
- Structure and data integrity
- Assay ontologies
- Database schema optimized for cluster I/O
Speed at scale essential – time poor users
MedChemica
Interface Design depends on the User
27
• > 2 x 1012 searches / year
• Totally unskilled users
• Simple consistent interface
• Rocket scientists
?
Meet your HiPPO where they’re skilled
• Intuitive ( = fast & familiar)
• Summary data + option to drill into the
detail
• Web browsers
• Excel
MedChemica
Exploiting Knowledge for Compound Optimization
Measured
Data
rule
finder
Rule
Database
Compounds
from Rules
Problem molecule
New molecule
suggestions
rule
finder
MCPairs=
“..it’s like asking 150 of your peers
for ideas in just a few seconds” –
AZ Principal Scientist
MedChemica
Exploiting Knowledge for Compound Optimization
https://www.youtube.com/watch?v=nQxXddJDTfc
MedChemica
More examples of Success
30
Thompson; M.J. et al J. Med. Chem., 2015, 58 (23), pp 9309–9333
DOI: 10.1021/acs.jmedchem.5b01312
MedChemica
“Me-Betters” on a Massive scale
Enumerator
System
1162
Marketed
Drugs
Wealth of
Follow-on
opportunities
Grand Rule
Database
v3
Improve solubility & metabolism
= lower dose
= uid from bid/tid
Safer, better compliance
~425 improvement
suggestions / drug
MedChemica
‘Instant’ SAR exploration
https://www.youtube.com/watch?v=_FGSnD6PG3I
MedChemica
• MMP based clustering
• QSAR from MMPA
• Matched molecular series
•Interface design is key
There is so much more…
?
MedChemica
What can we do with Advanced Analytics?
Accelerate Chemistry by using:
• right algorithms that our users understand
• as much data as possible
• fast, “user appropriate” interfaces
deliver better products into development faster.
34
MedChemica
Collaborators and Users - experience

Contenu connexe

Tendances

KO-brochure-online-jan2015-hi
KO-brochure-online-jan2015-hiKO-brochure-online-jan2015-hi
KO-brochure-online-jan2015-hi
Steve Brough
 

Tendances (9)

Griffen MedChemica Virtual Tox Panel
Griffen MedChemica Virtual Tox PanelGriffen MedChemica Virtual Tox Panel
Griffen MedChemica Virtual Tox Panel
 
Best practices in chemical management webinar
Best practices in chemical management webinarBest practices in chemical management webinar
Best practices in chemical management webinar
 
1530 track2 humphrey
1530 track2 humphrey1530 track2 humphrey
1530 track2 humphrey
 
Potency Tester Creates Extra Revenue for Retailer
Potency Tester Creates Extra Revenue for RetailerPotency Tester Creates Extra Revenue for Retailer
Potency Tester Creates Extra Revenue for Retailer
 
Open-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesOpen-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databases
 
RSC Hatfield 2018 Kinase meeting : potency patents MMPA approaches
RSC Hatfield 2018  Kinase meeting : potency patents MMPA approachesRSC Hatfield 2018  Kinase meeting : potency patents MMPA approaches
RSC Hatfield 2018 Kinase meeting : potency patents MMPA approaches
 
KO-brochure-online-jan2015-hi
KO-brochure-online-jan2015-hiKO-brochure-online-jan2015-hi
KO-brochure-online-jan2015-hi
 
What is in your vape?! CannMed 2019 Presentation
What is in your vape?! CannMed 2019 PresentationWhat is in your vape?! CannMed 2019 Presentation
What is in your vape?! CannMed 2019 Presentation
 
Slow is Smooth & Smooth is Fast!
Slow is Smooth & Smooth is Fast!Slow is Smooth & Smooth is Fast!
Slow is Smooth & Smooth is Fast!
 

Similaire à SCI What can Big Data do for Chemistry 2017 MedChemica

[Hongsermeier] clinical decision support services amdis final
[Hongsermeier] clinical decision support services amdis final[Hongsermeier] clinical decision support services amdis final
[Hongsermeier] clinical decision support services amdis final
Trimed Media Group
 
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Databricks
 
Microsoft: A Waking Giant In Healthcare Analytics and Big Data
Microsoft: A Waking Giant In Healthcare Analytics and Big DataMicrosoft: A Waking Giant In Healthcare Analytics and Big Data
Microsoft: A Waking Giant In Healthcare Analytics and Big Data
Health Catalyst
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 

Similaire à SCI What can Big Data do for Chemistry 2017 MedChemica (20)

Explainable AI in Drug Hunting
Explainable AI in Drug HuntingExplainable AI in Drug Hunting
Explainable AI in Drug Hunting
 
MedChemica Active Learning - Combining MMPA and ML
MedChemica Active Learning - Combining MMPA and MLMedChemica Active Learning - Combining MMPA and ML
MedChemica Active Learning - Combining MMPA and ML
 
Transforming Big Data into Big Value
Transforming Big Data into Big ValueTransforming Big Data into Big Value
Transforming Big Data into Big Value
 
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
Maximize Your Understanding of Operational Realities in Manufacturing with Pr...
 
[Hongsermeier] clinical decision support services amdis final
[Hongsermeier] clinical decision support services amdis final[Hongsermeier] clinical decision support services amdis final
[Hongsermeier] clinical decision support services amdis final
 
2015 bioinformatics personal_genomics_wim_vancriekinge
2015 bioinformatics personal_genomics_wim_vancriekinge2015 bioinformatics personal_genomics_wim_vancriekinge
2015 bioinformatics personal_genomics_wim_vancriekinge
 
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
Insights from Building the Future of Drug Discovery with Apache Spark with Lu...
 
Accelerating multiple medicinal chemistry projects using Artificial Intellige...
Accelerating multiple medicinal chemistry projects using Artificial Intellige...Accelerating multiple medicinal chemistry projects using Artificial Intellige...
Accelerating multiple medicinal chemistry projects using Artificial Intellige...
 
KG_based pharma marketing.pptx
KG_based pharma marketing.pptxKG_based pharma marketing.pptx
KG_based pharma marketing.pptx
 
Meaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine researchMeaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine research
 
Parkinson disease classification v2.0
Parkinson disease classification v2.0Parkinson disease classification v2.0
Parkinson disease classification v2.0
 
Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...
Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...
Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...
 
Semantic Technology for Provider-Payer-Pharma Data Collaboration
Semantic Technology for Provider-Payer-Pharma Data CollaborationSemantic Technology for Provider-Payer-Pharma Data Collaboration
Semantic Technology for Provider-Payer-Pharma Data Collaboration
 
Microsoft: A Waking Giant In Healthcare Analytics and Big Data
Microsoft: A Waking Giant In Healthcare Analytics and Big DataMicrosoft: A Waking Giant In Healthcare Analytics and Big Data
Microsoft: A Waking Giant In Healthcare Analytics and Big Data
 
IRJET- Extending Association Rule Summarization Techniques to Assess Risk of ...
IRJET- Extending Association Rule Summarization Techniques to Assess Risk of ...IRJET- Extending Association Rule Summarization Techniques to Assess Risk of ...
IRJET- Extending Association Rule Summarization Techniques to Assess Risk of ...
 
Parkinson disease classification recorded v2.0
Parkinson disease classification recorded   v2.0Parkinson disease classification recorded   v2.0
Parkinson disease classification recorded v2.0
 
Practical Drug Discovery using Explainable Artificial Intelligence
Practical Drug Discovery using Explainable Artificial IntelligencePractical Drug Discovery using Explainable Artificial Intelligence
Practical Drug Discovery using Explainable Artificial Intelligence
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
2020.04.07 automated molecular design and the bradshaw platform webinar
2020.04.07 automated molecular design and the bradshaw platform webinar2020.04.07 automated molecular design and the bradshaw platform webinar
2020.04.07 automated molecular design and the bradshaw platform webinar
 
Healthcare Analytics Adoption Model
Healthcare Analytics Adoption ModelHealthcare Analytics Adoption Model
Healthcare Analytics Adoption Model
 

Dernier

Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 

Dernier (20)

Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 

SCI What can Big Data do for Chemistry 2017 MedChemica

  • 1. MedChemica What have we done? What could we do with Advanced Analytics in the Chemistry Industry? Ed Griffen MedChemica Ltd
  • 2. MedChemica Big Data – Focus on Benefits not Features From the Gartner IT Glossary: What is Big Data? Big Data is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation. 2 Features Benefits
  • 3. MedChemica Where is Big Data proving most Successful? • Customer analysis • Targeted advertising • Language translation 3 • What do these have in common? • Underlying theoretical model insufficiently accurate or unknown • Very, very large data sets • Straightforward statistical methods • Most users are unskilled and not interested in mechanics
  • 4. MedChemica What are the classes of chemical problem? 4 ‘Potency’ Properties Production Patents • Lead finding • Potency improvement • Pharmacokinetics • Solubility • Off target toxicity • First successful route • ‘Best” route • Freedom to Operate Product Size Duration of action Safety margin Speed Cost Commercial Position Common to all • Pharmaceuticals • Agrochemicals • Flavors and Fragrances • Consumer products • Materials science • Underlying theoretical model insufficiently accurate or unknown • Very, very large data sets • Straightforward statistical methods • Most users are unskilled and not interested in mechanics
  • 5. MedChemica ‘Big Data’ analysis for Chemistry Making and testing compounds is expensive! • No new compounds to make • No new testing to do • Exploit the compounds and data you’ve already paid for • Accelerate all new projects • Augment the skills and experience of your chemists • Mythbusting… All very cost effective
  • 6. MedChemica Help the HiPPOs – or they’ll crush you 6 1. McAfee & Brynjolfsson “Big Data: The Management Revolution”, Harvard Business Review October 2012 “Companies often make most of their important decisions by relying on “HiPPO”—the highest- paid person’s opinion.”1 Chemistry HiPPs: • experts in pattern recognition • judged on their ability to make the best decisions with partial data • highly trained • time poor • delivery focused • gatekeepers to the adoption of new approaches
  • 7. MedChemica Making a real textbook of Medicinal Chemistry MMPA MMPA MMPA Combine and Extract Rules Multiple Pharma ADMET data >437000 rules Better Project decisions Increased Medicinal Chemistry learning Kramer, Robb, Ting, Zheng, Griffen, et al: J. Med. Chem 2017 http://pubs.acs.org/doi/10.1021/acs.jmedchem.7b00935 ‘Potency’ Properties Production Patents • Lead finding • Potency improvement • Pharmacokinetics • Solubility • Off target toxicity • First successful route • ‘Best” route • Freedom to Operate
  • 8. MedChemica Making the complicated simple: HOT-Fit Learning from the development of clinical decision support software Algorithms Technology Data Speed Benefits Human System Use User Satisfaction Organization Structure Environment E.Kilsdonk, L.W.Peute, M.W.M.Jaspers, Factors Influencing Implementation Success of Guideline-based Clinical Decision Support Systems: a systematic review and gaps analysis, International Journal of Medical Informatics http://dx.doi.org/10.1016/j.ijmedinf.2016.12.001
  • 9. MedChemica Chemistry Knowledge extraction methods Remember: your HiPPO needs to understand 9 substructures Physical chemistry descriptors(Hansch, Taft, Fujita, Abraham) Atomic, pair, triplet descriptors Indices Counts & descriptive statistics MMPA (M)LR Free Wilson PLS Trees / Forests SVM Bayesian NN Deep Learning Dark Black Descriptors Method It’s a summit – but what else is out there?
  • 10. MedChemica • Matched Molecular Pairs – Molecules that differ only by a particular, well-defined structural transformation Griffen, E. et al. J. Med. Chem. 2011, 54(22), pp.7739-7750. Advanced MMPA with MCPairs • Transformation with environment capture – MMPs can be recorded as transformations from A B Δ Data A- B 1 2 2 3 3 3 4 4 4 12 23 3 34 4 4 A B Environment is key - must be captured in the chemical encoding
  • 11. MMPA: Environment really matters HMe: • Median Dlog(Solubility) • 225 different environments 2.5log 1.5log HMe: • Median Dlog(Clint) Human microsomal clearance • 278 different environments MedChemica
  • 12. MedChemica Matched Molecular Pair methods matter If you don’t use both you’ll miss 12-56% of the pairs 2 Methods: Maximum Common SubStructure(MCSS) Fragment and Index(FI) Warner, Sheridan Hussein & Rea Strengths: Ring replacement linker and core swaps Macrocycle ring pairs 12 EGF D1 Cav3.2 fF+I fMCSS 0.1 0.9 0.1 0.9 0.1 0.9 0.1 0.9 0.1 0.9 0.1 0.9 Leach et al J.Chem. Inf. Model. 2017 http://dx.doi.org/10.1021/acs.jcim.7b00335
  • 13. MedChemica Identify and group matching SMIRKS Calc ulate statistical parameters for eac h unique SMIRKS(n, median, sd, se, n_up/ n_down) Is n ≥ 6? Not enough data: ignore transformation Is the | median| ≤ 0.05 and the interc entile range (10-90%) ≤ 0.3? Perform two-tailed binomial test on the transformation to determine the signific anc e of the up/ down frequenc y transformation is c lassified as ‘neutral’ Transformation c lassified as ‘NED’ (No Effec t Determined) Transformation c lassified as ‘increase’ or ‘ decrease’ depending on whic h direc tion the property is c hanging passfail yesno yesno Rule selection 0 +ve-ve Median data difference Neutral IncreaseDecrease NED • No assumption of normal distribution • Manage ‘censored’ = qualified / out-of-range data
  • 14. MedChemica Making the complicated simple: HOT-Fit Algorithms Technology Human Organization Data Speed System Use User Satisfaction Structure Environment Benefits
  • 15. MedChemica Where to get data? • Public data is unrepresentative • Censored by publication bias • Pharma data – can’t share structures due to IP. • Use chemical transformations to encode knowledge from matched molecular pair (MMP) analysis  now sharable Novartis: Kramer, C.; Kalliokoski, T. et al The Experimental Uncertainty of Heterogeneous Public Ki Data J. Med. Chem 2012, 55, 5165 If project data really looked like that, there would be no problem in the Pharma industry.
  • 16. MedChemica Data Sources Roche Database AZ Data MMP finder AZ Database MMP finder MMP finder Roche Data Genentech Data Grand Rule Database Grand Rule Database Grand Rule Database Grand Rule Database AZ Exploitation Roche Exploitation Genentech Exploitation >500 million pairs MedChemica Aggregation Individual company firewall Genentech Database 0.5 million rules
  • 17. MedChemica Merge Pharma 1 100k rules Pharma 2 92k rules Pharma 3 37k rules 5.8k rules in common (pre-merge) ~ 2% New Rules 88k ~26% of total Combining data yields brand new rules Gains: 300 - 900% Merging knowledge – GRDv1
  • 18. MedChemica Knowledge Extracted Numbers of statistically valid transforms Grouped Datasets Number of Rules logD7.4 153449 Merged solubility 46655 In vitro microsomal clearance: Human, rat, mouse, cyno, dog 88423 In vitro hepatocyte clearance : Human, rat, mouse, cyno, dog 26627 MCDK permeability A-B / B – A efflux 1852 Cytochrome P450 inhibition: 2C9, 2D6 , 3A4 , 2C19 , 1A2 40605 Cardiac ion channels NaV 1.5, hERG ion channel inhibition 15636 Glutathione Stability 116 plasma protein or albumin binding Human, rat, mouse, cyno, dog 64622 Grand Rule Database v3
  • 19. MedChemica Single company vs merged Comparison between Roche-only and GRD rules for human microsomal clearance. Overall R2 is 0.76 and RMSE 0.11.
  • 20. MedChemica Chemists use logD as a benchmark: • Standard to use lipophilicity as a design surrogate • Provides a context for changes • Key multi-objective design issues are centered round conflicting logD correlations: • Solubility & metabolic stabilitypotency & permeability • Particularly useful to look at chemical transformations that ‘ break the dogma’ of logD correlation
  • 21. MedChemica Solubility : logD – trends & exceptions >=20 examples per rule, n=13,453 R2 = 0.66, slope = -0.57, intercept = 0. Magenta line: line of slope -1, intercept 0, dark blue line linear best fit, pale blue density ellipse contains 99% and the mid blue ellipse contains 50% of the transformations.
  • 22. MedChemica Exceptional Solubility transformations Transformation median ΔlogD ±std (nPairs) median ΔlogSol ±std (nPairs) Comment 0.00 ± 0.67 (91) 0.73 ± 0.72 (87) DlogD == Solubility  -0.10 ±0.83 (83) 0.65 ± 0.96 (69) 0.07 ± 0.50 (108) 0.52 ± 0.77 (80) -0.10 ± 0.54 (208) 0.40 ± 0.78 (115) -0.59 ± 0.49 (82) 0.03 ± 0.72 (98) DlogD  Solubility ==
  • 23. MedChemica Clearance : logD – trends & exceptions >=20 examples per rule, n=11,572 R2 = 0.40, slope 0.23, intercept = 0. Magenta line: line of slope 1, intercept 0, dark blue line linear best fit, pale blue density ellipse contains 99% and the mid blue ellipse contains 50% of the transformations.
  • 24. MedChemica Exceptional HLM transformations Transformation median ΔlogD ±std (nPairs) HLM median Δlog(Clint) ±std (nPairs) Comment 0.35±0.45 (15) -0.34±0.71 (13) DlogD  Clint  0.70±0.74 (117) -0.32±0.51 (53) 0.73±0.61 (26) -0.23±0.36 (18) 0.00±0.11 (19) -0.59±0.38 (14) DlogD == Clint  -0.69±0.42 (8) 0.76±0.59 (7) DlogD  Clint 
  • 25. MedChemica Making the complicated simple: HOT-Fit Algorithms Technology Human Organization Data Speed System Use User Satisfaction Structure Environment Benefits
  • 26. MedChemica MMPA: Engineering challenges • Quick to implement on a small scale • Always becomes an n2 problem…. • ‘Challenging’ at enterprise scales 100,000+ - Cheminformatics ‘gotchas’ • Tautomers, charge states • Unusual aromatic systems • Highly symmetric molecules • Capturing and coding environments accurately - Structure and data integrity - Assay ontologies - Database schema optimized for cluster I/O Speed at scale essential – time poor users
  • 27. MedChemica Interface Design depends on the User 27 • > 2 x 1012 searches / year • Totally unskilled users • Simple consistent interface • Rocket scientists ? Meet your HiPPO where they’re skilled • Intuitive ( = fast & familiar) • Summary data + option to drill into the detail • Web browsers • Excel
  • 28. MedChemica Exploiting Knowledge for Compound Optimization Measured Data rule finder Rule Database Compounds from Rules Problem molecule New molecule suggestions rule finder MCPairs= “..it’s like asking 150 of your peers for ideas in just a few seconds” – AZ Principal Scientist
  • 29. MedChemica Exploiting Knowledge for Compound Optimization https://www.youtube.com/watch?v=nQxXddJDTfc
  • 30. MedChemica More examples of Success 30 Thompson; M.J. et al J. Med. Chem., 2015, 58 (23), pp 9309–9333 DOI: 10.1021/acs.jmedchem.5b01312
  • 31. MedChemica “Me-Betters” on a Massive scale Enumerator System 1162 Marketed Drugs Wealth of Follow-on opportunities Grand Rule Database v3 Improve solubility & metabolism = lower dose = uid from bid/tid Safer, better compliance ~425 improvement suggestions / drug
  • 33. MedChemica • MMP based clustering • QSAR from MMPA • Matched molecular series •Interface design is key There is so much more… ?
  • 34. MedChemica What can we do with Advanced Analytics? Accelerate Chemistry by using: • right algorithms that our users understand • as much data as possible • fast, “user appropriate” interfaces deliver better products into development faster. 34

Notes de l'éditeur

  1. Lot’s of people come forward with ideas to ‘revolutionise drug discovery’, but being more data driven is surprisingly cheap compared to most of them. Eg ‘new modalities’ like therapeutic RNAs or chimeric antigen receptors, r even large ring macrocycles.
  2. We may be at the summit but who can tell? And what is around us? Alternatively we may want to have a completely clear view and potential cliffs and valleys, but by the time you get there, so much has been published that compounds are probabaly in the clinic if not to market – but of course there may still be opportunities