The document discusses how AI and machine learning can help address challenges in healthcare by analyzing complex medical data. It provides examples of how AI can help with tasks like analyzing medical images to assist radiologists, predicting drug response from scans, and using electronic health records to better understand diseases and patient heterogeneity. The document also acknowledges challenges like the need for large labeled datasets and ensuring interpretability and avoidance of bias.
2. Disclosure
• Does not reflect official AZ thought or projects
• No conflicts of interest
2
3. About me
• Have been a:
• At
• Oncology R&D RWE / ML&AI @AZ
• Data Science Institute @ICL
• Centre for Infection @HPA(UK)
• Universities, industry, government …
3
health informatician, data scientist, bioinformatician, database
administrator, epi-informaticist, software dev, data manager,
consultant, molecular geneticist, evolutionary scientist,
biochemist, immunologist, programmer …
5. Healthcare, health,
disease, human biology
are vast and
complicated
5
The human body
1 million different types of molecules
About 50 trillion cells
Of about 200 different types
Each cell has 23 pairs of chromosomes
These make up 6.4 billion basepairs (positions)
Organised into about 18,000 genes
(Or maybe more like 40,000 genes)
Genetic material elsewhere in the cell
6. Disease is an
interaction of
multiple biological
compartments, age,
lifestyle, history,
exposure,
environment
previous treatment
and chance
17 November 2020Name6
7. The data is
complicated &
diverse
7
Labs, genomics,
clinical exams,
images, physical
measurements,
chemical, health
records, other
‘omics,
observations,
medications …
17 November 2020Name
8. What are our healthcare problems?
17 November 2020Name8
Gathering information
More and better data,
monitoring patients, new
molecular technologies,
imaging, devices,
integration of different
modalities, EHR records
Understanding disease
What is a disease,
pathophysiological
mechanisms, biomarkers,
patient subtypes
Developing
interventions
Finding possible targets,
candidate molecules,
running trials, analysing
trials
Delivering healthcare
Diagnosing patients,
predicting outcomes,
targeted therapy, resource
allocation & optimization
10. Messy data
But what is AI / Machine Learning / Data Science?
10
Clear
assumptions
Explicit
models … No model
Statistical modelling Machine Learning / AI
…
a continuum of approaches
Few
assumptions
Other than things we talk about a lot …
Clean &
controlled data
Trained from
data
11. 17 November 2020Name11
• Complex multi-modal data
• Often poor idea of underlying
mechanism or model
• Messy problems with messy data
• Lots of available data (caveat)
• Many healthcare questions are classical
data questions (classify, optimize,
predict)
• Healthcare should be data-driven
• Great success in other complex domains
ML/AI is
well suited for
healthcare &
therapy
development
12. But what are the pitfalls?
12
Need more (labelled) data
And healthcare data needs
to be handled carefully
May require specialised
computation & skills
Some problems difficult to
adapt to ML
Bias & interpretability
– data never lies, but
what is it telling us?
14. Radiology & imaging widely used in healthcare
14
• X-rays, CT, MRI, PET, sonograms …
• But interpretation is laborious
• Scope for human error
– 71% of detected lung cancers were
retrospectively found on previous scans
– 5-9% disagreement between experts
– 23% when no clinical information
supplied
• Not enough radiologists
• Not enough time
https://www.rsna.org/en/news/2019/
May/uk-radiology-shortage
15. Ai is good at recognising things in images
15
• Lots of prior art
• Lots of data to train models
from
• “AI radiologist”
– would be more consistent
– faster
– could double-check or
triage
• But there’s more …
16. Baseline scan Sequential scans
• Can we define novel efficacy endpoints? i.e. identify quantitative changes in the image that predict overall
survival more robustly than conventional endpoints (e.g. RECIST)
Radiomic analysis of medical images
Specific scientific questions to address:
• Can we predict response to specific drugs from the baseline scan? i.e. duration of PFS or OS
• Can we get insight into toxicity? i.e. improved prediction, diagnosis or understanding of AEs such as ILD
• Can the scans provide other insights? e.g. tumour genetics, e.g. therapy resistance, e.g. POM biomarkers?
• Can we effectively combine radiomic insights with other clinical data in order to accelerate and
improve patient stratification algorithms?
Radiomics is the science of extracting quantitative
features from medical images to measure shape,
intensity, density, texture, etc. The analysis of these
‘radiomic features’ can reveal disease characteristics
that are not readily appreciated by the naked eye.
17. AI for PD-L1 scoring in Urothelial Carcinoma
Deep learning can automatically score PD-L1 expression in Tumour cells and
Immune cells
Slide stained for PD-L1 expression Cells that were automatically detected using AI
18. • It costs ~ $1-2B and 10 years to
develop & launch a drug
• Each patient in a clinical trial costs
$1-10K
• The “valley of death”: most
candidate drugs will fail
• Post-approval adds to the costs
• Eroom’s Law
18
The tough maths of drug development
ePharmacology.hubpages.com
19. AstraZeneca generates and has access to more data than ever before.
Target ID
Target
Validation
Discovery Pre-Clinical Clinical Commercial
Post
Marketing
Surveillance
Genetic &
Genomic Data
Patient-Centric
Data
Sensors &
Smart Devices
Interactive
Media
Healthcare Information
network
Market
Data
20. “AI will not replace
drug hunters, but drug
hunters who don’t use
AI will be replaced by
those who do.”
-Andrew Hopkins, CEO Exscientia
17Name20
21. AI for drug candidate selection & prioritization
21
https://www.biopharma-excellence.com/news/2019/6/30/artificial-intelligence-a-revolution-in-
biopharmaceutical-development
22. • Similar patient presentation can
mask vastly different molecular
machinery
• Even within a “homogenous”
condition, patients will have
different outcomes
• What are the treatment effects for
individual patients?
Understanding these leads to:
• More effective trials
• More effective treatment
• Insights on pathophysiology
22
Patients are heterogenous
Heterogeneity in lesion change in colorectal cancer
Nikodemiou et al. (2020)
23. AI enabled mining of electronic health records to better
understand diseases
COPD T2D
▪ Transform patients into sequences of diagnosis
codes
▪ Look for over-represented temporal pairs of codes
▪ Collapse pairs into trajectories of diagnoses
▪ Combine similar trajectories with graph similarity
Brunak et al. Nature Coms. 2016
Topology based Patient-Patient network, identify
distinct subtypes of T2D
Dudley et al. Sci. transl. Med, 2015
24. Data driven KOL identification and site selection
24
Network Analysis Federated EHRs
Real Time I/E analysis of Trial protocol
Patient referral network of
oncologists & surgeons
treating NSCLC based on
claims data.
Color represents physician
grouping.
Size of bubble represents
physician PageRank.
• Claims data is used to
map physician networks
based on patient
referrals
• Network analytics such
as PageRank algorithm
are used to determine
which physicians are
most important in the
network
• Network connections are
used to map existing
relationships between
oncologists & surgeons
25. Building a external control arm from Real World Data
25
Patients with unmet
medical need
Single-arm trial
Inclusion /
exclusion criteria
Matched patients on standard of
care can be compared to new
treatment
Access to New Medicine
Patients from historical
trials / RWE data
Inclusion /
exclusion criteria
Apply Propensity Score Matching
Matching requires Deep data
not just Big Data
26. A lot of knowledge is
associative or
relational – FOAF
Knowledge graphs
can help us capture
and explore these
17 November 2020Name26
27. A lot of healthcare
surrounds logistics,
supply & demand
AI can solve this
17 November 2020Name27
https://www.digitalcommerce360.com/2019/09/06/use-artificial-
intelligence-to-transform-the-hospital-supply-chain/
29. Therapy development costs continue to increase
• Eroom’s Law: cost of
developing new drug roughly
doubles every nine years
• Acceleration of biomedical
research not reflected in drug
development
• Recent uptick in approvals
does not reflect decreasing
costs
29
Pharmacelera (2014)
30. How do we know what a system is doing?
30
• Interpretability is non-negotiable
– Biased data can give rise to biased
models
– And a model may not be doing what
we think it is
– AI models can only be built for data
that you have
– Validation is critical
• And they need a lot of data
31. • Labelled data is the new oil
• Unfortunately
• Data coverage is sparse
• Data is weird
• And also WEIRD
• Diverse data (more and unexploited information)
• Governance & privacy issues
• More data from:
• Real-time and intimate integration with EHRs
• Devices
• Federated networks
• Collaborate with national centres, long-term funding &
broad collaborations
31
Where does the data come from?
Reddy (2020)
32. • Data Science & AI have the potential to transform the way we identify and develop medicines
• Life Science companies have made large investments in building DS&AI capabilities
• If you are driven by science and passioned about improving lives then, I’d strongly recommend you seek
an opportunity in R&D (AstraZeneca maybe …)
Example jobs at AstraZeneca – please visit our careers website
• Principal Data Scientist - https://careers.astrazeneca.com/job/gaithersburg/principal-data-scientist/7684/14833674
• Associate Director Imaging & AI - Imaging & Data Analytics - https://careers.astrazeneca.com/job/gothenburg/associate-
director-imaging-and-ai-imaging-and-data-analytics/7684/14469379
• Data Sciences & AI Graduate Programme – UK - https://careers.astrazeneca.com/data-sciences-and-ai-graduate-
programme
32
Final thoughts
33. Confidentiality Notice
This file is private and may contain confidential and proprietary information. If you have received this file in error, please notify us and remove
it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorized use or disclosure of the
contents of this file is not permitted and may be unlawful. AstraZeneca PLC, 1 Francis Crick Avenue, Cambridge Biomedical Campus,
Cambridge, CB2 0AA, UK, T: +44(0)203 749 5000, www.astrazeneca.com
33