SlideShare a Scribd company logo
1 of 38
Translational Data Science at
Merck
Chris L. Waller, Ph.D.
Executive Director and Head, Scientific
Modeling Platforms…
Forward-Looking Statement
This presentation includes “forward-looking statements” within the meaning of the safe harbor provisions of the United
States Private Securities Litigation Reform Act of 1995. Such statements may include, but are not limited to, statements
about the benefits of the merger between Merck and Schering-Plough, including future financial and operating results, the
combined company’s plans, objectives, expectations and intentions and other statements that are not historical facts.
Such statements are based upon the current beliefs and expectations of Merck’s management and are subject to
significant risks and uncertainties. Actual results may differ from those set forth in the forward-looking statements.
The following factors, among others, could cause actual results to differ from those set forth in the forward-looking
statements: the possibility that all of the expected synergies from the merger of Merck and Schering-Plough will not be
realized, or will not be realized within the expected time period; the impact of pharmaceutical industry regulation and
health care legislation in the United States and internationally; Merck’s ability to accurately predict future market
conditions; dependence on the effectiveness of Merck’s patents and other protections for innovative products; and the
exposure to litigation and/or regulatory actions.
Merck undertakes no obligation to publicly update any forward-looking statement, whether as a result of new information,
future events or otherwise. Additional factors that could cause results to differ materially from those described in the
forward-looking statements can be found in Merck’s 2011 Annual Report on Form 10-K and the company’s other filings
with the Securities and Exchange Commission (SEC) available at the SEC’s Internet site (www.sec.gov).
Outline
• Merck & Co. (MSD) Introduction
• Function and Form: R&D (Merck Research Labs) and R&D IT (MRL
IT)
• Translational Data Science, Informatics, and Analytics: Vision and
Technology
• Real World Evidence: Opportunities to Use Outcomes to Influence
Research and Development
• Discussion
But first, the news…
Cost to Develop and Win Marketing Approval
for a New Drug Is Increasing!
BOSTON – Nov. 18, 2014 – Developing a new prescription medicine that gains marketing approval, a process often lasting longer than a decade, is estimated to cost $2,558 million, according to a new study
by the Tufts Center for the Study of Drug Development.
The $2,558 million figure per approved compound is based on estimated:
Average out-of-pocket cost of $1,395 million
Time costs (expected returns that investors forego while a drug is in development) of $1,163 million
Estimated average cost of post-approval R&D—studies to test new indications, new formulations, new dosage strengths and regimens, and to monitor safety and long-term side effects in patients required by
the U.S. Food and Drug Administration as a condition of approval—of $312 million boosts the full product lifecycle cost per approved drug to $2,870 million. All figures are expressed in 2013 dollars.
The new analysis, which updates similar Tufts CSDD analyses, was developed from information provided by 10 pharmaceutical companies on 106 randomly selected drugs that were first tested in human
subjects anywhere in the world from 1995 to 2007.
“Drug development remains a costly undertaking despite ongoing efforts across the full spectrum of pharmaceutical and biotech companies to rein in growing R&D costs,” said Joseph A. DiMasi, director of
economic analysis at Tufts CSDD and principal investigator for the study.
He added, “Because the R&D process is marked by substantial technical risks, with expenditures incurred for many development projects that fail to result in a marketed product, our estimate links the costs of
unsuccessful projects to those that are successful in obtaining marketing approval from regulatory authorities.”
In a study published in 2003, Tufts CSDD estimated the cost per approved new drug to be $802 million (in 2000 dollars) for drugs first tested in human subjects from 1983 to 1994, based on average out-of-
pocket costs of $403 million and capital costs of $401 million.
The $802 million, equal to $1,044 million in 2013 dollars, indicates that the cost to develop and win marketing approval for a new drug has increased by 145% between the two study periods, or at a
compound annual growth rate of 8.5%.
According to DiMasi, rising drug development costs have been driven mainly by increases in out-of-pocket costs for individual drugs and higher failure rates for drugs tested in human subjects.
Factors that likely have boosted out-of-pocket clinical costs include increased clinical trial complexity, larger clinical trial sizes, higher cost of inputs from the medical sector used for development, greater focus
on targeting chronic and degenerative diseases, changes in protocol design to include efforts to gather health technology assessment information, and testing on comparator drugs to accommodate payer
demands for comparative effectiveness data.
Lengthening development and approval times were not responsible for driving up development costs, according to DiMasi.
“In fact,” DiMasi said, “changes in the overall time profile for development and regulatory approval phases had a modest moderating effect on the increase in R&D costs. As a result, the time cost share of total
cost declined from approximately 50% in previous studies to 45% for this study.”
The study was authored by DiMasi, Henry G. Grabowski of the Duke University Department of Economics, and Ronald W. Hansen at the Simon Business School at the University of Rochester.
Progressive, Unsustainable Decline in Productivity
Reported by Matthew Herper, Forbes 5/22/2014 “Who’s the best in drug research…”
http://www.forbes.com/sites/matthewherper/2014/05/22/new-report-ranks-22-drug-companies-based-on-rd/
2014 New Drug Approvals Hit 18-Year High
2014 was a good year for pharmaceutical
innovation – the best, in fact, since the
industry’s all-time record of 1996. FDA
approved a total of 44 drugs –
The productivity crisis in pharmaceutical R&D
Fabio Pammolli, Laura Magazzini & Massimo Riccaboni
Nature Reviews Drug Discovery 10, 428-438 (June 2011)
28,000 compounds from Pharmaceutical Industry Database
We are unable to predict success.
Failure Rates Increasing at all Stages of R&D
Merck & Co. (MSD)
$6.5 billion; 25 drug candidates in late-stage
development; key areas: oncology, CV,
diabetes, respiratory & immunology,
neurology, infectious disease and vaccines
2014 R&D
EXPENSE
$42.2 billion; 61% of sales come
from outside the United States
2014 REVENUES
Pharmaceuticals, Vaccines,
Biologics and Animal Health
BUSINESSES
Kenilworth, New Jersey, U.S.A.HEADQUARTERS
Operating since 1851RICH HISTORY
We are known as Merck & Co. We are
known as MSD outside of the United
States and Canada.
WHO WE ARE
Approximately 70,000 worldwide
(as of 12/31/14)
EMPLOYEES
Key Company
Facts
Premier Research-Driven BioPharmaceutical Company
Merck Research Labs
Form and Function
Translational Medicine Preclinical Development Clinical, Regulatory, & Safety Outcomes Research
Scientific Modeling Platform (Cross-functional Analytics & Predictive Modeling)
Scientific Information Management Platform (Cross-functional Information Access & Interoperability)
Business Outcomes
Decrease SDV / GCD Cost Decrease Time to Market
Increase in Analysis of Real
World Data
Ensure 100% Compliance
Increase Analytics Based
Decision Making
Increase Biologics
contribution to 40%
Increase use of modeling for
trials and submissions
Scientists can find
Information they need
Improve POC Success to 60%
Enterprise and Laboratory Platforms (Cross-functional Information Creation and Collection)
Applied Math and Modeling Team (Cross-functional Analytics & Predictive Modeling )
Translational Data Science, Informatics, and
Analytics
Data Science
Data Science involves combining strong analytical skills with an exploratory mindset and
business domain expertise. Data scientists, or data science teams, can identify the right
questions, help get the right data, integrate, explore, visualize, interpret, find patterns, select the
right analytics approaches, and deliver business insights and impact. They generally operate on
the top half of the information pyramid, e.g. they depend on (lots of) available, interoperable,
data.
Informatics
Informatics is the activity of solving problems using data & information assets,
methodologies, and technologies. It also means navigating whatever parts of the
data-information-knowledge ecosystem are necessary to solve a problem. This
activity could require one or many different informatics-related disciplines, e.g.,
information management, software engineering, information system design,
bioinformatics, computational biology, mathematics, modeling, imaging, genomics,
network analysis, text mining, information flow modeling, scientific computing, health
informatics, statistics, cheminformatics, and it often requires a multidisciplinary team.
Analytics Continuum at Merck & Co.
JM Johnson, DRAFT 6/5/2014
Based on a similar slide from Booz Allen Hamilton
Analytical
complexity/depth
Descriptive
Analytics
(hindsight)
Prescriptive
Analytics
(foresight)
Predictive Modeling / Simulation /
Optimization
What will happen if ..? What’s the best
choice? What are the alternatives?
What should we do?
Statistical and Mathematical
Analysis
Is my hypothesis correct?
What is the cause?
Enquiry Analytics
Data Exploration & Mining
Analysis / Visualization /
Query / Drill down / Alerts
Hypothesis generation
What is the problem? Is there a
pattern? What is a good question to
ask? When is action needed?
Ad hoc and Custom
Reports
How did it happen?
Standard Reports and
Dashboards
What happened?
Predictive
Analytics
(insight)
The “best” approach may be any of the above.
It depends on the problem and the context.
Merck’s Global Network
Press Release v1 (Merck BHAG Realized)
Merck’s revolutionary model-driven approach to drug development leads to breakthrough therapies in Oncology and Neuroscience.
Boston, MA, November 4, 2024
In the last 12 months Merck has released breakthrough treatments for cancer and mental health in record time by using it’s revolutionary modeling platform for
human drug response.
By working with regulatory authorities world wide and leveraging public private partnerships, Merck has been able to develop deep models of human disease
allowing them to go straight to human trials. This has allowed them to greatly reduce the traditional timeline for drug development and by-pass controversial and
expensive animal trials.
Head of modeling Dr. Smith said that the approach was made possible by developing deep and accurate models of each individual in a clinical trial. “We actively
recruited patient populations and made use of sophisticated bio-sensors, nanotechnologies and real-time analysis to develop comprehensive predictive models of
their genetics, metabolism and disease”. Over a period of several years Merck modelers received constant streams of data from these volunteers giving them
unprecedented understanding of their disease. They combined this with large publicly funded datasets and crowd sourced and internal modeling methods.
“We are moving to a new paradigm in drug discovery where we enroll patients before we start therapeutic development” said Smith.
Merck believes that it’s modeling platform and methodology can be used to rapidly develop cures for other diseases and is actively seeking patients to donate
their health information as well as development partners to license this platform in new disease areas.
Note: This is completely fake and does not represent any forward looking statements on behalf of Merck.
Press Release v2 (Merck BHAG Realized)
Merck’s “Virtual PipelineTM” Powers Decision Making
Boston, MA, November 4, 2024
Merck released details today on a revolutionary platform that it created to support all aspects of the drug discovery and development process.
This 10 year journey began in 2014 with the acknowledgement that the pharmaceutical industry must transform in order to survive the mounting
financial and regulatory pressures.
In collaboration with regulatory agencies world-wide, Merck created the Virtual PipelineTM by adopting a Product Lifecycle Management (PLM)
mentality and completely and permanently altered the pharmaceutical research and development landscape.
“The existence of the Virtual PipelineTM and the ability to fully simulate the entire lifecycles of therapeutic agents allowed our business development
team to make an informed decision to acquire Iliad Pharmaceuticals’ entire portfolio with the intent to launch a drug that will see Merck re-enter the
infectious disease therapeutic area. It is our expectation that Merck will enter the market with First and Best-in-Class agents grossing in excess of
$10BN per annum.”, reported Dr. Hootie N.D. Blowfish, Head of Strategic Acquisitions.
While too early to verify, Merck projects that the Virtual PipelineTM will enable their research scientists to reduce the time from target identification
to product launch by as much as 40% with associated cost savings nearing 50%.
Note: This is completely fake and does not represent any forward looking statements on behalf of Merck.
Questions, questions, questions…
Research Development Commercial Medical
Drug Protein Target ResponseSystem Individuals PopulationsPathway
What entity should I make?
How active is my entity?
What other activities does my entity possess?
How can I make it?
Do I have the starting materials?
What dose is required?Is it likely to be metabolized?
Is clearance going to be a problem? What is the most effective formulation?
How can I make it in bulk?
What disease should I target?
What targets are involved?
What mechanisms are involved?
How are my competitors doing?
Is my compound more effective than comparators?
How much can I charge for this?
Can I patent this?
Transform
Deliver
Aggregate
Access
Drug Protein Target Response
Answers, answers, answers…
System Individuals PopulationsPathway
Research Development Commercial Medical
Data
(Internal and External,
Structured and
Unstructured)
Models and Simulations
(Data)
Workflows
(Best Practices)
Drug Protein Target
Response
interacts
with
and elicits a
The Promise of Predictive Modeling, Simulation,
and Optimization
distributes to
site of action
through a
in
System
IndividualsPopulations
Pathway
in a
within
that respond to
Each arrow represents an opportunity
to develop and utilize a predictive
model in lieu of more resource and
time-consuming experimentation!
Drug Protein Target Response
Initial Efforts Focused on Intra-domain Optimization
System Individuals PopulationsPathway
Research Development Commercial Medical
Data
(Internal and External,
Structured and
Unstructured)
Models and Simulations
(Data)
Workflows
(Best Practices)
Learning Loops (DMAIC Cycles) within the functional domains of Pharma R&D Support:
• Adaptive Research Operating Plans
• Adaptive Clinical Trials
• Behavioral Modification…
Design
Measure
Analyze
ImproveControl
Design
Measure
Analyze
ImproveControl
Design
Measure
Analyze
ImproveControl
Design
Measure
Analyze
ImproveControl
Model Usage is Growing…
Compounds registered as ‘GENERAL_SCREENING’ excluded from analysis
Resulting in Higher Quality Compounds!
Descriptor Function X1 X2 X3 X4
QSAR_CLint_rat_hepatocyte Decreasing 45 100
QSAR_CLint_human_hepatocyte Decreasing 25 60
QSAR_Clearance_rat Decreasing 15 35
ClogD_pH_7.4 Hump Function 1.5 23 3 3.5
Polar_Surface Hump Function 65 75 125 140
Molecular_Weight Hump Function 420 475 530 580
Courtesy: Kerim Babaoglu
Multiparameter Optimization (MPO) Analysis Drives Design of More Desirable Compounds
More Desirable Compounds Display Lower (Better) Human Dose Calculations
(Scaled from Experimental Rat PK Data)
Design/Synthesis Cycle
DesirabilityScore
Legend:
Green = Good Dose
Yellow = Moderate Dose
Red = Poor Dose
Drug Protein Target Response
Connecting the Domains with Models
System Individuals PopulationsPathway
Research Development Commercial Medical
Data
(Internal and External,
Structured and
Unstructured)
Models and Simulations
(Data)
Workflows
(Best Practices)
Cross-domain DMAIC Loops…
Leads to Decreased Lead Optimization Cycle Times
Drug Protein Target Response
Closing the Loop
System Individuals PopulationsPathway
Research Development Commercial Medical
Data
(Internal and External,
Structured and
Unstructured)
Models and Simulations
(Data)
Workflows
(Best Practices)
Can we construct pan-R&D workflows that incorporate existing data, predictive models, and best practices
to drive design, predict full product lifecycle, and increase probability of success?
Real World Evidence and Outcomes Research
A Trillion Points of Data
31
The Hadoop Initiative:
Supporting Today’s
Data Access and
Preparing for the
Emergence of Big Data
Abstract
Currently, Merck’s observational research activities rely heavily upon electronic
medical record (EMR) and electronic administrative insurance claim (AIC) data,
which are purchased from vendors and often stored in-house on the current
Oracle® Exadata platform. This platform provides efficient storage and access to
these types of electronic databases, which usually are organized as traditional
structured relational database tables that may approach billions of observations
in size.
However, as the rapid acceleration of worldwide electronic data generation
continues, new sources of nontraditional data are expected to become
increasingly relevant to pharmaceutical research. These new sources,
collectively termed “Big Data,” are characterized as potentially massive and
arriving in various formats, often unstructured – features that will render them
increasingly less compatible with traditional computer architecture.
To prepare for future data demands while supporting our current data
requirements, Merck’s Center for Observational and Real world Evidence
(CORE) is evaluating Hadoop, an architecture and methodology designed to
efficiently and inexpensively meet the storage, retrieval, and analysis
requirements of Big Data’s immense and variably structured data. Currently
under way is an assessment of Hadoop’s capability to meet Merck’s current EMR
and AIC data processing requirements. Initial testing is proving favorable, and if
the final phase of testing is also positive, Hadoop will offer a platform that will
both meet today’s data requirements and offer a compatible, scalable
architecture for accommodating tomorrow’s Big Data.
This poster summarizes the preliminary performance findings to date, highlights
the benefits of Hadoop, describes the current production data platform vs the
proposed Hadoop platform, highlights what is meant by Big Data, and touches
upon the challenges ahead as we face the inevitable prospect of managing and
analyzing Big Data.
Departments of 1Statistical Programming for CORE†, North Wales, PA;
2Applied Technology, Branchburg, NJ; 3CORE† Data Sciences & Insights,
North Wales, PA; 4Market Research & Analytics, North Wales, PA; 5IT
Client Services Leader, CORE†, Rahway, NJ; 6CORE† Data Sciences &
Insights, North Wales, PA, Merck & Co., Inc., Kenilworth, NJ, USA
Michael Senderak1; David Tabacco2;
Robert Lubwama3; David O’Connell4;
Matt Majer5; Bryan Mallitz6
Key Contributors: CORE† Data Sciences & Insights, PharmacoEpidemiology
& Database Research Unit, Global Human Health, Market Research
Analytics, CORE† Information Technology, Prague Global Innovation
Network, Applied Technology
†Center for Observational and Real world Evidence
PO-13
Copyright © 2016 Merck Sharp & Dohme Corp., a subsidiary of Merck & Co., Inc. All rights reserved. Back Next
Administrative insurance claims (IC)
Electronic medical records (EMR)
Objective 1: Hadoop vs Oracle Exadata (continued)
Table 2. Preliminary performance
test results
Hadoop/SAS
LASR/HPA
Exadata
(Current Merck
Data Platform)
Data
Extraction†
6.15 sec vs →
4.0 sec vs →
4.9 min vs →
11.5 min vs →
4.8 min vs →
23.63 sec
14.96 sec
4.1 min
30 min
2 min
SAS Code
Processing‡
55 sec vs →
16 sec vs →
15 sec vs →
55 sec vs →
16 sec vs →
58 sec vs →
5 hr, 52 min, 21 sec
6 min, 25 sec
5 min, 47 sec
8 hr, 52 min
11 min
9 hr, 27 min
†To date, results range from ~300% extraction improvement
using Hadoop to ~140% performance decline. Runs for
additional test cases are in progress. Note that data were
extracted to an SAS Institute server proximally located to the
Hadoop hardware. The next phase of testing will involve
extraction to the current, geographically distant Merck HP Unix
server to determine exact performance metrics. Note also that
the extraction advantage of Hadoop depends on extraction
query construction, which will be further addressed in the next
testing phase.
‡The power of parallel processing shows dramatic results, as
the model is developed in memory across multiple nodes, as
opposed to a single thread on disk.
Figure 1. Current Oracle Exadata/HP Unix platform
Figure 2. Proposed Hadoop/HP Unix platform
SAS environment on HP Unix
Oracle Exadata
Electronic medical records (EMR)
Administrative insurance claims (IC)
GE Centricity
CPRD
Cerner
THIN
Marketscan Medicare
Marketscan CCAE
and MDCR
OptumInsight
Data files
Analysis/informatics
users
Analysis datasets
Pool
subset
extract
Analysis
cohorts
SAS/R
analysis
SQL query
SAS/R/SQL
SAS/R analysis
HUMANA
• Exadata is used primarily for storage
• Data are extracted to Unix as SAS datasets for processing
Statistical
programming
EMR and
IC vendor data
GE Centricity
CPRD
Cerner
THIN
Big
Data
Big Data
analytics
In subsequent releases, SAS processing currently with the HP Unix platform may be migrated into the Hadoop architecture.
Hadoop:
• Stores EMR and IC data for extraction
to Unix
• Both stores and processes Big Data
Compressed ASCII files
received from vendors
SAS environment on HP Unix
Analysis/informatics
users
Analysis datasets
Pool
subset
extract
Analysis
cohorts
SAS/R
analysis
SQL query
SAS/R/SQL
SAS/R analysis
Statistical
programming
EMR and
IC vendor data
Compressed ASCII files
received from vendors
Residing on or off site
Marketscan Medicare
Marketscan CCAE
and MDCR
OptumInsight
HUMANA
Data files
Analysis/
informatics
users
Big Data:
Genomic data, streaming data, etc
Statistical programming
The Hadoop Initiative: Supporting Today’s Data Access and Preparing for the Emergence of Big Data PO-13
32Copyright © 2016 Merck Sharp & Dohme Corp., a subsidiary of Merck & Co., Inc. All rights reserved. Back Next
Marketplace
monitoring
Product launch
and
marketplace
Clinical
Phase
Preclinical
phase
Discovery
phase
The exponential explosion of data generated daily is just in its early stages (Philadelphia Big Data
Conference, 2015).
• During the single year of 2012, nearly 500 times the amount of data were generated than since the
dawn of mankind
• Through 2013 to 2020, nearly 17,000 times the amount of data will have been generated than since
the dawn of mankind
• By 2020, information will double every 73 days
Big Data defined
Table 3. Generally agreed upon-definition of Big Data
Objective 2: Preparing for Big Data
Volume Data too large for standard database management tools
Velocity Delivered at incredibly fast rates, often real time, not always predictable timing
Variability
Arrives in various formats, often unstructured (as opposed to relational or
standard row-column format)
Versatility Various sources and types of data
Table 4. The data lake
Big Data is:
• Potentially
massive
• Without an
assumed
single structure
• Not data
tables, but a
fluid data lake
of possibly
dissimilar data
sources
Traditional storage and retrieval
systems are not designed for Big Data.
Instead, with Big Data:
• Data prep, cleansing, linking happen just-
in-time
• No up-front extraction, transformation,
and loading into a structured environment
• Linking across sources and entities can
happen flexibly and incrementally
• Linkages across disparate data sources
are customized to the business need at
hand
• Linkage solutions are developed only as
needed, minimizing resource needs
Hadoop’s distributed processing
across multiple parallel
computing paths efficiently
manages the massive storage
and computational requirements
of unstructured data sources
• Example: Full prostate tumor
genetic sequences (exomes)
(Roche, 2014)
− 15 seconds to search
4,002,926,334 rows of exome
variants and join with
14,787,223 rows of expression
data
Big Data and the pharmaceutical industry
Table 4. Sources of Big Data for pharma
Regional, National, and Worldwide Databases
Patient histories Patient
registries
Electronic
medical records
Medical insurance
claims
Prescription claims Lab data Imaging data Genomic data
Physician office data
and freehand notes
Wearable
medical
device data
Government
records
Market/sales data
Web-based data:
 News feeds
 Social media streams: blogs, patient experience forums, etc.
Figure 3. Big Data synergies across the product life cycle
Adapted from: Defay T. and Mehta V. 2014.
The Hadoop Initiative: Supporting Today’s Data Access and Preparing for the Emergence of Big Data PO-13
33Copyright © 2016 Merck Sharp & Dohme Corp., a subsidiary of Merck & Co., Inc. All rights reserved. Back Next
Prediscovery:
patient
targeting
Genetic and genomic:
Gene sequencing, immunology databases, etc
Patient-centered: Government, academic, commercial data,
clinical trials, biometrics, etc
Smart devices and censors: Telephonic,
wireless, etc. Patient monitoring
Interactive media: Patient self-help and sharing forums,
doctor forums, blogs, etc
Healthcare information networks:
Patient/physician resources, guidelines,
policies, etc
Market data: Physician office records and
insurance claims for rx, dx, labs, etc
Preclinical development phase
Clinical development, distribution,
and postmarketing
BigDatasources
Predictive and Economic Modeling
• Global Burden of Disease
• Budget Impact
• Launch Optimization
Consortia and Other Considerations
• TransCelerate – working now on an eSource program
focused on harmonizing the direct capture of clinical study
data from HER/EMR, wireless/remote patient data, and
virtual trials.
• FDA Sentinel Initiative – patient safety data collected by
entities contracted by FDA. Does access to near real-time
real-world data change the safety landscape on any way?
We are able to predict success.
The Vision: Failure Rates Decreasing at All Stages of R&D
0
10
20
30
40
50
60
70
80
90
100
15 25
0
10
20
30
40
50
60
70
80
90
100
15 25
0
10
20
30
40
50
60
70
80
90
100
15 20 25 30
0
10
20
30
40
50
60
70
80
90
100
15 20 25 30
0
10
20
30
40
50
60
70
80
90
100
15 20 25 30
Thank you!

More Related Content

What's hot

Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma Ankur Khanna
 
Big Data in Pharma - Overview and Use Cases
Big Data in Pharma - Overview and Use CasesBig Data in Pharma - Overview and Use Cases
Big Data in Pharma - Overview and Use CasesJosef Scheiber
 
DataPharmaNovember2016
DataPharmaNovember2016DataPharmaNovember2016
DataPharmaNovember2016Pfizer
 
Disruptive Strategies for Removing Drug Discovery Bottlenecks
Disruptive Strategies for Removing Drug Discovery Bottlenecks Disruptive Strategies for Removing Drug Discovery Bottlenecks
Disruptive Strategies for Removing Drug Discovery Bottlenecks Sean Ekins
 
Pistoia alliance debates analytics 15-09-2015 16.00
Pistoia alliance debates   analytics 15-09-2015 16.00Pistoia alliance debates   analytics 15-09-2015 16.00
Pistoia alliance debates analytics 15-09-2015 16.00Pistoia Alliance
 
Clinical Research Informatics World 2015
Clinical Research Informatics World 2015Clinical Research Informatics World 2015
Clinical Research Informatics World 2015Jaime Hodges
 
New Disruptive Technology Helps CROs and Pharma Accelerate Oncology-Focused C...
New Disruptive Technology Helps CROs and Pharma Accelerate Oncology-Focused C...New Disruptive Technology Helps CROs and Pharma Accelerate Oncology-Focused C...
New Disruptive Technology Helps CROs and Pharma Accelerate Oncology-Focused C...Rafael Casiano
 
Creating a roadmap to clinical trial efficiency
Creating a roadmap to clinical trial efficiencyCreating a roadmap to clinical trial efficiency
Creating a roadmap to clinical trial efficiencySubhash Chandra
 
Advanced Analytics for Clinical Data Full Event Guide
Advanced Analytics for Clinical Data Full Event GuideAdvanced Analytics for Clinical Data Full Event Guide
Advanced Analytics for Clinical Data Full Event GuidePfizer
 
Leverage Big Data Analytics to Enhance Clinical Trials from Planning to Execu...
Leverage Big Data Analytics to Enhance Clinical Trials from Planning to Execu...Leverage Big Data Analytics to Enhance Clinical Trials from Planning to Execu...
Leverage Big Data Analytics to Enhance Clinical Trials from Planning to Execu...Saama
 
Technology Considerations to Enable the Risk-Based Monitoring Methodology
Technology Considerations to Enable the Risk-Based Monitoring MethodologyTechnology Considerations to Enable the Risk-Based Monitoring Methodology
Technology Considerations to Enable the Risk-Based Monitoring Methodologywww.datatrak.com
 
Shedding Some Light on the Insights Lurking in the PMA Database
Shedding Some Light on the Insights Lurking in the PMA DatabaseShedding Some Light on the Insights Lurking in the PMA Database
Shedding Some Light on the Insights Lurking in the PMA DatabaseRevital (Tali) Hirsch
 
Future of RWE - Big Data and Analytics for Pharma 2017 presentation
Future of RWE - Big Data and Analytics for Pharma 2017 presentationFuture of RWE - Big Data and Analytics for Pharma 2017 presentation
Future of RWE - Big Data and Analytics for Pharma 2017 presentationSaama
 
Pressure BioScience Presentation January 2017
Pressure BioScience Presentation January 2017Pressure BioScience Presentation January 2017
Pressure BioScience Presentation January 2017RedChip Companies, Inc.
 

What's hot (20)

Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma Data Mining and Big Data Analytics in Pharma
Data Mining and Big Data Analytics in Pharma
 
Two Heads
Two HeadsTwo Heads
Two Heads
 
Blockbuster
BlockbusterBlockbuster
Blockbuster
 
Big Data in Pharma - Overview and Use Cases
Big Data in Pharma - Overview and Use CasesBig Data in Pharma - Overview and Use Cases
Big Data in Pharma - Overview and Use Cases
 
DataPharmaNovember2016
DataPharmaNovember2016DataPharmaNovember2016
DataPharmaNovember2016
 
Disruptive Strategies for Removing Drug Discovery Bottlenecks
Disruptive Strategies for Removing Drug Discovery Bottlenecks Disruptive Strategies for Removing Drug Discovery Bottlenecks
Disruptive Strategies for Removing Drug Discovery Bottlenecks
 
Disruptive Strategies for Removing Drug Discovery Bottlenecks
Disruptive Strategies for Removing Drug Discovery BottlenecksDisruptive Strategies for Removing Drug Discovery Bottlenecks
Disruptive Strategies for Removing Drug Discovery Bottlenecks
 
Four Disruptive Strategies for Removing Drug Discovery Bottlenecks
Four Disruptive Strategies for Removing Drug Discovery BottlenecksFour Disruptive Strategies for Removing Drug Discovery Bottlenecks
Four Disruptive Strategies for Removing Drug Discovery Bottlenecks
 
Pistoia alliance debates analytics 15-09-2015 16.00
Pistoia alliance debates   analytics 15-09-2015 16.00Pistoia alliance debates   analytics 15-09-2015 16.00
Pistoia alliance debates analytics 15-09-2015 16.00
 
Clinical Research Informatics World 2015
Clinical Research Informatics World 2015Clinical Research Informatics World 2015
Clinical Research Informatics World 2015
 
Analytics in Pharmaceutical Industry
Analytics in Pharmaceutical IndustryAnalytics in Pharmaceutical Industry
Analytics in Pharmaceutical Industry
 
New Disruptive Technology Helps CROs and Pharma Accelerate Oncology-Focused C...
New Disruptive Technology Helps CROs and Pharma Accelerate Oncology-Focused C...New Disruptive Technology Helps CROs and Pharma Accelerate Oncology-Focused C...
New Disruptive Technology Helps CROs and Pharma Accelerate Oncology-Focused C...
 
Creating a roadmap to clinical trial efficiency
Creating a roadmap to clinical trial efficiencyCreating a roadmap to clinical trial efficiency
Creating a roadmap to clinical trial efficiency
 
Advanced Analytics for Clinical Data Full Event Guide
Advanced Analytics for Clinical Data Full Event GuideAdvanced Analytics for Clinical Data Full Event Guide
Advanced Analytics for Clinical Data Full Event Guide
 
Leverage Big Data Analytics to Enhance Clinical Trials from Planning to Execu...
Leverage Big Data Analytics to Enhance Clinical Trials from Planning to Execu...Leverage Big Data Analytics to Enhance Clinical Trials from Planning to Execu...
Leverage Big Data Analytics to Enhance Clinical Trials from Planning to Execu...
 
Technology Considerations to Enable the Risk-Based Monitoring Methodology
Technology Considerations to Enable the Risk-Based Monitoring MethodologyTechnology Considerations to Enable the Risk-Based Monitoring Methodology
Technology Considerations to Enable the Risk-Based Monitoring Methodology
 
Shedding Some Light on the Insights Lurking in the PMA Database
Shedding Some Light on the Insights Lurking in the PMA DatabaseShedding Some Light on the Insights Lurking in the PMA Database
Shedding Some Light on the Insights Lurking in the PMA Database
 
Trial io pcori doc v1
Trial io pcori doc v1Trial io pcori doc v1
Trial io pcori doc v1
 
Future of RWE - Big Data and Analytics for Pharma 2017 presentation
Future of RWE - Big Data and Analytics for Pharma 2017 presentationFuture of RWE - Big Data and Analytics for Pharma 2017 presentation
Future of RWE - Big Data and Analytics for Pharma 2017 presentation
 
Pressure BioScience Presentation January 2017
Pressure BioScience Presentation January 2017Pressure BioScience Presentation January 2017
Pressure BioScience Presentation January 2017
 

Similar to Translational Data Science_clean

Building a Culture
Building a CultureBuilding a Culture
Building a CultureChris Waller
 
Cloud Enabled Pharma R&D Trials
Cloud Enabled Pharma R&D TrialsCloud Enabled Pharma R&D Trials
Cloud Enabled Pharma R&D TrialsDmitriy Synyak
 
Critical Path Initiative Challenges
Critical  Path  Initiative  ChallengesCritical  Path  Initiative  Challenges
Critical Path Initiative ChallengesLandmark
 
Wk 5 case 1 designing drug virtually
Wk 5 case 1 designing drug virtually Wk 5 case 1 designing drug virtually
Wk 5 case 1 designing drug virtually dyadelm
 
Case 5.1 - DESIGNING DRUGS VIRTUALLY
Case 5.1 - DESIGNING DRUGS VIRTUALLYCase 5.1 - DESIGNING DRUGS VIRTUALLY
Case 5.1 - DESIGNING DRUGS VIRTUALLYAya Wan Idris
 
ValeriaPiras_NYUCapstone_PharmaM&AComm
ValeriaPiras_NYUCapstone_PharmaM&ACommValeriaPiras_NYUCapstone_PharmaM&AComm
ValeriaPiras_NYUCapstone_PharmaM&ACommValeria Piras
 
Medpace late phase_white_paper_final
Medpace late phase_white_paper_finalMedpace late phase_white_paper_final
Medpace late phase_white_paper_finalMedpace
 
Creating a Comprehensive Drug Development Plan
Creating a Comprehensive Drug Development PlanCreating a Comprehensive Drug Development Plan
Creating a Comprehensive Drug Development PlanCovance
 
Drug and Device Combinations- The Leadership Challenge
Drug and Device Combinations- The Leadership Challenge Drug and Device Combinations- The Leadership Challenge
Drug and Device Combinations- The Leadership Challenge Robert Ferguson
 
Drug Discovery, Development and Commercialization
Drug Discovery, Development and CommercializationDrug Discovery, Development and Commercialization
Drug Discovery, Development and CommercializationBashant Kumar sah
 
EY Drug R&D: Big DATA for big returns
EY Drug R&D: Big DATA for big returnsEY Drug R&D: Big DATA for big returns
EY Drug R&D: Big DATA for big returnsThomas Wilckens
 
Fulgent Genetics - Biotech - Total return >200%
Fulgent Genetics - Biotech - Total return >200%Fulgent Genetics - Biotech - Total return >200%
Fulgent Genetics - Biotech - Total return >200%Rogelio Rea
 
How and When to Kill a Program in New Product Planning
How and When to Kill a Program in New Product PlanningHow and When to Kill a Program in New Product Planning
How and When to Kill a Program in New Product PlanningAnthony Russell
 
Accelerating Generic Approvals by Dr Anthony Crasto
Accelerating Generic Approvals by Dr Anthony CrastoAccelerating Generic Approvals by Dr Anthony Crasto
Accelerating Generic Approvals by Dr Anthony CrastoAnthony Melvin Crasto Ph.D
 
LTRN Investor Presentation - September 6 2022
LTRN Investor Presentation - September 6 2022LTRN Investor Presentation - September 6 2022
LTRN Investor Presentation - September 6 2022RedChip Companies, Inc.
 
The Clinical Trials Business
The Clinical Trials BusinessThe Clinical Trials Business
The Clinical Trials BusinessReportLinker.com
 
Pharmaceutical product development and its associated quality system 01
Pharmaceutical product development and its associated quality system 01Pharmaceutical product development and its associated quality system 01
Pharmaceutical product development and its associated quality system 01Abdirizak Mohammed
 

Similar to Translational Data Science_clean (20)

Building a Culture
Building a CultureBuilding a Culture
Building a Culture
 
Cloud Enabled Pharma R&D Trials
Cloud Enabled Pharma R&D TrialsCloud Enabled Pharma R&D Trials
Cloud Enabled Pharma R&D Trials
 
Critical Path Initiative Challenges
Critical  Path  Initiative  ChallengesCritical  Path  Initiative  Challenges
Critical Path Initiative Challenges
 
pc14164_brochure-1
pc14164_brochure-1pc14164_brochure-1
pc14164_brochure-1
 
Wk 5 case 1 designing drug virtually
Wk 5 case 1 designing drug virtually Wk 5 case 1 designing drug virtually
Wk 5 case 1 designing drug virtually
 
Case 5.1 - DESIGNING DRUGS VIRTUALLY
Case 5.1 - DESIGNING DRUGS VIRTUALLYCase 5.1 - DESIGNING DRUGS VIRTUALLY
Case 5.1 - DESIGNING DRUGS VIRTUALLY
 
ValeriaPiras_NYUCapstone_PharmaM&AComm
ValeriaPiras_NYUCapstone_PharmaM&ACommValeriaPiras_NYUCapstone_PharmaM&AComm
ValeriaPiras_NYUCapstone_PharmaM&AComm
 
Medpace late phase_white_paper_final
Medpace late phase_white_paper_finalMedpace late phase_white_paper_final
Medpace late phase_white_paper_final
 
Creating a Comprehensive Drug Development Plan
Creating a Comprehensive Drug Development PlanCreating a Comprehensive Drug Development Plan
Creating a Comprehensive Drug Development Plan
 
Cost Of Biopharm 2007
Cost Of Biopharm 2007Cost Of Biopharm 2007
Cost Of Biopharm 2007
 
PMC7
PMC7PMC7
PMC7
 
Drug and Device Combinations- The Leadership Challenge
Drug and Device Combinations- The Leadership Challenge Drug and Device Combinations- The Leadership Challenge
Drug and Device Combinations- The Leadership Challenge
 
Drug Discovery, Development and Commercialization
Drug Discovery, Development and CommercializationDrug Discovery, Development and Commercialization
Drug Discovery, Development and Commercialization
 
EY Drug R&D: Big DATA for big returns
EY Drug R&D: Big DATA for big returnsEY Drug R&D: Big DATA for big returns
EY Drug R&D: Big DATA for big returns
 
Fulgent Genetics - Biotech - Total return >200%
Fulgent Genetics - Biotech - Total return >200%Fulgent Genetics - Biotech - Total return >200%
Fulgent Genetics - Biotech - Total return >200%
 
How and When to Kill a Program in New Product Planning
How and When to Kill a Program in New Product PlanningHow and When to Kill a Program in New Product Planning
How and When to Kill a Program in New Product Planning
 
Accelerating Generic Approvals by Dr Anthony Crasto
Accelerating Generic Approvals by Dr Anthony CrastoAccelerating Generic Approvals by Dr Anthony Crasto
Accelerating Generic Approvals by Dr Anthony Crasto
 
LTRN Investor Presentation - September 6 2022
LTRN Investor Presentation - September 6 2022LTRN Investor Presentation - September 6 2022
LTRN Investor Presentation - September 6 2022
 
The Clinical Trials Business
The Clinical Trials BusinessThe Clinical Trials Business
The Clinical Trials Business
 
Pharmaceutical product development and its associated quality system 01
Pharmaceutical product development and its associated quality system 01Pharmaceutical product development and its associated quality system 01
Pharmaceutical product development and its associated quality system 01
 

Translational Data Science_clean

  • 1. Translational Data Science at Merck Chris L. Waller, Ph.D. Executive Director and Head, Scientific Modeling Platforms…
  • 2. Forward-Looking Statement This presentation includes “forward-looking statements” within the meaning of the safe harbor provisions of the United States Private Securities Litigation Reform Act of 1995. Such statements may include, but are not limited to, statements about the benefits of the merger between Merck and Schering-Plough, including future financial and operating results, the combined company’s plans, objectives, expectations and intentions and other statements that are not historical facts. Such statements are based upon the current beliefs and expectations of Merck’s management and are subject to significant risks and uncertainties. Actual results may differ from those set forth in the forward-looking statements. The following factors, among others, could cause actual results to differ from those set forth in the forward-looking statements: the possibility that all of the expected synergies from the merger of Merck and Schering-Plough will not be realized, or will not be realized within the expected time period; the impact of pharmaceutical industry regulation and health care legislation in the United States and internationally; Merck’s ability to accurately predict future market conditions; dependence on the effectiveness of Merck’s patents and other protections for innovative products; and the exposure to litigation and/or regulatory actions. Merck undertakes no obligation to publicly update any forward-looking statement, whether as a result of new information, future events or otherwise. Additional factors that could cause results to differ materially from those described in the forward-looking statements can be found in Merck’s 2011 Annual Report on Form 10-K and the company’s other filings with the Securities and Exchange Commission (SEC) available at the SEC’s Internet site (www.sec.gov).
  • 3. Outline • Merck & Co. (MSD) Introduction • Function and Form: R&D (Merck Research Labs) and R&D IT (MRL IT) • Translational Data Science, Informatics, and Analytics: Vision and Technology • Real World Evidence: Opportunities to Use Outcomes to Influence Research and Development • Discussion
  • 4. But first, the news…
  • 5. Cost to Develop and Win Marketing Approval for a New Drug Is Increasing! BOSTON – Nov. 18, 2014 – Developing a new prescription medicine that gains marketing approval, a process often lasting longer than a decade, is estimated to cost $2,558 million, according to a new study by the Tufts Center for the Study of Drug Development. The $2,558 million figure per approved compound is based on estimated: Average out-of-pocket cost of $1,395 million Time costs (expected returns that investors forego while a drug is in development) of $1,163 million Estimated average cost of post-approval R&D—studies to test new indications, new formulations, new dosage strengths and regimens, and to monitor safety and long-term side effects in patients required by the U.S. Food and Drug Administration as a condition of approval—of $312 million boosts the full product lifecycle cost per approved drug to $2,870 million. All figures are expressed in 2013 dollars. The new analysis, which updates similar Tufts CSDD analyses, was developed from information provided by 10 pharmaceutical companies on 106 randomly selected drugs that were first tested in human subjects anywhere in the world from 1995 to 2007. “Drug development remains a costly undertaking despite ongoing efforts across the full spectrum of pharmaceutical and biotech companies to rein in growing R&D costs,” said Joseph A. DiMasi, director of economic analysis at Tufts CSDD and principal investigator for the study. He added, “Because the R&D process is marked by substantial technical risks, with expenditures incurred for many development projects that fail to result in a marketed product, our estimate links the costs of unsuccessful projects to those that are successful in obtaining marketing approval from regulatory authorities.” In a study published in 2003, Tufts CSDD estimated the cost per approved new drug to be $802 million (in 2000 dollars) for drugs first tested in human subjects from 1983 to 1994, based on average out-of- pocket costs of $403 million and capital costs of $401 million. The $802 million, equal to $1,044 million in 2013 dollars, indicates that the cost to develop and win marketing approval for a new drug has increased by 145% between the two study periods, or at a compound annual growth rate of 8.5%. According to DiMasi, rising drug development costs have been driven mainly by increases in out-of-pocket costs for individual drugs and higher failure rates for drugs tested in human subjects. Factors that likely have boosted out-of-pocket clinical costs include increased clinical trial complexity, larger clinical trial sizes, higher cost of inputs from the medical sector used for development, greater focus on targeting chronic and degenerative diseases, changes in protocol design to include efforts to gather health technology assessment information, and testing on comparator drugs to accommodate payer demands for comparative effectiveness data. Lengthening development and approval times were not responsible for driving up development costs, according to DiMasi. “In fact,” DiMasi said, “changes in the overall time profile for development and regulatory approval phases had a modest moderating effect on the increase in R&D costs. As a result, the time cost share of total cost declined from approximately 50% in previous studies to 45% for this study.” The study was authored by DiMasi, Henry G. Grabowski of the Duke University Department of Economics, and Ronald W. Hansen at the Simon Business School at the University of Rochester.
  • 6. Progressive, Unsustainable Decline in Productivity Reported by Matthew Herper, Forbes 5/22/2014 “Who’s the best in drug research…” http://www.forbes.com/sites/matthewherper/2014/05/22/new-report-ranks-22-drug-companies-based-on-rd/ 2014 New Drug Approvals Hit 18-Year High 2014 was a good year for pharmaceutical innovation – the best, in fact, since the industry’s all-time record of 1996. FDA approved a total of 44 drugs –
  • 7. The productivity crisis in pharmaceutical R&D Fabio Pammolli, Laura Magazzini & Massimo Riccaboni Nature Reviews Drug Discovery 10, 428-438 (June 2011) 28,000 compounds from Pharmaceutical Industry Database We are unable to predict success. Failure Rates Increasing at all Stages of R&D
  • 8. Merck & Co. (MSD)
  • 9. $6.5 billion; 25 drug candidates in late-stage development; key areas: oncology, CV, diabetes, respiratory & immunology, neurology, infectious disease and vaccines 2014 R&D EXPENSE $42.2 billion; 61% of sales come from outside the United States 2014 REVENUES Pharmaceuticals, Vaccines, Biologics and Animal Health BUSINESSES Kenilworth, New Jersey, U.S.A.HEADQUARTERS Operating since 1851RICH HISTORY We are known as Merck & Co. We are known as MSD outside of the United States and Canada. WHO WE ARE Approximately 70,000 worldwide (as of 12/31/14) EMPLOYEES Key Company Facts
  • 12. Form and Function Translational Medicine Preclinical Development Clinical, Regulatory, & Safety Outcomes Research Scientific Modeling Platform (Cross-functional Analytics & Predictive Modeling) Scientific Information Management Platform (Cross-functional Information Access & Interoperability) Business Outcomes Decrease SDV / GCD Cost Decrease Time to Market Increase in Analysis of Real World Data Ensure 100% Compliance Increase Analytics Based Decision Making Increase Biologics contribution to 40% Increase use of modeling for trials and submissions Scientists can find Information they need Improve POC Success to 60% Enterprise and Laboratory Platforms (Cross-functional Information Creation and Collection) Applied Math and Modeling Team (Cross-functional Analytics & Predictive Modeling )
  • 13. Translational Data Science, Informatics, and Analytics
  • 14. Data Science Data Science involves combining strong analytical skills with an exploratory mindset and business domain expertise. Data scientists, or data science teams, can identify the right questions, help get the right data, integrate, explore, visualize, interpret, find patterns, select the right analytics approaches, and deliver business insights and impact. They generally operate on the top half of the information pyramid, e.g. they depend on (lots of) available, interoperable, data.
  • 15. Informatics Informatics is the activity of solving problems using data & information assets, methodologies, and technologies. It also means navigating whatever parts of the data-information-knowledge ecosystem are necessary to solve a problem. This activity could require one or many different informatics-related disciplines, e.g., information management, software engineering, information system design, bioinformatics, computational biology, mathematics, modeling, imaging, genomics, network analysis, text mining, information flow modeling, scientific computing, health informatics, statistics, cheminformatics, and it often requires a multidisciplinary team.
  • 16. Analytics Continuum at Merck & Co. JM Johnson, DRAFT 6/5/2014 Based on a similar slide from Booz Allen Hamilton Analytical complexity/depth Descriptive Analytics (hindsight) Prescriptive Analytics (foresight) Predictive Modeling / Simulation / Optimization What will happen if ..? What’s the best choice? What are the alternatives? What should we do? Statistical and Mathematical Analysis Is my hypothesis correct? What is the cause? Enquiry Analytics Data Exploration & Mining Analysis / Visualization / Query / Drill down / Alerts Hypothesis generation What is the problem? Is there a pattern? What is a good question to ask? When is action needed? Ad hoc and Custom Reports How did it happen? Standard Reports and Dashboards What happened? Predictive Analytics (insight) The “best” approach may be any of the above. It depends on the problem and the context.
  • 18. Press Release v1 (Merck BHAG Realized) Merck’s revolutionary model-driven approach to drug development leads to breakthrough therapies in Oncology and Neuroscience. Boston, MA, November 4, 2024 In the last 12 months Merck has released breakthrough treatments for cancer and mental health in record time by using it’s revolutionary modeling platform for human drug response. By working with regulatory authorities world wide and leveraging public private partnerships, Merck has been able to develop deep models of human disease allowing them to go straight to human trials. This has allowed them to greatly reduce the traditional timeline for drug development and by-pass controversial and expensive animal trials. Head of modeling Dr. Smith said that the approach was made possible by developing deep and accurate models of each individual in a clinical trial. “We actively recruited patient populations and made use of sophisticated bio-sensors, nanotechnologies and real-time analysis to develop comprehensive predictive models of their genetics, metabolism and disease”. Over a period of several years Merck modelers received constant streams of data from these volunteers giving them unprecedented understanding of their disease. They combined this with large publicly funded datasets and crowd sourced and internal modeling methods. “We are moving to a new paradigm in drug discovery where we enroll patients before we start therapeutic development” said Smith. Merck believes that it’s modeling platform and methodology can be used to rapidly develop cures for other diseases and is actively seeking patients to donate their health information as well as development partners to license this platform in new disease areas. Note: This is completely fake and does not represent any forward looking statements on behalf of Merck.
  • 19. Press Release v2 (Merck BHAG Realized) Merck’s “Virtual PipelineTM” Powers Decision Making Boston, MA, November 4, 2024 Merck released details today on a revolutionary platform that it created to support all aspects of the drug discovery and development process. This 10 year journey began in 2014 with the acknowledgement that the pharmaceutical industry must transform in order to survive the mounting financial and regulatory pressures. In collaboration with regulatory agencies world-wide, Merck created the Virtual PipelineTM by adopting a Product Lifecycle Management (PLM) mentality and completely and permanently altered the pharmaceutical research and development landscape. “The existence of the Virtual PipelineTM and the ability to fully simulate the entire lifecycles of therapeutic agents allowed our business development team to make an informed decision to acquire Iliad Pharmaceuticals’ entire portfolio with the intent to launch a drug that will see Merck re-enter the infectious disease therapeutic area. It is our expectation that Merck will enter the market with First and Best-in-Class agents grossing in excess of $10BN per annum.”, reported Dr. Hootie N.D. Blowfish, Head of Strategic Acquisitions. While too early to verify, Merck projects that the Virtual PipelineTM will enable their research scientists to reduce the time from target identification to product launch by as much as 40% with associated cost savings nearing 50%. Note: This is completely fake and does not represent any forward looking statements on behalf of Merck.
  • 20. Questions, questions, questions… Research Development Commercial Medical Drug Protein Target ResponseSystem Individuals PopulationsPathway What entity should I make? How active is my entity? What other activities does my entity possess? How can I make it? Do I have the starting materials? What dose is required?Is it likely to be metabolized? Is clearance going to be a problem? What is the most effective formulation? How can I make it in bulk? What disease should I target? What targets are involved? What mechanisms are involved? How are my competitors doing? Is my compound more effective than comparators? How much can I charge for this? Can I patent this?
  • 21. Transform Deliver Aggregate Access Drug Protein Target Response Answers, answers, answers… System Individuals PopulationsPathway Research Development Commercial Medical Data (Internal and External, Structured and Unstructured) Models and Simulations (Data) Workflows (Best Practices)
  • 22. Drug Protein Target Response interacts with and elicits a The Promise of Predictive Modeling, Simulation, and Optimization distributes to site of action through a in System IndividualsPopulations Pathway in a within that respond to Each arrow represents an opportunity to develop and utilize a predictive model in lieu of more resource and time-consuming experimentation!
  • 23. Drug Protein Target Response Initial Efforts Focused on Intra-domain Optimization System Individuals PopulationsPathway Research Development Commercial Medical Data (Internal and External, Structured and Unstructured) Models and Simulations (Data) Workflows (Best Practices) Learning Loops (DMAIC Cycles) within the functional domains of Pharma R&D Support: • Adaptive Research Operating Plans • Adaptive Clinical Trials • Behavioral Modification… Design Measure Analyze ImproveControl Design Measure Analyze ImproveControl Design Measure Analyze ImproveControl Design Measure Analyze ImproveControl
  • 24. Model Usage is Growing… Compounds registered as ‘GENERAL_SCREENING’ excluded from analysis
  • 25. Resulting in Higher Quality Compounds! Descriptor Function X1 X2 X3 X4 QSAR_CLint_rat_hepatocyte Decreasing 45 100 QSAR_CLint_human_hepatocyte Decreasing 25 60 QSAR_Clearance_rat Decreasing 15 35 ClogD_pH_7.4 Hump Function 1.5 23 3 3.5 Polar_Surface Hump Function 65 75 125 140 Molecular_Weight Hump Function 420 475 530 580 Courtesy: Kerim Babaoglu Multiparameter Optimization (MPO) Analysis Drives Design of More Desirable Compounds More Desirable Compounds Display Lower (Better) Human Dose Calculations (Scaled from Experimental Rat PK Data) Design/Synthesis Cycle DesirabilityScore Legend: Green = Good Dose Yellow = Moderate Dose Red = Poor Dose
  • 26. Drug Protein Target Response Connecting the Domains with Models System Individuals PopulationsPathway Research Development Commercial Medical Data (Internal and External, Structured and Unstructured) Models and Simulations (Data) Workflows (Best Practices) Cross-domain DMAIC Loops…
  • 27. Leads to Decreased Lead Optimization Cycle Times
  • 28. Drug Protein Target Response Closing the Loop System Individuals PopulationsPathway Research Development Commercial Medical Data (Internal and External, Structured and Unstructured) Models and Simulations (Data) Workflows (Best Practices) Can we construct pan-R&D workflows that incorporate existing data, predictive models, and best practices to drive design, predict full product lifecycle, and increase probability of success?
  • 29. Real World Evidence and Outcomes Research
  • 30. A Trillion Points of Data
  • 31. 31 The Hadoop Initiative: Supporting Today’s Data Access and Preparing for the Emergence of Big Data Abstract Currently, Merck’s observational research activities rely heavily upon electronic medical record (EMR) and electronic administrative insurance claim (AIC) data, which are purchased from vendors and often stored in-house on the current Oracle® Exadata platform. This platform provides efficient storage and access to these types of electronic databases, which usually are organized as traditional structured relational database tables that may approach billions of observations in size. However, as the rapid acceleration of worldwide electronic data generation continues, new sources of nontraditional data are expected to become increasingly relevant to pharmaceutical research. These new sources, collectively termed “Big Data,” are characterized as potentially massive and arriving in various formats, often unstructured – features that will render them increasingly less compatible with traditional computer architecture. To prepare for future data demands while supporting our current data requirements, Merck’s Center for Observational and Real world Evidence (CORE) is evaluating Hadoop, an architecture and methodology designed to efficiently and inexpensively meet the storage, retrieval, and analysis requirements of Big Data’s immense and variably structured data. Currently under way is an assessment of Hadoop’s capability to meet Merck’s current EMR and AIC data processing requirements. Initial testing is proving favorable, and if the final phase of testing is also positive, Hadoop will offer a platform that will both meet today’s data requirements and offer a compatible, scalable architecture for accommodating tomorrow’s Big Data. This poster summarizes the preliminary performance findings to date, highlights the benefits of Hadoop, describes the current production data platform vs the proposed Hadoop platform, highlights what is meant by Big Data, and touches upon the challenges ahead as we face the inevitable prospect of managing and analyzing Big Data. Departments of 1Statistical Programming for CORE†, North Wales, PA; 2Applied Technology, Branchburg, NJ; 3CORE† Data Sciences & Insights, North Wales, PA; 4Market Research & Analytics, North Wales, PA; 5IT Client Services Leader, CORE†, Rahway, NJ; 6CORE† Data Sciences & Insights, North Wales, PA, Merck & Co., Inc., Kenilworth, NJ, USA Michael Senderak1; David Tabacco2; Robert Lubwama3; David O’Connell4; Matt Majer5; Bryan Mallitz6 Key Contributors: CORE† Data Sciences & Insights, PharmacoEpidemiology & Database Research Unit, Global Human Health, Market Research Analytics, CORE† Information Technology, Prague Global Innovation Network, Applied Technology †Center for Observational and Real world Evidence PO-13 Copyright © 2016 Merck Sharp & Dohme Corp., a subsidiary of Merck & Co., Inc. All rights reserved. Back Next
  • 32. Administrative insurance claims (IC) Electronic medical records (EMR) Objective 1: Hadoop vs Oracle Exadata (continued) Table 2. Preliminary performance test results Hadoop/SAS LASR/HPA Exadata (Current Merck Data Platform) Data Extraction† 6.15 sec vs → 4.0 sec vs → 4.9 min vs → 11.5 min vs → 4.8 min vs → 23.63 sec 14.96 sec 4.1 min 30 min 2 min SAS Code Processing‡ 55 sec vs → 16 sec vs → 15 sec vs → 55 sec vs → 16 sec vs → 58 sec vs → 5 hr, 52 min, 21 sec 6 min, 25 sec 5 min, 47 sec 8 hr, 52 min 11 min 9 hr, 27 min †To date, results range from ~300% extraction improvement using Hadoop to ~140% performance decline. Runs for additional test cases are in progress. Note that data were extracted to an SAS Institute server proximally located to the Hadoop hardware. The next phase of testing will involve extraction to the current, geographically distant Merck HP Unix server to determine exact performance metrics. Note also that the extraction advantage of Hadoop depends on extraction query construction, which will be further addressed in the next testing phase. ‡The power of parallel processing shows dramatic results, as the model is developed in memory across multiple nodes, as opposed to a single thread on disk. Figure 1. Current Oracle Exadata/HP Unix platform Figure 2. Proposed Hadoop/HP Unix platform SAS environment on HP Unix Oracle Exadata Electronic medical records (EMR) Administrative insurance claims (IC) GE Centricity CPRD Cerner THIN Marketscan Medicare Marketscan CCAE and MDCR OptumInsight Data files Analysis/informatics users Analysis datasets Pool subset extract Analysis cohorts SAS/R analysis SQL query SAS/R/SQL SAS/R analysis HUMANA • Exadata is used primarily for storage • Data are extracted to Unix as SAS datasets for processing Statistical programming EMR and IC vendor data GE Centricity CPRD Cerner THIN Big Data Big Data analytics In subsequent releases, SAS processing currently with the HP Unix platform may be migrated into the Hadoop architecture. Hadoop: • Stores EMR and IC data for extraction to Unix • Both stores and processes Big Data Compressed ASCII files received from vendors SAS environment on HP Unix Analysis/informatics users Analysis datasets Pool subset extract Analysis cohorts SAS/R analysis SQL query SAS/R/SQL SAS/R analysis Statistical programming EMR and IC vendor data Compressed ASCII files received from vendors Residing on or off site Marketscan Medicare Marketscan CCAE and MDCR OptumInsight HUMANA Data files Analysis/ informatics users Big Data: Genomic data, streaming data, etc Statistical programming The Hadoop Initiative: Supporting Today’s Data Access and Preparing for the Emergence of Big Data PO-13 32Copyright © 2016 Merck Sharp & Dohme Corp., a subsidiary of Merck & Co., Inc. All rights reserved. Back Next
  • 33. Marketplace monitoring Product launch and marketplace Clinical Phase Preclinical phase Discovery phase The exponential explosion of data generated daily is just in its early stages (Philadelphia Big Data Conference, 2015). • During the single year of 2012, nearly 500 times the amount of data were generated than since the dawn of mankind • Through 2013 to 2020, nearly 17,000 times the amount of data will have been generated than since the dawn of mankind • By 2020, information will double every 73 days Big Data defined Table 3. Generally agreed upon-definition of Big Data Objective 2: Preparing for Big Data Volume Data too large for standard database management tools Velocity Delivered at incredibly fast rates, often real time, not always predictable timing Variability Arrives in various formats, often unstructured (as opposed to relational or standard row-column format) Versatility Various sources and types of data Table 4. The data lake Big Data is: • Potentially massive • Without an assumed single structure • Not data tables, but a fluid data lake of possibly dissimilar data sources Traditional storage and retrieval systems are not designed for Big Data. Instead, with Big Data: • Data prep, cleansing, linking happen just- in-time • No up-front extraction, transformation, and loading into a structured environment • Linking across sources and entities can happen flexibly and incrementally • Linkages across disparate data sources are customized to the business need at hand • Linkage solutions are developed only as needed, minimizing resource needs Hadoop’s distributed processing across multiple parallel computing paths efficiently manages the massive storage and computational requirements of unstructured data sources • Example: Full prostate tumor genetic sequences (exomes) (Roche, 2014) − 15 seconds to search 4,002,926,334 rows of exome variants and join with 14,787,223 rows of expression data Big Data and the pharmaceutical industry Table 4. Sources of Big Data for pharma Regional, National, and Worldwide Databases Patient histories Patient registries Electronic medical records Medical insurance claims Prescription claims Lab data Imaging data Genomic data Physician office data and freehand notes Wearable medical device data Government records Market/sales data Web-based data:  News feeds  Social media streams: blogs, patient experience forums, etc. Figure 3. Big Data synergies across the product life cycle Adapted from: Defay T. and Mehta V. 2014. The Hadoop Initiative: Supporting Today’s Data Access and Preparing for the Emergence of Big Data PO-13 33Copyright © 2016 Merck Sharp & Dohme Corp., a subsidiary of Merck & Co., Inc. All rights reserved. Back Next Prediscovery: patient targeting Genetic and genomic: Gene sequencing, immunology databases, etc Patient-centered: Government, academic, commercial data, clinical trials, biometrics, etc Smart devices and censors: Telephonic, wireless, etc. Patient monitoring Interactive media: Patient self-help and sharing forums, doctor forums, blogs, etc Healthcare information networks: Patient/physician resources, guidelines, policies, etc Market data: Physician office records and insurance claims for rx, dx, labs, etc Preclinical development phase Clinical development, distribution, and postmarketing BigDatasources
  • 34.
  • 35. Predictive and Economic Modeling • Global Burden of Disease • Budget Impact • Launch Optimization
  • 36. Consortia and Other Considerations • TransCelerate – working now on an eSource program focused on harmonizing the direct capture of clinical study data from HER/EMR, wireless/remote patient data, and virtual trials. • FDA Sentinel Initiative – patient safety data collected by entities contracted by FDA. Does access to near real-time real-world data change the safety landscape on any way?
  • 37. We are able to predict success. The Vision: Failure Rates Decreasing at All Stages of R&D 0 10 20 30 40 50 60 70 80 90 100 15 25 0 10 20 30 40 50 60 70 80 90 100 15 25 0 10 20 30 40 50 60 70 80 90 100 15 20 25 30 0 10 20 30 40 50 60 70 80 90 100 15 20 25 30 0 10 20 30 40 50 60 70 80 90 100 15 20 25 30