1. COMMENTARIES
Improving Therapeutic
Effectiveness and Safety
Through Big Healthcare Data
S Schneeweiss1
Big healthcare data—electronically recorded longitudinal data
generated during the provision and administration of healthcare for
millions of patients—have become essential for understanding the
effectiveness and safety of therapeutics. They are most effectively
used in concert with experimental and laboratory research
throughout the life cycle of a drug. Applications range from
providing phenotype and health outcomes information in genome-
wide association studies to postmarketing studies that assure pre-
scribers of the safety of approved drugs (Figure 1).
USES OF BIG HEALTHCARE DATA
Big healthcare data are characterized by a
large number of patients covered, a reflec-
tion of the variations in routine care prac-
tices, a lack of researcher-designed data
capture (yielding inaccurate or missing
information), and a lack of uniform data
standards.1
These characteristics have differ-
ent implications depending on the analytic
goals of the study and how data are utilized.
Use for population description
and pattern exploration
Large insurance claims and electronic medi-
cal record databases are very useful in under-
standing disease burden and medical
need, as well as the underuse and guideline-
recommended use of therapeutics, because
they reflect care outside tightly controlled
research environments. Conclusions drawn
from electronic medical record-based re-
search may trigger care interventions to
optimize the use of drugs (e.g., improve
adherence to chronic-use medications), and
may be used to monitor the success of
such interventions. Hypothesis-free pattern
exploration with powerful visualizations
may identify populations with particular
utilization and outcome patterns, stimulat-
ing new lines of inquiry. Data generated
from outside the professional healthcare sys-
tem (e.g., through blogs, smartphone apps,
or patient support groups), provide a differ-
ent type of health information but are less
straightforward to interpret for population-
level insights, as they lack meaningful
denominators and are subject to selected par-
ticipation. Linkage between these novel data
sources and structured healthcare informa-
tion would create a very useful data asset.
Use for association studies
Big genomics data are increasingly linked to
big healthcare data. The latter includes phe-
notypic data and temporality-preserving
drug use and health outcomes data, allowing
large-scale genome-wide association studies
and genome-drug interaction studies. Even
if they do not imply causal relationships,
such association studies using big healthcare
data can be useful when interpreted
cautiously.
Use for prediction
Particularly in integrated healthcare systems,
it is now possible to program prediction
algorithms for treatment effectiveness vs. fail-
ure and feed the suggestion back to the pro-
vider. Because these are individual-level
probabilistic predictions without any impli-
cation of causality, such prediction algo-
rithms inform the provider at the point of
care; however, they will not culminate in
automated prescribing unless their perform-
ance improves substantially.2
Improvements
are more likely to come from richer data
than from new algorithms. For example, pre-
dicting the lack of adherence to a medication
regimen is an area in which dynamic analyses
of big data, including claims, electronic med-
ical record, in addition to consumer applica-
tions and electronic devices measuring
behavioral factors, hold the promise of
meaningful improvements in health care.3
Use for understanding
causal relationships
Ultimately, providers and drug developers
need to understand causal relationships
between drugs and health outcomes.
1
Division of Pharmacoepidemiology, Department of Medicine, Brigham & Women’s Hospital and Harvard Medical School, Boston, Massachusetts, USA.
Correspondence: S Schneeweiss (schneeweiss@post.harvard.edu)
doi:10.1002/cpt.316
262 VOLUME 99 NUMBER 3 | MARCH 2016 | www.wileyonlinelibrary/cpt
PERSPECTIVES
2. Understanding causality is arguably even
more critical in medicine than in other
data-rich fields, because healthcare profes-
sionals and regulators are responsible for
making decisions about the well-being of
patients. Big healthcare data have proven
useful for assessing the safety of medica-
tions, drug-drug interactions (including the
risk of unintended clinical events), and
increasingly the comparative effectiveness of
different drugs on health outcomes.4
Stud-
ies that conduct baseline randomization
and follow subjects using secondary health-
care data are of particular interest.5
How-
ever, in order to be useful for patient care,
evidence needs to cross a quality threshold
that allows interpreting associations as
causal relationships. When analyses lead to
causal interpretations of the effectiveness of
therapeutics they become subject to more
scientific scrutiny, in terms of transparency,
auditability, reproducibility, and replicabil-
ity (Figure 2).
WHAT MAKES HEALTHCARE
DATA DIFFERENT?
Despite the potential of big healthcare
data, there is concern about the analysis
and interpretation of findings. Most issues
arise from concerns over a fundamental
misunderstanding: that secondary health-
care data will be interpreted as research-
grade medical information. However, big
healthcare data are usually filtered through
the sociology of health care systems and
recording practices in light of economic
interests and system constraints.1
When
analyzing such data, one often works with
surrogates for true medical constructs. For
example, an elderly patient who received a
discharge code for hypertension, as one of
five allowed diagnosis fields from a tertiary
care hospital, is less sick because the five
fields could not be used up with more
severe diagnoses that would have increased
revenues. Conversely, the use of an oxygen
canister, which is well measured in second-
ary data, is a good proxy for advanced dis-
ease approaching the end of life.6
When we rely too closely on secondary
data, we are frequently faced with a mass of
numbers that defy epidemiological interpre-
tation because a population denominator is
not clearly defined; we see a mixing of inci-
dence and prevalence values; and we see
reversed temporality between patient base-
line characteristics and future health out-
comes. These numbers are often cloaked in
colorful visualizations that mean little in
terms of true insights. Attempts to quantify
causal relationships are unfortunately often
associated with adjustment for causal inter-
mediates, reverse causation, immortal time
bias, residual confounding, or neglecting
informative censoring, among other biases.7
The issues above are well described in the
literature, and we know how to minimize
and avoid them. However, the application of
statistics to context-free numbers leads us in
the wrong direction. We need investigators
who seek to understand data sources with
the help of software that implements prin-
cipled analyses at scale and in rapid cycles.
SUCCESS FACTORS FOR BIG
HEALTHCARE DATA ANALYTICS
Several factors make evidence produced
by big healthcare data analyses more
likely to be successful in terms of under-
standing therapeutic effectiveness and
safety that will ultimately influence
healthcare decision-making.
Meaningful evidence
In order for any big healthcare data analysis
to be meaningful, the appropriate informa-
tion needs to be available. This may include
information on drug exposure, outcomes
that matter to patients and providers, mea-
surement of important confounders or prox-
ies thereof, and (increasingly) biomarker
information to identify the right patients for
highly targeted therapeutics. Because
Figure 1 The uses of big healthcare data and their analysis throughout the life cycle of prescription drugs.
PERSPECTIVES
CLINICAL PHARMACOLOGY & THERAPEUTICS | VOLUME 99 NUMBER 3 | MARCH 2016 263
3. investigators cannot change information
already collected through the routine health-
care system, this requires searching for the
most appropriate data source, sometimes
worldwide, and often necessitates linking
several sources. This type of flexibility means
working with data that vary in terms of
quality, content, and coding.
For example, if the goal is a full charac-
terization of medications, their effective-
ness should be established in increasingly
finely stratified populations. Net benefits
are established by ascribing preference or
quality weights to absolute treatment effect
measures (i.e., difference rather than ratio
measures) of intended and unintended
effects.
Valid evidence
In order to reach (and surpass) the evidence
threshold (Figure 2), it is of course critical
to produce valid findings. This may require
a variety of different methodological ap-
proaches to the same question (e.g., combin-
ing randomized studies with secondary data
and observational analyses). Historical-
controlled studies and time-trend analyses
will become even more important in evalu-
ating highly targeted therapies that have
well-characterized molecular mechanisms
and will be quickly adopted by the provider
community. Observational studies based on
big healthcare data will generally benefit
from data-driven approaches to adjustment
for confounding in order to minimize bias.8
Expedited evidence
Particularly for newly marketed medications
(but also for most other applications of big
data analyses) it is important that evidence
on effectiveness and safety be generated in
rapid cycles.9
Even if the precision of safety
estimates is somewhat limited for newly
marketed medications, providing further
feedback as early as possible will either serve
as reassurance or put regulators on notice.
Regularly updating trends in treatment
effect estimates through frequently refreshed
big data will make treatment recommenda-
tions or regulatory decisions less binary and
more iterative, as knowledge regarding the
effectiveness and safety of a medication
evolves throughout its life cycle.
Transparent and reproducible evidence
Ultimately, big data analyses should help
inform decision makers, who usually are
not the ones generating the evidence.
Because of the lack of standardization in
secondary data analytics, complete trans-
parency is critically important in the
reporting of analytic approaches and all
coding details. This will allow reproduction
of analyses, replication of findings using
different data sources, and ultimately
greater confidence in such analyses, possibly
approaching the trust we place in highly
controlled clinical trials.
What does big data in healthcare mean,
other than a lot of data? The three Vs that
are often used to characterize big data also
apply to healthcare data: volume of data;
variety of data types; and velocity of data
access.10
The analysis of such data requires
three more Vs in order to be impactful: valid-
ity of analytic approach; visibility of methods
and results; and the ability to vouch for
patient privacy/data security. Computational
bottlenecks have largely disappeared because
of the dramatically decreased cost of comput-
ing capacity and its on-demand availability
through cloud computing.
Overall, big healthcare data analytics to
improve therapeutic effectiveness and safety
of medications continues to both broaden its
scope of use and gain strength in its impact
on population-based evidence generation
and population management. The field has
matured to develop a clearer understanding
of the challenges it faces, ways to improve the
Figure 2 Most decisions in healthcare require insights that are above a certain evidence quality threshold in order to support causal interpretations of
associations between drug use and health outcomes.
PERSPECTIVES
264 VOLUME 99 NUMBER 3 | MARCH 2016 | www.wileyonlinelibrary/cpt
4. meaningfulness of inferences made from
massive amounts of data, and approaches to
interpreting results for decision-making.
However, there is still a strong need for data
scientists with thorough training in the con-
duct of principled analyses that minimize
bias. Although a new generation of software
products will support this effort, a deep
understanding of the source data and how it
was generated will remain critical to the suc-
cess of big healthcare data analytics.
CONFLICT OF INTEREST
Dr. Schneeweiss is consultant to WHISCON,
LLC, and to Aetion, a software company in which
he also owns equity. He is principal investigator
of investigator-initiated grants to the Brigham
and Women’s Hospital from Novartis, Genen-
tech, Boehringer Ingelheim, and Genentech
unrelated to the topic of this study.
VC 2015 ASCPT
1. Schneeweiss, S. & Avorn, J. A review of
uses of health care utilization databases
for epidemiologic research on therapeutics.
J. Clin. Epidemiol. 58, 323–337 (2005).
2. Pencina, M.J. & D’Agostino, R.B. Sr.
Evaluating discrimination of risk prediction
models: the C statistic. JAMA 314, 1063–
1064 (2015).
3. Shrank, W.H. A case for why health systems
should partner with pharmacies. Harvard
Business Review, 14 October 2015.
4. Schneeweiss, S. Developments in post-
marketing comparative effectiveness
research. Clin. Pharmacol. Ther. 82, 143–
156 (2007).
5. Tunis, S.R., Stryer, D.B. & Clancy, C.M.
Practical clinical trials: increasing the value
of clinical research for decision making in
clinical and health policy. JAMA 290,
1624–1632 (2003).
6. Schneeweiss, S., Rassen, J.A., Glynn, R.J.,
Avorn, J., Mogun, H. & Brookhart, M.A.
High-dimensional propensity score
adjustment in studies of treatment effects
using health care claims data.
Epidemiology 20, 512–522 (2009).
7. Suissa, S. Immortal time bias in pharmaco-
epidemiology. Am. J. Epidemiol. 167, 492–
499 (2008).
8. Laan, M.J. & Rose, S. Targeted Learning:
Causal Inference for Observational and
Experimental Data (Springer, New York,
NY, 2011).
9. Psaty, B.M. & Breckenridge, A.M. Mini-
sentinel and regulatory science—big data
rendered fit and functional. N. Engl. J. Med.
370, 2165–2167 (2014).
10. Douglas, I. The Importance of “Big Data”:
a Definition (Gartner, Stamford, CT,
2012).
The FDA’s Sentinel Initiative—A
Comprehensive Approach to
Medical Product Surveillance
R Ball1
, M Robb1
, SA Anderson2
and G Dal Pan1
In May 2008, the Department of Health and Human Services
announced the launch of the Sentinel Initiative by the US Food
and Drug Administration (FDA) to create the Sentinel System, a
national electronic system for medical product safety
surveillance.1,2
This system complements existing FDA
surveillance capabilities that track adverse events reported after
the use of FDA regulated products by allowing the FDA to
proactively assess the safety of these products.
The Sentinel System includes the Active
Postmarket Risk Identification and Analy-
sis (ARIA) system mandated by Congress
in the US Food and Drug Administration
(FDA) Amendments Act (FDAAA) of
2007. In addition, the Sentinel Initiative
created focused surveillance efforts around
vaccine safety using the Postmarket Rapid
Immunization Safety Monitoring (PRISM)
system,3
and supports regulatory review of
blood and blood products with its Blood
Surveillance Continuous Active Network
(BloodSCAN).
One of the first stages of the development
of the Sentinel System included Mini-
Sentinel, a pilot program launched in 2009
to test the feasibility of and develop the sci-
entific approaches needed for creating such
a national system.2
In 2014, the FDA began
transitioning from the Mini-Sentinel pilot
to the fully operational Sentinel System.
The Sentinel System will build upon the
successes of the Mini-Sentinel pilot4
and
leverage the Sentinel Infrastructure, a dis-
tributed database with a Common Data
Model to enable the creation of analytical
programs to be run remotely in participat-
ing data partner’s secure data environment
for analysis. The FDA is also seeking to
develop the use of the Sentinel Infrastruc-
ture for questions outside of safety surveil-
lance, but of importance to the FDA in
the protection and promotion of public
health. All these elements are defined in
Table 1.
Assessment of the Sentinel System’s
current capabilities
The Sentinel Program Interim Assessment
mandated by the Prescription Drug User
Fee Act (PDUFA) V concluded that “In
the implementation and execution of Mini-
Sentinel, FDA has met or exceeded the
requirements of FDAAA and ...PDUFA.”5
The report highlights several additional
accomplishments: (1) the establishment of
the Mini-Sentinel Operations Center;
(2) creation of a common data model and
distributed-data approach; (3) successful
development of processes for turning safety
concerns into queries of the Mini-Sentinel
data; and (4) making good progress toward
building a mature data analytics system.5
Other major accomplishments included
exceeding the FDAAA 2007 milestones
1
Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, Maryland, USA; 2
Center for Biologics Evaluation and Research, Food
and Drug Administration, Silver Spring, Maryland, USA. Correspondence: R Ball (Robert.Ball@fda.hhs.gov)
doi:10.1002/cpt.320
PERSPECTIVES
CLINICAL PHARMACOLOGY & THERAPEUTICS | VOLUME 99 NUMBER 3 | MARCH 2016 265