SlideShare une entreprise Scribd logo
1  sur  51
Télécharger pour lire hors ligne
Comparison of Privacy-Protecting
Analytic and Data-sharing Methods:
a Simulation Study
Kazuki Yoshida*, Susan Gruber,
Bruce Fireman, Darren Toh
* Departments of Epidemiology and Biostatistics
Harvard T.H. Chan School of Public Health
SCS Meeting on June 21, 2017
1 / 51
Acknowledgment
This study was funded through a Patient-Centered
Outcomes Research Institute (PCORI) Award
(ME-1403-11305; PI: Darren Toh).
All statements in this document, including its findings
and conclusions, are solely those of the authors and do
not necessarily represent the views of PCORI or PCORI’s
Board of Governors or Methodology Committee.
2 / 51
Introduction
3 / 51
Background
Distributed data networks such as [1] and
[2] are becoming a platform of choice of rapid
synthesis of evidence.
Here we focus on distributed data networks with
horizontally partitioned patient data [3].
4 / 51
Data partners
Data partners are entities that routinely collect patient
care data as part of their daily operation.
For example, Sentinel has the following data partners. [4]
5 / 51
Structure of distributed data network
Patient data are stored based on the common data model
[5] at each site for the purpose of sharing.
Analyses are conducted at coordinating center by
aggregating data from individual data partners.
6 / 51
Challenges in distributed data network
Sharing of data from data partners should be minimized
to protect patient privacy as well as each data partner’s
proprietary interest.
Several privacy-protecting analytic and data-sharing
methods [6] have been proposed.
These methods have not been systematically compared.
7 / 51
Aims
To provide framework for classifying previously suggested
privacy-protecting methods.
To assess the relative performance of various
privacy-protecting methods in the setting of a simulated
distributed data network.
Specifically, to examine how different levels of data
sharing can affect analysis performance.
8 / 51
Methods
9 / 51
Classification of methods
We considered the following "axes" in classification.
Levels of data sharing
Types of confounder summary scores
Confounding adjustment methods
Matching Stratification Weighting
PS Individual data Individual data Individual data
Risk sets Risk sets Risk sets
Summary tables Summary tables -
Effect estimates Effect estimates Effect estimates
DRS Individual data Individual data -
Risk sets Risk sets -
Summary tables Summary tables -
Effect estimates Effect estimates -
10 / 51
Levels of data sharing
Individual-level data [7]
Individual-level exposure, outcome, event time, and summary
score data are shared.
Risk-set data [8]
Aggregated risk sets at event times, that is, the number of
individuals experiencing events at each time points, and the
number of individuals still being followed, are shared.
Summary-table data
Aggregated event counts and total person-time (how many
people were exposed to the drug for how long) are shared.
Site-specific effect estimate data
Entire analysis is conducted within each site, and analysis
results are shared across sites.
11 / 51
Example of individual-level data
site A time event PS Matched
1 1 251 1 0.5402941 1
1 1 277 1 0.4949680 1
1 0 366 0 0.4921805 1
1 0 261 1 0.5128428 1
1 1 52 0 0.5801256 1
1 0 366 0 0.5334244 1
1 0 223 1 0.5267744 1
1 0 28 1 0.5135982 1
1 1 100 0 0.5506620 1
1 0 311 0 0.5361661 1
1 0 260 0 0.4979951 1
1 1 254 0 0.5665530 1
The individual-level data contain the time-to-event status of each
individual. Depending on the analysis method, the summary score itself,
derived weights, or matched cohort status need to be shared for each
individual.
12 / 51
Example of risk set data
site method eval_time events_A0 events_A1 riskset_A0 riskset_A1
1 PS Match 0 0 0 457 457
1 PS Match 1 2 2 457 457
1 PS Match 2 5 0 454 455
1 PS Match 3 1 0 449 455
1 PS Match 4 0 3 447 455
1 PS Match 5 3 1 446 448
1 PS Match 6 0 2 442 444
1 PS Match 7 1 4 440 439
1 PS Match 8 0 2 439 434
1 PS Match 9 1 1 436 432
1 PS Match 10 0 1 434 429
1 PS Match 11 0 1 431 425
1 PS Match 12 0 2 431 422
1 PS Match 13 0 1 429 419
1 PS Match 14 2 1 429 418
1 PS Match 15 2 1 427 416
1 PS Match 16 1 0 423 412
Risk-set data are created at each time points at which event of
interest occurred constructed separately for the treated A = 1 and
untreated A = 0. The number of events at each time point (the
time scale itself can be converted to an ordinal variable) is shared,
but no individual-level data are required.
13 / 51
Example of summary table data
site method A events person-time
1 PS Match 0 185 75453
1 PS Match 1 196 69686
2 PS Match 0 404 144741
2 PS Match 1 410 137917
3 PS Match 0 645 224931
3 PS Match 1 652 223105
Summary-table data are essentially tables from each site created
separately for the treated A = 1 and untreated A = 0. The
numbers of events in each group are shared, but no individual-level
data are required.
14 / 51
Example of site-specific effect estimate data
site method log HR Var(log HR)
1 PS Match 0.13490472 0.010514751
2 PS Match 0.06062382 0.004917153
3 PS Match 0.01847491 0.003084460
Site-specific effect estimate data only contain the effect estimate
(in this case log hazard ratio) and corresponding variance of the
estimate. Each site contributes only two numbers. There is no
element of individual patient-level data.
15 / 51
Types of confounder summary scores
Two types of confounder summary scores are commonly used
in pharmacoepidemiology.
Propensity score (PS) [9]
Predicted probability of receiving treatment of interest
Disease risk score (DRS) [10]
Binary: Predicted probability of outcome of interest
under no treatment
Survival: Relative log hazard ratio (linear predictor from
Cox regression) under no treatment
Both scores summarize multiple covariates, simplify analyses,
and reduce data being shared.
16 / 51
Confounder summary scores
When patient characteristics determine both treatment assignment
and outcome of interest, correlation arises between treatment and
outcome even without a true effect of treatment on the outcome
(confounding) [11]. Statistical assessment of the true effect of
treatment requires accounting for these confounders.
17 / 51
Types of confounding adjustment
Matching [12]
Create pairs of individuals with similar scores, thereby creating
a cohort of similar individuals except for the exposure status.
Stratification [13]
Create sub-groups of individuals with similar scores, for
example, deciles (10 subgroups with 1/10 size of the cohort),
and comparison of treated and untreated is done within each
stratum.
Weighting with PS (IPTW [14], matching weights [15])
Balance distribution of score across treatment by re-weighting
individuals, that is, making some individuals contribute more
or less to the analysis depending on some functions their PS.
18 / 51
Simulation
19 / 51
Rationale for a simulation study
Essential component of method comparison research. [16]
Allows assessment of performance of different methods in
a controlled environment.
Patient data are artificially generated so that we know the
truth each method should find.
Data generation, analysis, and performance assessment
are repeated many times for accuracy.
20 / 51
Simulation: Base scenario
4 sites with size 100K, 20K, 20K, and 5K patients
7 covariates (1 continuous, 6 binary)
X → A association OR 0.3 - 3.0
X → Y association HR 0.6 - 1.6
Treatment prevalence 50%
No treatment effect
5% one-year observed incidence of survival outcome
21 / 51
Scenario overview
Scenario Explanation Incidence % Treated Effect
1 Base scenario 5% 50% Null
2 10% treated 5% 10% Null
3 1% outcome incidence 1% 50% Null
4 0.1% outcome incidence 0.1% 50% Null
5 0.01% outcome incidence 0.01% 50% Null
6 Varying outcome incidence 0.01%-5% 50% Null
7 Protective treatment effect 5% 50% Protective
8 8-sites 5% 50% Null
9 Varying confounder counts 5% 50% Null
10 Small sites 1% 50% Null
Null treatment effect is a conditional log hazard ratio of 0
(conditional hazard ratio of 1.0). Protective treatment effect
is a conditional log hazard ratio of -0.22 (conditional hazard
ratio of 0.8).
22 / 51
Scenario overview
Scenario # Sites (Sizes) # Confounders
1 4 (100K, 20K × 2, 5K) 7
2 4 (100K, 20K × 2, 5K) 7
3 4 (100K, 20K × 2, 5K) 7
4 4 (100K, 20K × 2, 5K) 7
5 4 (100K, 20K × 2, 5K) 7
6 4 (20K × 4) 7
7 4 (100K, 20K × 2, 5K) 7
8 8 (100K × 2, 20K × 4, 5K × 2) 7
9 4 (20K × 4) 5, 10, 20, 40
10 4 (5K × 4) 7
23 / 51
Simulated data partners
Four sites of different data sizes are simulated in the base
scenario.
Each site is generated as a separate dataset to emulate
the distributed data network setting in which data reside
behind the firewall of each data partner.
24 / 51
Data generation
Covariates X1, ..., X7 were generated first. Treatment
assignment was determined by the covariates. Then covariates
and treatment (when non-null effect) determined the outcome.
25 / 51
Data preparation
Each site prepares data to be shared across sites.
Summary score estimation
Propensity score (PS)
Disease risk score (DRS)
Adjustment for confounding
Matching (PS & DRS)
Stratification (PS & DRS)
Weighting (PS only)
Data reduction
Individual-level data
Risk-set data
Summary-table data
Site-specific effect estimate
data
26 / 51
Data sharing
Prepared less identifiable data are then shared across sites to the
coordinating center, where they are aggregated for final analysis.
27 / 51
Comparison of interest
Within each confounding adjustment method (cell),
different levels of data sharing was compared to the
individual-level data sharing.
Matching Stratification Weighting
PS Individual data Individual data Individual data
Risk sets Risk sets Risk sets
Summary tables Summary tables -
Effect estimates Effect estimates Effect estimates
DRS Individual data Individual data -
Risk sets Risk sets -
Summary tables Summary tables -
Effect estimates Effect estimates -
28 / 51
Assessment metrics
Bias metric
Average of point estimates (should be close to truth)
Precision metrics
Variability of point estimates (should be small)
Standard error estimates (should reflect true variability)
Computation metric
Proportion of failure to produce results
29 / 51
Implementation of simulation
The simulation suite was
implemented in an
open-source statistical
language except for
a small part written in
SAS, which was then
called from R.
Package ‘distributed’
June 19, 2017
Type Package
Title Examine Privacy Preserving Data Analysis Methods in Simulated
Distributed Data Network
Version 0.1.0
Date 2017-01-27
Author Kazuki Yoshida
Maintainer Kazuki Yoshida <kazukiyoshida@mail.harvard.edu>
Description Simulate a distributed data network and examine performance of various privacy preserv-
ing data analysis methods. See the package vignette for instructions.
License GPL-2
Imports magrittr, dplyr, tidyr, assertthat, doRNG, foreach, geepack,
tableone, Matching, survival, sandwich, survey, gnm, pryr
Suggests testthat, rmarkdown, MatchIt
URL
VignetteBuilder rmarkdown
RoxygenNote 6.0.1
NeedsCompilation no
R topics documented:
distributed-package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Analyze . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
AnalyzeSiteDataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
AnalyzeSiteDatasetBin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
AnalyzeSiteDatasetSurv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
AnalyzeSiteRegression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
AnalyzeSiteRegressionHelper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
AnalyzeSiteRisksets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
AnalyzeSiteSummary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
AnalyzeSiteSummaryBin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
AnalyzeSiteSummarySurv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
AnalyzeSiteTruth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
AnalyzeSiteTruthBin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
AnalyzeSiteTruthSurv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
AssignCovariates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1
30 / 51
Computing environment
Harvard University’s
Odyssey high
performance computing
cluster was used for the
simulation as simulation
had to be repeated many
times.
31 / 51
Results
32 / 51
Scenario 1 (base scenario)
4 sites with size 100k, 20k, 20k, and 5k patients
7 covariates (1 continuous, 6 binary)
50% treatment prevalence
No treatment effect
5% incidence of binary outcome
33 / 51
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
PS MW DRS Match. DRS Strat.
PS Match. PS Strat. PS IPTW
m
eta
sum
m
ary
risksets
dataset
m
eta
sum
m
ary
risksets
dataset
m
eta
sum
m
ary
risksets
dataset
−0.05
0.00
0.05
0.10
−0.05
0.00
0.05
0.10
logHR
Survival analysis log HR. Scenario 1
34 / 51
●●●
●●
●
●●
●●●
●●
●
●●●
●●●
●●
●
●●●
●●●
●●
●
●●●
●
●
●
●
●
●
●
●●
●●
●
●●
●
●●
●
●●●●● ●●●●●
●
●●●●●
●
●●●●●
●
●●●●●●
●
●
●●●
●
●●●●●●
●
●
●●●
●
●●●
●
●●●●●●●
●
●
●●●
●
●●●
●
●
●
● ● ●
PS MW DRS Match. DRS Strat.
PS Match. PS Strat. PS IPTW
m
eta
sum
m
ary
risksets
dataset
m
eta
sum
m
ary
risksets
dataset
m
eta
sum
m
ary
risksets
dataset
0.50
0.75
0.90
1.00
1.10
1.50
2.00
0.50
0.75
0.90
1.00
1.10
1.50
2.00
EstimatedSE(logHR)/simulationSE(logHR)
Survival analysis SE estimate accuracy. Scenario 1
35 / 51
Scenario 2 (infrequent treatment)
4 sites with size 100k, 20k, 20k, and 5k patients
7 covariates (1 continuous, 6 binary)
10% treatment prevalence
No treatment effect
5% one-year incidence of survival outcome
36 / 51
●
●
●●
●● ●
●
●●
●● ●
●
●●
●● ●
●
●●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
PS MW DRS Match. DRS Strat.
PS Match. PS Strat. PS IPTW
m
eta
sum
m
ary
risksets
dataset
m
eta
sum
m
ary
risksets
dataset
m
eta
sum
m
ary
risksets
dataset
−0.1
0.0
0.1
0.2
−0.1
0.0
0.1
0.2
logHR
Survival analysis log HR. Scenario 2
37 / 51
●●●
●
●●●
●
●●●
●
●●●
●
●●●●
●
●●●●
●
●●●●
●
●
●
●
●
●
●
●
●
●●● ●●● ●●● ●●●
●
●
●
●●
●●●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●●
●
●●
●
●●
●
PS MW DRS Match. DRS Strat.
PS Match. PS Strat. PS IPTW
m
eta
sum
m
ary
risksets
dataset
m
eta
sum
m
ary
risksets
dataset
m
eta
sum
m
ary
risksets
dataset
0.50
0.75
0.90
1.00
1.10
1.50
2.00
0.50
0.75
0.90
1.00
1.10
1.50
2.00
EstimatedSE(logHR)/simulationSE(logHR)
Survival analysis SE estimate accuracy. Scenario 2
38 / 51
Scenario 5 (infrequent outcome)
4 sites with size 100k, 20k, 20k, and 5k patients
7 covariates (1 continuous, 6 binary)
50% treatment prevalence
No treatment effect
0.01% one-year incidence of survival outcome
39 / 51
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●●●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
●
●
PS MW DRS Match. DRS Strat.
PS Match. PS Strat. PS IPTW
m
eta
sum
m
ary
risksets
dataset
m
eta
sum
m
ary
risksets
dataset
m
eta
sum
m
ary
risksets
dataset
−2
0
2
4
−2
0
2
4
logHR
Survival analysis log HR. Scenario 5
40 / 51
● ● ●
●
●
●
●●
●
●
●●
●
●
●
●●●
●
●
●
●
●●
●●
●
●
●
●
●●●
●
●●●
●●●●●●●
●●
●●●
● ●
●
●●●
●
●●●
●●●●●●●●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●●
●
●
●●●●●●
●
●●●
●
●●
●
●
●
●●●
●
●
●●●●●●
●
●●●
●
●●
●
●
●
●●●
●
●
●●●●●●
●
●●●
●
●●
●●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●●
●●
●●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●●
●●
●●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●●●●●●●●
●
●●●●●●●●●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
PS MW DRS Match. DRS Strat.
PS Match. PS Strat. PS IPTW
m
eta
sum
m
ary
risksets
dataset
m
eta
sum
m
ary
risksets
dataset
m
eta
sum
m
ary
risksets
dataset
0.50
0.75
0.90
1.00
1.10
1.50
2.00
0.50
0.75
0.90
1.00
1.10
1.50
2.00
EstimatedSE(logHR)/simulationSE(logHR)
Survival analysis SE estimate accuracy. Scenario 5
41 / 51
PS MW DRS Match. DRS Strat.
PS Match. PS Strat. PS IPTW
m
eta
sum
m
ary
risksets
dataset
m
eta
sum
m
ary
risksets
dataset
m
eta
sum
m
ary
risksets
dataset
0
25
50
75
100
0
25
50
75
100
data
%successfuliterations
Survival analysis successful iterations (%). Scenario 5
42 / 51
Discussion
43 / 51
Summary
We examined various privacy-protecting analytic and
data-sharing methods through a simulation study to
assess whether restricting the level of data sharing could
affect the performance of analytic methods compared to
the pooled individual-level data analysis.
Overall, levels of data sharing had little impact on bias
and precision of log HR estimates within each
confounding adjustment method in most simulated
scenarios.
44 / 51
Implications
This implies that in the setting where each data partner
provides similar site-specific results, that is, when it
makes sense to pool information across sites to form an
overall effect estimate, a meta-analysis of site-specific
effect estimates may be the most attractive option.
Pooling of site-specific analysis results has the benefit of
requiring investigator at the coordinating center to
examine the homogeneity or heterogeneity of site-specific
results, thereby, preventing inappropriate pooling when
heterogeneity is prominent.
45 / 51
Limitations
The true underlying treatment effects were kept identical
across sites. This was necessary to ensure valid
comparison of methods.
We generated survival data based on exponential model
(time-constant hazard). Departure from this may make
summary table-based events/person-time analysis and
Cox regression less comparable.
Risk-set data analysis using PS-weighted dataset was
implemented as an experimental attempt. Although the
point estimates were correct, the SE estimates were not
accurate when treatment groups are of different sizes.
46 / 51
Conclusion
Privacy-protecting methods, regardless confounding
adjustment methods employed, demonstrated similar
performance to the patient-level data analysis in the
simulation scenarios we examined.
Meta-analysis of site-level analysis results seems to be a
reasonable approach provided that data partners are
similar in patient characteristics and the outcome is not
too rare, which can render some sites non-informative.
47 / 51
Appendix
48 / 51
Bibliography I
[1] Fleurence RL, Curtis LH, Califf RM, Platt R, Selby JV, and Brown JS.
Launching PCORnet, a national patient-centered clinical research network.
Journal of the American Medical Informatics Association: JAMIA. 2014;
21(4):578–582.
[2] Platt R, Carnahan RM, Brown JS, Chrischilles E, Curtis LH, Hennessy S, Nelson
JC, Racoosin JA, Robb M, Schneeweiss S, Toh S, and Weiner MG.
The U.S. Food and Drug Administration’s Mini-Sentinel program: status and
direction.
Pharmacoepidemiology and Drug Safety. 2012;21:1–8.
[3] Bohn J, Eddings W, and Schneeweiss S.
Conducting Privacy-Preserving Multivariable Propensity Score Analysis When
Patient Covariate Information Is Stored in Separate Locations.
American Journal of Epidemiology. 2017;185(6):501–510.
[4] Data Partners | Sentinel System.
[5] Distributed Database and Common Data Model | Sentinel System.
[6] Toh S, Shetterly S, Powers JD, and Arterburn D.
Privacy-preserving analytic methods for multisite comparative effectiveness and
patient-centered outcomes research.
Medical Care. 2014;52(7):664–668.
49 / 51
Bibliography II
[7] Rassen JA, Avorn J, and Schneeweiss S.
Multivariate-adjusted pharmacoepidemiologic analyses of confidential information
pooled from multiple health care utilization databases.
Pharmacoepidemiology and Drug Safety. 2010;19(8):848–857.
[8] Fireman B, Lee J, Lewis N, Bembom O, van der Laan M, and Baxter R.
Influenza vaccination and mortality: differentiating vaccine effects from bias.
American Journal of Epidemiology. 2009;170(5):650–656.
[9] Rosenbaum PR and Rubin DB.
The central role of the propensity score in observational studies for causal
effects.
Biometrika. 1983;70(1):41–55.
[10] Hansen BB.
The prognostic analogue of the propensity score.
Biometrika. 2008;95(2):481–488.
[11] Hernan MA and Robins JM.
Causal Inference.
Chapman & Hall/CRC. 2016.
50 / 51
Bibliography III
[12] Rosenbaum PR and Rubin DB.
Constructing a Control Group Using Multivariate Matched Sampling Methods
That Incorporate the Propensity Score.
The American Statistician. 1985;39(1):33–38.
[13] Rosenbaum PR and Rubin DB.
Reducing Bias in Observational Studies Using Subclassification on the Propensity
Score.
Journal of the American Statistical Association. 1984;79(387):516.
[14] Robins JM, Hernán MA, and Brumback B.
Marginal structural models and causal inference in epidemiology.
Epidemiology (Cambridge, Mass). 2000;11(5):550–560.
[15] Li L and Greene T.
A weighting analogue to pair matching in propensity score analysis.
The International Journal of Biostatistics. 2013;9(2):215–234.
[16] Burton A, Altman DG, Royston P, and Holder RL.
The design of simulation studies in medical statistics.
Statistics in Medicine. 2006;25(24):4279–4292.
51 / 51

Contenu connexe

Tendances

JSM2013,Proceedings,paper307699_79238,DSweitzer
JSM2013,Proceedings,paper307699_79238,DSweitzerJSM2013,Proceedings,paper307699_79238,DSweitzer
JSM2013,Proceedings,paper307699_79238,DSweitzerDennis Sweitzer
 
Medical Statistics Pt 1
Medical Statistics Pt 1Medical Statistics Pt 1
Medical Statistics Pt 1Fastbleep
 
Network meta-analysis with integrated nested Laplace approximations
Network meta-analysis with integrated nested Laplace approximationsNetwork meta-analysis with integrated nested Laplace approximations
Network meta-analysis with integrated nested Laplace approximationsBurak Kürsad Günhan
 
Randomization Tests
Randomization Tests Randomization Tests
Randomization Tests Ajay Dhamija
 
Two-way mapping of EQ-5D-3L and EQ-5D-5L: A copula-based method with applicat...
Two-way mapping of EQ-5D-3L and EQ-5D-5L: A copula-based method with applicat...Two-way mapping of EQ-5D-3L and EQ-5D-5L: A copula-based method with applicat...
Two-way mapping of EQ-5D-3L and EQ-5D-5L: A copula-based method with applicat...cheweb1
 
Approximate ANCOVA
Approximate ANCOVAApproximate ANCOVA
Approximate ANCOVAStephen Senn
 
2013jsm,Proceedings,DSweitzer,26sep
2013jsm,Proceedings,DSweitzer,26sep2013jsm,Proceedings,DSweitzer,26sep
2013jsm,Proceedings,DSweitzer,26sepDennis Sweitzer
 
Funderburk et al 2011 ICHPS_Using_Propensity_Score_& MI_Analysis_10-6-11
Funderburk et al 2011 ICHPS_Using_Propensity_Score_& MI_Analysis_10-6-11Funderburk et al 2011 ICHPS_Using_Propensity_Score_& MI_Analysis_10-6-11
Funderburk et al 2011 ICHPS_Using_Propensity_Score_& MI_Analysis_10-6-11Frank Funderburk
 
Basics of medical statistics
Basics of medical statisticsBasics of medical statistics
Basics of medical statisticsRamachandra Barik
 
Dr Vivek Baliga - The Basics Of Medical Statistics
Dr Vivek Baliga - The Basics Of Medical StatisticsDr Vivek Baliga - The Basics Of Medical Statistics
Dr Vivek Baliga - The Basics Of Medical StatisticsDr Vivek Baliga
 
How to write a paper statistics
How to write a paper statisticsHow to write a paper statistics
How to write a paper statisticsAmany El-seoud
 
Medical Statistics Pt 2
Medical Statistics Pt 2Medical Statistics Pt 2
Medical Statistics Pt 2Fastbleep
 
7. Calculate samplesize for clinical trials
7. Calculate samplesize for clinical trials7. Calculate samplesize for clinical trials
7. Calculate samplesize for clinical trialsAzmi Mohd Tamil
 
The Rothamsted school meets Lord's paradox
The Rothamsted school meets Lord's paradoxThe Rothamsted school meets Lord's paradox
The Rothamsted school meets Lord's paradoxStephen Senn
 

Tendances (20)

JSM2013,Proceedings,paper307699_79238,DSweitzer
JSM2013,Proceedings,paper307699_79238,DSweitzerJSM2013,Proceedings,paper307699_79238,DSweitzer
JSM2013,Proceedings,paper307699_79238,DSweitzer
 
Medical Statistics Pt 1
Medical Statistics Pt 1Medical Statistics Pt 1
Medical Statistics Pt 1
 
Network meta-analysis with integrated nested Laplace approximations
Network meta-analysis with integrated nested Laplace approximationsNetwork meta-analysis with integrated nested Laplace approximations
Network meta-analysis with integrated nested Laplace approximations
 
Randomization Tests
Randomization Tests Randomization Tests
Randomization Tests
 
Non parametric test
Non parametric testNon parametric test
Non parametric test
 
Two-way mapping of EQ-5D-3L and EQ-5D-5L: A copula-based method with applicat...
Two-way mapping of EQ-5D-3L and EQ-5D-5L: A copula-based method with applicat...Two-way mapping of EQ-5D-3L and EQ-5D-5L: A copula-based method with applicat...
Two-way mapping of EQ-5D-3L and EQ-5D-5L: A copula-based method with applicat...
 
Approximate ANCOVA
Approximate ANCOVAApproximate ANCOVA
Approximate ANCOVA
 
2013jsm,Proceedings,DSweitzer,26sep
2013jsm,Proceedings,DSweitzer,26sep2013jsm,Proceedings,DSweitzer,26sep
2013jsm,Proceedings,DSweitzer,26sep
 
Funderburk et al 2011 ICHPS_Using_Propensity_Score_& MI_Analysis_10-6-11
Funderburk et al 2011 ICHPS_Using_Propensity_Score_& MI_Analysis_10-6-11Funderburk et al 2011 ICHPS_Using_Propensity_Score_& MI_Analysis_10-6-11
Funderburk et al 2011 ICHPS_Using_Propensity_Score_& MI_Analysis_10-6-11
 
Basics of medical statistics
Basics of medical statisticsBasics of medical statistics
Basics of medical statistics
 
Lg ph d_slides_vfinal
Lg ph d_slides_vfinalLg ph d_slides_vfinal
Lg ph d_slides_vfinal
 
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
 
Dr Vivek Baliga - The Basics Of Medical Statistics
Dr Vivek Baliga - The Basics Of Medical StatisticsDr Vivek Baliga - The Basics Of Medical Statistics
Dr Vivek Baliga - The Basics Of Medical Statistics
 
How to write a paper statistics
How to write a paper statisticsHow to write a paper statistics
How to write a paper statistics
 
Medical Statistics Pt 2
Medical Statistics Pt 2Medical Statistics Pt 2
Medical Statistics Pt 2
 
7. Calculate samplesize for clinical trials
7. Calculate samplesize for clinical trials7. Calculate samplesize for clinical trials
7. Calculate samplesize for clinical trials
 
Ekeanyanwu_Gold
Ekeanyanwu_GoldEkeanyanwu_Gold
Ekeanyanwu_Gold
 
Chapter 6 Ranksumtest
Chapter 6 RanksumtestChapter 6 Ranksumtest
Chapter 6 Ranksumtest
 
Statistics78 (2)
Statistics78 (2)Statistics78 (2)
Statistics78 (2)
 
The Rothamsted school meets Lord's paradox
The Rothamsted school meets Lord's paradoxThe Rothamsted school meets Lord's paradox
The Rothamsted school meets Lord's paradox
 

Similaire à Comparison of Privacy-Protecting Methods for Distributed Data Analysis

Healthcare Predicitive Analytics for Risk Profiling in Chronic Care: A Bayesi...
Healthcare Predicitive Analytics for Risk Profiling in Chronic Care: A Bayesi...Healthcare Predicitive Analytics for Risk Profiling in Chronic Care: A Bayesi...
Healthcare Predicitive Analytics for Risk Profiling in Chronic Care: A Bayesi...MIS Quarterly
 
Systematic review and meta analaysis course - part 2
Systematic review and meta analaysis course - part 2Systematic review and meta analaysis course - part 2
Systematic review and meta analaysis course - part 2Ahmed Negida
 
Information Security Risk Analysis Using Analytic Hierarchy Process and Fuzzy...
Information Security Risk Analysis Using Analytic Hierarchy Process and Fuzzy...Information Security Risk Analysis Using Analytic Hierarchy Process and Fuzzy...
Information Security Risk Analysis Using Analytic Hierarchy Process and Fuzzy...IJCSIS Research Publications
 
Practical Methods To Overcome Sample Size Challenges
Practical Methods To Overcome Sample Size ChallengesPractical Methods To Overcome Sample Size Challenges
Practical Methods To Overcome Sample Size ChallengesnQuery
 
Survival Analysis With Generalized Additive Models
Survival Analysis With Generalized Additive ModelsSurvival Analysis With Generalized Additive Models
Survival Analysis With Generalized Additive ModelsChristos Argyropoulos
 
Statistics in meta analysis
Statistics in meta analysisStatistics in meta analysis
Statistics in meta analysisDr Shri Sangle
 
Draft AMCP 2006 Model Quality 4-4-06
Draft AMCP 2006 Model Quality 4-4-06Draft AMCP 2006 Model Quality 4-4-06
Draft AMCP 2006 Model Quality 4-4-06Joe Gricar, MS
 
Heart disease prediction by using novel optimization algorithm_ A supervised ...
Heart disease prediction by using novel optimization algorithm_ A supervised ...Heart disease prediction by using novel optimization algorithm_ A supervised ...
Heart disease prediction by using novel optimization algorithm_ A supervised ...BASMAJUMAASALEHALMOH
 
Prediction of Dengue, Diabetes and Swine Flu using Random Forest Classificati...
Prediction of Dengue, Diabetes and Swine Flu using Random Forest Classificati...Prediction of Dengue, Diabetes and Swine Flu using Random Forest Classificati...
Prediction of Dengue, Diabetes and Swine Flu using Random Forest Classificati...IRJET Journal
 
Metanalysis Lecture
Metanalysis LectureMetanalysis Lecture
Metanalysis Lecturedrmomusa
 
Quantitative methods of Signal detection on spontaneous reporting systems - S...
Quantitative methods of Signal detection on spontaneous reporting systems - S...Quantitative methods of Signal detection on spontaneous reporting systems - S...
Quantitative methods of Signal detection on spontaneous reporting systems - S...Francois MAIGNEN
 
Multivariate sample similarity measure for feature selection with a resemblan...
Multivariate sample similarity measure for feature selection with a resemblan...Multivariate sample similarity measure for feature selection with a resemblan...
Multivariate sample similarity measure for feature selection with a resemblan...IJECEIAES
 
Basic survival analysis
Basic survival analysisBasic survival analysis
Basic survival analysisMike LaValley
 
Chronic Kidney Disease Prediction Using Machine Learning
Chronic Kidney Disease Prediction Using Machine LearningChronic Kidney Disease Prediction Using Machine Learning
Chronic Kidney Disease Prediction Using Machine LearningIJCSIS Research Publications
 

Similaire à Comparison of Privacy-Protecting Methods for Distributed Data Analysis (20)

Healthcare Predicitive Analytics for Risk Profiling in Chronic Care: A Bayesi...
Healthcare Predicitive Analytics for Risk Profiling in Chronic Care: A Bayesi...Healthcare Predicitive Analytics for Risk Profiling in Chronic Care: A Bayesi...
Healthcare Predicitive Analytics for Risk Profiling in Chronic Care: A Bayesi...
 
Systematic review and meta analaysis course - part 2
Systematic review and meta analaysis course - part 2Systematic review and meta analaysis course - part 2
Systematic review and meta analaysis course - part 2
 
Information Security Risk Analysis Using Analytic Hierarchy Process and Fuzzy...
Information Security Risk Analysis Using Analytic Hierarchy Process and Fuzzy...Information Security Risk Analysis Using Analytic Hierarchy Process and Fuzzy...
Information Security Risk Analysis Using Analytic Hierarchy Process and Fuzzy...
 
Practical Methods To Overcome Sample Size Challenges
Practical Methods To Overcome Sample Size ChallengesPractical Methods To Overcome Sample Size Challenges
Practical Methods To Overcome Sample Size Challenges
 
Survival Analysis With Generalized Additive Models
Survival Analysis With Generalized Additive ModelsSurvival Analysis With Generalized Additive Models
Survival Analysis With Generalized Additive Models
 
Maas, Andrew
Maas, AndrewMaas, Andrew
Maas, Andrew
 
Descriptive Analytics: Data Reduction
 Descriptive Analytics: Data Reduction Descriptive Analytics: Data Reduction
Descriptive Analytics: Data Reduction
 
Statistics in meta analysis
Statistics in meta analysisStatistics in meta analysis
Statistics in meta analysis
 
Panel slides
Panel slidesPanel slides
Panel slides
 
Draft AMCP 2006 Model Quality 4-4-06
Draft AMCP 2006 Model Quality 4-4-06Draft AMCP 2006 Model Quality 4-4-06
Draft AMCP 2006 Model Quality 4-4-06
 
Heart disease prediction by using novel optimization algorithm_ A supervised ...
Heart disease prediction by using novel optimization algorithm_ A supervised ...Heart disease prediction by using novel optimization algorithm_ A supervised ...
Heart disease prediction by using novel optimization algorithm_ A supervised ...
 
E04733639
E04733639E04733639
E04733639
 
Prediction of Dengue, Diabetes and Swine Flu using Random Forest Classificati...
Prediction of Dengue, Diabetes and Swine Flu using Random Forest Classificati...Prediction of Dengue, Diabetes and Swine Flu using Random Forest Classificati...
Prediction of Dengue, Diabetes and Swine Flu using Random Forest Classificati...
 
Metaanalysis copy
Metaanalysis    copyMetaanalysis    copy
Metaanalysis copy
 
Metanalysis Lecture
Metanalysis LectureMetanalysis Lecture
Metanalysis Lecture
 
Quantitative methods of Signal detection on spontaneous reporting systems - S...
Quantitative methods of Signal detection on spontaneous reporting systems - S...Quantitative methods of Signal detection on spontaneous reporting systems - S...
Quantitative methods of Signal detection on spontaneous reporting systems - S...
 
Multivariate sample similarity measure for feature selection with a resemblan...
Multivariate sample similarity measure for feature selection with a resemblan...Multivariate sample similarity measure for feature selection with a resemblan...
Multivariate sample similarity measure for feature selection with a resemblan...
 
The odds ratio
The odds ratioThe odds ratio
The odds ratio
 
Basic survival analysis
Basic survival analysisBasic survival analysis
Basic survival analysis
 
Chronic Kidney Disease Prediction Using Machine Learning
Chronic Kidney Disease Prediction Using Machine LearningChronic Kidney Disease Prediction Using Machine Learning
Chronic Kidney Disease Prediction Using Machine Learning
 

Plus de Kazuki Yoshida

Graphical explanation of causal mediation analysis
Graphical explanation of causal mediation analysisGraphical explanation of causal mediation analysis
Graphical explanation of causal mediation analysisKazuki Yoshida
 
Pharmacoepidemiology Lecture: Designing Observational CER to Emulate an RCT
Pharmacoepidemiology Lecture: Designing Observational CER to Emulate an RCTPharmacoepidemiology Lecture: Designing Observational CER to Emulate an RCT
Pharmacoepidemiology Lecture: Designing Observational CER to Emulate an RCTKazuki Yoshida
 
What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?Kazuki Yoshida
 
Visual Explanation of Ridge Regression and LASSO
Visual Explanation of Ridge Regression and LASSOVisual Explanation of Ridge Regression and LASSO
Visual Explanation of Ridge Regression and LASSOKazuki Yoshida
 
Search and Replacement Techniques in Emacs: avy, swiper, multiple-cursor, ag,...
Search and Replacement Techniques in Emacs: avy, swiper, multiple-cursor, ag,...Search and Replacement Techniques in Emacs: avy, swiper, multiple-cursor, ag,...
Search and Replacement Techniques in Emacs: avy, swiper, multiple-cursor, ag,...Kazuki Yoshida
 
Spacemacs: emacs user's first impression
Spacemacs: emacs user's first impressionSpacemacs: emacs user's first impression
Spacemacs: emacs user's first impressionKazuki Yoshida
 
Multiple Imputation: Joint and Conditional Modeling of Missing Data
Multiple Imputation: Joint and Conditional Modeling of Missing DataMultiple Imputation: Joint and Conditional Modeling of Missing Data
Multiple Imputation: Joint and Conditional Modeling of Missing DataKazuki Yoshida
 
20130222 Data structures and manipulation in R
20130222 Data structures and manipulation in R20130222 Data structures and manipulation in R
20130222 Data structures and manipulation in RKazuki Yoshida
 
20130215 Reading data into R
20130215 Reading data into R20130215 Reading data into R
20130215 Reading data into RKazuki Yoshida
 
Linear regression with R 2
Linear regression with R 2Linear regression with R 2
Linear regression with R 2Kazuki Yoshida
 
Linear regression with R 1
Linear regression with R 1Linear regression with R 1
Linear regression with R 1Kazuki Yoshida
 
(Very) Basic graphing with R
(Very) Basic graphing with R(Very) Basic graphing with R
(Very) Basic graphing with RKazuki Yoshida
 
Introduction to Deducer
Introduction to DeducerIntroduction to Deducer
Introduction to DeducerKazuki Yoshida
 
Groupwise comparison of continuous data
Groupwise comparison of continuous dataGroupwise comparison of continuous data
Groupwise comparison of continuous dataKazuki Yoshida
 
Categorical data with R
Categorical data with RCategorical data with R
Categorical data with RKazuki Yoshida
 
Install and Configure R and RStudio
Install and Configure R and RStudioInstall and Configure R and RStudio
Install and Configure R and RStudioKazuki Yoshida
 
Reading Data into R REVISED
Reading Data into R REVISEDReading Data into R REVISED
Reading Data into R REVISEDKazuki Yoshida
 
Descriptive Statistics with R
Descriptive Statistics with RDescriptive Statistics with R
Descriptive Statistics with RKazuki Yoshida
 

Plus de Kazuki Yoshida (20)

Graphical explanation of causal mediation analysis
Graphical explanation of causal mediation analysisGraphical explanation of causal mediation analysis
Graphical explanation of causal mediation analysis
 
Pharmacoepidemiology Lecture: Designing Observational CER to Emulate an RCT
Pharmacoepidemiology Lecture: Designing Observational CER to Emulate an RCTPharmacoepidemiology Lecture: Designing Observational CER to Emulate an RCT
Pharmacoepidemiology Lecture: Designing Observational CER to Emulate an RCT
 
What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?
 
Emacs Key Bindings
Emacs Key BindingsEmacs Key Bindings
Emacs Key Bindings
 
Visual Explanation of Ridge Regression and LASSO
Visual Explanation of Ridge Regression and LASSOVisual Explanation of Ridge Regression and LASSO
Visual Explanation of Ridge Regression and LASSO
 
Search and Replacement Techniques in Emacs: avy, swiper, multiple-cursor, ag,...
Search and Replacement Techniques in Emacs: avy, swiper, multiple-cursor, ag,...Search and Replacement Techniques in Emacs: avy, swiper, multiple-cursor, ag,...
Search and Replacement Techniques in Emacs: avy, swiper, multiple-cursor, ag,...
 
Spacemacs: emacs user's first impression
Spacemacs: emacs user's first impressionSpacemacs: emacs user's first impression
Spacemacs: emacs user's first impression
 
Multiple Imputation: Joint and Conditional Modeling of Missing Data
Multiple Imputation: Joint and Conditional Modeling of Missing DataMultiple Imputation: Joint and Conditional Modeling of Missing Data
Multiple Imputation: Joint and Conditional Modeling of Missing Data
 
20130222 Data structures and manipulation in R
20130222 Data structures and manipulation in R20130222 Data structures and manipulation in R
20130222 Data structures and manipulation in R
 
20130215 Reading data into R
20130215 Reading data into R20130215 Reading data into R
20130215 Reading data into R
 
Linear regression with R 2
Linear regression with R 2Linear regression with R 2
Linear regression with R 2
 
Linear regression with R 1
Linear regression with R 1Linear regression with R 1
Linear regression with R 1
 
(Very) Basic graphing with R
(Very) Basic graphing with R(Very) Basic graphing with R
(Very) Basic graphing with R
 
Introduction to Deducer
Introduction to DeducerIntroduction to Deducer
Introduction to Deducer
 
Groupwise comparison of continuous data
Groupwise comparison of continuous dataGroupwise comparison of continuous data
Groupwise comparison of continuous data
 
Categorical data with R
Categorical data with RCategorical data with R
Categorical data with R
 
Install and Configure R and RStudio
Install and Configure R and RStudioInstall and Configure R and RStudio
Install and Configure R and RStudio
 
Reading Data into R REVISED
Reading Data into R REVISEDReading Data into R REVISED
Reading Data into R REVISED
 
Descriptive Statistics with R
Descriptive Statistics with RDescriptive Statistics with R
Descriptive Statistics with R
 
Reading Data into R
Reading Data into RReading Data into R
Reading Data into R
 

Dernier

From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 

Dernier (20)

From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 

Comparison of Privacy-Protecting Methods for Distributed Data Analysis

  • 1. Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulation Study Kazuki Yoshida*, Susan Gruber, Bruce Fireman, Darren Toh * Departments of Epidemiology and Biostatistics Harvard T.H. Chan School of Public Health SCS Meeting on June 21, 2017 1 / 51
  • 2. Acknowledgment This study was funded through a Patient-Centered Outcomes Research Institute (PCORI) Award (ME-1403-11305; PI: Darren Toh). All statements in this document, including its findings and conclusions, are solely those of the authors and do not necessarily represent the views of PCORI or PCORI’s Board of Governors or Methodology Committee. 2 / 51
  • 4. Background Distributed data networks such as [1] and [2] are becoming a platform of choice of rapid synthesis of evidence. Here we focus on distributed data networks with horizontally partitioned patient data [3]. 4 / 51
  • 5. Data partners Data partners are entities that routinely collect patient care data as part of their daily operation. For example, Sentinel has the following data partners. [4] 5 / 51
  • 6. Structure of distributed data network Patient data are stored based on the common data model [5] at each site for the purpose of sharing. Analyses are conducted at coordinating center by aggregating data from individual data partners. 6 / 51
  • 7. Challenges in distributed data network Sharing of data from data partners should be minimized to protect patient privacy as well as each data partner’s proprietary interest. Several privacy-protecting analytic and data-sharing methods [6] have been proposed. These methods have not been systematically compared. 7 / 51
  • 8. Aims To provide framework for classifying previously suggested privacy-protecting methods. To assess the relative performance of various privacy-protecting methods in the setting of a simulated distributed data network. Specifically, to examine how different levels of data sharing can affect analysis performance. 8 / 51
  • 10. Classification of methods We considered the following "axes" in classification. Levels of data sharing Types of confounder summary scores Confounding adjustment methods Matching Stratification Weighting PS Individual data Individual data Individual data Risk sets Risk sets Risk sets Summary tables Summary tables - Effect estimates Effect estimates Effect estimates DRS Individual data Individual data - Risk sets Risk sets - Summary tables Summary tables - Effect estimates Effect estimates - 10 / 51
  • 11. Levels of data sharing Individual-level data [7] Individual-level exposure, outcome, event time, and summary score data are shared. Risk-set data [8] Aggregated risk sets at event times, that is, the number of individuals experiencing events at each time points, and the number of individuals still being followed, are shared. Summary-table data Aggregated event counts and total person-time (how many people were exposed to the drug for how long) are shared. Site-specific effect estimate data Entire analysis is conducted within each site, and analysis results are shared across sites. 11 / 51
  • 12. Example of individual-level data site A time event PS Matched 1 1 251 1 0.5402941 1 1 1 277 1 0.4949680 1 1 0 366 0 0.4921805 1 1 0 261 1 0.5128428 1 1 1 52 0 0.5801256 1 1 0 366 0 0.5334244 1 1 0 223 1 0.5267744 1 1 0 28 1 0.5135982 1 1 1 100 0 0.5506620 1 1 0 311 0 0.5361661 1 1 0 260 0 0.4979951 1 1 1 254 0 0.5665530 1 The individual-level data contain the time-to-event status of each individual. Depending on the analysis method, the summary score itself, derived weights, or matched cohort status need to be shared for each individual. 12 / 51
  • 13. Example of risk set data site method eval_time events_A0 events_A1 riskset_A0 riskset_A1 1 PS Match 0 0 0 457 457 1 PS Match 1 2 2 457 457 1 PS Match 2 5 0 454 455 1 PS Match 3 1 0 449 455 1 PS Match 4 0 3 447 455 1 PS Match 5 3 1 446 448 1 PS Match 6 0 2 442 444 1 PS Match 7 1 4 440 439 1 PS Match 8 0 2 439 434 1 PS Match 9 1 1 436 432 1 PS Match 10 0 1 434 429 1 PS Match 11 0 1 431 425 1 PS Match 12 0 2 431 422 1 PS Match 13 0 1 429 419 1 PS Match 14 2 1 429 418 1 PS Match 15 2 1 427 416 1 PS Match 16 1 0 423 412 Risk-set data are created at each time points at which event of interest occurred constructed separately for the treated A = 1 and untreated A = 0. The number of events at each time point (the time scale itself can be converted to an ordinal variable) is shared, but no individual-level data are required. 13 / 51
  • 14. Example of summary table data site method A events person-time 1 PS Match 0 185 75453 1 PS Match 1 196 69686 2 PS Match 0 404 144741 2 PS Match 1 410 137917 3 PS Match 0 645 224931 3 PS Match 1 652 223105 Summary-table data are essentially tables from each site created separately for the treated A = 1 and untreated A = 0. The numbers of events in each group are shared, but no individual-level data are required. 14 / 51
  • 15. Example of site-specific effect estimate data site method log HR Var(log HR) 1 PS Match 0.13490472 0.010514751 2 PS Match 0.06062382 0.004917153 3 PS Match 0.01847491 0.003084460 Site-specific effect estimate data only contain the effect estimate (in this case log hazard ratio) and corresponding variance of the estimate. Each site contributes only two numbers. There is no element of individual patient-level data. 15 / 51
  • 16. Types of confounder summary scores Two types of confounder summary scores are commonly used in pharmacoepidemiology. Propensity score (PS) [9] Predicted probability of receiving treatment of interest Disease risk score (DRS) [10] Binary: Predicted probability of outcome of interest under no treatment Survival: Relative log hazard ratio (linear predictor from Cox regression) under no treatment Both scores summarize multiple covariates, simplify analyses, and reduce data being shared. 16 / 51
  • 17. Confounder summary scores When patient characteristics determine both treatment assignment and outcome of interest, correlation arises between treatment and outcome even without a true effect of treatment on the outcome (confounding) [11]. Statistical assessment of the true effect of treatment requires accounting for these confounders. 17 / 51
  • 18. Types of confounding adjustment Matching [12] Create pairs of individuals with similar scores, thereby creating a cohort of similar individuals except for the exposure status. Stratification [13] Create sub-groups of individuals with similar scores, for example, deciles (10 subgroups with 1/10 size of the cohort), and comparison of treated and untreated is done within each stratum. Weighting with PS (IPTW [14], matching weights [15]) Balance distribution of score across treatment by re-weighting individuals, that is, making some individuals contribute more or less to the analysis depending on some functions their PS. 18 / 51
  • 20. Rationale for a simulation study Essential component of method comparison research. [16] Allows assessment of performance of different methods in a controlled environment. Patient data are artificially generated so that we know the truth each method should find. Data generation, analysis, and performance assessment are repeated many times for accuracy. 20 / 51
  • 21. Simulation: Base scenario 4 sites with size 100K, 20K, 20K, and 5K patients 7 covariates (1 continuous, 6 binary) X → A association OR 0.3 - 3.0 X → Y association HR 0.6 - 1.6 Treatment prevalence 50% No treatment effect 5% one-year observed incidence of survival outcome 21 / 51
  • 22. Scenario overview Scenario Explanation Incidence % Treated Effect 1 Base scenario 5% 50% Null 2 10% treated 5% 10% Null 3 1% outcome incidence 1% 50% Null 4 0.1% outcome incidence 0.1% 50% Null 5 0.01% outcome incidence 0.01% 50% Null 6 Varying outcome incidence 0.01%-5% 50% Null 7 Protective treatment effect 5% 50% Protective 8 8-sites 5% 50% Null 9 Varying confounder counts 5% 50% Null 10 Small sites 1% 50% Null Null treatment effect is a conditional log hazard ratio of 0 (conditional hazard ratio of 1.0). Protective treatment effect is a conditional log hazard ratio of -0.22 (conditional hazard ratio of 0.8). 22 / 51
  • 23. Scenario overview Scenario # Sites (Sizes) # Confounders 1 4 (100K, 20K × 2, 5K) 7 2 4 (100K, 20K × 2, 5K) 7 3 4 (100K, 20K × 2, 5K) 7 4 4 (100K, 20K × 2, 5K) 7 5 4 (100K, 20K × 2, 5K) 7 6 4 (20K × 4) 7 7 4 (100K, 20K × 2, 5K) 7 8 8 (100K × 2, 20K × 4, 5K × 2) 7 9 4 (20K × 4) 5, 10, 20, 40 10 4 (5K × 4) 7 23 / 51
  • 24. Simulated data partners Four sites of different data sizes are simulated in the base scenario. Each site is generated as a separate dataset to emulate the distributed data network setting in which data reside behind the firewall of each data partner. 24 / 51
  • 25. Data generation Covariates X1, ..., X7 were generated first. Treatment assignment was determined by the covariates. Then covariates and treatment (when non-null effect) determined the outcome. 25 / 51
  • 26. Data preparation Each site prepares data to be shared across sites. Summary score estimation Propensity score (PS) Disease risk score (DRS) Adjustment for confounding Matching (PS & DRS) Stratification (PS & DRS) Weighting (PS only) Data reduction Individual-level data Risk-set data Summary-table data Site-specific effect estimate data 26 / 51
  • 27. Data sharing Prepared less identifiable data are then shared across sites to the coordinating center, where they are aggregated for final analysis. 27 / 51
  • 28. Comparison of interest Within each confounding adjustment method (cell), different levels of data sharing was compared to the individual-level data sharing. Matching Stratification Weighting PS Individual data Individual data Individual data Risk sets Risk sets Risk sets Summary tables Summary tables - Effect estimates Effect estimates Effect estimates DRS Individual data Individual data - Risk sets Risk sets - Summary tables Summary tables - Effect estimates Effect estimates - 28 / 51
  • 29. Assessment metrics Bias metric Average of point estimates (should be close to truth) Precision metrics Variability of point estimates (should be small) Standard error estimates (should reflect true variability) Computation metric Proportion of failure to produce results 29 / 51
  • 30. Implementation of simulation The simulation suite was implemented in an open-source statistical language except for a small part written in SAS, which was then called from R. Package ‘distributed’ June 19, 2017 Type Package Title Examine Privacy Preserving Data Analysis Methods in Simulated Distributed Data Network Version 0.1.0 Date 2017-01-27 Author Kazuki Yoshida Maintainer Kazuki Yoshida <kazukiyoshida@mail.harvard.edu> Description Simulate a distributed data network and examine performance of various privacy preserv- ing data analysis methods. See the package vignette for instructions. License GPL-2 Imports magrittr, dplyr, tidyr, assertthat, doRNG, foreach, geepack, tableone, Matching, survival, sandwich, survey, gnm, pryr Suggests testthat, rmarkdown, MatchIt URL VignetteBuilder rmarkdown RoxygenNote 6.0.1 NeedsCompilation no R topics documented: distributed-package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Analyze . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 AnalyzeSiteDataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 AnalyzeSiteDatasetBin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 AnalyzeSiteDatasetSurv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 AnalyzeSiteRegression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 AnalyzeSiteRegressionHelper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 AnalyzeSiteRisksets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 AnalyzeSiteSummary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 AnalyzeSiteSummaryBin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 AnalyzeSiteSummarySurv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 AnalyzeSiteTruth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 AnalyzeSiteTruthBin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 AnalyzeSiteTruthSurv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 AssignCovariates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1 30 / 51
  • 31. Computing environment Harvard University’s Odyssey high performance computing cluster was used for the simulation as simulation had to be repeated many times. 31 / 51
  • 33. Scenario 1 (base scenario) 4 sites with size 100k, 20k, 20k, and 5k patients 7 covariates (1 continuous, 6 binary) 50% treatment prevalence No treatment effect 5% incidence of binary outcome 33 / 51
  • 34. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ●● ● PS MW DRS Match. DRS Strat. PS Match. PS Strat. PS IPTW m eta sum m ary risksets dataset m eta sum m ary risksets dataset m eta sum m ary risksets dataset −0.05 0.00 0.05 0.10 −0.05 0.00 0.05 0.10 logHR Survival analysis log HR. Scenario 1 34 / 51
  • 35. ●●● ●● ● ●● ●●● ●● ● ●●● ●●● ●● ● ●●● ●●● ●● ● ●●● ● ● ● ● ● ● ● ●● ●● ● ●● ● ●● ● ●●●●● ●●●●● ● ●●●●● ● ●●●●● ● ●●●●●● ● ● ●●● ● ●●●●●● ● ● ●●● ● ●●● ● ●●●●●●● ● ● ●●● ● ●●● ● ● ● ● ● ● PS MW DRS Match. DRS Strat. PS Match. PS Strat. PS IPTW m eta sum m ary risksets dataset m eta sum m ary risksets dataset m eta sum m ary risksets dataset 0.50 0.75 0.90 1.00 1.10 1.50 2.00 0.50 0.75 0.90 1.00 1.10 1.50 2.00 EstimatedSE(logHR)/simulationSE(logHR) Survival analysis SE estimate accuracy. Scenario 1 35 / 51
  • 36. Scenario 2 (infrequent treatment) 4 sites with size 100k, 20k, 20k, and 5k patients 7 covariates (1 continuous, 6 binary) 10% treatment prevalence No treatment effect 5% one-year incidence of survival outcome 36 / 51
  • 37. ● ● ●● ●● ● ● ●● ●● ● ● ●● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● PS MW DRS Match. DRS Strat. PS Match. PS Strat. PS IPTW m eta sum m ary risksets dataset m eta sum m ary risksets dataset m eta sum m ary risksets dataset −0.1 0.0 0.1 0.2 −0.1 0.0 0.1 0.2 logHR Survival analysis log HR. Scenario 2 37 / 51
  • 38. ●●● ● ●●● ● ●●● ● ●●● ● ●●●● ● ●●●● ● ●●●● ● ● ● ● ● ● ● ● ● ●●● ●●● ●●● ●●● ● ● ● ●● ●●● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ●● ● ●● ● ●● ● PS MW DRS Match. DRS Strat. PS Match. PS Strat. PS IPTW m eta sum m ary risksets dataset m eta sum m ary risksets dataset m eta sum m ary risksets dataset 0.50 0.75 0.90 1.00 1.10 1.50 2.00 0.50 0.75 0.90 1.00 1.10 1.50 2.00 EstimatedSE(logHR)/simulationSE(logHR) Survival analysis SE estimate accuracy. Scenario 2 38 / 51
  • 39. Scenario 5 (infrequent outcome) 4 sites with size 100k, 20k, 20k, and 5k patients 7 covariates (1 continuous, 6 binary) 50% treatment prevalence No treatment effect 0.01% one-year incidence of survival outcome 39 / 51
  • 40. ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● PS MW DRS Match. DRS Strat. PS Match. PS Strat. PS IPTW m eta sum m ary risksets dataset m eta sum m ary risksets dataset m eta sum m ary risksets dataset −2 0 2 4 −2 0 2 4 logHR Survival analysis log HR. Scenario 5 40 / 51
  • 41. ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●●● ● ● ● ● ●● ●● ● ● ● ● ●●● ● ●●● ●●●●●●● ●● ●●● ● ● ● ●●● ● ●●● ●●●●●●●●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●●● ● ● ●●●●●● ● ●●● ● ●● ● ● ● ●●● ● ● ●●●●●● ● ●●● ● ●● ● ● ● ●●● ● ● ●●●●●● ● ●●● ● ●● ●● ● ●● ●● ● ● ● ● ● ●● ● ● ● ●● ●● ●● ● ●● ●● ● ● ● ● ● ●● ● ● ● ●● ●● ●● ● ●● ●● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●●●●●●●●● ● ●●●●●●●●● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● PS MW DRS Match. DRS Strat. PS Match. PS Strat. PS IPTW m eta sum m ary risksets dataset m eta sum m ary risksets dataset m eta sum m ary risksets dataset 0.50 0.75 0.90 1.00 1.10 1.50 2.00 0.50 0.75 0.90 1.00 1.10 1.50 2.00 EstimatedSE(logHR)/simulationSE(logHR) Survival analysis SE estimate accuracy. Scenario 5 41 / 51
  • 42. PS MW DRS Match. DRS Strat. PS Match. PS Strat. PS IPTW m eta sum m ary risksets dataset m eta sum m ary risksets dataset m eta sum m ary risksets dataset 0 25 50 75 100 0 25 50 75 100 data %successfuliterations Survival analysis successful iterations (%). Scenario 5 42 / 51
  • 44. Summary We examined various privacy-protecting analytic and data-sharing methods through a simulation study to assess whether restricting the level of data sharing could affect the performance of analytic methods compared to the pooled individual-level data analysis. Overall, levels of data sharing had little impact on bias and precision of log HR estimates within each confounding adjustment method in most simulated scenarios. 44 / 51
  • 45. Implications This implies that in the setting where each data partner provides similar site-specific results, that is, when it makes sense to pool information across sites to form an overall effect estimate, a meta-analysis of site-specific effect estimates may be the most attractive option. Pooling of site-specific analysis results has the benefit of requiring investigator at the coordinating center to examine the homogeneity or heterogeneity of site-specific results, thereby, preventing inappropriate pooling when heterogeneity is prominent. 45 / 51
  • 46. Limitations The true underlying treatment effects were kept identical across sites. This was necessary to ensure valid comparison of methods. We generated survival data based on exponential model (time-constant hazard). Departure from this may make summary table-based events/person-time analysis and Cox regression less comparable. Risk-set data analysis using PS-weighted dataset was implemented as an experimental attempt. Although the point estimates were correct, the SE estimates were not accurate when treatment groups are of different sizes. 46 / 51
  • 47. Conclusion Privacy-protecting methods, regardless confounding adjustment methods employed, demonstrated similar performance to the patient-level data analysis in the simulation scenarios we examined. Meta-analysis of site-level analysis results seems to be a reasonable approach provided that data partners are similar in patient characteristics and the outcome is not too rare, which can render some sites non-informative. 47 / 51
  • 49. Bibliography I [1] Fleurence RL, Curtis LH, Califf RM, Platt R, Selby JV, and Brown JS. Launching PCORnet, a national patient-centered clinical research network. Journal of the American Medical Informatics Association: JAMIA. 2014; 21(4):578–582. [2] Platt R, Carnahan RM, Brown JS, Chrischilles E, Curtis LH, Hennessy S, Nelson JC, Racoosin JA, Robb M, Schneeweiss S, Toh S, and Weiner MG. The U.S. Food and Drug Administration’s Mini-Sentinel program: status and direction. Pharmacoepidemiology and Drug Safety. 2012;21:1–8. [3] Bohn J, Eddings W, and Schneeweiss S. Conducting Privacy-Preserving Multivariable Propensity Score Analysis When Patient Covariate Information Is Stored in Separate Locations. American Journal of Epidemiology. 2017;185(6):501–510. [4] Data Partners | Sentinel System. [5] Distributed Database and Common Data Model | Sentinel System. [6] Toh S, Shetterly S, Powers JD, and Arterburn D. Privacy-preserving analytic methods for multisite comparative effectiveness and patient-centered outcomes research. Medical Care. 2014;52(7):664–668. 49 / 51
  • 50. Bibliography II [7] Rassen JA, Avorn J, and Schneeweiss S. Multivariate-adjusted pharmacoepidemiologic analyses of confidential information pooled from multiple health care utilization databases. Pharmacoepidemiology and Drug Safety. 2010;19(8):848–857. [8] Fireman B, Lee J, Lewis N, Bembom O, van der Laan M, and Baxter R. Influenza vaccination and mortality: differentiating vaccine effects from bias. American Journal of Epidemiology. 2009;170(5):650–656. [9] Rosenbaum PR and Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70(1):41–55. [10] Hansen BB. The prognostic analogue of the propensity score. Biometrika. 2008;95(2):481–488. [11] Hernan MA and Robins JM. Causal Inference. Chapman & Hall/CRC. 2016. 50 / 51
  • 51. Bibliography III [12] Rosenbaum PR and Rubin DB. Constructing a Control Group Using Multivariate Matched Sampling Methods That Incorporate the Propensity Score. The American Statistician. 1985;39(1):33–38. [13] Rosenbaum PR and Rubin DB. Reducing Bias in Observational Studies Using Subclassification on the Propensity Score. Journal of the American Statistical Association. 1984;79(387):516. [14] Robins JM, Hernán MA, and Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology (Cambridge, Mass). 2000;11(5):550–560. [15] Li L and Greene T. A weighting analogue to pair matching in propensity score analysis. The International Journal of Biostatistics. 2013;9(2):215–234. [16] Burton A, Altman DG, Royston P, and Holder RL. The design of simulation studies in medical statistics. Statistics in Medicine. 2006;25(24):4279–4292. 51 / 51