BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Computation

Distributed data analysis in the face of
privacy concerns
Kassaye Yitbarek Yigzaw
Postdoctoral Fellow
Norwegian Centre for E-health Research

Outline
• Motivation
• Secure multi-party computation
• Challenges
• Proposed solutions
• Discussion
05.03.2018
Distributed data analysis in the face of privacy
concerns
2

Health data
concerns
3
Distributed
health data
EHR data
Registry data
Insurance
claims
Health data collected
by data custodians
05.03.2018

Opportunities
• Huge potential for a variety of purposes, such as
research and public health
• Increases the rate of new scientific discoveries
• Answers research questions that may not be
possible otherwise
concerns
405.03.2018

Distributed data
• Generalizability and reproducibility of analyses
results
• Often require data from multiple data sources
 Large sample size that provides sufficient
statistical power
 Heterogeneity
• An individual’s data are partitioned between
multiple data sources
05.03.2018
concerns
5

Horizontally partitioned data
• Data sources collect the same attributes for
disjoint set of individuals
05.03.2018
concerns
6
Data source 1 Data source 2 Data source N…

Challenges
• Data reuse raises privacy concerns
• Limit data sharing and secondary use
concerns
705.03.2018
Mental and physical harm
to patients
Evaluate their performance
Damage doctor–patient
relationship
Reveal confidential business
information

concerns
8
Privacy
Enabling health
data reuse
Research
Public health
05.03.2018
Objective

Common approaches
D1
D3
Data
D2
Third party
concerns
9
Data
Data
05.03.2018
De-identified data
Patient identifying
data
Re-identification risk
Data utility
Bias
Time
Distributed dataCost

Secure multi-party computation (SMC)
D1
D3
Data
D2
Third party
Secure multi-party
computation emulates
the third party
concerns
10
Data
Data
05.03.2018
Computing on distributed
data without revealing
sensitive information apart
from results

Challenges
• A generic solution for computing any function exist
• Efficiency and scalability are the main challenges
• Efficiency: communication and computation
overhead
• Scalability: number of data custodians and records
concerns
1105.03.2018

Adversarial model
• In semi-honest adversarial model participating
parties:
 Follow the protocol specification
 May try to learn private information from the
messages exchanged in the protocol execution
• Enables to develop protocols that are more
efficient and scalable
concerns
1205.03.2018

Dataset creation
concerns
13
User
Query Query
Query
Query

Virtualdataset
D1
Data
D2
Data
D3
Data
Coordinator
05.03.2018

Secure statistical computation
concerns
14
User
Query
Virtualdataset
D1
Data
D2
Data
D3
Data
Coordinator
05.03.2018
Secure
protocols
Aggregate
result

Secure summation protocol
concerns
15
D2
D3
Data
DN
D1
Data Data
Data
05.03.2018

Secure sum protocol (2)
• Proposed an extension to the secure sum protocol
• The protocol makes collusion difficult:
 Forming a ring topology at runtime and
 Revealing only partial information about the
ring topology to each party
concerns
1605.03.2018

Coordinator
k-Secure summation protocol
05.03.2018
concerns
17
Privacy peer
Secure summation
protocol

Other statistical problems
• A large number of statistical problems can be
decomposed into sub-functions of summation
forms
• Descriptive statistics (e.g., average, standard
deviation, covariance , Pearson’s r, minimum,
maximum, and median)
• Linear regression
• Clustering (k-means)
concerns
1805.03.2018

Secure computation of average
05.03.2018
concerns
19

Secure computation of average
concerns
20
id age height weight
1
2
D1
Data
1: Local computation
2: k-secure sum protocol
05.03.2018

Secure mth-ranked element protocol
concerns
21
D2
D3
Data
DN
D1
Data Data
Data
05.03.2018

Secure mth-ranked element protocol
• Computing minimum (𝑚 = 1) and maximum (𝑚 =
𝑛)
• Computing 𝑝 𝑡ℎ-percentile 𝑚 =
𝑝
100
× 𝑛
• First quartile, median, third quartile
• Box plot
05.03.2018
concerns
22

Discussion
• The proposed solution can be used for a wide
varieties of applications
• Antibiotics prescription monitoring and
benchmarking
• Infrastructure for research on primary care data
• The frameworks can be applied to domains other
than health
05.03.2018
concerns
23

Discussion
• Gives physical control to the data custodians
• Efficient and scalable to a very large number of
data custodians and records
concerns
2405.03.2018

Discussion
• Develop secure protocols for more statistical
functions
• Vertically partitioned data
• Disclosure control
concerns
2505.03.2018

Publications
• Yigzaw KY., Hailemichael MA, Skrøvseth SO, Bellika JG. Secure and Scalable Protocol
for Computing mth - Ranked Element on Distributed Data. In: In: AMIA Annual
Symposium Proceedings. 2018 (under revision)
• Yigzaw KY. Towards Practical Privacy-Preserving Distributed Statistical Computation of
Health Data. UiT The Arctic University of Norway. PhD Thesis. 2016.
• Hailemichael MA, Yigzaw KY, Bellika JG. Emnet: a tool for privacy-preserving statistical
computing on distributed health data. Proceedings from The 13th Scandinavian
Conference on Health Informatics.2015
• Andersen A, Yigzaw KY, Karlsen R. Privacy preserving health data processing. In: IEEE
16th International Conference on E-Health Networking, Applications and Services
(Healthcom). IEEE; 2014:225-230.
• Yigzaw KY, Bellika JG, Andersen A, Hartvigsen G, Fernandez-Llatas C. Towards
Privacy-preserving Computing on Distributed Electronic Health Record Data. In:
Proceedings of the 2013 Middleware Doctoral Symposium. MDS ’13. New York, NY,
USA: ACM; 2013:4:1–4:6.
05.03.2018
concerns
26

March 05, 2018 27
http://www.panoramio.com/photo/10889343
Thank you for your attention!
Privacy-preserving collection and analyses of
citizens-generated data
Kassaye Yitbarek Yigzaw
kassaye.yitbarek.yigzaw@ehealthresearch.no

BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Computation

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Computation

Similaire à BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Computation (20)

Plus de Statistisk sentralbyrå

Plus de Statistisk sentralbyrå (20)

Dernier

Dernier (20)

BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Computation

Notes de l'éditeur