SlideShare une entreprise Scribd logo
1  sur  27
Distributed data analysis in the face of
privacy concerns
Kassaye Yitbarek Yigzaw
Postdoctoral Fellow
Norwegian Centre for E-health Research
Outline
• Motivation
• Secure multi-party computation
• Challenges
• Proposed solutions
• Discussion
05.03.2018
Distributed data analysis in the face of privacy
concerns
2
Health data
Distributed data analysis in the face of privacy
concerns
3
Distributed
health data
EHR data
Registry data
Insurance
claims
Health data collected
by data custodians
05.03.2018
Opportunities
• Huge potential for a variety of purposes, such as
research and public health
• Increases the rate of new scientific discoveries
• Answers research questions that may not be
possible otherwise
Distributed data analysis in the face of privacy
concerns
405.03.2018
Distributed data
• Generalizability and reproducibility of analyses
results
• Often require data from multiple data sources
 Large sample size that provides sufficient
statistical power
 Heterogeneity
• An individual’s data are partitioned between
multiple data sources
05.03.2018
Distributed data analysis in the face of privacy
concerns
5
Horizontally partitioned data
• Data sources collect the same attributes for
disjoint set of individuals
05.03.2018
Distributed data analysis in the face of privacy
concerns
6
Data source 1 Data source 2 Data source N…
Challenges
• Data reuse raises privacy concerns
• Limit data sharing and secondary use
Distributed data analysis in the face of privacy
concerns
705.03.2018
Mental and physical harm
to patients
Evaluate their performance
Damage doctor–patient
relationship
Reveal confidential business
information
Distributed data analysis in the face of privacy
concerns
8
Privacy
Enabling health
data reuse
Research
Public health
05.03.2018
Objective
Common approaches
D1
D3
Data
D2
Third party
Distributed data analysis in the face of privacy
concerns
9
Data
Data
05.03.2018
De-identified data
Patient identifying
data
Re-identification risk
Data utility
Bias
Time
Distributed dataCost
Secure multi-party computation (SMC)
D1
D3
Data
D2
Third party
Secure multi-party
computation emulates
the third party
Distributed data analysis in the face of privacy
concerns
10
Data
Data
05.03.2018
Computing on distributed
data without revealing
sensitive information apart
from results
Challenges
• A generic solution for computing any function exist
• Efficiency and scalability are the main challenges
• Efficiency: communication and computation
overhead
• Scalability: number of data custodians and records
Distributed data analysis in the face of privacy
concerns
1105.03.2018
Adversarial model
• In semi-honest adversarial model participating
parties:
 Follow the protocol specification
 May try to learn private information from the
messages exchanged in the protocol execution
• Enables to develop protocols that are more
efficient and scalable
Distributed data analysis in the face of privacy
concerns
1205.03.2018
Dataset creation
Distributed data analysis in the face of privacy
concerns
13
User
Query Query
Query
Query

Virtualdataset
D1
Data
D2
Data
D3
Data
Coordinator
05.03.2018
Secure statistical computation
Distributed data analysis in the face of privacy
concerns
14
User
Query
Virtualdataset
D1
Data
D2
Data
D3
Data
Coordinator
05.03.2018
Secure
protocols
Aggregate
result
Secure summation protocol
Distributed data analysis in the face of privacy
concerns
15
D2
D3
Data
DN
D1
Data Data
Data
05.03.2018
Secure sum protocol (2)
• Proposed an extension to the secure sum protocol
• The protocol makes collusion difficult:
 Forming a ring topology at runtime and
 Revealing only partial information about the
ring topology to each party
Distributed data analysis in the face of privacy
concerns
1605.03.2018
Coordinator
k-Secure summation protocol
05.03.2018
Distributed data analysis in the face of privacy
concerns
17
Privacy peer
Secure summation
protocol
Other statistical problems
• A large number of statistical problems can be
decomposed into sub-functions of summation
forms
• Descriptive statistics (e.g., average, standard
deviation, covariance , Pearson’s r, minimum,
maximum, and median)
• Linear regression
• Clustering (k-means)
Distributed data analysis in the face of privacy
concerns
1805.03.2018
Secure computation of average
05.03.2018
Distributed data analysis in the face of privacy
concerns
19
Secure computation of average
Distributed data analysis in the face of privacy
concerns
20
id age height weight
1
2
D1
Data
1: Local computation
2: k-secure sum protocol
05.03.2018
Secure mth-ranked element protocol
Distributed data analysis in the face of privacy
concerns
21
D2
D3
Data
DN
D1
Data Data
Data
05.03.2018
Secure mth-ranked element protocol
• Computing minimum (𝑚 = 1) and maximum (𝑚 =
𝑛)
• Computing 𝑝 𝑡ℎ-percentile 𝑚 =
𝑝
100
× 𝑛
• First quartile, median, third quartile
• Box plot
05.03.2018
Distributed data analysis in the face of privacy
concerns
22
Discussion
• The proposed solution can be used for a wide
varieties of applications
• Antibiotics prescription monitoring and
benchmarking
• Infrastructure for research on primary care data
• The frameworks can be applied to domains other
than health
05.03.2018
Distributed data analysis in the face of privacy
concerns
23
Discussion
• Gives physical control to the data custodians
• Efficient and scalable to a very large number of
data custodians and records
Distributed data analysis in the face of privacy
concerns
2405.03.2018
Discussion
• Develop secure protocols for more statistical
functions
• Vertically partitioned data
• Disclosure control
Distributed data analysis in the face of privacy
concerns
2505.03.2018
Publications
• Yigzaw KY., Hailemichael MA, Skrøvseth SO, Bellika JG. Secure and Scalable Protocol
for Computing mth - Ranked Element on Distributed Data. In: In: AMIA Annual
Symposium Proceedings. 2018 (under revision)
• Yigzaw KY. Towards Practical Privacy-Preserving Distributed Statistical Computation of
Health Data. UiT The Arctic University of Norway. PhD Thesis. 2016.
• Hailemichael MA, Yigzaw KY, Bellika JG. Emnet: a tool for privacy-preserving statistical
computing on distributed health data. Proceedings from The 13th Scandinavian
Conference on Health Informatics.2015
• Andersen A, Yigzaw KY, Karlsen R. Privacy preserving health data processing. In: IEEE
16th International Conference on E-Health Networking, Applications and Services
(Healthcom). IEEE; 2014:225-230.
• Yigzaw KY, Bellika JG, Andersen A, Hartvigsen G, Fernandez-Llatas C. Towards
Privacy-preserving Computing on Distributed Electronic Health Record Data. In:
Proceedings of the 2013 Middleware Doctoral Symposium. MDS ’13. New York, NY,
USA: ACM; 2013:4:1–4:6.
05.03.2018
Distributed data analysis in the face of privacy
concerns
26
March 05, 2018 27
http://www.panoramio.com/photo/10889343
Thank you for your attention!
Privacy-preserving collection and analyses of
citizens-generated data
Kassaye Yitbarek Yigzaw
kassaye.yitbarek.yigzaw@ehealthresearch.no

Contenu connexe

Tendances

Tendances (20)

Digital transformation to enable a FAIR approach for health data science
Digital transformation to enable a FAIR approach for health data scienceDigital transformation to enable a FAIR approach for health data science
Digital transformation to enable a FAIR approach for health data science
 
Towards metrics to assess and encourage FAIRness
Towards metrics to assess and encourage FAIRnessTowards metrics to assess and encourage FAIRness
Towards metrics to assess and encourage FAIRness
 
Developing and assessing FAIR digital resources
Developing and assessing FAIR digital resourcesDeveloping and assessing FAIR digital resources
Developing and assessing FAIR digital resources
 
State of open research data open con
State of open research data   open conState of open research data   open con
State of open research data open con
 
The Role of the FAIR Guiding Principles for an effective Learning Health System
The Role of the FAIR Guiding Principles for an effective Learning Health SystemThe Role of the FAIR Guiding Principles for an effective Learning Health System
The Role of the FAIR Guiding Principles for an effective Learning Health System
 
2019-10-11 The value of FAIR data in health data networks - The Hyve - ELIXIR...
2019-10-11 The value of FAIR data in health data networks - The Hyve - ELIXIR...2019-10-11 The value of FAIR data in health data networks - The Hyve - ELIXIR...
2019-10-11 The value of FAIR data in health data networks - The Hyve - ELIXIR...
 
Big data service architecture: a survey
Big data service architecture: a surveyBig data service architecture: a survey
Big data service architecture: a survey
 
Seminar: How does official statistics meet needs for management information
Seminar: How does official statistics meet needs for management informationSeminar: How does official statistics meet needs for management information
Seminar: How does official statistics meet needs for management information
 
EPSRC Policy Compliance: What researchers need to know
EPSRC Policy Compliance: What researchers need to knowEPSRC Policy Compliance: What researchers need to know
EPSRC Policy Compliance: What researchers need to know
 
How to write a data management plan
How to write a data management planHow to write a data management plan
How to write a data management plan
 
FAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsFAIR Data Knowledge Graphs
FAIR Data Knowledge Graphs
 
Fair by design
Fair by designFair by design
Fair by design
 
Bigdata and Hadoop with applications
Bigdata and Hadoop with applicationsBigdata and Hadoop with applications
Bigdata and Hadoop with applications
 
OU Library Research Support webinar: Data sharing
OU Library Research Support webinar: Data sharingOU Library Research Support webinar: Data sharing
OU Library Research Support webinar: Data sharing
 
RDM & ELNs @ Edinburgh
RDM & ELNs @ EdinburghRDM & ELNs @ Edinburgh
RDM & ELNs @ Edinburgh
 
Interoperability Solution - Hybrid Update -- From Pahe II and III to Post Mar...
Interoperability Solution - Hybrid Update -- From Pahe II and III to Post Mar...Interoperability Solution - Hybrid Update -- From Pahe II and III to Post Mar...
Interoperability Solution - Hybrid Update -- From Pahe II and III to Post Mar...
 
Fairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matricesFairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matrices
 
Open Science: What, why, how?
Open Science: What, why, how? Open Science: What, why, how?
Open Science: What, why, how?
 
FAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeFAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to Practice
 
Data management plan template
Data management plan templateData management plan template
Data management plan template
 

Similaire à BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Computation

BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm. BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm.
maigva
 

Similaire à BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Computation (20)

Data Governance in two different data archives: When is a federal data reposi...
Data Governance in two different data archives: When is a federal data reposi...Data Governance in two different data archives: When is a federal data reposi...
Data Governance in two different data archives: When is a federal data reposi...
 
dissertation proposal writing service
dissertation proposal writing servicedissertation proposal writing service
dissertation proposal writing service
 
Preparing Research Data for Sharing
Preparing Research Data for SharingPreparing Research Data for Sharing
Preparing Research Data for Sharing
 
Use of data in safe havens: ethics and reproducibility issues
Use of data in safe havens: ethics and reproducibility issuesUse of data in safe havens: ethics and reproducibility issues
Use of data in safe havens: ethics and reproducibility issues
 
Statistics — Your Friend, Not Your Foe
Statistics — Your Friend, Not Your Foe Statistics — Your Friend, Not Your Foe
Statistics — Your Friend, Not Your Foe
 
Barbara Bierer, "Clinical Trial Data Sharing"
Barbara Bierer, "Clinical Trial Data Sharing"Barbara Bierer, "Clinical Trial Data Sharing"
Barbara Bierer, "Clinical Trial Data Sharing"
 
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
 
Data Sharing & Data Citation
Data Sharing & Data CitationData Sharing & Data Citation
Data Sharing & Data Citation
 
big-data-and-data-sharing_ethical-issues.pdf
big-data-and-data-sharing_ethical-issues.pdfbig-data-and-data-sharing_ethical-issues.pdf
big-data-and-data-sharing_ethical-issues.pdf
 
BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm. BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm.
 
NIH Data Sharing Plan Workshop - Slides
NIH Data Sharing Plan Workshop - SlidesNIH Data Sharing Plan Workshop - Slides
NIH Data Sharing Plan Workshop - Slides
 
Preparing research data for sharing
Preparing research data for sharingPreparing research data for sharing
Preparing research data for sharing
 
DIRISA for Open Data and Open Science/Anwar Vahed
DIRISA for Open Data and Open Science/Anwar VahedDIRISA for Open Data and Open Science/Anwar Vahed
DIRISA for Open Data and Open Science/Anwar Vahed
 
Data Management Planning for Engineers
Data Management Planning for EngineersData Management Planning for Engineers
Data Management Planning for Engineers
 
The Role of Data Lakes in Healthcare
The Role of Data Lakes in HealthcareThe Role of Data Lakes in Healthcare
The Role of Data Lakes in Healthcare
 
Enterprise Analytics: Serving Big Data Projects for Healthcare
Enterprise Analytics: Serving Big Data Projects for HealthcareEnterprise Analytics: Serving Big Data Projects for Healthcare
Enterprise Analytics: Serving Big Data Projects for Healthcare
 
dkNET Office Hours - "Are You Ready for 2023: New NIH Data Management and Sha...
dkNET Office Hours - "Are You Ready for 2023: New NIH Data Management and Sha...dkNET Office Hours - "Are You Ready for 2023: New NIH Data Management and Sha...
dkNET Office Hours - "Are You Ready for 2023: New NIH Data Management and Sha...
 
Developing metadata curation processes for data that can’t be shared openly
Developing metadata curation processes for data that can’t be shared openlyDeveloping metadata curation processes for data that can’t be shared openly
Developing metadata curation processes for data that can’t be shared openly
 
Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research
Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research
Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research
 
Publishing and sharing sensitive data 28 June
Publishing and sharing sensitive data 28 JunePublishing and sharing sensitive data 28 June
Publishing and sharing sensitive data 28 June
 

Plus de Statistisk sentralbyrå

Plus de Statistisk sentralbyrå (20)

Den europeiske studentundersøkelsen 2018
Den europeiske studentundersøkelsen 2018Den europeiske studentundersøkelsen 2018
Den europeiske studentundersøkelsen 2018
 
Befolkningsframskrivingene 2018, seminar 26. juni
Befolkningsframskrivingene 2018, seminar 26. juniBefolkningsframskrivingene 2018, seminar 26. juni
Befolkningsframskrivingene 2018, seminar 26. juni
 
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
 
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
 
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
 
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
 
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
 
Innvandrere i Norge 2017, presentasjon fra frokostseminar 11.12.2017
Innvandrere i Norge 2017, presentasjon fra frokostseminar 11.12.2017Innvandrere i Norge 2017, presentasjon fra frokostseminar 11.12.2017
Innvandrere i Norge 2017, presentasjon fra frokostseminar 11.12.2017
 
SSBs API mot Statistikkbanken
SSBs API mot StatistikkbankenSSBs API mot Statistikkbanken
SSBs API mot Statistikkbanken
 
Norsk kulturbarometer 2016
Norsk kulturbarometer 2016Norsk kulturbarometer 2016
Norsk kulturbarometer 2016
 
Presentasjon rapport: Levekår blant innvandrere i Norge 2016
Presentasjon rapport: Levekår blant innvandrere i Norge 2016Presentasjon rapport: Levekår blant innvandrere i Norge 2016
Presentasjon rapport: Levekår blant innvandrere i Norge 2016
 
SSB: Fagseminar om innvandring og inntektsutvikling 16. mars 2017
SSB: Fagseminar om innvandring og inntektsutvikling 16. mars 2017 SSB: Fagseminar om innvandring og inntektsutvikling 16. mars 2017
SSB: Fagseminar om innvandring og inntektsutvikling 16. mars 2017
 
Flyktninger i Norge, presentasjoner fra seminar 14. desember 2016
Flyktninger i Norge, presentasjoner fra seminar 14. desember 2016Flyktninger i Norge, presentasjoner fra seminar 14. desember 2016
Flyktninger i Norge, presentasjoner fra seminar 14. desember 2016
 
SSBs API mot Statistikkbanken
SSBs API mot StatistikkbankenSSBs API mot Statistikkbanken
SSBs API mot Statistikkbanken
 
Flyktninger bosatt i Norge: Hvem er de, og hvordan går det med dem?
Flyktninger bosatt i Norge: Hvem er de, og hvordan går det med dem?Flyktninger bosatt i Norge: Hvem er de, og hvordan går det med dem?
Flyktninger bosatt i Norge: Hvem er de, og hvordan går det med dem?
 
Hva vet vi om verdens flyktninger?
Hva vet vi om verdens flyktninger?Hva vet vi om verdens flyktninger?
Hva vet vi om verdens flyktninger?
 
4. Hva vet vi om verdens flyktninger?
4. Hva vet vi om verdens flyktninger?4. Hva vet vi om verdens flyktninger?
4. Hva vet vi om verdens flyktninger?
 
Hva vet vi om verdens flyktninger?
Hva vet vi om verdens flyktninger?Hva vet vi om verdens flyktninger?
Hva vet vi om verdens flyktninger?
 
Hva vet vi om verdens flyktninger?
Hva vet vi om verdens flyktninger?Hva vet vi om verdens flyktninger?
Hva vet vi om verdens flyktninger?
 
Hva vet vi om verdens flyktninger?
Hva vet vi om verdens flyktninger?Hva vet vi om verdens flyktninger?
Hva vet vi om verdens flyktninger?
 

Dernier

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
gajnagarg
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
gajnagarg
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
gajnagarg
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
amitlee9823
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
gajnagarg
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Dernier (20)

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
 

BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Computation

  • 1. Distributed data analysis in the face of privacy concerns Kassaye Yitbarek Yigzaw Postdoctoral Fellow Norwegian Centre for E-health Research
  • 2. Outline • Motivation • Secure multi-party computation • Challenges • Proposed solutions • Discussion 05.03.2018 Distributed data analysis in the face of privacy concerns 2
  • 3. Health data Distributed data analysis in the face of privacy concerns 3 Distributed health data EHR data Registry data Insurance claims Health data collected by data custodians 05.03.2018
  • 4. Opportunities • Huge potential for a variety of purposes, such as research and public health • Increases the rate of new scientific discoveries • Answers research questions that may not be possible otherwise Distributed data analysis in the face of privacy concerns 405.03.2018
  • 5. Distributed data • Generalizability and reproducibility of analyses results • Often require data from multiple data sources  Large sample size that provides sufficient statistical power  Heterogeneity • An individual’s data are partitioned between multiple data sources 05.03.2018 Distributed data analysis in the face of privacy concerns 5
  • 6. Horizontally partitioned data • Data sources collect the same attributes for disjoint set of individuals 05.03.2018 Distributed data analysis in the face of privacy concerns 6 Data source 1 Data source 2 Data source N…
  • 7. Challenges • Data reuse raises privacy concerns • Limit data sharing and secondary use Distributed data analysis in the face of privacy concerns 705.03.2018 Mental and physical harm to patients Evaluate their performance Damage doctor–patient relationship Reveal confidential business information
  • 8. Distributed data analysis in the face of privacy concerns 8 Privacy Enabling health data reuse Research Public health 05.03.2018 Objective
  • 9. Common approaches D1 D3 Data D2 Third party Distributed data analysis in the face of privacy concerns 9 Data Data 05.03.2018 De-identified data Patient identifying data Re-identification risk Data utility Bias Time Distributed dataCost
  • 10. Secure multi-party computation (SMC) D1 D3 Data D2 Third party Secure multi-party computation emulates the third party Distributed data analysis in the face of privacy concerns 10 Data Data 05.03.2018 Computing on distributed data without revealing sensitive information apart from results
  • 11. Challenges • A generic solution for computing any function exist • Efficiency and scalability are the main challenges • Efficiency: communication and computation overhead • Scalability: number of data custodians and records Distributed data analysis in the face of privacy concerns 1105.03.2018
  • 12. Adversarial model • In semi-honest adversarial model participating parties:  Follow the protocol specification  May try to learn private information from the messages exchanged in the protocol execution • Enables to develop protocols that are more efficient and scalable Distributed data analysis in the face of privacy concerns 1205.03.2018
  • 13. Dataset creation Distributed data analysis in the face of privacy concerns 13 User Query Query Query Query  Virtualdataset D1 Data D2 Data D3 Data Coordinator 05.03.2018
  • 14. Secure statistical computation Distributed data analysis in the face of privacy concerns 14 User Query Virtualdataset D1 Data D2 Data D3 Data Coordinator 05.03.2018 Secure protocols Aggregate result
  • 15. Secure summation protocol Distributed data analysis in the face of privacy concerns 15 D2 D3 Data DN D1 Data Data Data 05.03.2018
  • 16. Secure sum protocol (2) • Proposed an extension to the secure sum protocol • The protocol makes collusion difficult:  Forming a ring topology at runtime and  Revealing only partial information about the ring topology to each party Distributed data analysis in the face of privacy concerns 1605.03.2018
  • 17. Coordinator k-Secure summation protocol 05.03.2018 Distributed data analysis in the face of privacy concerns 17 Privacy peer Secure summation protocol
  • 18. Other statistical problems • A large number of statistical problems can be decomposed into sub-functions of summation forms • Descriptive statistics (e.g., average, standard deviation, covariance , Pearson’s r, minimum, maximum, and median) • Linear regression • Clustering (k-means) Distributed data analysis in the face of privacy concerns 1805.03.2018
  • 19. Secure computation of average 05.03.2018 Distributed data analysis in the face of privacy concerns 19
  • 20. Secure computation of average Distributed data analysis in the face of privacy concerns 20 id age height weight 1 2 D1 Data 1: Local computation 2: k-secure sum protocol 05.03.2018
  • 21. Secure mth-ranked element protocol Distributed data analysis in the face of privacy concerns 21 D2 D3 Data DN D1 Data Data Data 05.03.2018
  • 22. Secure mth-ranked element protocol • Computing minimum (𝑚 = 1) and maximum (𝑚 = 𝑛) • Computing 𝑝 𝑡ℎ-percentile 𝑚 = 𝑝 100 × 𝑛 • First quartile, median, third quartile • Box plot 05.03.2018 Distributed data analysis in the face of privacy concerns 22
  • 23. Discussion • The proposed solution can be used for a wide varieties of applications • Antibiotics prescription monitoring and benchmarking • Infrastructure for research on primary care data • The frameworks can be applied to domains other than health 05.03.2018 Distributed data analysis in the face of privacy concerns 23
  • 24. Discussion • Gives physical control to the data custodians • Efficient and scalable to a very large number of data custodians and records Distributed data analysis in the face of privacy concerns 2405.03.2018
  • 25. Discussion • Develop secure protocols for more statistical functions • Vertically partitioned data • Disclosure control Distributed data analysis in the face of privacy concerns 2505.03.2018
  • 26. Publications • Yigzaw KY., Hailemichael MA, Skrøvseth SO, Bellika JG. Secure and Scalable Protocol for Computing mth - Ranked Element on Distributed Data. In: In: AMIA Annual Symposium Proceedings. 2018 (under revision) • Yigzaw KY. Towards Practical Privacy-Preserving Distributed Statistical Computation of Health Data. UiT The Arctic University of Norway. PhD Thesis. 2016. • Hailemichael MA, Yigzaw KY, Bellika JG. Emnet: a tool for privacy-preserving statistical computing on distributed health data. Proceedings from The 13th Scandinavian Conference on Health Informatics.2015 • Andersen A, Yigzaw KY, Karlsen R. Privacy preserving health data processing. In: IEEE 16th International Conference on E-Health Networking, Applications and Services (Healthcom). IEEE; 2014:225-230. • Yigzaw KY, Bellika JG, Andersen A, Hartvigsen G, Fernandez-Llatas C. Towards Privacy-preserving Computing on Distributed Electronic Health Record Data. In: Proceedings of the 2013 Middleware Doctoral Symposium. MDS ’13. New York, NY, USA: ACM; 2013:4:1–4:6. 05.03.2018 Distributed data analysis in the face of privacy concerns 26
  • 27. March 05, 2018 27 http://www.panoramio.com/photo/10889343 Thank you for your attention! Privacy-preserving collection and analyses of citizens-generated data Kassaye Yitbarek Yigzaw kassaye.yitbarek.yigzaw@ehealthresearch.no

Notes de l'éditeur

  1. My name is Kassaye Yitbarek Yigzaw I’m a postdoctoral fellow at Norwegian Centre for E-health Research The title of the talk is “Distributed data analysis in the face of privacy concerns” Part of the works presented in this presentation were done when I was a PhD student at UiT the Arctic University of Norway
  2. I will talk a bit about the motivation for health data reuse and privacy challenge. Then, I will give a brief introduction to SMC paradigm. After presenting the main challenges of SMC, I will present the solutions we proposed to solve the challenges. I finish the presentation with a discussion
  3. The increased adoption of electronic health record (EHR) systems, as well as a wide variety of other electronic data sources (e.g., insurance claims and registry data), led to the collection of large amounts of detailed health information about individuals
  4. Reuse of health data has huge potential for a variety of purposes, such as research and public health Data reuse increases the rate of new scientific discoveries and answers research questions that may not be possible otherwise
  5. Generalizability and reproducibility of analyses results often require data from multiple data sources This is because, the data from a single institution may not have large sample size that provides sufficient statistical power or heterogeneity that represent the population of interest Or the required data about an individual can be partitioned between multiple data sources
  6. In this presentation, I will focus on horizontally partitioned data, where each data source collects the same attributes for disjoint set of individuals. The union of all data sources’ datasets form the over all dataset.
  7. Secondary use of data raises privacy concerns of individuals. Inappropriate disclosure of sensitive information may lead to mental and physical harm to patients. Even when individuals’ privacy concerns are addressed, Clinicians and healthcare providers are also concerned that data sharing may damage doctor-patient relationship, the data could be used to evaluate their performance or, in some contexts, reveal confidential business information Therefore, privacy concerns limit willingness for data sharing.
  8. Both protecting privacy and improving healthcare through data reuse are important social goods that should be maintained. Therefore, we need data analysis techniques that address the privacy concerns of both patients and data custodians.
  9. Traditionally, distributed data analyses is performed by centrally collecting the data at a trusted third party who analyses the data. The third party can be an institution like SSB or a researcher The data collected at the third party can be patient identifying data that often requires consent. When there are systematic difference between individuals who consent and do not consent, it leads to bias. In addition, consent collection is expensive and takes long time In rare cases, there are exemption for consent. The other alternative is sharing de-identified data that often does not require consent. The challenges for de-identified data sharing are, making a balance between re-identification risk and data utility. The problem becomes even more challenging in the context of distributed data. A simple approach for de-identifying distributed data is that each institution locally de-identify their data before sharing. However, the union of the de-identified data does not give the same result as centrally collected data de-identified.
  10. There is an area of research called secure multi-party computation. SMC deals with the problem of computing on distributed data without revealing anything apart form the result. In other words, it aims to emulate the third party The research in SMC is not limited to computing statistical functions. It is also used for privacy-preserving record linkage. In this talk, I will focus on the computing statistical functions.
  11. SMC was introduced in the 80s and a generic solution for computing any function exists. However, the generic solution is not efficient and scalable for practical uses. Efficiency is the ability to compute with good performance. Usually expressed in terms of communication and computation complexity Scalability is the ability to efficiently compute when the number of data custodians and records increase
  12. The most commonly considered adversarial model is semi-honest (honest-but-curious). In this model, a party that participate in a protocol are assumed to follow the protocols steps, but it may tries to learn private information from the messages exchanged during the protocol steps The popularity of the privacy model comes as it allows to develop mode efficient and scalable protocol while providing sufficient security for several use cases. In this presentation, I present protocols secure against assume semi-honest adversary.
  13. Before going into the secure computation, let me give you a general overview. We have a set of data custodians. Let us assume a third party denoted Coordinator. Coordinator is not trusted to collect any private information, it is only expected to be semi-honest The coordinator receive user’s dataset criteria and broadcast to the data custodians. The data custodians execute the query against its database and locally store the result. The query results across the data custodians collectively make the over all dataset, we refer to these datasets as virtual dataset, since they are physically distributed.
  14. There can be any for data cleaning and other pre-processing tasks at this stage. But, lets go a head and run statistical query on the virtual dataset. The coordinator receives a user query and initiate SMC protocols appropriate for the user query. Then, SMC protocols are run on the virtual dataset and aggregated results are returned to the user In the following slide, I will described some SMC protocols.
  15. Secure summation protocols add private values of a set of data custodians without revealing the private value of a data custodian. It is the most widely studied problem and different secure sum protocol are proposed. We consider the following secure summation protocol for its simplicity and efficiency. The simplified description of the protocol is as follows: First, the data custodians form a ring topology. The first data custodian D1 select a random value and sends the sum of the random value and its private value to D_2. D_2 adds its private value v_2 and s_1 and D_2 sends s_2 to D_3. The other parties in turn does the same. Finally, D1 calculates the total sum by subtracting the random number from s_N. However, if party Di + 1 and Di - 1 collude, the private data of party Di will be revealed
  16. We proposed an extension to the protocol that makes collusion between two parties difficult by forming a ring topology at runtime and revealing to an input party only partial knowledge about the ring topology.
  17. To be able to scale the secure summation for large number of data custodians, we proposed further extension to the protocol, denoted k-secure summation protocol. This protocol is based on dividing the data custodians into groups of k data custodians. Each group of k data custodians is denoted as privacy peer (PP). Each PP jointly run a secure summation protocol. Then, the results of the privacy-peers are centrally aggregated at the coordinator. Because of the parallel computation, the protocol can scale to very large number of data custodians.
  18. Researchers exploited the fact that a large number of statistical problems can be decomposed into sub-functions of summation forms Based on this concept, our group and other researchers created secure protocols for computing different statistics. Some of the exiting secure protocols include protocols for descriptive statistics , linear regression, and clustering
  19. I will illustrate the decomposition into sub-functions of summation form with a simple example. Average is described as … and the sub-computation are summation and count
  20. Lets say we want to compute average age. For each sub computation, the data custodians locally compute on its data, and the k-secure sum protocol is used to aggregate the local results. Finally, the coordinator compute average based on the sub-functions’ results.
  21. The example protocol I want to tell you a bit is a protocol for computing mth-ranked element. Let us say, each data custodian has ages of a set of individuals. The protocol finds the mth ranked age value
  22. The use case for the protocol are computing what are the minimum and maximum ages in the dataset. The other use case is computing pth percentiles, for example we can compute 25th, 50th, and 75th percentiles to generate a box plot
  23. The proposed solution can be used for a wide varieties of applications Some of the application we are currently working are antibiotics prescription monitoring and benchmarking solution for GP The framework is going to used in a national infrastructure for research on primary care data
  24. More evaluations of the frameworks and development of secure protocols for more statistical functions need further study The frameworks can be extended for stronger adversarial models The framework for distributed EHR data can be extended for vertically partitioned data. Extending the privacy-preserving distributed statistical computation framework for questionnaire data to other sources of PGHD is also a future work
  25. Thank you for your attention!