SlideShare a Scribd company logo
1 of 21
Download to read offline
AN EFFICIENT APPROXIMATE PROTOCOL
FOR
PRIVACY PRESERVING ASSOCIATION
RULE MINING
MURAT KANTARCIOGLU, ROBERT NIX AND JAIDEEP VAIDYA (2009)
- BY
PUSHPALANKA JAYAWARDHANA
158217G
CONTENTS
• Introduction
• Two Typesof Techniques
• Problem
• Proposing Solution
• Related Work
• Bloom Filters
• Redefinethe Problem
• Approximate Threshold Dot ProductAlgorithm
• Security
• Computational and Communicational Cost
• Experimental Results
• Accuracy
• Efficiency
• Conclusion
INTRODUCTION
• Association Rule Mining
• An important data mining model studied extensively by the database and
data mining community
• A method for discovering interesting relations between variables in large
databases
• "Beer and diaper" story
• Buying Diapers==> Buying Beer
• Privacy Preserving in Association Rule Mining
• More parties are interested in learning the global association rules
• None is willing to reveal the data at individual sites
TWO TYPES OF TECHNIQUES
• Perturbation Based methods
• Locally perturb data before delivering to the data miner
• Special techniques are used to reconstructthe original distribution
• Mining algorithm needs to be modified to consider that data is perturbed
• Have security concerns
• Secure Multiparty Computation Techniques
• Each party builds decision tree
• Only the final decision tree is shared, not any other data
• Use cryptographic techniques for security
• Computationally intensive
PROBLEM?
How can we mine data in an
efficient and provably secure
way?
PROPOSING SOLUTION
An approximate protocol for
computing the dot product of two
vectors owned by two different
parties
RELATED WORK
• A similar approximation protocol is already proposed with
sampling techniques
• A solution is present with bloom filters
• Rule mining is done centrally
• Goethals' s encryption mechanism
• simple and secured
• Calculate exact dot product
• Run time O(n)
BLOOM FILTERS
• A probabilistic data structure
• Used to test on membership of
a set
• False positives are possible
• No false negatives
• Can be used to approximate
the intersection size between
two sets
REDEFINE PROBLEM
• Compute the scalar product
• Checks if the scalar product of two distributed vectors is greater than some threshold
X1 . X2 = |S1 ∩ S2| ≥ t
APPROXIMATE THRESHOLD DOT PRODUCT ALGORITHM
• Each party creates own bloom filter, using common parameters.
• size of the bloom filter - m
• hash functions - h1, h2, ..................., hk
• Participate in the secure dot product algorithm using private bloom
filters and get the random shares of the dot product result
• each party participates in secure multiplication protocol using private
dot product results to get the random share of the multiplication result
• Finally, each party participate in a secure comparison protocol to
approximate the final result.
SECURITY
• Preserved under following assumptions
• Parties are semi-honest
• Dot product, multiplication and comparison protocols are
secure
COMPUTATIONAL AND COMMUNICATIONAL COST
• O(nk) for hashing for bloom filters, rest is O(1)
• Hashing cost is negligible compared to public key operations
• m<<n --> faster
• Flexible to use if a better secure dot product computing protocol if found in
the future
• communication cost propotional to m --> low cost
EXPERIMENTAL RESULTS
• Consider effect of,
• vector length (l),
• vector density (d)
• the actual intersection of the two vectors (i)
• the bloom filter parameters
• m (length of filter)
• k(number of hash functions)
on the performance of the algorithm.
ACCURACY -1
Increase k --> increase distortion --> less
accuracy
(when filter length is small )
ACCURACY -2
Increase filter length --> high
accuracy
(Less distortion and collision )
ACCURACY -3
Even for a large vector, same
accuracy can be achieved with sub-
linear increase in filter length
ACCURACY -4
• At 0 density no error
• Drastically increase error at high
densities
• Good for sparse vectors
ACCURACY -5
< 1 % error all the time
EFFICIENCY
Compared to exact
version,
27m : 57s Vs 4m : 04s
at run time
CONCLUSIONS
• Propose an efficient and secure protocol to approximately compute scalar
product in a privacy preserving manner.
• Efficiency is gained by allowing an approximation than an exact answer
• Extending to work with more than 2 parties is a future work
Q & A

More Related Content

Viewers also liked

Privacy Preserving Data Mining
Privacy Preserving Data MiningPrivacy Preserving Data Mining
Privacy Preserving Data MiningROMALEE AMOLIC
 
Privacy Preserving Data Mining
Privacy Preserving Data MiningPrivacy Preserving Data Mining
Privacy Preserving Data MiningVrushali Malvadkar
 
A Review Study on the Privacy Preserving Data Mining Techniques and Approaches
A Review Study on the Privacy Preserving Data Mining Techniques and ApproachesA Review Study on the Privacy Preserving Data Mining Techniques and Approaches
A Review Study on the Privacy Preserving Data Mining Techniques and Approaches14894
 
Data mining and privacy preserving in data mining
Data mining and privacy preserving in data miningData mining and privacy preserving in data mining
Data mining and privacy preserving in data miningNeeda Multani
 
Efficient Duplicate Detection Over Massive Data Sets
Efficient Duplicate Detection Over Massive Data SetsEfficient Duplicate Detection Over Massive Data Sets
Efficient Duplicate Detection Over Massive Data SetsPradeeban Kathiravelu, Ph.D.
 
Cryptography for privacy preserving data mining
Cryptography for privacy preserving data miningCryptography for privacy preserving data mining
Cryptography for privacy preserving data miningMesbah Uddin Khan
 
Privacy preserving dm_ppt
Privacy preserving dm_pptPrivacy preserving dm_ppt
Privacy preserving dm_pptSagar Verma
 
Prescription Event Monitoring & Record Linkage Systems
Prescription Event Monitoring & Record Linkage SystemsPrescription Event Monitoring & Record Linkage Systems
Prescription Event Monitoring & Record Linkage SystemsSatish Veerla
 

Viewers also liked (9)

Privacy Preserving Data Mining
Privacy Preserving Data MiningPrivacy Preserving Data Mining
Privacy Preserving Data Mining
 
Privacy Preserving Data Mining
Privacy Preserving Data MiningPrivacy Preserving Data Mining
Privacy Preserving Data Mining
 
A Review Study on the Privacy Preserving Data Mining Techniques and Approaches
A Review Study on the Privacy Preserving Data Mining Techniques and ApproachesA Review Study on the Privacy Preserving Data Mining Techniques and Approaches
A Review Study on the Privacy Preserving Data Mining Techniques and Approaches
 
Data mining and privacy preserving in data mining
Data mining and privacy preserving in data miningData mining and privacy preserving in data mining
Data mining and privacy preserving in data mining
 
Efficient Duplicate Detection Over Massive Data Sets
Efficient Duplicate Detection Over Massive Data SetsEfficient Duplicate Detection Over Massive Data Sets
Efficient Duplicate Detection Over Massive Data Sets
 
Introduction to Data Linkage
Introduction to Data LinkageIntroduction to Data Linkage
Introduction to Data Linkage
 
Cryptography for privacy preserving data mining
Cryptography for privacy preserving data miningCryptography for privacy preserving data mining
Cryptography for privacy preserving data mining
 
Privacy preserving dm_ppt
Privacy preserving dm_pptPrivacy preserving dm_ppt
Privacy preserving dm_ppt
 
Prescription Event Monitoring & Record Linkage Systems
Prescription Event Monitoring & Record Linkage SystemsPrescription Event Monitoring & Record Linkage Systems
Prescription Event Monitoring & Record Linkage Systems
 

Similar to Approximate Protocol for Privacy Preserving Associate Rule Mining

Overview of DuraMat software tool development
Overview of DuraMat software tool developmentOverview of DuraMat software tool development
Overview of DuraMat software tool developmentAnubhav Jain
 
The art of system and solution testing
The art of system and solution testingThe art of system and solution testing
The art of system and solution testinggaoliang641
 
Blockchain testing strategy
Blockchain testing strategyBlockchain testing strategy
Blockchain testing strategyrajni singh
 
Chaotic cryptography and multimedia security
Chaotic cryptography and multimedia securityChaotic cryptography and multimedia security
Chaotic cryptography and multimedia securityFatima Azeez
 
Open Chemistry: Input Preparation, Data Visualization & Analysis
Open Chemistry: Input Preparation, Data Visualization & AnalysisOpen Chemistry: Input Preparation, Data Visualization & Analysis
Open Chemistry: Input Preparation, Data Visualization & AnalysisMarcus Hanwell
 
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptxPPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptxneju3
 
Integrity for join queries
Integrity for join queriesIntegrity for join queries
Integrity for join queriesPapitha Velumani
 
adaptive_ecg_cdr_edittedforpublic.pptx
adaptive_ecg_cdr_edittedforpublic.pptxadaptive_ecg_cdr_edittedforpublic.pptx
adaptive_ecg_cdr_edittedforpublic.pptxssuser6f1a8e1
 
Machine Learning Application Development
Machine Learning Application DevelopmentMachine Learning Application Development
Machine Learning Application DevelopmentLARCA UPC
 
Application of machine learning and cognitive computing in intrusion detectio...
Application of machine learning and cognitive computing in intrusion detectio...Application of machine learning and cognitive computing in intrusion detectio...
Application of machine learning and cognitive computing in intrusion detectio...Mahdi Hosseini Moghaddam
 
Ofer rivlin BGU - department seminar
Ofer rivlin   BGU - department seminarOfer rivlin   BGU - department seminar
Ofer rivlin BGU - department seminarOfer Rivlin, CISSP
 
Scalable Similarity-Based Neighborhood Methods with MapReduce
Scalable Similarity-Based Neighborhood Methods with MapReduceScalable Similarity-Based Neighborhood Methods with MapReduce
Scalable Similarity-Based Neighborhood Methods with MapReducesscdotopen
 
Deterministic and high throughput data processing for CubeSats
Deterministic and high throughput data processing for CubeSatsDeterministic and high throughput data processing for CubeSats
Deterministic and high throughput data processing for CubeSatsPablo Ghiglino
 
Building Reactive Applications with DDS
Building Reactive Applications with DDSBuilding Reactive Applications with DDS
Building Reactive Applications with DDSAngelo Corsaro
 
Revisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software TestingRevisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software TestingLionel Briand
 
Foundational Design Patterns for Multi-Purpose Applications
Foundational Design Patterns for Multi-Purpose ApplicationsFoundational Design Patterns for Multi-Purpose Applications
Foundational Design Patterns for Multi-Purpose ApplicationsChing-Hwa Yu
 
Quantum cryptography by Girisha Shankar, Sr. Manager, Cisco
Quantum cryptography by Girisha Shankar, Sr. Manager, CiscoQuantum cryptography by Girisha Shankar, Sr. Manager, Cisco
Quantum cryptography by Girisha Shankar, Sr. Manager, CiscoVishnu Pendyala
 

Similar to Approximate Protocol for Privacy Preserving Associate Rule Mining (20)

Overview of DuraMat software tool development
Overview of DuraMat software tool developmentOverview of DuraMat software tool development
Overview of DuraMat software tool development
 
The art of system and solution testing
The art of system and solution testingThe art of system and solution testing
The art of system and solution testing
 
Blockchain testing strategy
Blockchain testing strategyBlockchain testing strategy
Blockchain testing strategy
 
Chaotic cryptography and multimedia security
Chaotic cryptography and multimedia securityChaotic cryptography and multimedia security
Chaotic cryptography and multimedia security
 
Open Chemistry: Input Preparation, Data Visualization & Analysis
Open Chemistry: Input Preparation, Data Visualization & AnalysisOpen Chemistry: Input Preparation, Data Visualization & Analysis
Open Chemistry: Input Preparation, Data Visualization & Analysis
 
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptxPPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
 
Integrity for join queries
Integrity for join queriesIntegrity for join queries
Integrity for join queries
 
adaptive_ecg_cdr_edittedforpublic.pptx
adaptive_ecg_cdr_edittedforpublic.pptxadaptive_ecg_cdr_edittedforpublic.pptx
adaptive_ecg_cdr_edittedforpublic.pptx
 
Machine Learning Application Development
Machine Learning Application DevelopmentMachine Learning Application Development
Machine Learning Application Development
 
Application of machine learning and cognitive computing in intrusion detectio...
Application of machine learning and cognitive computing in intrusion detectio...Application of machine learning and cognitive computing in intrusion detectio...
Application of machine learning and cognitive computing in intrusion detectio...
 
Ofer rivlin BGU - department seminar
Ofer rivlin   BGU - department seminarOfer rivlin   BGU - department seminar
Ofer rivlin BGU - department seminar
 
Scalable Similarity-Based Neighborhood Methods with MapReduce
Scalable Similarity-Based Neighborhood Methods with MapReduceScalable Similarity-Based Neighborhood Methods with MapReduce
Scalable Similarity-Based Neighborhood Methods with MapReduce
 
P2 p
P2 pP2 p
P2 p
 
Deterministic and high throughput data processing for CubeSats
Deterministic and high throughput data processing for CubeSatsDeterministic and high throughput data processing for CubeSats
Deterministic and high throughput data processing for CubeSats
 
Building Reactive Applications with DDS
Building Reactive Applications with DDSBuilding Reactive Applications with DDS
Building Reactive Applications with DDS
 
CDN algos
CDN algosCDN algos
CDN algos
 
Revisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software TestingRevisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software Testing
 
GPA Software Overview R3
GPA Software Overview R3GPA Software Overview R3
GPA Software Overview R3
 
Foundational Design Patterns for Multi-Purpose Applications
Foundational Design Patterns for Multi-Purpose ApplicationsFoundational Design Patterns for Multi-Purpose Applications
Foundational Design Patterns for Multi-Purpose Applications
 
Quantum cryptography by Girisha Shankar, Sr. Manager, Cisco
Quantum cryptography by Girisha Shankar, Sr. Manager, CiscoQuantum cryptography by Girisha Shankar, Sr. Manager, Cisco
Quantum cryptography by Girisha Shankar, Sr. Manager, Cisco
 

More from Pushpalanka Jayawardhana

Authorization for workloads in a dynamically scaling heterogeneous system
Authorization for workloads in a  dynamically scaling heterogeneous systemAuthorization for workloads in a  dynamically scaling heterogeneous system
Authorization for workloads in a dynamically scaling heterogeneous systemPushpalanka Jayawardhana
 
The role of IAM in OpenBanking and where do we stand
The role of IAM in OpenBanking and where do we stand The role of IAM in OpenBanking and where do we stand
The role of IAM in OpenBanking and where do we stand Pushpalanka Jayawardhana
 
Identity mediation for enterprise identity bus
Identity mediation for enterprise identity busIdentity mediation for enterprise identity bus
Identity mediation for enterprise identity busPushpalanka Jayawardhana
 
Threads and Concurrency Identifying Performance Deviations in Thread Pools
Threads and Concurrency Identifying Performance Deviations in Thread PoolsThreads and Concurrency Identifying Performance Deviations in Thread Pools
Threads and Concurrency Identifying Performance Deviations in Thread PoolsPushpalanka Jayawardhana
 
Leveraging federation capabilities of identity server for api gateway
Leveraging federation capabilities  of identity server for api gatewayLeveraging federation capabilities  of identity server for api gateway
Leveraging federation capabilities of identity server for api gatewayPushpalanka Jayawardhana
 
Feedback queuing models for time shared systems
Feedback queuing models for time shared systemsFeedback queuing models for time shared systems
Feedback queuing models for time shared systemsPushpalanka Jayawardhana
 

More from Pushpalanka Jayawardhana (11)

Authorization for workloads in a dynamically scaling heterogeneous system
Authorization for workloads in a  dynamically scaling heterogeneous systemAuthorization for workloads in a  dynamically scaling heterogeneous system
Authorization for workloads in a dynamically scaling heterogeneous system
 
The role of IAM in OpenBanking and where do we stand
The role of IAM in OpenBanking and where do we stand The role of IAM in OpenBanking and where do we stand
The role of IAM in OpenBanking and where do we stand
 
Frictionless Adaption of PSD2 with WSO2
Frictionless Adaption of PSD2 with WSO2Frictionless Adaption of PSD2 with WSO2
Frictionless Adaption of PSD2 with WSO2
 
Identity mediation for enterprise identity bus
Identity mediation for enterprise identity busIdentity mediation for enterprise identity bus
Identity mediation for enterprise identity bus
 
Threads and Concurrency Identifying Performance Deviations in Thread Pools
Threads and Concurrency Identifying Performance Deviations in Thread PoolsThreads and Concurrency Identifying Performance Deviations in Thread Pools
Threads and Concurrency Identifying Performance Deviations in Thread Pools
 
Leveraging federation capabilities of identity server for api gateway
Leveraging federation capabilities  of identity server for api gatewayLeveraging federation capabilities  of identity server for api gateway
Leveraging federation capabilities of identity server for api gateway
 
Feedback queuing models for time shared systems
Feedback queuing models for time shared systemsFeedback queuing models for time shared systems
Feedback queuing models for time shared systems
 
Big Data CDR Analyzer - Kanthaka
Big Data CDR Analyzer - KanthakaBig Data CDR Analyzer - Kanthaka
Big Data CDR Analyzer - Kanthaka
 
Kanthaka - High Volume CDR Analyzer
Kanthaka - High Volume CDR AnalyzerKanthaka - High Volume CDR Analyzer
Kanthaka - High Volume CDR Analyzer
 
Experience at WSO2 as an Intern
Experience at WSO2 as an InternExperience at WSO2 as an Intern
Experience at WSO2 as an Intern
 
Cosmology in general
Cosmology in generalCosmology in general
Cosmology in general
 

Recently uploaded

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 

Recently uploaded (20)

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 

Approximate Protocol for Privacy Preserving Associate Rule Mining

  • 1. AN EFFICIENT APPROXIMATE PROTOCOL FOR PRIVACY PRESERVING ASSOCIATION RULE MINING MURAT KANTARCIOGLU, ROBERT NIX AND JAIDEEP VAIDYA (2009) - BY PUSHPALANKA JAYAWARDHANA 158217G
  • 2. CONTENTS • Introduction • Two Typesof Techniques • Problem • Proposing Solution • Related Work • Bloom Filters • Redefinethe Problem • Approximate Threshold Dot ProductAlgorithm • Security • Computational and Communicational Cost • Experimental Results • Accuracy • Efficiency • Conclusion
  • 3. INTRODUCTION • Association Rule Mining • An important data mining model studied extensively by the database and data mining community • A method for discovering interesting relations between variables in large databases • "Beer and diaper" story • Buying Diapers==> Buying Beer • Privacy Preserving in Association Rule Mining • More parties are interested in learning the global association rules • None is willing to reveal the data at individual sites
  • 4. TWO TYPES OF TECHNIQUES • Perturbation Based methods • Locally perturb data before delivering to the data miner • Special techniques are used to reconstructthe original distribution • Mining algorithm needs to be modified to consider that data is perturbed • Have security concerns • Secure Multiparty Computation Techniques • Each party builds decision tree • Only the final decision tree is shared, not any other data • Use cryptographic techniques for security • Computationally intensive
  • 5. PROBLEM? How can we mine data in an efficient and provably secure way?
  • 6. PROPOSING SOLUTION An approximate protocol for computing the dot product of two vectors owned by two different parties
  • 7. RELATED WORK • A similar approximation protocol is already proposed with sampling techniques • A solution is present with bloom filters • Rule mining is done centrally • Goethals' s encryption mechanism • simple and secured • Calculate exact dot product • Run time O(n)
  • 8. BLOOM FILTERS • A probabilistic data structure • Used to test on membership of a set • False positives are possible • No false negatives • Can be used to approximate the intersection size between two sets
  • 9. REDEFINE PROBLEM • Compute the scalar product • Checks if the scalar product of two distributed vectors is greater than some threshold X1 . X2 = |S1 ∩ S2| ≥ t
  • 10. APPROXIMATE THRESHOLD DOT PRODUCT ALGORITHM • Each party creates own bloom filter, using common parameters. • size of the bloom filter - m • hash functions - h1, h2, ..................., hk • Participate in the secure dot product algorithm using private bloom filters and get the random shares of the dot product result • each party participates in secure multiplication protocol using private dot product results to get the random share of the multiplication result • Finally, each party participate in a secure comparison protocol to approximate the final result.
  • 11. SECURITY • Preserved under following assumptions • Parties are semi-honest • Dot product, multiplication and comparison protocols are secure
  • 12. COMPUTATIONAL AND COMMUNICATIONAL COST • O(nk) for hashing for bloom filters, rest is O(1) • Hashing cost is negligible compared to public key operations • m<<n --> faster • Flexible to use if a better secure dot product computing protocol if found in the future • communication cost propotional to m --> low cost
  • 13. EXPERIMENTAL RESULTS • Consider effect of, • vector length (l), • vector density (d) • the actual intersection of the two vectors (i) • the bloom filter parameters • m (length of filter) • k(number of hash functions) on the performance of the algorithm.
  • 14. ACCURACY -1 Increase k --> increase distortion --> less accuracy (when filter length is small )
  • 15. ACCURACY -2 Increase filter length --> high accuracy (Less distortion and collision )
  • 16. ACCURACY -3 Even for a large vector, same accuracy can be achieved with sub- linear increase in filter length
  • 17. ACCURACY -4 • At 0 density no error • Drastically increase error at high densities • Good for sparse vectors
  • 18. ACCURACY -5 < 1 % error all the time
  • 19. EFFICIENCY Compared to exact version, 27m : 57s Vs 4m : 04s at run time
  • 20. CONCLUSIONS • Propose an efficient and secure protocol to approximately compute scalar product in a privacy preserving manner. • Efficiency is gained by allowing an approximation than an exact answer • Extending to work with more than 2 parties is a future work
  • 21. Q & A