Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Implementation of Bitmap based Incognito and Performance Evaluation
1. Implementation of Bitmap based
Incognito and Performance
Evaluation
Hyunho Kang, Jaemyung Kim,
Gapjoo Na, and Sangwon Lee
Sungkyunkwan University
2. Table of Contents
Introduction
Existing Solutions
− Binary Search
− Incognito
Bitmap based Incognito
Optimization Techniques
Performance Evaluation
Conclusion
Implementation of Bitmap based Incognito and Performance Evaluation 2
3. Introduction
Privacy Problem and Solution (Sweeney)
− Released microdata → Join attack (Re-identification)
− Solution: k-anonymization
K-anonymization Algorithm
− Full-domain binary search
− Incognito: one of the most efficient algorithm (Kristen)
Problem of Existing Incognito Algorithm
− Require many repeating sorts against large volume data
− Solution: using bitmap index structure
Completely eliminate the expensive sort
Implementation of Bitmap based Incognito and Performance Evaluation 3
4. Joining Attack
Example - Joining Attack
Voter Registration List Hospital Patients
Name DOB Sex Zipcode DOB Sex Zipcode Disease
Andre
Andre 1/21/76 AndreMale
1/21/76 Male 53715
1/21/76
53715 Male 1/21/76 53715
1/21/76 Male
Male Flu 53715
53715 Flu
Flu
Beth 1/10/81 Female 55410 1/21/76 Male 53703 Broken Arm
Carol 10/1/44 Female 90210 2/28/76 Male 53703 Bronchitis
Dan 2/21/84 Male 02174 4/13/86 Female 53715 Hepatitis
Ellen 4/19/72 Female 02237 4/13/86 Female 53706 Sprained Ankle
2/28/86 Female 53706 Hang Nail
Name DOB Sex Zipcode Disease
Implementation of Bitmap based Incognito and Performance Evaluation 4
5. Joining Attack
Voter Registration List Hospital Patients
Name DOB Sex Zipcode DOB Sex Zipcode Disease
Andre 1/21/76 Male 53715 1/21/76 Male 537** Flu
Andre 1/21/76AndreMale 1/21/76
53715 Male 1/21/76 537** Male Flu OR
537** Flu
Beth 1/10/81 Female 55410 1/21/76 Male 537**
Broken Broken Arm
1/21/76 Male 537** Broken
Carol 10/1/44 Female 90210 2/28/76 Male Arm537** Bronchitis
Dan 2/21/84 Male 02174 4/13/86 Female 537** Hepatitis
Ellen 4/19/72 Female 02237 4/13/86 Female 537** Sprained Ankle
2/28/86 Female 537** Hang Nail
Name DOB Sex Zipcode Disease
Implementation of Bitmap based Incognito and Performance Evaluation 5
6. Basic Definitions (1/3)
Quasi-Identifier Attribute Set (Q)
− minimal set of attributes in table T that can be joined with
external information to re-identify individual records
− e.g. {Birthdate, Sex, Zipcode}
Frequency Set
− a mapping from each unique combination of values of Q in T
to the total number of tuples in T with these values of Q (the
counts)
Implementation of Bitmap based Incognito and Performance Evaluation 6
7. Basic Definitions (2/3)
K-anonymity (K-anonymous)
− To satisfy the k-anonymity property(or k-anonymous) with
respect to attribute set Q if every count in the frequency set
of T with respect to Q is greater than or equal to k.
− In SQL, table T is k-anonymous if each
SELECT MIN(COUNT(*))
FROM T
GROUP BY (Subset of Quasi-Identifier)
is ≥ k
− e.g.
SELECT MIN(COUNT(*))
FROM “Hospital Patients”
GROUP BY DOB, Sex, Zipcode
Implementation of Bitmap based Incognito and Performance Evaluation 7
8. Basic Definitions (3/3)
Generalization
− is defined by function (user-defined function)
− Notation <D : Di <D Dj: Dj is generalization of Di
Implementation of Bitmap based Incognito and Performance Evaluation 8
9. Example of Generalization (1/3)
Domain and Value Generalization
5371* = f(53715)
Z2 537**
537** = f(5371*)
Z1 5371* 5370*
Zipcode(Z0) 53715 53710 53706 53703
B1 S1 Person
*
Birth(B0) 1/21/76 2/28/76 4/13/86 Sex(S0) Male Female
Implementation of Bitmap based Incognito and Performance Evaluation 9
10. Example of Generalization (2/3)
Generalization Lattice for Two Attributes
<B1, S1>
<B1,S0> <B0, S1> <S1, Z2>
Sex Zipcode
<B0, S0> <S1, Z1> <S0, Z2> Male 537**
Female 537**
<B1, Z2> <S1, Z0> <S0, Z1>
Sex Zipcode
<B1, Z1> <B0, Z2> <S0, Z0> Male 5370*
Male 5371*
<B1, Z0> <B0, Z1>
Female 5370*
Female 5371*
<B0, Z0>
Implementation of Bitmap based Incognito and Performance Evaluation 10
11. Table of Contents
Introduction
Existing Solutions
− Binary Search
− Incognito
Bitmap based Incognito
Optimization Techniques
Performance Evaluation
Conclusion
Implementation of Bitmap based Incognito and Performance Evaluation 11
12. Full-Domain Generalization Algorithm
Binary Search of the lattice finds solution of minimum
height
- if no generalization of height h satisfies k-anonymity, then
no generalization of height h’ < h will satisfy k-anonymity.
<S1, Z2>
h : maximum height in the generalization lattice
1) Check generalization at height └h/2┘
<S0, Z2>
2) If this height satisfies k-anonymity <S1, Z1>
2-1) check generalization at height └h/4┘
3) Else <S1, Z0> <S0, Z1>
3-1) check generalization at height └3h/4┘
4) And so on… <S0, Z0>
This algorithm is proven to find a single minimal full-
domain k-anonymization
Implementation of Bitmap based Incognito and Performance Evaluation 12
13. Key Properties of Incognito
Generalization Property: <Z0> →<Z1>
Rollup Property
Subset Property: <S1,Z0,D1> → <S1,Z0>, <S1,D1>, <Z0,D1>
Hospital Patients Hospital Patients
B0 S0 Z0 D0 B0 S0 Z1 D0
1/21/76 Male 53715 Flu 1/21/76 Male 5371* Flu
1/21/76 Male 53703 Broken Arm 1/21/76 Male 5370* Broken Arm
2/28/76 Male 53703 Bronchitis 2/28/76 Male 5370* Bronchitis
4/13/86 Female 53715 Hepatitis 4/13/86 Female 5371* Hepatitis
4/13/86 Female 53706 Sprained Ankle 4/13/86 Female 5370* Sprained Ankle
2/28/86 Female 53706 Hang Nail 2/28/86 Female 5370* Hang Nail
Implementation of Bitmap based Incognito and Performance Evaluation 13
14. Basic Incognito Example (1/3)
Consider quasi-identifier (DOB, Sex, Zipcode) and k = 2
Search 1-subsets Hospital Patients
DOB Sex Zipcode Disease
1/21/76 Male 53715 Flu
1/21/76 Male 53703 Broken Arm
B1 2/28/76 Male 53703 Bronchitis
4/13/86 Female 53715 Hepatitis
B0 4/13/86 Female 53706 Sprained Ankle
2/28/76 Female 53706 Hang Nail
DOB Count
1/21/76 2
4/13/86 2 SELECT
2/28/76 2 COUNT(*)
GROUP BY DOB
Implementation of Bitmap based Incognito and Performance Evaluation 14
15. Basic Incognito Example (1/3)
Consider quasi-identifier (DOB, Sex, Zipcode) and k = 2
Search 1-subsets Hospital Patients
DOB Sex Zipcode Disease
1/21/76 Male 53715 Flu
1/21/76 Male 53703 Broken Arm
S1 2/28/76 Male 53703 Bronchitis
4/13/86 Female 53715 Hepatitis
S0 4/13/86 Female 53706 Sprained Ankle
2/28/76 Female 53706 Hang Nail
Sex Count
Male 3
Female 3 SELECT
COUNT(*)
GROUP BY Sex
Implementation of Bitmap based Incognito and Performance Evaluation 15
16. Basic Incognito Example (1/3)
Consider quasi-identifier (DOB, Sex, Zipcode) and k = 2
Search 1-subsets Hospital Patients
DOB Sex Zipcode Disease
1/21/76 Male 53715 Flu
Z2
1/21/76 Male 53703 Broken Arm
Z1 2/28/76 Male 53703 Bronchitis
4/13/86 Female 53715 Hepatitis
Z0 4/13/86 Female 53706 Sprained Ankle
2/28/76 Female 53706 Hang Nail
Zipcode Count
53715 2
53703 2 SELECT
COUNT(*)
53706 2
GROUP BY
Zipcode
Implementation of Bitmap based Incognito and Performance Evaluation 16
17. Basic Incognito Example (2/3)
Search all 2-subsets Hospital Patients
DOB Sex Zipcode Disease
<S1, Z2>
1/21/76 Male 53715 Flu
1/21/76 Male 53703 Broken Arm
<S1, Z1> <S0, Z2>
2/28/76 Male 53703 Bronchitis
4/13/86 Female 53715 Hepatitis
<S1, Z0> <S0, Z1>
4/13/86 Female 53706 Sprained Ankle
2/28/76 Female 53706 Hang Nail
<S0, Z0>
Sex Zipcode Count
SELECT
Male 53715 1 COUNT(*)
Female 53715 1 GROUP BY
Male 53703 2 Sex, Zipcode
Female 53706 2
Implementation of Bitmap based Incognito and Performance Evaluation 17
18. Basic Incognito Example (2/3)
Search all 2-subsets Hospital Patients
DOB Sex Zipcode Disease
<S1, Z2>
1/21/76 Male 53715 Flu
1/21/76 Male 53703 Broken Arm
<S1, Z1> <S0, Z2>
2/28/76 Male 53703 Bronchitis
4/13/86 Female 53715 Hepatitis
<S1, Z0> <S0, Z1>
4/13/86 Female 53706 Sprained Ankle
2/28/76 Female 53706 Hang Nail
SELECT
S1 Zipcode Count COUNT(*)
* 53715 2 GROUP BY
* 53703 2 S1, Zipcode
* 53706 2
Implementation of Bitmap based Incognito and Performance Evaluation 18
19. Basic Incognito Example (2/3)
Search all 2-subsets Hospital Patients
DOB Sex Zipcode Disease
<S1, Z2>
1/21/76 Male 53715 Flu
1/21/76 Male 53703 Broken Arm
<S1, Z1> <S0, Z2>
2/28/76 Male 53703 Bronchitis
4/13/86 Female 53715 Hepatitis
<S1, Z0> <S0, Z1>
4/13/86 Female 53706 Sprained Ankle
2/28/76 Female 53706 Hang Nail
Sex Z1 Count
SELECT
Male 5371* 1
COUNT(*)
Female 5371* 1 GROUP BY
Male 5370* 2 Sex, Z1
Female 5370* 2
Implementation of Bitmap based Incognito and Performance Evaluation 19
20. Basic Incognito Example (2/3)
Search all 2-subsets Hospital Patients
DOB Sex Zipcode Disease
<S1, Z2>
1/21/76 Male 53715 Flu
1/21/76 Male 53703 Broken Arm
<S1, Z1> <S0, Z2>
2/28/76 Male 53703 Bronchitis
4/13/86 Female 53715 Hepatitis
<S1, Z0>
4/13/86 Female 53706 Sprained Ankle
2/28/76 Female 53706 Hang Nail
SELECT
Sex Z2 Count COUNT(*)
Male 537** 3 GROUP BY
Female 537** 3 Sex, Z2
Implementation of Bitmap based Incognito and Performance Evaluation 20
21. Basic Incognito Example (3/3)
Search 3-subsets Hospital Patients
DOB Sex Zipcode Disease
1/21/76 Male 53715 Flu
<B1, S1, Z2> 1/21/76 Male 53703 Broken Arm
2/28/76 Male 53703 Bronchitis
<B1, S1, Z1> <B1, S0, Z2> <B0, S1, Z2> 4/13/86 Female 53715 Hepatitis
4/13/86 Female 53706 Sprained Ankle
<B1, S1, Z0>
2/28/76 Female 53706 Hang Nail
Implementation of Bitmap based Incognito and Performance Evaluation 21
23. Table of Contents
Introduction
Existing Solutions
− Binary Search
− Incognito
Bitmap based Incognito
Optimization Techniques
Performance Evaluation
Conclusion
Implementation of Bitmap based Incognito and Performance Evaluation 23
24. What Is the Problem?
Incognito Is Very Nice Algorithm
− but…
Checking k-anonymity for each node is still expensive!
− SELECT MIN(COUNT(*))
FROM T
GROUP BY (QI Attr. Set)
Implementation of Bitmap based Incognito and Performance Evaluation 24
25. Bitmap based Incognito
Generalization
− bitwise OR operation
Combination
− bitwise AND operation
Checking k-anonymity
− bit-counting operation
Implementation of Bitmap based Incognito and Performance Evaluation 25
26. Generalize 1-subset (single attr.)
Hospital Patients
DOB Sex Zipcode Disease
1/21/76 Male 53715 Flu
???? 0
1/21/76 Male 53703 Broken Arm ???? 1
2/28/76 Male 53703 Bronchitis 5370* 1
4/13/86 Female 53715 Hepatitis 5370* = 0
5370* 1
4/13/86 Female 53706 Sprained Ankle
5370* 1
2/28/76 Female 53706 Hang Nail
OR
0 0
537** 0 1
0 1
0 0
1 0
5371* 5370* 1 0
53715 53710 53706 53703
Implementation of Bitmap based Incognito and Performance Evaluation 26
35. Check k-anonymity
<S1, Z2> <S0, Z0>
<Male, <Male, <Male,
53703> 53706> 53715>
<S1, Z1> <S0, Z2> 0 0 1
1 0 0
1 0 0
<S1, Z0> <S0, Z1> 0 0 0
0 0 0
0 0 0
<S0, Z0>
<Female, <Female, <Female,
<S1, Z0>
<*, 53703> <*, 53706> <*, 53715> 53703> 53706> 53715>
0 0 1 0 0 0
1 0 0 0 0 0
1 C 0 C 0 C 0 0 0
0 O 0 O 1 O 0 0 1
U U U 0 1 0
0 N 1 N 0 N
0 T 1 T 0 T 0 1 0
2 2 2 ☞ Satisfy K(2)-anonymity
Implementation of Bitmap based Incognito and Performance Evaluation 35
36. Table of Contents
Introduction
Existing Solutions
− Binary Search
− Incognito
Bitmap based Incognito
Optimization Techniques
Performance Evaluation
Conclusion
Implementation of Bitmap based Incognito and Performance Evaluation 36
37. Optimization Techniques
1-Level Optimization
− Keep only 1-subset bitmaps for generating k-subset bitmaps
Reusing Optimization
− Reuse intermediate (k-?)-subset bitmaps for generating k-
subset bitmaps
Pruning Optimization
− Stop counting operation if specific bitmap does not satisfy ‘k’
− And then check more generalized node
Single Instruction Multiple Data
− Parallelize bitwise AND/OR operation using SIMD instruction
Implementation of Bitmap based Incognito and Performance Evaluation 37
38. 1-level Optimization
e3
↑
a2 e2
g2
↑ ↑ ↑
a1 g1 e1
↑ ↑ ↑
a0 g0 e0
<a2, g2, e1> = a2 ∧ g2 ∧ e1
Reduce Memory and Disk Space for Bitmap!
Implementation of Bitmap based Incognito and Performance Evaluation 38
39. Reusing Optimization
To generate <a2, g2, e1>
− a2 ∧ g2 ∧ e1
− <a2, g2> ∧ e1
− <a2, e1> ^ g2
− <g2, e1> ^ a2
2-subset bitmaps are already created at the previous step
Implementation of Bitmap based Incognito and Performance Evaluation 39
40. Pruning Optimization
1 => does not satisfy k
can skip node generalization <Male, 53710>, … , <Female, 53715>
Implementation of Bitmap based Incognito and Performance Evaluation 40
41. Single Instruction Multiple Data
Using SIMD Instruction
− BitwiseAND/OR and bit-counting operation can be parallelized
We implemented using
− Intel Pentium 4 Streamed SIMD Extensions(SSE) technology
Implementation of Bitmap based Incognito and Performance Evaluation 41
42. Table of Contents
Introduction
Existing Solutions
− Binary Search
− Incognito
Bitmap based Incognito
Optimization Techniques
Performance Evaluation
Conclusion
Implementation of Bitmap based Incognito and Performance Evaluation 42
43. Performance Evaluation
Dataset
− Small(5MB) and big(60MB) census data
− QI attributes set (four columns)
Generalization level: 3, 3, 2, 4 respectively
− Index size: 2MB(40%) and 16MB(27%)
− Bitmap size: 200KB(4%) and 2MB(3%)
Environment
− Pentium IV 2.0 GHz
− 1GB memory, 7200rpm hard disk
− Oracle 10g R1 & Intel C++ Compiler 9.0
Implementation of Bitmap based Incognito and Performance Evaluation 43
44. Performance Evaluation
Small Data
25.000
20.000
15.000
10.000
5.000
0.000
4000 2000 1000 500 100
1-Level Pruning Reusing Traditional
Implementation of Bitmap based Incognito and Performance Evaluation 44
45. Performance Evaluation
Small Data (zoom in)
1.400
1.200
1.000
0.800
0.600
0.400
0.200
0.000
4000 2000 1000 500 100
1-Level Pruning Reusing
Implementation of Bitmap based Incognito and Performance Evaluation 45
46. Performance Evaluation
Big Data
1400.000
1200.000
1000.000
800.000
600.000
400.000
200.000
0.000
4000 2000 1000 500 100
1-Level Pruning Reusing Traditional
Implementation of Bitmap based Incognito and Performance Evaluation 46
47. Performance Evaluation
Big Data (zoom in)
4.000
3.500
3.000
2.500
2.000
1.500
1.000
0.500
0.000
4000 2000 1000 500 100
1-Level Pruning Reusing
Implementation of Bitmap based Incognito and Performance Evaluation 47
48. Table of Contents
Introduction
Existing Solutions
− Binary Search
− Incognito
Bitmap based Incognito
Optimization Techniques
Performance Evaluation
Conclusion
Implementation of Bitmap based Incognito and Performance Evaluation 48
49. Conclusion
Incognito = very innovative k-anonymity algorithm
− Still inefficient in checking the for each node
− Expensive external sort or hash for counting (e.g. GROUP BY)
Using Bitmap (Bitwise AND/OR)
− Additional optimization opportunities
Reusing Optimization
Pruning Optimization
Single Instruction Multiple Data
− Space/time trade-off
1-level / Reusing Optimization
Implementation of Bitmap based Incognito and Performance Evaluation 49