Presented at the PrivateNLP workshop at WSDM 2020
https://sites.google.com/view/wsdm-privatenlp-2020/home/
Amazon prides itself on being the most customer-centric company on earth. That means maintaining the highest possible standards of both security and privacy when dealing with customer data. This month, at the ACM Web Search and Data Mining (WSDM) Conference, my colleagues and I will describe a way to protect privacy during large-scale analyses of textual data supplied by customers. Our method works by, essentially, re-phrasing the customer supplied text and basing analysis on the new phrasing, rather than on the customers’ own language.
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Preserving Privacy and Utility in Text Data Analysis
1. Preserving Privacy and Utility in Text Data Analysis
Tom Diethe, Oluwaseyi Feyisetan, Thomas Drake, Borja Balle
{sey,tdiethe,draket}@amazon.com
borja.balle@gmail.com
PrivateNLP Workshop, WSDM
February 7 2020
2. Outline
1 Alexa AI
2 Algorithmic Privacy
3 Privacy for Text
4 Differential Privacy in Euclidean Spaces
5 Differential Privacy in Hyperbolic Spaces
6 Optimizing the Privacy Utility Trade-off
7 Summary
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 1 / 41
3. Outline
1 Alexa AI
2 Algorithmic Privacy
3 Privacy for Text
4 Differential Privacy in Euclidean Spaces
5 Differential Privacy in Hyperbolic Spaces
6 Optimizing the Privacy Utility Trade-off
7 Summary
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 2 / 41
4. Alexa AI
What is Alexa?
A cloud-based voice service that can help
you with tasks, entertainment, general
information, shopping, and more
The more you talk to Alexa, the more
Alexa adapts to your speech patterns,
vocabulary, and personal preferences
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 3 / 41
5. Alexa AI
What is Alexa?
A cloud-based voice service that can help
you with tasks, entertainment, general
information, shopping, and more
The more you talk to Alexa, the more
Alexa adapts to your speech patterns,
vocabulary, and personal preferences
How do we ...
create robust and efficient AI systems?
maintain the privacy of customer data?
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 3 / 41
6. Failure Modes
Unintentional failures: ML system produces a formally correct but completely unsafe
outcome
Outliers/anomalies
Dataset shift
Limited memory
Intentional failures: failure is caused by an active adversary attempting to subvert the
system to attain her goals, such as to:
misclassify the result
infer private training data
steal the underlying algorithm
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 4 / 41
7. Outline
1 Alexa AI
2 Algorithmic Privacy
3 Privacy for Text
4 Differential Privacy in Euclidean Spaces
5 Differential Privacy in Hyperbolic Spaces
6 Optimizing the Privacy Utility Trade-off
7 Summary
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 5 / 41
8. A first attempt: Can’t I just anonymize my data?
k-anonymity: information for each person cannot be distinguished from at least k − 1
individuals whose information also appear in the release
Suppose a company is audited for salary discrimination
The auditor can see salaries by gender, age and nationality for each department and office
If the auditor has a friend, an ex, a date, working for the company she will learn the salary
of that person
Reducing data granularity reduces the risk, but also reduces accuracy (fidelity in this case)
Office Dept. Salary D.O.B. Nationality Gender
London IT £##### May 1985 Portuguese Female
Still presents risk of re-identification!. If there are 10 females born between 80-85 in the
whole of UK’s IT department, 9 of them could conspire to learn the salary of the 10th one
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 6 / 41
9. A first attempt: Can’t I just anonymize my data?
k-anonymity: information for each person cannot be distinguished from at least k − 1
individuals whose information also appear in the release
Suppose a company is audited for salary discrimination
The auditor can see salaries by gender, age and nationality for each department and office
If the auditor has a friend, an ex, a date, working for the company she will learn the salary
of that person
Reducing data granularity reduces the risk, but also reduces accuracy (fidelity in this case)
Office Dept. Salary D.O.B. Nationality Gender
London IT £##### May 1985 Portuguese Female
Still presents risk of re-identification!. If there are 10 females born between 80-85 in the
whole of UK’s IT department, 9 of them could conspire to learn the salary of the 10th one
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 6 / 41
10. A first attempt: Can’t I just anonymize my data?
k-anonymity: information for each person cannot be distinguished from at least k − 1
individuals whose information also appear in the release
Suppose a company is audited for salary discrimination
The auditor can see salaries by gender, age and nationality for each department and office
If the auditor has a friend, an ex, a date, working for the company she will learn the salary
of that person
Reducing data granularity reduces the risk, but also reduces accuracy (fidelity in this case)
Office Dept. Salary D.O.B. Nationality Gender
UK IT £##### 1980-1985 - Female
Still presents risk of re-identification!. If there are 10 females born between 80-85 in the
whole of UK’s IT department, 9 of them could conspire to learn the salary of the 10th one
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 6 / 41
11. Anonymized Data Isn’t
Example 1: Mid 1990’s: Massachusetts “Group Insurance Commission” released
“anonymized” data on state employees that showed every hospital visit
Goal was to help researchers. Removed all obvious identifiers such as name, address, and
social security number
MIT PhD student Latanya Sweeney decided to attempt to reverse the anonymization,
requested a copy of the data
Reidentification
William Weld, then Governor of Massachusetts, assured the public that GIC had protected
patient privacy by deleting identifiers. Sweeney started hunting for the Governor’s hospital
records in the GIC data. She knew that Governor Weld resided in Cambridge, Massachusetts,
population 54,000 and 7 ZIP codes. For $20, she purchased the complete voter rolls from the
city of Cambridge, containing the name, address, ZIP code, birth date, and gender of every
voter. Crossing this with the GIC records, Sweeney found Governor Weld with ease: Only 6
people shared his birth date, only 3 of them men, and of them, only he lived in his ZIP code.
Sweeney sent the Governor’s health records (including diagnoses and prescriptions) to his office.
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 7 / 41
12. Anonymized Data Isn’t
Example 1: Mid 1990’s: Massachusetts “Group Insurance Commission” released
“anonymized” data on state employees that showed every hospital visit
Goal was to help researchers. Removed all obvious identifiers such as name, address, and
social security number
MIT PhD student Latanya Sweeney decided to attempt to reverse the anonymization,
requested a copy of the data
Reidentification
William Weld, then Governor of Massachusetts, assured the public that GIC had protected
patient privacy by deleting identifiers. Sweeney started hunting for the Governor’s hospital
records in the GIC data. She knew that Governor Weld resided in Cambridge, Massachusetts,
population 54,000 and 7 ZIP codes. For $20, she purchased the complete voter rolls from the
city of Cambridge, containing the name, address, ZIP code, birth date, and gender of every
voter. Crossing this with the GIC records, Sweeney found Governor Weld with ease: Only 6
people shared his birth date, only 3 of them men, and of them, only he lived in his ZIP code.
Sweeney sent the Governor’s health records (including diagnoses and prescriptions) to his office.
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 7 / 41
13. Anonymized Data Isn’t
Example 2: In 2006, Netflix released data pertaining to how 500,000 of its users rated
movies over a six-year period
Netflix “anonymized” the data before releasing it by removing usernames, but assigned
unique identification numbers to users in order to allow for continuous tracking of user
ratings and trends
Reidentification
Researchers used this information to uniquely identify individual Netflix users by crossing the
data with the public IMDB database. According to the study, if a person has information about
when and how a user rated six movies, that person can identify 99% of people in the Netflix
database.
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 8 / 41
14. Anonymized Data Isn’t
Example 2: In 2006, Netflix released data pertaining to how 500,000 of its users rated
movies over a six-year period
Netflix “anonymized” the data before releasing it by removing usernames, but assigned
unique identification numbers to users in order to allow for continuous tracking of user
ratings and trends
Reidentification
Researchers used this information to uniquely identify individual Netflix users by crossing the
data with the public IMDB database. According to the study, if a person has information about
when and how a user rated six movies, that person can identify 99% of people in the Netflix
database.
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 8 / 41
15. Differential Privacy
A randomised mechanism M : X → Y is -differentially private if for all neighbouring inputs
x x (i.e. x − x 1 = 1) and for all sets of outputs E ⊆ Y we have
P[M(x) ∈ E] ≤ e P M x ∈ E
0 5 10 15 20 25
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
Ratio bounded by e
M(D)
M(D')
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 9 / 41
16. Differential Privacy
A randomised mechanism M : X → Y is -differentially private if for all neighbouring inputs
x x (i.e. x − x 1 = 1) and for all sets of outputs E ⊆ Y we have
P[M(x) ∈ E] ≤ e P M x ∈ E
0 5 10 15 20 25
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
Ratio bounded by e
M(D)
M(D')
Mechanisms:
Randomised response −→ plausible
deniability
Laplace mechanism: e.g. ˜µ = µ + ξ,
ξ ∼ Lap 1
n
Output perturbation
...
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 9 / 41
17. Randomized Response [Warner ’65]
Say you want to release a bit x ∈ {Yes, No}. Do the following:
1 flip a coin
2 if tails, respond truthfully with x
3 if heads, flip a second coin and respond “Yes” if heads; respond “No” if tails
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 10 / 41
18. Randomized Response [Warner ’65]
Say you want to release a bit x ∈ {Yes, No}. Do the following:
1 flip a coin
2 if tails, respond truthfully with x
3 if heads, flip a second coin and respond “Yes” if heads; respond “No” if tails
Claim: Above algorithm satisfies (log 3)-differential privacy
Pr[Response = Yes|x = Yes]
Pr[Response = Yes|x = No]
=
1/2 × 1 + 1/2 × 1/2
1/2 × 0 + 1/2 × 1/2
=
3/4
1/4
= 3 =⇒ e = 3
Same for Pr[Response=No|x=Yes]
Pr[Response=No|x=No] .
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 10 / 41
19. Important Properties
Robustness to post-processing: M is ( , δ)-DP, then f (M) is ( , δ)-DP
Composition: if M1, . . . , Mn are ( , δ)-DP, then g (M1, . . . , Mn) is
( n
i=1 i , n
i=1 δi )-DP
Protects against arbitrary side knowledge
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 11 / 41
20. Outline
1 Alexa AI
2 Algorithmic Privacy
3 Privacy for Text
4 Differential Privacy in Euclidean Spaces
5 Differential Privacy in Hyperbolic Spaces
6 Optimizing the Privacy Utility Trade-off
7 Summary
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 12 / 41
21. User-AI system interaction via natural language
User’s goal: meet some specific need with respect to an
issued query x
Agent’s goal: satisfy the user’s request
Privacy violation: occurs when x is used to make personal
inference. e.g. unrestricted PII present
Mechanism: Modify the query to protect privacy whilst
preserving semantics
Our approach: Metric Differential Privacy
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 13 / 41
22. User-AI system interaction via natural language
User’s goal: meet some specific need with respect to an
issued query x
Agent’s goal: satisfy the user’s request
Privacy violation: occurs when x is used to make personal
inference. e.g. unrestricted PII present
Mechanism: Modify the query to protect privacy whilst
preserving semantics
Our approach: Metric Differential Privacy
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 13 / 41
23. User-AI system interaction via natural language
User’s goal: meet some specific need with respect to an
issued query x
Agent’s goal: satisfy the user’s request
Privacy violation: occurs when x is used to make personal
inference. e.g. unrestricted PII present
Mechanism: Modify the query to protect privacy whilst
preserving semantics
Our approach: Metric Differential Privacy
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 13 / 41
24. User-AI system interaction via natural language
User’s goal: meet some specific need with respect to an
issued query x
Agent’s goal: satisfy the user’s request
Privacy violation: occurs when x is used to make personal
inference. e.g. unrestricted PII present
Mechanism: Modify the query to protect privacy whilst
preserving semantics
Our approach: Metric Differential Privacy
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 13 / 41
25. User-AI system interaction via natural language
User’s goal: meet some specific need with respect to an
issued query x
Agent’s goal: satisfy the user’s request
Privacy violation: occurs when x is used to make personal
inference. e.g. unrestricted PII present
Mechanism: Modify the query to protect privacy whilst
preserving semantics
Our approach: Metric Differential Privacy
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 13 / 41
26. User-AI system interaction via natural language
User’s goal: meet some specific need with respect to an
issued query x
Agent’s goal: satisfy the user’s request
Privacy violation: occurs when x is used to make personal
inference. e.g. unrestricted PII present
Mechanism: Modify the query to protect privacy whilst
preserving semantics
Our approach: Metric Differential Privacy
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 13 / 41
27. Desired Functionality
Intent Query x Modified Query x
GetWeather Will it be colder in Cleveland Will it be colder in Ohio
PlayMusic Play Cantopop on lastfm Play C-pop on lastfm
BookRestaurant Book a restaurant in Milladore Book a restaurant in Wood County
SearchCreativeWork I want to watch Manthan film I want to watch Hindi film
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 14 / 41
28. Word Embeddings
Mapping from words into vectors of real numbers (many ways to do this!)
e.g. Neural network based models (e.g. Word2Vec, GloVe, fastText)
Defines a mapping φ : W → Rn
Nearest neigbours are often synonyms
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 15 / 41
29. Metric Differential Privacy
Recall the definition of DP ...
P[M(x) ∈ E] ≤ e P M x ∈ E for x, x ∈ X s.t. x − x 1
= 1
This can be rewritten into a single equation as:
P[M(x) ∈ E]
P[M(x ) ∈ E]
≤ e x−x 1
Metric differential privacy generalises this to use any valid metric d(x, x ):
P[M(x) ∈ E]
P[M(x ) ∈ E]
≤ e d(x,x )
(easy to see that standard DP is metric DP with d(x, x ) = x − x 1)
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 16 / 41
30. Metric Differential Privacy
Recall the definition of DP ...
P[M(x) ∈ E] ≤ e P M x ∈ E for x, x ∈ X s.t. x − x 1
= 1
This can be rewritten into a single equation as:
P[M(x) ∈ E]
P[M(x ) ∈ E]
≤ e x−x 1
Metric differential privacy generalises this to use any valid metric d(x, x ):
P[M(x) ∈ E]
P[M(x ) ∈ E]
≤ e d(x,x )
(easy to see that standard DP is metric DP with d(x, x ) = x − x 1)
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 16 / 41
31. Metric Differential Privacy
Recall the definition of DP ...
P[M(x) ∈ E] ≤ e P M x ∈ E for x, x ∈ X s.t. x − x 1
= 1
This can be rewritten into a single equation as:
P[M(x) ∈ E]
P[M(x ) ∈ E]
≤ e x−x 1
Metric differential privacy generalises this to use any valid metric d(x, x ):
P[M(x) ∈ E]
P[M(x ) ∈ E]
≤ e d(x,x )
(easy to see that standard DP is metric DP with d(x, x ) = x − x 1)
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 16 / 41
32. Privacy in the Space of Word Embeddings [Feyisetan 2019, Feyisetan 2020]
Given:
w ∈ W: word to be “privatised” from word space W (dictionary)
φ : W → Z: embedding function from word space to embedding space Z (e.g. Rn
)
v = φ(w): corresponding word vector
d : Z × Z → R: distance function in embedding space
Ω( ): the D.P. noise sampling distribution (e.g. Ωi ( ) = Lap 1
n , i = 1, ..., n for Rn
)
Metric DP Mechanism for word embeddings
1 Perturb the word vector: v = v + ξ where ξ ∼ Ω( )
2 The new vector v will not be a word (a.s.)
3 Project back to W: w = arg minw∈W d(v , φ(w)), return w
What do we need?
d satisfies the axioms of a metric (nonnegative, indiscernibles, symmetry, triangle)
A way to sample using Ω in the metric space that respects d and gives us -metric DP
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 17 / 41
33. Privacy in the Space of Word Embeddings [Feyisetan 2019, Feyisetan 2020]
Given:
w ∈ W: word to be “privatised” from word space W (dictionary)
φ : W → Z: embedding function from word space to embedding space Z (e.g. Rn
)
v = φ(w): corresponding word vector
d : Z × Z → R: distance function in embedding space
Ω( ): the D.P. noise sampling distribution (e.g. Ωi ( ) = Lap 1
n , i = 1, ..., n for Rn
)
Metric DP Mechanism for word embeddings
1 Perturb the word vector: v = v + ξ where ξ ∼ Ω( )
2 The new vector v will not be a word (a.s.)
3 Project back to W: w = arg minw∈W d(v , φ(w)), return w
What do we need?
d satisfies the axioms of a metric (nonnegative, indiscernibles, symmetry, triangle)
A way to sample using Ω in the metric space that respects d and gives us -metric DP
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 17 / 41
34. Privacy in the Space of Word Embeddings [Feyisetan 2019, Feyisetan 2020]
Given:
w ∈ W: word to be “privatised” from word space W (dictionary)
φ : W → Z: embedding function from word space to embedding space Z (e.g. Rn
)
v = φ(w): corresponding word vector
d : Z × Z → R: distance function in embedding space
Ω( ): the D.P. noise sampling distribution (e.g. Ωi ( ) = Lap 1
n , i = 1, ..., n for Rn
)
Metric DP Mechanism for word embeddings
1 Perturb the word vector: v = v + ξ where ξ ∼ Ω( )
2 The new vector v will not be a word (a.s.)
3 Project back to W: w = arg minw∈W d(v , φ(w)), return w
What do we need?
d satisfies the axioms of a metric (nonnegative, indiscernibles, symmetry, triangle)
A way to sample using Ω in the metric space that respects d and gives us -metric DP
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 17 / 41
35. Outline
1 Alexa AI
2 Algorithmic Privacy
3 Privacy for Text
4 Differential Privacy in Euclidean Spaces
5 Differential Privacy in Hyperbolic Spaces
6 Optimizing the Privacy Utility Trade-off
7 Summary
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 18 / 41
36. Differential Privacy in the Space of Euclidean Word Embedding
Adding noise to a location always produces
a valid location — a point somewhere on
the earth’s surface
Adding noise to a word embedding
produces a new point in the embedding
space, but it’s A.S. not the location of a
valid word embedding
We perform approximate nearest neighbors
find the nearest valid embedding
Nearest valid embedding could be the
original word itself: in that case, the
original word is returned
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 19 / 41
37. Practical Considerations
To help choose , we define:
Uncertainty statistics for the adversary over the outputs
Indistinguishability statistics: plausible deniability
Find a radius of high protection: guarantee on the likelihood of changing any word in the
embedding vocabulary
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 20 / 41
38. Euclidean Experiments: Setup
Dataset IMDb Enron InsuranceQA
Task type Sentiment analysis Author identification Question answering
Evaluation Metric accuracy accuracy MAP, MRR
Training set size 25, 000 8, 517 12, 887
Test set size 25, 000 850 1, 800
Total word count 5, 958, 157 307, 639 92, 095
Vocabulary size 79, 428 15, 570 2, 745
Sentence length
µ = 42.27
σ = 34.38
µ = 30.68
σ = 31.54
µ = 7.15
σ = 2.06
Scenario 1: Train time protection little access to public data (10%), but abundant
access to private training data (90%); model training is done on the combined dataset
(i.e. public subset + perturbed private subset)
Scenario 2: Test time protection models trained on complete training set; evaluation
on privatized version of the test sets
We used 300-D GloVe word embeddings with biLSTM models
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 21 / 41
39. Results
IMDb reviews – Accuracy vs baseline for different values of ε
200 400 600 800 1000
epsilon
0.0
0.2
0.4
0.6
0.8
1.0
accuracy
Accuracy (at training time)
Accuracy
Baseline
200 400 600 800 1000
epsilon
0.0
0.2
0.4
0.6
0.8
1.0
accuracy
Accuracy (at test time)
Accuracy
Baseline
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 22 / 41
40. Results
Enron emails – Accuracy vs baseline for different values of ε
200 400 600 800 1000
epsilon
0.0
0.2
0.4
0.6
0.8
1.0
accuracy
Accuracy (at training time)
Accuracy
Baseline
200 400 600 800 1000
epsilon
0.0
0.2
0.4
0.6
0.8
1.0
accuracy
Accuracy (at test time)
Accuracy
Baseline
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 22 / 41
41. Results
InsuranceQA – MAP/MRR scores for different values of ε on the dev set
200 400 600 800 1000
epsilon
0.0
0.2
0.4
0.6
0.8
1.0
Scores for dev at training time
MAP on dev
MRR on dev
MAP baseline
MRR baseline
200 400 600 800 1000
epsilon
0.0
0.2
0.4
0.6
0.8
1.0
Scores for dev at test time
MAP on dev
MRR on dev
MAP baseline
MRR baseline
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 22 / 41
42. Privacy Evaluation
In the previous experiments, we didn’t explicitly evaluate privacy
Problem: is an arbitrary number that is hard to interpret
This is especially true in metric DP, since is on a different scale
As we have seen, there are empirical ways to calibrate according to statistics of the word
embeddings
But how do we convince stakeholders that the privacy guarantees are holding, and there
are no bugs?
Solution: machine auditors – machine learning algorithms designed to different types of
privacy attacks on the data
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 23 / 41
43. Privacy Evaluation
In the previous experiments, we didn’t explicitly evaluate privacy
Problem: is an arbitrary number that is hard to interpret
This is especially true in metric DP, since is on a different scale
As we have seen, there are empirical ways to calibrate according to statistics of the word
embeddings
But how do we convince stakeholders that the privacy guarantees are holding, and there
are no bugs?
Solution: machine auditors – machine learning algorithms designed to different types of
privacy attacks on the data
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 23 / 41
44. Privacy Evaluation
In the previous experiments, we didn’t explicitly evaluate privacy
Problem: is an arbitrary number that is hard to interpret
This is especially true in metric DP, since is on a different scale
As we have seen, there are empirical ways to calibrate according to statistics of the word
embeddings
But how do we convince stakeholders that the privacy guarantees are holding, and there
are no bugs?
Solution: machine auditors – machine learning algorithms designed to different types of
privacy attacks on the data
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 23 / 41
45. Privacy Evaluation
In the previous experiments, we didn’t explicitly evaluate privacy
Problem: is an arbitrary number that is hard to interpret
This is especially true in metric DP, since is on a different scale
As we have seen, there are empirical ways to calibrate according to statistics of the word
embeddings
But how do we convince stakeholders that the privacy guarantees are holding, and there
are no bugs?
Solution: machine auditors – machine learning algorithms designed to different types of
privacy attacks on the data
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 23 / 41
46. Privacy Evaluation
In the previous experiments, we didn’t explicitly evaluate privacy
Problem: is an arbitrary number that is hard to interpret
This is especially true in metric DP, since is on a different scale
As we have seen, there are empirical ways to calibrate according to statistics of the word
embeddings
But how do we convince stakeholders that the privacy guarantees are holding, and there
are no bugs?
Solution: machine auditors – machine learning algorithms designed to different types of
privacy attacks on the data
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 23 / 41
47. Privacy Evaluation
In the previous experiments, we didn’t explicitly evaluate privacy
Problem: is an arbitrary number that is hard to interpret
This is especially true in metric DP, since is on a different scale
As we have seen, there are empirical ways to calibrate according to statistics of the word
embeddings
But how do we convince stakeholders that the privacy guarantees are holding, and there
are no bugs?
Solution: machine auditors – machine learning algorithms designed to different types of
privacy attacks on the data
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 23 / 41
48. Machine Auditors
Probabilistic record linkage auditing attack
Objective: link a user in a public dataset, to a user in a (leaked) private dataset.
Attack simulation: simulate public and “leaked” datasets by randomly splitting
an initial dataset. The attack takes advantage of rare words and queries issued
by users. A vector of word counts can be extracted from user queries and used to
perform the linkage.
Assumptions: attacker is able to narrow the attack set (using side knowledge)
Evaluation: how many accurate links can the attacker reconstruct?
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 24 / 41
49. Machine Auditors
Membership auditing attack [Shokri et al ’17, Song & Shmatikov ’18]
Objective: identify whether an individual’s data (queries) were used in the
training set of an ML model.
Attack simulation: train ML model on queries from m users. Train “shadow”
models using data from a different set of n users. The attack model is a classifier
built using the output of the shadow models
Assumptions: attacker is able to narrow the attack set (using side knowledge)
Evaluation: can the attacker correctly detect m users inside and outside the
model’s dataset
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 24 / 41
50. Outline
1 Alexa AI
2 Algorithmic Privacy
3 Privacy for Text
4 Differential Privacy in Euclidean Spaces
5 Differential Privacy in Hyperbolic Spaces
6 Optimizing the Privacy Utility Trade-off
7 Summary
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 25 / 41
51. Hyperbolic Spaces
(a) (b)
(a) Projection of a point in the Lorentz model Hn to the Poincaré model
(b) WebIsADb is-a relationships in GloVe vocabulary on B2 Poincaré disk
Continuous analog of a tree
structure
Natural language captures
hypernomy and hyponomy
−→ embeddings require fewer
dimensions
Use models of Hyperbolic space -
projections into Euclidean space
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 26 / 41
52. Hyperbolic Differential Privacy
Distances in n−dimensional Poincaré ball are given by:
dBn (u, v) = arcosh 1 + 2
u − v 2
(1 − u 2
)(1 − v 2
)
Claim: dBn (u, v) is a valid metric. Proof (via Lorentzian model) in the paper
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 27 / 41
53. Hyperbolic Noise
Recall for Euclidean metric DP, we use Laplacian
noise to achieve −mDP, i.e:
ξ ∼ Lap
1
n
We derive the Hyperbolic Laplace distribution:
p(x|µ = 0, ε) =
1 + ε
2 2F1(1, ε, 2 + ε, −1)
−
2
x − 1
− 1
−ε
where 2F1(a, b; c, z) is the hypergeometric function
For sampling, we developed a Lorentzian Metropolis
Hastings sampler (see paper)
−0.4 −0.2 0.0 0.2 0.4
−0.4
−0.2
0.0
0.2
0.4
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 28 / 41
54. Hyperbolic Noise
Recall for Euclidean metric DP, we use Laplacian
noise to achieve −mDP, i.e:
ξ ∼ Lap
1
n
We derive the Hyperbolic Laplace distribution:
p(x|µ = 0, ε) =
1 + ε
2 2F1(1, ε, 2 + ε, −1)
−
2
x − 1
− 1
−ε
where 2F1(a, b; c, z) is the hypergeometric function
For sampling, we developed a Lorentzian Metropolis
Hastings sampler (see paper)
−0.4 −0.2 0.0 0.2 0.4
−0.4
−0.2
0.0
0.2
0.4
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 28 / 41
55. Hyperbolic Noise
Recall for Euclidean metric DP, we use Laplacian
noise to achieve −mDP, i.e:
ξ ∼ Lap
1
n
We derive the Hyperbolic Laplace distribution:
p(x|µ = 0, ε) =
1 + ε
2 2F1(1, ε, 2 + ε, −1)
−
2
x − 1
− 1
−ε
where 2F1(a, b; c, z) is the hypergeometric function
For sampling, we developed a Lorentzian Metropolis
Hastings sampler (see paper)
−0.4 −0.2 0.0 0.2 0.4
−0.4
−0.2
0.0
0.2
0.4
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 28 / 41
56. Hyperbolic Privacy Experiments 1
Task: obfuscation vs. Koppel’s authorship attribution algorithm
Datasets: TPAN@Clef tasks, correct author predictions (lower=better)
Pan-11 Pan-12
small large set-A set-C set-D set-I
0.5 36 72 4 3 2 5
1 35 73 3 3 2 5
2 40 78 4 3 2 5
8 65 116 4 5 4 5
∞ 147 259 6 6 6 12
Correct author predictions (lower is better)
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 29 / 41
57. Hyperbolic Privacy Experiments 2
Task: expected privacy vs Euclidean baseline
Datasets: 100/200/300d GloVe embeddings
expected value Nw
ε worst-case Nw hyp-100 euc-100 euc-200 euc-300
0.125 134 1.25 38.54 39.66 39.88
0.5 148 1.62 42.48 43.62 43.44
1 172 2.07 48.80 50.26 53.82
2 297 3.92 92.42 93.75 90.90
8 960 140.67 602.21 613.11 587.68
Privacy comparisons (lower Nw is better)
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 30 / 41
59. Outline
1 Alexa AI
2 Algorithmic Privacy
3 Privacy for Text
4 Differential Privacy in Euclidean Spaces
5 Differential Privacy in Hyperbolic Spaces
6 Optimizing the Privacy Utility Trade-off
7 Summary
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 32 / 41
61. Example: Differentially Private SGD
Algorithm 1: Differentially Private SGD
Input: dataset z = (z1, . . . , zn)
Hyperparameters: learning rate η, mini-batch size m, number of epochs T, noise variance
σ2, clipping norm L
Initialize w ← 0
for t ∈ [T] do
for k ∈ [n/m] do
Sample S ⊂ [n] with |S| = m uniformly at random
Let g ← 1
m j∈S clipL( (zj , w)) + 2L
m N(0, σ2I)
Update w ← w − ηg
return w
5+ hyper-parameters affecting both privacy and utility
For deep learning applications we only have empirical utility (not analyitic)
How do we find the hyperparameters that give us an optimal trade-off?
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 34 / 41
62. The Privacy-Utility Pareto Front
Pareto-Optimal Points
Hyper-parameter Space
Privacy Loss
Error
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 35 / 41
63. The Privacy-Utility Pareto Front
Pareto-Optimal Points
Hyper-parameter Space
Privacy Loss
Error
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 35 / 41
64. The Privacy-Utility Pareto Front
Pareto-Optimal Points
Hyper-parameter Space
Privacy Loss
Error
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 35 / 41
65. The Privacy-Utility Pareto Front
Pareto-Optimal Points
Hyper-parameter Space
Privacy Loss
Error
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 35 / 41
66. The Privacy-Utility Pareto Front
Pareto-Optimal Points
Hyper-parameter Space
Privacy Loss
Error
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 35 / 41
67. The Privacy-Utility Pareto Front
Pareto-Optimal Points
Hyper-parameter Space
Privacy Loss
Error
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 35 / 41
68. The Privacy-Utility Pareto Front
Pareto-Optimal Points
Hyper-parameter Space
Privacy Loss
Error
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 35 / 41
69. The Privacy-Utility Pareto Front
Pareto-Optimal Points
Hyper-parameter Space
Privacy Loss
Error
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 35 / 41
70. The Privacy-Utility Pareto Front
Pareto-Optimal Points
Hyper-parameter Space
Privacy Loss
Error
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 35 / 41
71. Bayesian Optimization
Gradient-free optimization for black-box functions
Widely used in applications (HPO in ML, scheduling & planning, experimental design ...)
In multi-objective problems, BO aims to learn the Pareto front with a minimal number of
evaluations.
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 36 / 41
72. Bayesian Optimization
Gradient-free optimization for black-box functions
Widely used in applications (HPO in ML, scheduling & planning, experimental design ...)
In multi-objective problems, BO aims to learn the Pareto front with a minimal number of
evaluations.
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 36 / 41
73. Bayesian Optimization
Gradient-free optimization for black-box functions
Widely used in applications (HPO in ML, scheduling & planning, experimental design ...)
In multi-objective problems, BO aims to learn the Pareto front with a minimal number of
evaluations.
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 36 / 41
74. Bayesian Optimization
Gradient-free optimization for black-box functions
Widely used in applications (HPO in ML, scheduling & planning, experimental design ...)
In multi-objective problems, BO aims to learn the Pareto front with a minimal number of
evaluations.
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 36 / 41
75. Bayesian Optimization
Gradient-free optimization for black-box functions
Widely used in applications (HPO in ML, scheduling & planning, experimental design ...)
In multi-objective problems, BO aims to learn the Pareto front with a minimal number of
evaluations.
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 36 / 41
76. Bayesian Optimization
Gradient-free optimization for black-box functions
Widely used in applications (HPO in ML, scheduling & planning, experimental design ...)
In multi-objective problems, BO aims to learn the Pareto front with a minimal number of
evaluations.
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 36 / 41
77. Bayesian Optimization
Gradient-free optimization for black-box functions
Widely used in applications (HPO in ML, scheduling & planning, experimental design ...)
In multi-objective problems, BO aims to learn the Pareto front with a minimal number of
evaluations.
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 36 / 41
78. Bayesian Optimization
Gradient-free optimization for black-box functions
Widely used in applications (HPO in ML, scheduling & planning, experimental design ...)
In multi-objective problems, BO aims to learn the Pareto front with a minimal number of
evaluations.
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 36 / 41
79. Bayesian Optimization
Gradient-free optimization for black-box functions
Widely used in applications (HPO in ML, scheduling & planning, experimental design ...)
In multi-objective problems, BO aims to learn the Pareto front with a minimal number of
evaluations.
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 36 / 41
80. Bayesian Optimization
Gradient-free optimization for black-box functions
Widely used in applications (HPO in ML, scheduling & planning, experimental design ...)
In multi-objective problems, BO aims to learn the Pareto front with a minimal number of
evaluations.
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 36 / 41
81. DPareto
DPareto
Repeat:
1 For each objective (privacy, utility):
1 Fit a surrogate model (Gaussian process (GP)) using the available dataset
2 Calculate the predictive distribution using the GP mean and variance functions
2 Use the posterior of the surrogate models to form an acquisition function
3 Collect the next point at the estimated global max. of the acquisition function
until budget exhausted
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 37 / 41
82. DPareto vs Random Sampling
28
)
20
22
24
26
28
Sampled points
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
PFhypervolume
Hypervolume Evolution
MLP1 (RS)
MLP1 (BO)
MLP2 (RS)
MLP2 (BO)
10−1
100
101
ε
0.0
0.2
0.4
0.6
0.8
1.0
Classificationerror
MLP2 Pareto Fronts
Initial
+256 RS
+256 BO
10−1
100
101
ε
0.16
0.18
0.20
0.22
0.24
Classificationerror
LogReg+SGD Samples
1500 RS
256 BO
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 38 / 41
83. Outline
1 Alexa AI
2 Algorithmic Privacy
3 Privacy for Text
4 Differential Privacy in Euclidean Spaces
5 Differential Privacy in Hyperbolic Spaces
6 Optimizing the Privacy Utility Trade-off
7 Summary
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 39 / 41
84. Summary: Privacy Enhancing Technologies
Privacy
Privacy risks can be counter-intuitive and tricky to formalize
High-dimensional data and side knowledge make privacy hard
Semantic guarantees (eg. DP) behave better than syntactic ones (eg.
k-anonymization)
Differential privacy is a mature privacy enhancing technology
Metric DP provides local plausible deniability, accuracy can be good even in
cases with an infinite number of outcomes
Empirical privacy-utility trade-off evaluation enables application-specific decisions
Bayesian optimization provides computationally efficient method to recover the
Pareto front (esp. with large number of hyper-parameters)
Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 40 / 41