6. THE BIG PICTURE
• Apple crash reporter
• Windows error reporting
• Ubuntu Apport
• Gnome BugBuddy
• Mozilla / Google Breakpad
• many others
• Chilimbi and colleagues ’09
• Elbaum and Diep ’05
• Hilbert and Redmiles ’00
• Liblit and colleagues ’05
• Pavlopoulou andYoung ’99
• many others
12. Handling concerns in practice
PRIVACY CONCERNS
• Ignore them
• Privacy policies
• Collect limited amounts of
information
• less likely to be sensitive
• can rely on user checking
21. Inputs that satisfy
F’s path condition Sensitive
input (I)
that causes F
Input domain
Inputs that
cause F
INTUITION
Anonymized
input (I’)
that also
causes F
22. CASTRO AND COLLEAGUES’ TECHNIQUE
(PATH CONDITION GENERATION)
Path condition: set of constraints on a program’s
inputs that encode the conditions necessary for a
specific path to be executed.
23. boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
CASTRO AND COLLEAGUES’ TECHNIQUE
(PATH CONDITION GENERATION)
24. boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
CASTRO AND COLLEAGUES’ TECHNIQUE
(PATH CONDITION GENERATION)
5 3 0
(sensitive)
25. Path Condition:
Symbolic State:
boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
CASTRO AND COLLEAGUES’ TECHNIQUE
(PATH CONDITION GENERATION)
5 3 0
(sensitive)
26. Path Condition:
Symbolic State:
boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
CASTRO AND COLLEAGUES’ TECHNIQUE
(PATH CONDITION GENERATION)
5 3 0
x→i1
y→i2
z→i3
(sensitive)
27. Path Condition:
Symbolic State:
boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
CASTRO AND COLLEAGUES’ TECHNIQUE
(PATH CONDITION GENERATION)
5 3 0
x→i1
y→i2
z→i3
(sensitive)
28. Path Condition:
i1 <= 5
Symbolic State:
boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
CASTRO AND COLLEAGUES’ TECHNIQUE
(PATH CONDITION GENERATION)
5 3 0
x→i1
y→i2
z→i3
(sensitive)
29. Path Condition:
i1 <= 5
Symbolic State:
boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
CASTRO AND COLLEAGUES’ TECHNIQUE
(PATH CONDITION GENERATION)
5 3 0
x→i1
y→i2
z→i3
(sensitive)
30. Path Condition:
i1 <= 5
Symbolic State:
a→i1*2
boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
CASTRO AND COLLEAGUES’ TECHNIQUE
(PATH CONDITION GENERATION)
5 3 0
x→i1
y→i2
z→i3
(sensitive)
31. Path Condition:
i1 <= 5
Symbolic State:
a→i1*2
boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
CASTRO AND COLLEAGUES’ TECHNIQUE
(PATH CONDITION GENERATION)
5 3 0
x→i1
y→i2
z→i3
(sensitive)
32. Path Condition:
i1 <= 5
Symbolic State:
a→i1*2
boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
CASTRO AND COLLEAGUES’ TECHNIQUE
(PATH CONDITION GENERATION)
5 3 0
x→i1
y→i2
z→i3
∧ i2+i1*2 > 10
(sensitive)
33. Path Condition:
i1 <= 5
Symbolic State:
a→i1*2
boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
CASTRO AND COLLEAGUES’ TECHNIQUE
(PATH CONDITION GENERATION)
5 3 0
x→i1
y→i2
z→i3
∧ i2+i1*2 > 10
(sensitive)
34. Path Condition:
i1 <= 5
Symbolic State:
a→i1*2
boolean foo(int x, int y, int z) {
if(x <= 5) {
int a = x * 2;
if(y + a > 10) {
if(z == 0) {
return true;
}
}
}
return false;
}
CASTRO AND COLLEAGUES’ TECHNIQUE
(PATH CONDITION GENERATION)
5 3 0
x→i1
y→i2
z→i3
∧ i2+i1*2 > 10
∧ i3 == 0
(sensitive)
70. ASSUMPTIONS
1. The failure f is observable and can be detected with an
assertion.
‣ common to all debugging techniques; holds in most, if not all, cases.
71. ASSUMPTIONS
1. The failure f is observable and can be detected with an
assertion.
‣ common to all debugging techniques; holds in most, if not all, cases.
2. Any input that satisfies the path condition results in f.
• Non-determinism
‣ common to all debugging techniques; requires a deterministic
replay mechanism
• Implicit checks (e.g., division by zero)
‣ likely that they do not involve relevant inputs
‣ make implicit checks explicit (e.g., 100/x → assert x != 0)
76. 1. The failure f is observable and can be detected with an
assertion.
‣ common to all debugging techniques; holds in most, if not all, cases.
77. 1. The failure f is observable and can be detected with an
assertion.
‣ common to all debugging techniques; holds in most, if not all, cases.
2. Any input that satisfies the path condition results in f.
• Non-determinism
‣ common to all debugging techniques; requires a deterministic
replay mechanism
• Implicit checks (e.g., division by zero)
‣ likely that they do not involve relevant inputs
‣ make implicit checks explicit (e.g., 100/x → assert x != 0)
81. EVALUATION
1 Feasibility
Can the approach
generate, in a reasonable
amount of time,
anonymized inputs that
reproduce a failure?
Strength
How much information
about the original inputs
is revealed?
2
82. EVALUATION
Effectiveness
Are the anonymized
inputs safe to send to
developers?
31 Feasibility
Can the approach
generate, in a reasonable
amount of time,
anonymized inputs that
reproduce a failure?
Strength
How much information
about the original inputs
is revealed?
2
83. EVALUATION
Effectiveness
Are the anonymized
inputs safe to send to
developers?
31 Feasibility
Can the approach
generate, in a reasonable
amount of time,
anonymized inputs that
reproduce a failure?
Strength
How much information
about the original inputs
is revealed?
2 4 Improvement
Does the use of path
condition relaxation and
breakable input
conditions provide any
benefits over the basic
approach?
94. Average % Bits Revealed Average % Residue
RQ2: STRENGTH
Measures how many inputs
that satisfy the path
condition
Little
information revealed
95. Average % Bits Revealed Average % Residue
RQ2: STRENGTH
Measures how many inputs
that satisfy the path
condition
Lots of
information revealed
96. Average % Bits Revealed Average % Residue
RQ2: STRENGTH
Measures how many inputs
that satisfy the path
condition
Measures how much of the
anonymized input is identical
to the original input
AAAAAA
secret
AAAAAA
...
AAAAAA
BBBBBB
secret
BBBBBB
...
BBBBBB
I’
Lots of
information revealed
I
97. Average % Bits Revealed Average % Residue
RQ2: STRENGTH
Measures how many inputs
that satisfy the path
condition
Measures how much of the
anonymized input is identical
to the original input
AAAAAA
secret
AAAAAA
...
AAAAAA
BBBBBB
secret
BBBBBB
...
BBBBBB
I’
Lots of
information revealed
I
106. RQ3: EFFECTIVENESS
HTMLPARSER
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/
xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<title>james clause @ gatech | home</title>
<style type="text/css" media="screen" title="">
<!--/*--><![CDATA[<!--*/
body {
margin: 0px;
...
/*]]>*/-->
</style>
</head>
<body>
...
</body>
The portions of the inputs that remain after
anonymization tend to be structural in nature and
therefore are safe to send to developers
113. RELATED WORK
• Castro and colleagues ’08
• Broadwell and colleagues ’03
• Uses dynamic taint analysis to identify and remove sensitive information in crash
dumps.
114. RELATED WORK
• Castro and colleagues ’08
• Broadwell and colleagues ’03
• Uses dynamic taint analysis to identify and remove sensitive information in crash
dumps.
• Wang and colleagues ’08
115. RELATED WORK
• Castro and colleagues ’08
• Broadwell and colleagues ’03
• Uses dynamic taint analysis to identify and remove sensitive information in crash
dumps.
• Wang and colleagues ’08
• Uses a combination of dynamic taint analysis, symbolic execution and Q/A with a
client machine to construct anonymized inputs
116. RELATED WORK
• Castro and colleagues ’08
• Broadwell and colleagues ’03
• Uses dynamic taint analysis to identify and remove sensitive information in crash
dumps.
• Wang and colleagues ’08
• Uses a combination of dynamic taint analysis, symbolic execution and Q/A with a
client machine to construct anonymized inputs
• Data set anonymization techniques (e.g., k-anonymization)
117. RELATED WORK
• Castro and colleagues ’08
• Broadwell and colleagues ’03
• Uses dynamic taint analysis to identify and remove sensitive information in crash
dumps.
• Wang and colleagues ’08
• Uses a combination of dynamic taint analysis, symbolic execution and Q/A with a
client machine to construct anonymized inputs
• Data set anonymization techniques (e.g., k-anonymization)
• Budi and colleagues ’11
118. RELATED WORK
• Castro and colleagues ’08
• Broadwell and colleagues ’03
• Uses dynamic taint analysis to identify and remove sensitive information in crash
dumps.
• Wang and colleagues ’08
• Uses a combination of dynamic taint analysis, symbolic execution and Q/A with a
client machine to construct anonymized inputs
• Data set anonymization techniques (e.g., k-anonymization)
• Budi and colleagues ’11
• Grechanik and colleagues ’11
119. RELATED WORK
• Castro and colleagues ’08
• Broadwell and colleagues ’03
• Uses dynamic taint analysis to identify and remove sensitive information in crash
dumps.
• Wang and colleagues ’08
• Uses a combination of dynamic taint analysis, symbolic execution and Q/A with a
client machine to construct anonymized inputs
• Data set anonymization techniques (e.g., k-anonymization)
• Budi and colleagues ’11
• Grechanik and colleagues ’11
• Dynamic symbolic execution techniques
121. FUTURE WORK
• Additional quality metrics that:
• consider additional aspects of privacy loss
• consider the relative sensitivity of different inputs
• are intuitive and easy to use
122. FUTURE WORK
• Additional quality metrics that:
• consider additional aspects of privacy loss
• consider the relative sensitivity of different inputs
• are intuitive and easy to use
• Conduction additional (human) studies
• additional (larger) subjects
123. FUTURE WORK
• Additional quality metrics that:
• consider additional aspects of privacy loss
• consider the relative sensitivity of different inputs
• are intuitive and easy to use
• Conduction additional (human) studies
• additional (larger) subjects
• Investigate the combination of anonymization and
minimization
125. SUMMARY
1. An approach for automatically anonymizing failure-inducing
inputs
• extends Castro and colleagues’ technique through the
novel concepts of path condition relaxation and
breakable input conditions
126. SUMMARY
1. An approach for automatically anonymizing failure-inducing
inputs
• extends Castro and colleagues’ technique through the
novel concepts of path condition relaxation and
breakable input conditions
2. An empirical evaluation that demonstrates, for the subjects
considered, our approach is:
• feasible — generates anonymized inputs in < 10 minutes
• effective — anonymized inputs did not contain sensitive
information
• an improvement over the state-of-the-art