Dynamic taint analysis is a well-known information flow analysis problem with many possible applications. Taint tracking allows for analysis of application data flow by assigning labels to data, and then propagating those labels through data flow. Taint tracking systems traditionally compromise among performance, precision, soundness, and portability. Performance can be critical, as these systems are often intended to be deployed to production environments, and hence must have low overhead. To be deployed in security-conscious settings, taint tracking must also be sound and precise. Dynamic taint tracking must be portable in order to be easily deployed and adopted for real world purposes, without requiring recompilation of the operating system or language interpreter, and without requiring access to application source code.
We present Phosphor, a dynamic taint tracking system for the Java Virtual Machine (JVM) that simultaneously achieves our goals of performance, soundness, precision, and portability. Moreover, to our knowledge, it is the first portable general purpose taint tracking system for the JVM. We evaluated Phosphor's performance on two commonly used JVM languages (Java and Scala), on two successive revisions of two commonly used JVMs (Oracle's HotSpot and OpenJDK's IcedTea) and on Android's Dalvik Virtual Machine, finding its performance to be impressive: as low as 3% (53% on average; 220% at worst) using the DaCapo macro benchmark suite.
Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs
1. Phosphor: Illuminating Dynamic
Data Flow in Commodity JVMs
Jonathan Bell and Gail Kaiser
Columbia University, New York, NY USA
Fork me on Github
OOPSLA 2014 @_jon_bell_ October 22, 2014
2. Dynamic Data Flow Analysis:
Taint Tracking
Output that is derived
from tainted input
Inputs Application Outputs
Flagged (“Tainted”)
Input
OOPSLA 2014 @_jon_bell_ October 22, 2014
3. Taint Tracking: Applications
• End-user privacy testing: Does this application
send my personal data to remote servers?
• SQL injection attack avoidance: Do SQL queries
contain raw user input
• Debugging: Which inputs are relevant to the
current (crashed) application state?
OOPSLA 2014 @_jon_bell_ October 22, 2014
4. Qualities of a Successful
Analysis
OOPSLA 2014 @_jon_bell_ October 22, 2014
9. “Normal” Taint Tracking
• Associate tags with data, then propagate the tags
• Approaches:
• Operating System modifications [Vandebogart ’07],
[Zeldovich ’06]
• Language interpreter modifications [Chandra ’07],
[Enck ’10], [Nair ’07], [Son ’13]
• Source code modifications [Lam ‘06], [Xu ’06]
• Binary instrumentation of applications [Clause ’07],
[Cheng ’06], [Kemerlis ’12]
Hard to be sound, precise, and performant
Not portable
OOPSLA 2014 @_jon_bell_ October 22, 2014
10. Phosphor
• Leverages benefits of interpreter-based
approaches (information about variables) but fully
portably
• Instruments all byte code that runs in the JVM
(including the JRE API) to track taint tags
• Add a variable for each variable
• Adds propagation logic
OOPSLA 2014 @_jon_bell_ October 22, 2014
11. Key contribution:
How do we efficiently store meta-data
for every variable without modifying the
JVM itself?
OOPSLA 2014 @_jon_bell_ October 22, 2014
12. JVM Type Organization
• Primitive Types
• int, long, char, byte, etc.
• Reference Types
• Arrays, instances of classes
• All reference types are assignable to
java.lang.Object
OOPSLA 2014 @_jon_bell_ October 22, 2014
13. Phosphor’s taint tag storage
Local
variable
Method
argument
Return
value
Operand
stack
Field
Object Stored as a field of the object
Object
array
Stored as a field of each object
Primitive
Primitive
array
Shadow
variable
Shadow
array
variable
Shadow
argument
Shadow
array
argument
"Boxed"
"Boxed"
Below the
value on stack
Array below
value on stack
Shadow
field
Shadow
array field
OOPSLA 2014 @_jon_bell_ October 22, 2014
14. Taint Propagation
• Modify all byte code instructions to be taint-aware
by adding extra instructions
• Examples:
• Arithmetic -> combine tags of inputs
• Load variable to stack -> Also load taint tag to
stack
• Modify method calls to pass taint tags
OOPSLA 2014 @_jon_bell_ October 22, 2014
16. Challenge 1: Type Mayhem
java.lang.Object
Primitive Types
Sometimes
has extra
variable!
instanceof instanceof
Instances of
classes (Objects)
Primitive Arrays
Always has
extra variable!
Always has
extra variable
Never has extra
variable!
OOPSLA 2014 @_jon_bell_ October 22, 2014
17. Challenge 1: Type Mayhem
byte[] array = new byte[5];
Object ret = array;
return ret;
int[] array_tag = new int[5];
byte[] array = new byte[5];
Object ret = new TaintedByteArray(array_tag,array);
Solution 1: Box taint tag with array when we lose
type information
OOPSLA 2014 @_jon_bell_ October 22, 2014
18. Challenge 2: Native
Code
We can’t instrument everything!
OOPSLA 2014 @_jon_bell_ October 22, 2014
19. Challenge 2: Native Code
public int hashCode() {
return super.hashCode() * field.hashCode();
}
public native int hashCode();
Solution: Wrappers. Rename every method, and leave a
wrapper behind
OOPSLA 2014 @_jon_bell_ October 22, 2014
20. Challenge 2: Native Code
public int hashCode() {
return super.hashCode() * field.hashCode();
}
public native int hashCode();
Solution: Wrappers. Rename every method, and leave a
wrapper behind
public TaintedInt hashCode$$wrapper() {
return new TaintedInt(0, hashCode());
}
OOPSLA 2014 @_jon_bell_ October 22, 2014
21. Challenge 2: Native Code
Wrappers work both ways: native code can still call a
method with the old signature
public int[] someMethod(byte in)
OOPSLA 2014 @_jon_bell_ October 22, 2014
22. Challenge 2: Native Code
Wrappers work both ways: native code can still call a
method with the old signature
public int[] someMethod(byte in)
public TaintedIntArray someMethod$$wrapper(int in_tag, byte in)
{
//The original method "someMethod", but with taint tracking
}
OOPSLA 2014 @_jon_bell_ October 22, 2014
23. Challenge 2: Native Code
Wrappers work both ways: native code can still call a
method with the old signature
public int[] someMethod(byte in)
{
return someMethod$$wrapper(0, in).val;
}
public TaintedIntArray someMethod$$wrapper(int in_tag, byte in)
{
//The original method "someMethod", but with taint tracking
}
OOPSLA 2014 @_jon_bell_ October 22, 2014
24. Challenge 2: Native Code
Wrappers work both ways: native code can still call a
method with the old signature
public int[] someMethod(byte in)
{
return someMethod$$wrapper(0, in).val;
}
public TaintedIntArray someMethod$$wrapper(int in_tag, byte in)
{
//The original method "someMethod", but with taint tracking
}
OOPSLA 2014 @_jon_bell_ October 22, 2014
25. Design Limitations
• Tracking through native code
• Return value’s tag becomes combination of all
parameters (heuristic); not found to be a problem
in our evaluation
• Tracks explicit data flow only (not through control
flow)
OOPSLA 2014 @_jon_bell_ October 22, 2014
27. Soundness & Precision
• DroidBench - series of unit tests for Java taint
tracking
• Passed all except for implicit flows (intended
behavior)
OOPSLA 2014 @_jon_bell_ October 22, 2014
28. Performance
• Macrobenchmarks (DaCapo, Scalabench)
• Microbenchmarks
• Versus TaintDroid [Enck, 2010] on CaffeineMark
• Versus Trishul [Nair, 2008] on JavaGrande
OOPSLA 2014 @_jon_bell_ October 22, 2014
31. Microbenchmarks
Phosphor (Hotspot 7) and Trishul Relative Overhead
0%
20%
40%
60%
80%
100%
120%
Arithmetic
Assign
Cast
Create
Exception
Loop
Math
Method
Serial
Relative Runtime Overhead
Phoshpor
Trishul
OOPSLA 2014 @_jon_bell_ October 22, 2014
32. Microbenchmarks
Phosphor (Hotspot 7) and Trishul Relative Overhead
0%
20%
40%
60%
80%
100%
120%
Arithmetic
Assign
Cast
Create
Exception
Loop
Math
Method
Serial
Relative Runtime Overhead
Phoshpor
Phoshpor
Trishul
Trishul
Kaffe
OOPSLA 2014 @_jon_bell_ October 22, 2014
33. Microbenchmarks:
Taintdroid
• Taintdroid: Taint tracking for Android’s Dalvik VM
[Enck, 2010]
• Not very precise: one tag per array (not per array
element!)
• Applied Phosphor to Android!
OOPSLA 2014 @_jon_bell_ October 22, 2014
35. Portability
JVM Version(s) Success?
Oracle (Hotspot) 1.7.0_45, 1.8.0_0 Yes
OpenJDK 1.7.0_45, 1.8.0_0 Yes
Android Dalvik 4.3.1 Yes
Apache Harmony 6.0M3 Yes
Kaffe VM 1.1.9 Yes
Jikes RVM 3.1.3 No, but may be
possible with more work
OOPSLA 2014 @_jon_bell_ October 22, 2014
36. Future Work & Extension
• This is a general approach for tracking metadata
with variables in unmodified JVMs
• Could track any sort of data in principle
• We have already extended this approach to track
path constraints on inputs
OOPSLA 2014 @_jon_bell_ October 22, 2014
38. Fork me on Github
Phosphor: Illuminating Dynamic
Data Flow in Commodity JVMs
Jonathan Bell and Gail Kaiser
Columbia University
jbell@cs.columbia.edu @_jon_bell_
https://github.com/Programming-Systems-Lab/Phosphor
Artifact * Consistent * Complete * SLA * Well AEC
OOP*
Reuse Documented * * to asy EEvalua* ted