4. Update Cycle of Apps
> 100K: 10 days
1K - 100K: 17 days
https://www.nowsecure.com/blog/2015/06/09/understanding-android-s-application-update-cycles/
4
5. Program Analysis Tools for
Cost-effective and Timely
Identification of
Malware and Vulnerable Apps
5
6. STAMP
• A tool for static analysis of Android apps.
• Was developed as part of a DARPA project
at Stanford
• Is licensed by Quixey Inc.
• Can be licensed for commercial use:
http://techfinder.stanford.edu/technology_detail.php?ID=30312
• I was responsible for end-to-end design
and implementation of its core components. 6
16. API Model
No source-to-sink flow if List class no models
String deviceId = getDeviceId();
List list = new LinkedList();
l.add(deviceId);
String s = l.get(0);
sendToInternet(s);
16
17. API Model
• Models flow of
values across app-
framework
boundary
• Based on abstract
semantics of
analysis
java.util.LinkedList.java:
private Object elem;
public void add(Object elem){
this.elem = elem;
}
public void get(int i){
return this.elem;
}
17
18. Callback Model
Flow is not detected if onClick is not reachable.
button.setOnClickListener(
new OnClickListener(){
public void onClick(){
String deviceId = getDeviceId();
sendToInternet(deviceId);
}
});
18
19. Callback Model
public void callCallbacks()
{
onKeyDown(…);
onTouchDown(…);
….
}
Model for View class Example app code
19
View v = new View(…);
20. Callback Model
public void callCallbacks()
{
onKeyDown(…);
onTouchDown(…);
….
}
View v = new View(…);
v.callCallbacks();
Model for View class Example app code
20
21. Callgraph Construction in Java
21
class A {
void foo(){ }
}
class B extends A {
void foo(){ }
}
A x = new B();
x.foo();
A y = new A();
y.foo();
B:foo is called
A:foo is called
22. Phantom Objects Model
• p does not point to any object
• So, no outgoing call edge from p.foo()
T p = aMethodWithoutModel();
p.foo();
22
23. • Returns a special
abstract object
• Default model for
methods that don’t
have any model
• Required to build
sound call graph
Phantom Objects Model
23
T aMethodWithoutModel()
{
return new AnySubTypeOfT();
}
24. Talk Outline
• Models in Stamp
• Analysis Overview
• Implementation and Experiments
• Semantic Signatures for Malware Detection
24
25. Example
m = source();
n = foo(m);
sink(n);
Is there a flow from source to sink?
@Flow(from=”u”,to=”@return”)
B foo(A u){
y = new B();
return y;
}
@Flow(from=”z”,to=”!SINK”)
void sink(B z){
}
@Flow(from=”$SRC”,to=”@return”)
A source(){
x = new A();
return x;
}
Yes
Model of method foo
25
32. Putting Graphs Together
Rsource
z
Rfoo u
m
assign
assign
n
assign
assign
B y
Rfoo u
xfer
new
assign
!SINKz sink
A x
new
Rsource
assign
$SRC
src
new
new
assign
new assign
32
33. Checking CFL Membership
Rsource
z
Rfoo u
m
assign
assign
n
assign
assign
B y
Rfoo u
xfer
new
assign
!SINKz sink
A x
new
Rsource
assign
$SRC
src
new
new
assign
new assign
src assign new new assign assign assign
xfer assign new new assign
assign asign sink
S$SRC -> !SINK:
33
34. Checking CFL Membership
There is a flow from
$SRC to !SINK iff
S$SRC->!SINK is in a
context-free
language C.
src assign new new assign assign assign
xfer assign new new assign
assign asign sink
S$SRC -> !SINK:
34
36. Why Context-free Language?
Context-free language enforces field-sensitivity.
m.f = source();
x = m.f;
sink(x);
m.f = source();
x = m.g;
sink(x);
Flow exists Flow does not exists
36
37. Need for Context Sensitivity
pfoo1 fooret
a
m
n
b
Rsource$SRC
src
!SINK
sink
a = source();
n = foo(a);
m = foo(b);
sink(m);
Object foo(Object p){
return p;
}
37
38. Need for Context Sensitivity
pfoo1 fooret
a
m
n
b
Rsource$SRC
src
!SINK
sink
a = source();
n = foo(a);
m = foo(b);
sink(m);
Object foo(Object p){
return p;
}
pfoo1 fooret
38
39. Context Sensitivity in STAMP
• Cloning-based approach: clone subgraph
for a method for each of its context
• Contexts are either a k-length sequence of
callsites or abstract objects (i.e., object-
sensitive)
• Abstract objects are cloned per context
(i.e., support heap cloning)
41
40. Context Sensitivity in STAMP
• Size of resulting relations (e.g., graph) is
huge (e.g., billions of edges for k = 2)
• Relations are compactly stored as Binary
Decision Diagram (BDD)
• Analysis rules are written in Datalog
42
41. Talk Outline
• Models in STAMP
• Analysis Overview
• Implementation and Experiments
• Semantic Signatures for Malware Detection
43
42. STAMP Implementation
• Analyzes Dalvik bytecode (uses Soot’s
Dexpler front end)
• Currently has models for ~1300 methods in
176 Android classes
• ~13K lines of Java on top of open-source
tools like soot and apk-tool
48
45. Symantec Experiment
• 84 apps collected from the wild
• Goal: report source-sink flows
o sources: Phone number, Contacts, Call log, SMS
messages, Location
o sinks: Internet, SMS
• 124 actual flows identified using proprietary
tools
49
46. False Positive/Negatives of STAMP
16%14%
False positive rateFalse negative rate
Green - Flows reported by STAMP
Red - Actual flows that apps have 50
49. Scalability Experiment
• About 90% apps are of
size 10M or less.
• For 90 of the 94 apps
that are of size 10M or
less and that are in
the set of 100 randomly
picked apps, Stamp
takes at most 10
minutes.
53
50. Lessons Learned
Manually identifying which models are missing
and need to be added is very hard.
Enabled STAMP to recommend
• methods that potentially require models
• potential transfer annotations
• potential callback methods
50
51. Identifying Missing Transfer Models
Method m potentially requires a model if
m is a method in the Android framework
m does not have a model yet
one or more parameters of m are tainted
51
52. Identifying Missing Callback Models
Method m is a potential callback if
m is a method in the app code
m is unreachable (currently)
m overrides a method in the Android framework
52
53. Sometimes a simple tweak to the app can
eliminate a lot of imprecision.
Lessons Learned
52
Runnable r = new Runnable(){ …};
r.run();
Runnable r = new Runnable(){ …};
myactivity.runOnOnUIThread(r);
55. Talk Outline
• Models in STAMP
• Analysis Overview
• Implementation and Experiments
• Semantic Signatures for Malware Detection
63
56. Commercial AV tools Obfuscated
Malware Samples
• 17 known malware apps, each from a
different family
• Three types of syntactic transformations
1. Change the names of components, classes,
methods, and fields.
2. Redirect all invocations to methods of android.*
classes through proxy methods.
3. Encode string constants
64
57. Commercial AV tools on Obfuscated
Malware
Comparison between Apposcopy and other AV tools on obfuscated malware
65
58. GoldDream Malware Signature
1. App registers a receiver for system events
such as SMS messages or outgoing phone
calls.
2. When these events trigger, the code in the
receiver starts a background service.
3. The service sends private user information,
such as the phone's unique IMEI number
and subscriber id to a remote server. 66
63. Mostly-benign Apps
• 11,215 apps from Google Play
• Apposcopy identified 16 apps as malware
• Compared against VirusTotal, (provides
aggregated reports from ~50 AV tools)
• 13 apps are confirmed malicious
• Remaining 3 are classified as Adware by
VirusTotal tools
71
64. Acknowledgement
Prof. Alex Aiken
Prof. John Mitchell
Prof. Mayur Naik
Prof. Isil Dillig
Prof. Tom Dillig
Dr. Jason Franklin
Osbert Bastani
Manolis Papadakis
Lazaro Clapp
Patrick Mutchler
Yu Feng
Ravi Mangal
66. Minimize Manual Effort
1. Only a small number of methods require
explicit models.
2. Models are easy to write, debug, and
maintain.
3. Stamp points out methods that might be
missing models.
4. Many models can be automatically inferred.
62
67. Model Inference
Rsource
z
Rfoo u
m
assign
assign
n
assign
assign
B y
Rfoo u
xfer
new
assign
!SINKz sink
A x
new
Rsource
assign
$SRC
src
new
new
assign
new assign
src assign new new assign assign assign
xfer assign new new assign
assign asign sink
S$SRC -> !SINK:
58
• Osbert Bastani’s work
75. Other Analysis Configuration
• Context sensitivity
• Callsite- or object-sensitivity?
• Use same type of context-sensitivity for all
callsites?
• What value of K in context sensitivity
• Same K for every call site?
• Use heap cloning for all allocation sites (including
phantom objects)?
46
76. Other Analysis Configuration
• Which type of analysis: exhaustive,
demand-driven, client-driven, etc.?
• Resolve reflection?
• Handle exceptional control and data flow?
• How to handle special types of enitities:
• Variables of String and StringBuffer types
• String, class constants
• … 47