SlideShare une entreprise Scribd logo
1  sur  76
Saswat Anand
STAMP:
STatic Analysis of Mobile Programs
Vetting Apps in Appstore
2
Number of Apps in Google Play
~2K new apps/day
3
Update Cycle of Apps
> 100K: 10 days
1K - 100K: 17 days
https://www.nowsecure.com/blog/2015/06/09/understanding-android-s-application-update-cycles/
4
Program Analysis Tools for
Cost-effective and Timely
Identification of
Malware and Vulnerable Apps
5
STAMP
• A tool for static analysis of Android apps.
• Was developed as part of a DARPA project
at Stanford
• Is licensed by Quixey Inc.
• Can be licensed for commercial use:
http://techfinder.stanford.edu/technology_detail.php?ID=30312
• I was responsible for end-to-end design
and implementation of its core components. 6
STAMP
App STAMP Device Id → Internet
Location → Internet
Call Log → Internet
7
Application
Static
Analysis
Malware?
Fraud app?
Insecure?
Platform
(e.g. Android)
• Hard to analyze (e.g., native
code, reflection)
• Very large
• Mostly irrelevant for analysis
Static Analysis of Android Apps
8
Application
Static
Analysis
Static Analysis of Android Apps
FALSE
NEGATIVE
9
Application
Static
Analysis
Malware?
Fraud app?
Insecure?
STAMP’s Approach
Models of
Platform
(e.g. Android)
10
• Summarize relevant
behavior
• Small and simple  scales
better
Talk Outline
• Models in Stamp
• Analysis Overview
• Implementation and Experiments
• Semantic Signatures for Malware Detection
11
Types of Models
• Taint Model
• API Model
• Callback Model
• Phantom Object Model
12
Taint Source Model
android/telephony/TelephonyManager.java:
@STAMP(flows={@Flow(from="$getDeviceId",to="@return")})
public java.lang.String getDeviceId()
{
return new String();
}
13
Taint Sink Model
android/telephony/SmsManager.java:
@STAMP(flows={@Flow(from="text",to="!sendTextMessage")})
public void sendTextMessage(java.lang.String destinationAddress,
java.lang.String scAddress,
java.lang.String text,
android.app.PendingIntent sentIntent,
android.app.PendingIntent deliveryIntent)
{
}
14
Taint Transfer Model
java/lang/Object.java:
@STAMP(flows = {@Flow(from="this",to="@return")})
public java.lang.String toString()
{
return new String();
}
15
API Model
No source-to-sink flow if List class no models
String deviceId = getDeviceId();
List list = new LinkedList();
l.add(deviceId);
String s = l.get(0);
sendToInternet(s);
16
API Model
• Models flow of
values across app-
framework
boundary
• Based on abstract
semantics of
analysis
java.util.LinkedList.java:
private Object elem;
public void add(Object elem){
this.elem = elem;
}
public void get(int i){
return this.elem;
}
17
Callback Model
Flow is not detected if onClick is not reachable.
button.setOnClickListener(
new OnClickListener(){
public void onClick(){
String deviceId = getDeviceId();
sendToInternet(deviceId);
}
});
18
Callback Model
public void callCallbacks()
{
onKeyDown(…);
onTouchDown(…);
….
}
Model for View class Example app code
19
View v = new View(…);
Callback Model
public void callCallbacks()
{
onKeyDown(…);
onTouchDown(…);
….
}
View v = new View(…);
v.callCallbacks();
Model for View class Example app code
20
Callgraph Construction in Java
21
class A {
void foo(){ }
}
class B extends A {
void foo(){ }
}
A x = new B();
x.foo();
A y = new A();
y.foo();
B:foo is called
A:foo is called
Phantom Objects Model
• p does not point to any object
• So, no outgoing call edge from p.foo()
T p = aMethodWithoutModel();
p.foo();
22
• Returns a special
abstract object
• Default model for
methods that don’t
have any model
• Required to build
sound call graph
Phantom Objects Model
23
T aMethodWithoutModel()
{
return new AnySubTypeOfT();
}
Talk Outline
• Models in Stamp
• Analysis Overview
• Implementation and Experiments
• Semantic Signatures for Malware Detection
24
Example
m = source();
n = foo(m);
sink(n);
Is there a flow from source to sink?
@Flow(from=”u”,to=”@return”)
B foo(A u){
y = new B();
return y;
}
@Flow(from=”z”,to=”!SINK”)
void sink(B z){
}
@Flow(from=”$SRC”,to=”@return”)
A source(){
x = new A();
return x;
}
Yes
Model of method foo
25
Constructing Graph
A x
new
@Flow(from=”$SRC”,to=”@return”)
A source(){
x = new A();
return x;
}
26
Constructing Graph
A x
new
Rsource
assign
@Flow(from=”$SRC”,to=”@return”)
A source(){
x = new A();
return x;
}
27
src
Constructing Graph
A x
new
Rsource
assign
$SRC
@Flow(from=”$SRC”,to=”@return”)
A source(){
x = new A();
return x;
}
28
src
Constructing Graph
A x
new
Rsource
assign
@Flow(from=”$SRC”,to=”@return”)
A source(){
x = new A();
return x;
}
$SRC
new assign
29
assign
Constructing Graph
B y
Rfoo u
xfer
new
assign
new
@Flow(from=”u”,to=”@return”)
B foo(A u){
y = new B();
return y;
}
30
Constructing Graph
Rsource
z
Rfoo u
m = source();
n = foo(m);
sink(n);
m
assign
assign
n
assign
assign
31
Putting Graphs Together
Rsource
z
Rfoo u
m
assign
assign
n
assign
assign
B y
Rfoo u
xfer
new
assign
!SINKz sink
A x
new
Rsource
assign
$SRC
src
new
new
assign
new assign
32
Checking CFL Membership
Rsource
z
Rfoo u
m
assign
assign
n
assign
assign
B y
Rfoo u
xfer
new
assign
!SINKz sink
A x
new
Rsource
assign
$SRC
src
new
new
assign
new assign
src assign new new assign assign assign
xfer assign new new assign
assign asign sink
S$SRC -> !SINK:
33
Checking CFL Membership
There is a flow from
$SRC to !SINK iff
S$SRC->!SINK is in a
context-free
language C.
src assign new new assign assign assign
xfer assign new new assign
assign asign sink
S$SRC -> !SINK:
34
Context Free Grammar
35
Why Context-free Language?
Context-free language enforces field-sensitivity.
m.f = source();
x = m.f;
sink(x);
m.f = source();
x = m.g;
sink(x);
Flow exists Flow does not exists
36
Need for Context Sensitivity
pfoo1 fooret
a
m
n
b
Rsource$SRC
src
!SINK
sink
a = source();
n = foo(a);
m = foo(b);
sink(m);
Object foo(Object p){
return p;
}
37
Need for Context Sensitivity
pfoo1 fooret
a
m
n
b
Rsource$SRC
src
!SINK
sink
a = source();
n = foo(a);
m = foo(b);
sink(m);
Object foo(Object p){
return p;
}
pfoo1 fooret
38
Context Sensitivity in STAMP
• Cloning-based approach: clone subgraph
for a method for each of its context
• Contexts are either a k-length sequence of
callsites or abstract objects (i.e., object-
sensitive)
• Abstract objects are cloned per context
(i.e., support heap cloning)
41
Context Sensitivity in STAMP
• Size of resulting relations (e.g., graph) is
huge (e.g., billions of edges for k = 2)
• Relations are compactly stored as Binary
Decision Diagram (BDD)
• Analysis rules are written in Datalog
42
Talk Outline
• Models in STAMP
• Analysis Overview
• Implementation and Experiments
• Semantic Signatures for Malware Detection
43
STAMP Implementation
• Analyzes Dalvik bytecode (uses Soot’s
Dexpler front end)
• Currently has models for ~1300 methods in
176 Android classes
• ~13K lines of Java on top of open-source
tools like soot and apk-tool
48
STAMP User Interface
44
STAMP Report for a DenDroid
Spyware Sample
45
Symantec Experiment
• 84 apps collected from the wild
• Goal: report source-sink flows
o sources: Phone number, Contacts, Call log, SMS
messages, Location
o sinks: Internet, SMS
• 124 actual flows identified using proprietary
tools
49
False Positive/Negatives of STAMP
16%14%
False positive rateFalse negative rate
Green - Flows reported by STAMP
Red - Actual flows that apps have 50
Reasons for False Negatives
51
Reasons for False Positives
52
Scalability Experiment
• About 90% apps are of
size 10M or less.
• For 90 of the 94 apps
that are of size 10M or
less and that are in
the set of 100 randomly
picked apps, Stamp
takes at most 10
minutes.
53
Lessons Learned
Manually identifying which models are missing
and need to be added is very hard.
Enabled STAMP to recommend
• methods that potentially require models
• potential transfer annotations
• potential callback methods
50
Identifying Missing Transfer Models
Method m potentially requires a model if
m is a method in the Android framework
m does not have a model yet
one or more parameters of m are tainted
51
Identifying Missing Callback Models
Method m is a potential callback if
m is a method in the app code
m is unreachable (currently)
m overrides a method in the Android framework
52
Sometimes a simple tweak to the app can
eliminate a lot of imprecision.
Lessons Learned
52
Runnable r = new Runnable(){ …};
r.run();
Runnable r = new Runnable(){ …};
myactivity.runOnOnUIThread(r);
Application
Static
analysis
Malware?
Fraud app?
Insecure?
STAMP’s Limitations
Models of
Platform
(e.g. Android)
Requires
Manual Effort
10
Automatic Model
Inference
Talk Outline
• Models in STAMP
• Analysis Overview
• Implementation and Experiments
• Semantic Signatures for Malware Detection
63
Commercial AV tools Obfuscated
Malware Samples
• 17 known malware apps, each from a
different family
• Three types of syntactic transformations
1. Change the names of components, classes,
methods, and fields.
2. Redirect all invocations to methods of android.*
classes through proxy methods.
3. Encode string constants
64
Commercial AV tools on Obfuscated
Malware
Comparison between Apposcopy and other AV tools on obfuscated malware
65
GoldDream Malware Signature
1. App registers a receiver for system events
such as SMS messages or outgoing phone
calls.
2. When these events trigger, the code in the
receiver starts a background service.
3. The service sends private user information,
such as the phone's unique IMEI number
and subscriber id to a remote server. 66
GoldDream Malware Signature
GDEvent(SMS_RECEIEVED).
GDEvent(NEW_OUTGOING_CALL).
GoldDream :- receiver(r),
icc(SYSTEM, r, e, _),
GDEvent(e),
service(s),
icc(r, s, _, _),
flow(s, DevideId, s, Internet),
flow(s, SubscriberId, s, Internet).
67
Inter-component Call Graph
68
Commercial AV tools Vs. Apposcopy
on Obfuscated Malware Samples
Comparison between Apposcopy and other AV tools on obfuscated malware 69
Accuracy on Known Malware
70
Mostly-benign Apps
• 11,215 apps from Google Play
• Apposcopy identified 16 apps as malware
• Compared against VirusTotal, (provides
aggregated reports from ~50 AV tools)
• 13 apps are confirmed malicious
• Remaining 3 are classified as Adware by
VirusTotal tools
71
Acknowledgement
Prof. Alex Aiken
Prof. John Mitchell
Prof. Mayur Naik
Prof. Isil Dillig
Prof. Tom Dillig
Dr. Jason Franklin
Osbert Bastani
Manolis Papadakis
Lazaro Clapp
Patrick Mutchler
Yu Feng
Ravi Mangal
Questions?
Minimize Manual Effort
1. Only a small number of methods require
explicit models.
2. Models are easy to write, debug, and
maintain.
3. Stamp points out methods that might be
missing models.
4. Many models can be automatically inferred.
62
Model Inference
Rsource
z
Rfoo u
m
assign
assign
n
assign
assign
B y
Rfoo u
xfer
new
assign
!SINKz sink
A x
new
Rsource
assign
$SRC
src
new
new
assign
new assign
src assign new new assign assign assign
xfer assign new new assign
assign asign sink
S$SRC -> !SINK:
58
• Osbert Bastani’s work
Model Inference
Rsource
z
Rfoo u
m
assign
assign
n
assign
assign
Rfoo u
!SINKz sink
A x
new
Rsource
assign
$SRC
srcnew assign
src assign new new assign assign assign
…
assign asign sink
S$SRC -> !SINK:
59
Model Inference
Rfoo u
new
assign
load[f]
store[f]
ϵ ϵ
xfer
60
Model Inference
Rsource
z
Rfoo u
m
assign
assign
n
assign
assign
Rfoo u
!SINKz sink
A x
new
Rsource
assign
$SRC
srcnew assign
61
Stamp Architecture
Models (java files)
Source code for
Stub Android.jar
Merge &
compile
Class files for
models
jimplify App’s APKJimple code
Client-specific
transformation
Fact
generation
Transformed
Jimple code
Datalog Facts Analyze Results
40
Stamp Architecture
Models (java files)
Source code for
Stub Android.jar
Merge &
compile
Class files for
models
jimplify App’s APKJimple code
Client-specific
transformation
Fact
generation
Transformed
Jimple code
Datalog Facts Analyze Results
41
Stamp Architecture
Models (java files)
Source code for
Stub Android.jar
Merge &
compile
Class files for
models
jimplify App’s APKJimple code
Client-specific
transformation
Fact
generation
Transformed
Jimple code
Datalog Facts Analyze Results
42
Stamp Architecture
Models (java files)
Source code for
Stub Android.jar
Merge &
compile
Class files for
models
jimplify App’s APKJimple code
Client-specific
transformation
Fact
generation
Transformed
Jimple code
Datalog Facts Analyze Results
43
Other Analysis Configuration
• Context sensitivity
• Callsite- or object-sensitivity?
• Use same type of context-sensitivity for all
callsites?
• What value of K in context sensitivity
• Same K for every call site?
• Use heap cloning for all allocation sites (including
phantom objects)?
46
Other Analysis Configuration
• Which type of analysis: exhaustive,
demand-driven, client-driven, etc.?
• Resolve reflection?
• Handle exceptional control and data flow?
• How to handle special types of enitities:
• Variables of String and StringBuffer types
• String, class constants
• … 47

Contenu connexe

Tendances

Effective Test Suites for ! Mixed Discrete-Continuous Stateflow Controllers
Effective Test Suites for ! Mixed Discrete-Continuous Stateflow ControllersEffective Test Suites for ! Mixed Discrete-Continuous Stateflow Controllers
Effective Test Suites for ! Mixed Discrete-Continuous Stateflow Controllers
Lionel Briand
 
Test Case Prioritization for Acceptance Testing of Cyber Physical Systems
Test Case Prioritization for Acceptance Testing of Cyber Physical SystemsTest Case Prioritization for Acceptance Testing of Cyber Physical Systems
Test Case Prioritization for Acceptance Testing of Cyber Physical Systems
Lionel Briand
 

Tendances (19)

Functional programming-advantages
Functional programming-advantagesFunctional programming-advantages
Functional programming-advantages
 
Validation of Derived Features and Well-Formedness Constraints in DSLs
Validation of Derived Features and Well-Formedness Constraints in DSLsValidation of Derived Features and Well-Formedness Constraints in DSLs
Validation of Derived Features and Well-Formedness Constraints in DSLs
 
Search-driven String Constraint Solving for Vulnerability Detection
Search-driven String Constraint Solving for Vulnerability DetectionSearch-driven String Constraint Solving for Vulnerability Detection
Search-driven String Constraint Solving for Vulnerability Detection
 
Dissertation Defense
Dissertation DefenseDissertation Defense
Dissertation Defense
 
HDR Defence - Software Abstractions for Parallel Architectures
HDR Defence - Software Abstractions for Parallel ArchitecturesHDR Defence - Software Abstractions for Parallel Architectures
HDR Defence - Software Abstractions for Parallel Architectures
 
Encoding Object-oriented Datatypes in HOL: Extensible Records Revisited
Encoding Object-oriented Datatypes in HOL: Extensible Records RevisitedEncoding Object-oriented Datatypes in HOL: Extensible Records Revisited
Encoding Object-oriented Datatypes in HOL: Extensible Records Revisited
 
Effective Test Suites for ! Mixed Discrete-Continuous Stateflow Controllers
Effective Test Suites for ! Mixed Discrete-Continuous Stateflow ControllersEffective Test Suites for ! Mixed Discrete-Continuous Stateflow Controllers
Effective Test Suites for ! Mixed Discrete-Continuous Stateflow Controllers
 
Test Case Prioritization for Acceptance Testing of Cyber Physical Systems
Test Case Prioritization for Acceptance Testing of Cyber Physical SystemsTest Case Prioritization for Acceptance Testing of Cyber Physical Systems
Test Case Prioritization for Acceptance Testing of Cyber Physical Systems
 
Incremental Reconfiguration of Product Specific Use Case Models for Evolving ...
Incremental Reconfiguration of Product Specific Use Case Models for Evolving ...Incremental Reconfiguration of Product Specific Use Case Models for Evolving ...
Incremental Reconfiguration of Product Specific Use Case Models for Evolving ...
 
DRONE: A Tool to Detect and Repair Directive Defects in Java APIs Documentation
DRONE: A Tool to Detect and Repair Directive Defects in Java APIs DocumentationDRONE: A Tool to Detect and Repair Directive Defects in Java APIs Documentation
DRONE: A Tool to Detect and Repair Directive Defects in Java APIs Documentation
 
Making property-based testing easier to read for humans
Making property-based testing easier to read for humansMaking property-based testing easier to read for humans
Making property-based testing easier to read for humans
 
Testing of Cyber-Physical Systems: Diversity-driven Strategies
Testing of Cyber-Physical Systems: Diversity-driven StrategiesTesting of Cyber-Physical Systems: Diversity-driven Strategies
Testing of Cyber-Physical Systems: Diversity-driven Strategies
 
Insecure coding in C (and C++)
Insecure coding in C (and C++)Insecure coding in C (and C++)
Insecure coding in C (and C++)
 
Static analysis works for mission-critical systems, why not yours?
Static analysis works for mission-critical systems, why not yours? Static analysis works for mission-critical systems, why not yours?
Static analysis works for mission-critical systems, why not yours?
 
Jax retrospective
Jax retrospectiveJax retrospective
Jax retrospective
 
Open-DO Update
Open-DO UpdateOpen-DO Update
Open-DO Update
 
EXTENT-2016: Industry Practices of Advanced Program Analysis
EXTENT-2016: Industry Practices of Advanced Program AnalysisEXTENT-2016: Industry Practices of Advanced Program Analysis
EXTENT-2016: Industry Practices of Advanced Program Analysis
 
A recommender system for generalizing and refining code templates
A recommender system for generalizing and refining code templatesA recommender system for generalizing and refining code templates
A recommender system for generalizing and refining code templates
 
Enjoy Type Hints and its benefits
Enjoy Type Hints and its benefitsEnjoy Type Hints and its benefits
Enjoy Type Hints and its benefits
 

Similaire à STAMP

Code Analysis-run time error prediction
Code Analysis-run time error predictionCode Analysis-run time error prediction
Code Analysis-run time error prediction
NIKHIL NAWATHE
 
OORPT Dynamic Analysis
OORPT Dynamic AnalysisOORPT Dynamic Analysis
OORPT Dynamic Analysis
lienhard
 
Tricky sample? Hack it easy! Applying dynamic binary inastrumentation to ligh...
Tricky sample? Hack it easy! Applying dynamic binary inastrumentation to ligh...Tricky sample? Hack it easy! Applying dynamic binary inastrumentation to ligh...
Tricky sample? Hack it easy! Applying dynamic binary inastrumentation to ligh...
Maksim Shudrak
 
Model-based Analysis of Large Scale Software Repositories
Model-based Analysis of Large Scale Software RepositoriesModel-based Analysis of Large Scale Software Repositories
Model-based Analysis of Large Scale Software Repositories
Markus Scheidgen
 
Revisiting Code Ownership and Its Relationship with Software Quality in the S...
Revisiting Code Ownership and Its Relationship with Software Quality in the S...Revisiting Code Ownership and Its Relationship with Software Quality in the S...
Revisiting Code Ownership and Its Relationship with Software Quality in the S...
The University of Adelaide
 
CodeChecker Overview Nov 2019
CodeChecker Overview Nov 2019CodeChecker Overview Nov 2019
CodeChecker Overview Nov 2019
Olivera Milenkovic
 

Similaire à STAMP (20)

The Use of Static Code Analysis When Teaching or Developing Open-Source Software
The Use of Static Code Analysis When Teaching or Developing Open-Source SoftwareThe Use of Static Code Analysis When Teaching or Developing Open-Source Software
The Use of Static Code Analysis When Teaching or Developing Open-Source Software
 
Code Analysis-run time error prediction
Code Analysis-run time error predictionCode Analysis-run time error prediction
Code Analysis-run time error prediction
 
Does static analysis need machine learning?
Does static analysis need machine learning?Does static analysis need machine learning?
Does static analysis need machine learning?
 
MDE in Practice
MDE in PracticeMDE in Practice
MDE in Practice
 
OORPT Dynamic Analysis
OORPT Dynamic AnalysisOORPT Dynamic Analysis
OORPT Dynamic Analysis
 
The operation principles of PVS-Studio static code analyzer
The operation principles of PVS-Studio static code analyzerThe operation principles of PVS-Studio static code analyzer
The operation principles of PVS-Studio static code analyzer
 
Tricky sample? Hack it easy! Applying dynamic binary inastrumentation to ligh...
Tricky sample? Hack it easy! Applying dynamic binary inastrumentation to ligh...Tricky sample? Hack it easy! Applying dynamic binary inastrumentation to ligh...
Tricky sample? Hack it easy! Applying dynamic binary inastrumentation to ligh...
 
Survey of Program Transformation Technologies
Survey of Program Transformation TechnologiesSurvey of Program Transformation Technologies
Survey of Program Transformation Technologies
 
Miso
MisoMiso
Miso
 
Clean Code for East Bay .NET User Group
Clean Code for East Bay .NET User GroupClean Code for East Bay .NET User Group
Clean Code for East Bay .NET User Group
 
Model-based Analysis of Large Scale Software Repositories
Model-based Analysis of Large Scale Software RepositoriesModel-based Analysis of Large Scale Software Repositories
Model-based Analysis of Large Scale Software Repositories
 
Fast detection of Android malware: machine learning approach
Fast detection of Android malware: machine learning approachFast detection of Android malware: machine learning approach
Fast detection of Android malware: machine learning approach
 
"Быстрое обнаружение вредоносного ПО для Android с помощью машинного обучения...
"Быстрое обнаружение вредоносного ПО для Android с помощью машинного обучения..."Быстрое обнаружение вредоносного ПО для Android с помощью машинного обучения...
"Быстрое обнаружение вредоносного ПО для Android с помощью машинного обучения...
 
Icpc08b.ppt
Icpc08b.pptIcpc08b.ppt
Icpc08b.ppt
 
Rseminarp
RseminarpRseminarp
Rseminarp
 
Slicing of Object-Oriented Programs
Slicing of Object-Oriented ProgramsSlicing of Object-Oriented Programs
Slicing of Object-Oriented Programs
 
Revisiting Code Ownership and Its Relationship with Software Quality in the S...
Revisiting Code Ownership and Its Relationship with Software Quality in the S...Revisiting Code Ownership and Its Relationship with Software Quality in the S...
Revisiting Code Ownership and Its Relationship with Software Quality in the S...
 
CodeChecker Overview Nov 2019
CodeChecker Overview Nov 2019CodeChecker Overview Nov 2019
CodeChecker Overview Nov 2019
 
SERENE 2014 School: Daniel varro serene2014_school
SERENE 2014 School: Daniel varro serene2014_schoolSERENE 2014 School: Daniel varro serene2014_school
SERENE 2014 School: Daniel varro serene2014_school
 
SERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 School: Incremental Model Queries over the CloudSERENE 2014 School: Incremental Model Queries over the Cloud
SERENE 2014 School: Incremental Model Queries over the Cloud
 

STAMP