Contenu connexe Similaire à STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis (20) STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis1. STAAF
An Efficient Distributed Framework for Performing
Large-Scale Android Application Analysis
OWASP AppSec USA
Thursday, September 22, 2011
1 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
2. Allow Me to Introduce Myself
Ryan W Smith
VP Engineering at Praetorian
OWASP DFW Chapter Leader
(2011)
Active member of The Honeynet
Project
(2002- )
8+ years of work with DoD,
Intelligence Community,
Federal/State/Local governments,
and Fortune 500 companies
2 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
3. 3 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
4. Presentation Roadmap
• STAAF (Overview)
• Background
• STAAF (Deep Dive)
• Results
• Future Work
• Conclusions
4 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
5. What can STAAF do for you?
Observation #1:
There are a lot of Android app analysis
tools freely available
BUT:
They’re typically designed for single app
analysis
STAAF leverages the power of these tools as modules,
And adds efficiency, scalability, data mgmt and sharing
5 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
6. What can STAAF do for you?
Observation #2:
Higher value analysis can be attained by
analyzing large numbers of applications
over long periods of time
SOLUTION:
Reduce the time and complexity for an
analyst to process large numbers of apps
Goal Analyze 50k apps in less than 2 days and make
the extracted data readily available to analysts
6 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
7. What can STAAF do for you?
Process Process Meaningful
Apps Data Results
Mod #1 Mod A
Mod #2 Mod B
Analytic
Mod #3 Data Mod C
Mod #N Mod Z
Goal Minimize analysts’ effort to extract meaningful
results from a large number of applications
7 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
8. What is STAAF
8 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
9. What is STAAF
9 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
10. What is STAAF
10 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
11. What is STAAF
11 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
12. What is STAAF
12 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
13. What STAAF is NOT
• STAAF is not a stand
alone application
• STAAF is not only a
malware detection or
anti-virus engine
• STAAF is not an
application collection
tool
STAAF is a problem agnostic app analysis framework
13 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
14. Presentation Roadmap
• STAAF (Overview)
• Background
• STAAF (Deep Dive)
• Results
• Future Work
• Conclusions
14 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
15. Android’s Open App Model
• Low barrier to entry
• Apps hosted and installed
from anywhere
• All apps are created equal
• No distinction between core
apps and 3rd party apps
• Accept apps based on:
1. Trust of the source
2. Permissions requested
15 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
16. “Legitimate” Monitoring Apps
Ad/Marketing Networks
Social Gaming Networks
16 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
17. “Not-So-Legitimate” Permission Use
SMS Trojan
– Link to site hosting rogue app for “free movie player”
– Sends 2 Premium SMS messages to a Kazakhstan number
(about $5 per message)
Gemini
– Repackaged apps in Chinese markets
– Sex positions and MonkeyJump2 are known examples
– Central C&C
– Exfiltrates unique device identifiers
– Downloads and Install New Apps (with permission)
DroidDream
– Approx. 50 Malicious apps in official market
– Central C&C
– Exfiltrates unique device identifiers
– Downloads additional code modules
17 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
18. Presentation Roadmap
• STAAF (Overview)
• Background
• STAAF (Deep Dive)
• Results
• Future Work
• Conclusions
18 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
19. STAAF Workflow
Step 0: STAAF components initialized
19 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
20. STAAF Workflow
Step 1: Users sends APKs to be processed
20 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
21. STAAF Workflow
Step 2: Coordinator checks database for previous
results and logs new instance data for each APK
21 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
22. STAAF Workflow
Step 3: Coordinator sends new APKs to the file
repository service
22 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
23. STAAF Workflow
Step 4: Coordinator sends tasking orders to the task
queue
23 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
24. STAAF Workflow
Step 5: Elastic computing nodes pull tasks from their
designated task queue
24 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
25. STAAF Workflow
Step 6: Elastic computing nodes pull in the APK and
related information
25 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
26. STAAF Workflow
Step 6: After processing the elastic computing nodes
push out processed files and analysis results
26 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
27. STAAF Workflow
Step 7: When all tasking is complete elastic
computing nodes notify the coordinator
27 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
28. Task Modules
• Can be registered dynamically
• Task-Oriented
– High level
• What % of apps use
permission X
COMPLEXITY
• What is the most
PRIORITY
common libraries used
– Mid level
• Extract Permissions
• Extract static URLs
• Extract Methods Called
– Low level
• Extract manifest
• Extract Dex bytecode
28 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
29. Deduplication of Effort
• All Intermediate data are cached
for later use
– Extract and convert manifest to
ASCII
– Extract Dex and convert to Smali
and Java
– Compute the control flow graph
from the Dex
• Libraries and shared resources
must only be processed once
• Apps must only be processed
once by each module, ever
Small savings matter at large scales
29 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
30. Distributed Data Sharing
• Sharing app samples is
just the beginning
• Share the entire
process:
– Raw Application
– Extracted Resources
– Raw Data
– Processed Data
• Or set specific limits on
what data is shared
30 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
31. Presentation Roadmap
• STAAF (Overview)
• Background
• STAAF (Deep Dive)
• Results
• Future Work
• Conclusions
31 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
32. Time Trials
STAAF Performance Tests
# Time Apps ECUs Nodes Database
1 2h25m 500 1 1 Central
2 2h00m 500 1 2 Central
3 1h56m 500 1 4 Central
4 0h36m 500 1 4 Local
5 0h36m 500 5 1 Central
6 0h28m 500 5 4 Central
7 0h10m 500 5 4 Local
8 0h27m 1722 5 5 Local
9 1h19m 9349 5 10 Local
Achieved 50k apps in ~7 hours*
*Extrapolated from shorter tests
32 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
33. Time Trials
STAAF Performance Tests
# Time Apps ECUs Nodes Database Central Central
DB DB
1 2h25m 500 1 1 Central
2 2h00m 500 1 2 Central 1 1
Node Node
3 1h56m 500 1 4 Central 5
1 ECU
4 0h36m 500 1 4 Local ECUs
5 0h36m 500 5 1 Central
6 0h28m 500 5 4 Central
7 0h10m 500 5 4 Local Compute Time Compute Time
8 0h27m 1722 5 5 Local 2h25m 0h36m
9 1h19m 9349 5 10 Local
“One EC2 Compute Unit (ECU) provides the equivalent CPU capacity
of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.” -Amazon
33 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
34. Time Trials
STAAF Performance Tests
# Time Apps ECUs Nodes Database Central Central
DB DB
1 2h25m 500 1 1 Central
2 2h00m 500 1 2 Central 1 4
Node Nodes
3 1h56m 500 1 4 Central
1 ECU 1 ECU
4 0h36m 500 1 4 Local
5 0h36m 500 5 1 Central
6 0h28m 500 5 4 Central
7 0h10m 500 5 4 Local Compute Time Compute Time
8 0h27m 1722 5 5 Local 2h25m 1h56m
9 1h19m 9349 5 10 Local
STAAF is bound by both CPU and database throughput
34 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
35. Time Trials
STAAF Performance Tests
# Time Apps ECUs Nodes Database Central Local
DB DB
1 2h25m 500 1 1 Central
4 4
2 2h00m 500 1 2 Central
Nodes Nodes
3 1h56m 500 1 4 Central
1 ECU 1 ECU
4 0h36m 500 1 4 Local
5 0h36m 500 5 1 Central
6 0h28m 500 5 4 Central
7 0h10m 500 5 4 Local Compute Time Compute Time
8 0h27m 1722 5 5 Local 1h56m 0h36m
9 1h19m 9349 5 10 Local
By using distributed, local databases STAAF
achieves a significant time performance increase
35 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
36. Time Trials
STAAF Performance Tests
# Time Apps ECUs Nodes Database Central Local
DB DB
1 2h25m 500 1 1 Central
2 2h00m 500 1 2 Central 1 4
Node Nodes
3 1h56m 500 1 4 Central
1 ECU 1 ECU
4 0h36m 500 1 4 Local
5 0h36m 500 5 1 Central
6 0h28m 500 5 4 Central
7 0h10m 500 5 4 Local Compute Time Compute Time
8 0h27m 1722 5 5 Local 2h25m 0h36m
9 1h19m 9349 5 10 Local
Using by adding multiple processors with local
databases, we achieve near linear scalability
36 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
37. Time Trials
STAAF Performance Tests
# Time Apps ECUs Nodes Database Local Central
DB DB
1 2h25m 500 1 1 Central
2 2h00m 500 1 2 Central 4 1
Nodes Node
3 1h56m 500 1 4 Central 5
1 ECU
4 0h36m 500 1 4 Local ECUs
5 0h36m 500 5 1 Central
6 0h28m 500 5 4 Central
7 0h10m 500 5 4 Local Compute Time Compute Time
8 0h27m 1722 5 5 Local 0h36m 0h36m
9 1h19m 9349 5 10 Local
By simply increasing the CPU capacity to 5 ECUs, we
achieve the same performance as four 1 ECU nodes
37 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
38. Time Trials
STAAF Performance Tests
# Time Apps ECUs Nodes Database Central Central
DB DB
1 2h25m 500 1 1 Central
2 2h00m 500 1 2 Central 1 4
Node Nodes
3 1h56m 500 1 4 Central 5 5
4 0h36m 500 1 4 Local ECUs ECUs
5 0h36m 500 5 1 Central
6 0h28m 500 5 4 Central
7 0h10m 500 5 4 Local Compute Time Compute Time
8 0h27m 1722 5 5 Local 0h36m 0h28m
9 1h19m 9349 5 10 Local
Once again, using a central database fails to
achieve linear performance gains
38 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
39. Time Trials
STAAF Performance Tests
# Time Apps ECUs Nodes Database Central Local
DB DB
1 2h25m 500 1 1 Central
2 2h00m 500 1 2 Central 1 4
Node Nodes
3 1h56m 500 1 4 Central 5 5
4 0h36m 500 1 4 Local ECUs ECUs
5 0h36m 500 5 1 Central
6 0h28m 500 5 4 Central
7 0h10m 500 5 4 Local Compute Time Compute Time
8 0h27m 1722 5 5 Local 0h36m 0h10m
9 1h19m 9349 5 10 Local
By using distributed, local databases we once
again achieve near linear performance gains
39 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
40. Time Trials
STAAF Performance Tests
# Time Apps ECUs Nodes Database Central Local
DB DB
1 2h25m 500 1 1 Central
2 2h00m 500 1 2 Central 1 4
Node Nodes
3 1h56m 500 1 4 Central 5
1 ECU
4 0h36m 500 1 4 Local ECUs
5 0h36m 500 5 1 Central
6 0h28m 500 5 4 Central
7 0h10m 500 5 4 Local Compute Time Compute Time
8 0h27m 1722 5 5 Local 2h25m 0h10m
9 1h19m 9349 5 10 Local
By increasing CPU capacity, number of processing nodes, and
number of databases, we decreased processing time by 14.5x
40 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
41. Time Trials
STAAF Performance Tests
# Time Apps ECUs Nodes Database Local Local
DB DB
1 2h25m 500 1 1 Central
5 10
2 2h00m 500 1 2 Central
Nodes Nodes
3 1h56m 500 1 4 Central 5 5
4 0h36m 500 1 4 Local ECUs ECUs
5 0h36m 500 5 1 Central
6 0h28m 500 5 4 Central
7 0h10m 500 5 4 Local 1722 Apps 9349 Apps
8 0h27m 1722 5 5 Local Compute Time Compute Time
9 1h19m 9349 5 10 Local 0h27m 1h19m
Larger tests confirm that STAAF continues to scale linearly
41 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
42. Initial Results :: Permissions Requests
53,000 Applications Analyzed Permissions Requested
Android Market: ~48,000 Average: 3
3rd Party Markets: ~5,000 Most Requested: 117
Location Data 11,929 (24%)
Read Contacts 3,636 (8%)
Send SMS 1,693 (4%)
Receive SMS 1,262 (4%)
Record Audio 1,100 (2%)
Read SMS 832 (2%)
Process Outgoing Calls 323 (1%)
42 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
43. Additional Results :: Shared Libraries
53,000 Applications Analyzed
Android Market: ~48,000
3rd Party Markets: ~5,000
com.admob 38% (18,426 apps )
org.apache 8% ( 3,684 apps )
com.google.android 6% ( 2,838 apps )
com.google.ads 6% ( 2,779 apps )
com.flurry 6% ( 2,762 apps )
com.mobclix 4% ( 2,055 apps )
com.millennialmedia 4% ( 1,758 apps)
43 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
44. Permissions Are Not a Good Indicator
Malware only needs a
single permission
zsones
&
Droid Dream
SMS Replicator Fake Security Tool Gemeni
SMS Trojan
44 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
45. Presentation Roadmap
• STAAF (Overview)
• Background
• STAAF (Deep Dive)
• Results
• Future Work
• Conclusions
45 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
46. STAAF’s Future
• Build a publically available
user interface
• Provide a dashboard with
global stats
• Further Tune database
performance issues
• Build more complex analysis
modules
– Static data flow analysis
– Dynamic sandbox analysis
• Expose a public module
interface through UI
46 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
47. Presentation Roadmap
• STAAF (Overview)
• Background
• STAAF (Deep Dive)
• Results
• Future Work
• Conclusions
47 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
48. Final Thoughts
• STAAF is a system of systems and
services, not an application
• STAAF enables large scale Android
application analysis
• STAAF is problem agnostic and
can be tailored to answer many
analytic questions
• STAAF augments the capabilities
of the analyst, it does not replace
them
• STAAF achieves scalable
performance increases by
increasing computer
nodes/power
48 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
49. Q&A
49 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured