Malware Collection and Analysis via Hardware Virtualization
1. Malware Collection and Analysis
via Hardware Virtualization
Tamas K Lengyel
Computer Science and Engineering
11/10/2015
2. Outline
1. Introduction and Problem statement
2. Background, Challenges & Approach
3. Limitations and scope
4. Publications to date
5. Malware collection system & results
6. Malware analysis system & results
7. Hardware and software limitations
8. Contributions
9. Future work
3. Introduction
• 1,000,000 new malware binaries a day
• Thwarting malware requires in-depth
understanding of its operation
• Collect and analyze malware
• Existing tools and techniques are impeded by
modern malware techniques
• Packing, evasion and metamorphism
• Hardware virtualization has been proposed to
counter these techniques
4. Requirements
1. Scalability
Maximizing the number of concurrently active collection
and analysis sessions on limited hardware resources
2. Stealth
Detecting the monitoring environment should be
prevented
3. Fidelity
The collected data has to be accurate
4. Isolation
Monitoring components have to be securely isolated
and we need to prevent cross-contamination
5. Prominent prior work
• 2005: Vrable et al. - Scalability, fidelity, and
containment in the potemkin virtual honeyfarm
• 2008: Payne et al. - Lares: An architecture for
secure active monitoring using virtualization
• 2008: Dinaburg et al. - Ether: malware analysis
via hardware virtualization extensions
• 2013: Deng et al. - Spider: Stealthy binary
program instrumentation and debugging via
hardware virtualization
6. Problem statement
Developing effective anti-malware technologies
requires the collection and rapid analysis of an
increasing number of malware samples such
that all four requirements are met
simultaneously.
No comprehensive evaluation to date has been
performed to determine whether virtualization is
an effective platform for the development of
such tools.
8. Challenges
1. Scalability
Disk and memory requirements are linear
2. Stealth
In-guest tools can be detected
3. Isolation
In-guest tools can be disabled
Cross-contamination of VMs over the network
4. Fidelity
Data collection is negatively impacted by 2 & 3
9. Our approach
1. Study current malware techniques
2. Develop out-of-guest tools
3. Conduct live experiments
4. Evaluate results
5. Study shortcomings and limitations
10. Limitations
1. Definition of malware
Constantly evolving and undefined set
2. Measurements and metrics
Requirements are not always quantifiable
Results are only indicative, not definitive
We work to counter current malware techniques
3. Repeatability of experiments
External entities outside our control
11. Scope
• Malware analysis vs. malware detection
Black Box Analysis
We only aim at collecting relevant information which
may aid malware detection
• Detection of virtualization vs. detection of
monitoring
Virtualization is already widely deployed
• Determining when we collected enough data
Halting problem
12. Publications
• CSET’12: Virtual Machine Introspection in a Hybrid Honeypot
Architecture. Acceptance rate: 48%
• NSS’13: Towards Hybrid Honeynets via Virtual Machine
Introspection and Cloning. Acceptance rate: 24%
• SHCIS’14: Multi-tiered Security Architecture for ARM via the
Virtualization and Security Extensions
• MMF’14: Pitfalls of Virtual Machine Introspection on Modern
Hardware.
• MMF’14: Code Validation for Modern OS Kernels
• ACSAC’14: Scalability, Fidelity and Stealth in the DRAKVUF
Dynamic Malware Analysis system. Acceptance rate: 19.9%
• SHCIS’15: Virtual Machine Introspection with Xen on ARM
• C&TC’15: CloudIDEA: A Malware Defense Architecture for
Cloud Data Centers. Acceptance rate: 38%
13. Malware collection
Primary requirement: capture malware binaries
• Scalability: Deploy copy-on-write disk and
memory sharing
• Stealth: No in-guest agents, no modification to
the hypervisor
• Isolation: External agent + network isolation
• Fidelity: Kernel heap pool-tag scanning
18. Malware analysis
Primary requirement: capture useful live data
• Scalability: Re-use CoW techniques from prior
experiments
• Stealth: No in-guest agents, no modification to
the hypervisor, command injection with VMI
• Isolation: VLAN tagging, TCB disaggregation
• Fidelity: Syscalls and kernel heap-allocations
19. Useful data?
Goal is to generate data that is complete in order
to be useful for analysis
Data-collection should be flexible to allow tuning to
specific requirements
Two main objectives defined in prior art:
1. Syscall monitoring
2. Kernel heap monitoring
We also will monitor deleted files as we deemed
that an interesting and useful addition
21. Syscall trapping
Stealthy breakpoint injection method:
1. Overwrite internal kernel function entry points
with #BP (0xCC)
2. Read/write protect page with EPT
3. When traps hit, place back original byte
4. Singlestep 1 instruction
5. Place breakpoint back again
Can monitor all internal kernel functions, not just
system calls!
27. Stalling malware
Standard methods
• Detection of virtualized environments
• Detection of in-guest artifacts
• Sleeping
Advanced methods
• Time-skew detection
• API spamming
28. API spamming
• Repeatedly call monitored APIs which normally
complete fast
• NtCreateSemaphore
• Logging these calls will take more time
• Spamming these times-out the monitoring
Use of NtCreateSemaphore in 60s:
Observed in: 45,383 samples. Average: 7.77
Samples significantly above average: 1
Number of calls: 17,453
29. Summary
Hardware virtualization is effective for both
malware collection and analysis
All four requirements can be met simultaneously
using hardware virtualization
The technology is sufficiently flexible to develop
and fine-tune data collection techniques
Major improvement in the arms-race against
malware
31. Hardware limitations on x86
EPT only reports violation start address
Read/write operation may be up to 8 bytes long
32. Hardware limitations on x86
sTLB makes TLB-splitting
attacks no longer feasible
TLB can still be used to hide
mappings from VMI
33. Hardware limitations on ARM
Split-TLB architecture without sTLB
Hardware-assisted translation available from the
VMM
Translation is performed as data-fetch access
• Only hits the dTLB
Hiding code-pages on ARM is possible via split-TLB
attacks
34. Contributions
1. Identified core requirements that must be met
simultaneously
2. Developed and open-sourced the prototypes,
with major contributions to existing systems
3. Performed extensive tests with modern malware
4. Identified hardware and software limitations that
must be addressed when building such systems
35. Future work
• Keeping up with the evolving threat landscape
• Attacks against the hypervisor and lower layers
• Data-only malware
• Stalling malware
• Making use of new and evolving hardware
virtualization extensions
• Hybrid VMI
• Data-mining the collected information
• Identifying malware groups
• Creating IDS/IPS rules
36. Questions?
• Dissertation text available at
http://tklengyel.com/thesis.pdf
• DRAKVUF
http://drakvuf.com
• LibVMI
http://libvmi.com