Detection Challenges
Machine Learning Approaches
Modeling Machine Learning classifiers
Attacks on Machine Learning Defenses
Real Protect
Deep Learning in Sandbox
3. 3
Agenda
• Detection Challenges
• Machine Learning Approaches
• Modeling Machine Learning classifiers
• Attacks on Machine Learning Defenses
• Real Protect
• Deep Learning in Sandbox
To participate in the polling question, download the mobile app.
5. 5
The Age of “Signatures” Is Fading
• This technique is reactive by nature. Although very precise, the
sheer number and growth in malware variants is making this
unsustainable
• Malware authors are continuously monitoring antivirus vendor
detection and releasing new variants
• Use of commercial, open source or underground packers and
protectors makes repacking new variants trivial
Signatures identify with near certainty that an object is either malicious or clean
1001010
1101010
1011101
010
7. 7
Unpacking Challenges
Think of it as a file, inside another executable file,
which can be inside another executable file
Think Russian dolls (Matryoshka)
When executed, the “outer” executable will unpack
the contents of the “inner” executable into memory
and execute it.
Image: https://www.pinterest.com
The innermost executable is the “real” executable!
11. 11
Mimikatz Detection
Resources, strings, packer and compiler details,
compile time, API, and function calls are readily
available for authoring signatures.
Native binary has thousands of
interesting features!
Image: http://www.abcya.com/word_clouds.htm
18. 18
Sources of Features
10010101
10101010
11101010
Static Analysis (file type, resources, meta-data)
Fuzzy Hashing (identical byte or checksum sequences)
Import Address Hash (function calls, order of function calls)
Dynamic Analysis (file system, registry, network behaviors)
Memory Analysis (process or system memory analysis)
19. 19
Leveraging Multiple
Sources of Knowledge
• Identify a suspicious characteristic or activity
• The object is given a reputation and confidence level if
existing signatures based methods don’t detect
• Pre-execution: Static file feature extraction
(file type, import hash, entry point, resources, strings,
packer and compiler details, compile time, APIs, section
names)
• Post-execution: Behavioral features and memory analysis
(behavioral sequence, process tree, file system, registry
events, network communication events, mutex, strings from
memory)
A hybrid approach provides
the best classification rates!
20. 20
Extracting Static Features
• File type, resources, and strings
• Packer and compiler details
• Compile time, entry point
• Import address hash,
• Function calls and APIs
Ransomware: CTB-Locker (pre-execution)
Image: http://www.abcya.com/word_clouds.htm
23. 23
Unsupervised Machine Learning
Height
Weight
We are given a large set of dogs of different breeds (Chihuahuas, Beagles, Dachshunds)We can use two features to distinguish them - their height and weight.How can we determine which dog falls into which breed?
28. 28
Modeling a Machine Learning Classifier
Input Data
• Executables, compiled code, documents
Feature Engineering
• N-grams, entropy of sections
Labels
• Is malicious or clean?
• Belongs to a certain family of malware
• Capabilities (keyloggers, backdoors)
Model
• Assigns a sample to an output class
• Support vector machines, Naïve Bayes,
random forests, neural networks
Output Layer
Hidden Layers Output Layer
31. 31
Causative: Poisoning Sample Collections
2. Submit samples to VirusTotal
or any other public malware
collection site
1. Insert signature
fragments into
clean files
4. Many vendors reshare the
samples and trust the
malicious classification
6. Potential FP
on clean files
by the model
5. Vendor using malicious
sample for training models
3. Trusted vendor
will start detecting
those files
34. 34
Defenses Against Machine
Learning Attacks
Exploratory attack
• Training data: Prevent the attacker from knowing training
data
• Feature selection: Harden classifiers against attack by
using multiple features
Causative attack: Attacker has some degree of control
over the training data. Learning should be resilient to
poisoning attacks
• Do empirical analysis of training instances to make it more
resilient
• Human in loop approach
36. 36
Real Protect
• Detects zero-day malware in near real time
• Classification of malware based on behavior and static analysis
• Uses machine learning to automate classification
• Signature-less, small client footprint
• Supports both offline mode and online mode (cloud) of classification
• Improves detection up to 30% on top of .DAT and McAfee® Global Threat Intelligence detections
• Augments McAfee endpoint security products for Windows
• Produces actionable threat intelligence
• Useful for patient zero discovery, threat actor attribution and forensic investigations
• Available now!
• Standalone: www.mcafee.com/us/downloads/free-tools/raptor.aspx
• Consumer Cloud AV product
• Enterprise availability in McAfee Endpoint Security 10.5 this year
37. 37
McAfee® Endpoint Security 10 Threat Prevention
Layered Approach
Whitelisting (Hash + Cert)
.DAT
McAfee Global Threat Intelligence
McAfee Threat Intelligence Exchange (Hash + Cert)
Real Protect - Static
Dynamic App Containment
Real Protect - Behavioral
Threat
Prevention
Web Control
Firewall
TIE
Future Modules
Pre-Execution
Post-Execution
Post-Execution
39. 39
ATDml technology in a Nutshell
ATDml = Signatureless deep learning classifier that leverages sandboxing technology to
achieve high-precision malware conviction rate
40. 40
Deep Learning in the Sandbox
Malware samples
Sandbox
Original Binary
Feature Vector
Behavior
Trained
Parameters
Prediction
Training
Prediction
Framework
Feature Vector
Feature Normalization
Dimensionality reduction
Unpacked File
Deep Learning
Output Layer
Hidden Layers
Input Layer
41. 41
What Are We Going to Demo Here?
1. Shows advanced ways of evading detection
by utilizing a crypter by adding static and
behavioral evasion
2. How deep learning in the sandbox is able to
detect the most evasive and previously
unseen malware
Unmask the
Attack
44. 44
ATDml Value Proposition
1. Zero-day detection by deep analysis: Efficient
classification of new and previously unseen
malware by leveraging deep learning
2. Resilience to evasion: Model to be highly
resilient to evasive techniques used to bypass
detection
3. Identify intention of attack: Ability to bring in
malware attribution to identify the intention of
the attack
Polymorphic & Metamorphic Malware
Rootkits and bootkits
Sandbox aware malware
Attacks on Disassembly and Packing
Behavioral Polymorphism
Created using: http://www.abcya.com/word_clouds.htm
Inspired by the inner working of human brain
Loose model of human brain that could be programmed in a computer
Neural network learns from observational data, figuring out its own solution to the problem.
Used in areas such as pattern recognition and data classification
Nop insertion
Register renaming
Junk insertion
Instruction reordering
Encryption
Compression
Branch condition modification
Instruction substitution
OS Fingerprinting
Interaction based
System Tampering
Latent Execution
Hypervisor detection
Basic block reordering