Binary Analysis - Luxembourg

Binary Analysis for Vulnerability
Detection
National University of Singapore
http://www.comp.nus.edu.sg/~abhik
Visit to University of Luxembourg S&T center, January 2017.
1
Research project with DSO National Labs, 2013-16.
“TSUNAMi: Trustworthy systems from un-trusted
component amalgamations”
National Research Foundation (NRF), 2015-2020.

Singapore
2
274 sq. mi., 5 million population, about 12 hours flight from Luxembourg.

NUS
3
Founded 1905.
9000 grad. &
23000
undergrad.
from 88
countries.

Cybersecurity research
4
The National Cybersecurity R&D Programme seeks to develop R&D expertise and
capabilities in cybersecurity for Singapore. It aims to improve the trustworthiness of
cyber infrastructures with an emphasis on security, reliability, resiliency and usability.
A 5-year S$130 million funding will be available to support research efforts into both
technological and human-science aspects of cybersecurity in the following outcome-
based R&D themes. The themes are designed to provide an element of operational
context, while not restricting “game-changing” ideas from the community.
Cybersecurity research spans six themes:
Scalable Trustworthy Systems:
Resilient Systems:
Effective Situation Awareness and Attack Attribution:
Combatting Insider Threats:
Threats Detection, Analysis and Defence: Efficient and Effective Digital Forensics:
https://www.nrf.gov.sg/programmes/national-cybersecurity-r-d-
programme

Outline
• NCR project – Trustworthy systems from Un-trusted
Components
• Technical contributions in Binary Analysis
• Technology showcase
• Initiatives – Consortium
5

COTS-integrated
Platforms
6
Trustworthy System
Outsourced and Shared Data
Vulnerability
Malicious
Behavior
Flaws
Data Breach
Binary analysis of paramount need for software acquisition or assembly.

Vulnerability
Discovery
Binary
Hardening
Verification
Data
Protection
7
Agency
Collaboration
– DSTA, …
Industry
Collaboration
ST, Symantec,
NEC, …
Education – NUS
(New module)
Research Outputs – Publications, Tools, Academic
Collaboration, Exchanges, Seminars, Workshops
Enhancing local
capabilities

Use of research in NRF
project
• Binary Analysis
o Useful to government agencies for procuring software.
o Deep binary analysis on evaluation version prior to
procurement.
• Binary hardening
o Useful to government agencies on procured software.
• Point technologies from individual work-packages.
8

Contributions
• Binary analysis for
 Fuzz testing
 Comprehension
 Debugging
 Patching (Latest work)
• -> Research Program at NUS since 2008, with
DRTech, DSO, …
9

Video
• https://youtu.be/C1hl_ujw6B0
• (1 Minute)
• https://youtu.be/EHBjMSQvIpg
• (1 Minute)
10

Who cares?
11
A team of hackers won $2 million by
building a machine that could hack better
than they could
Read more at
http://www.businessinsider.sg/forallsecure-
mayhem-darpa-cyber-grand-challenge-2016-
8/#ZuIF7Dmq3aaCAdaq.99
DARPA Cyber Grand
Challenge
-> Automation of Security
[detecting and fixing
vulnerabilities in binaries
automatically]

Fuzz Testing
12
Springfield Project - Fuzzing as a service
OSS-Fuzz - Continuous fuzzing for open-source projects
Pioneered by Barton Miller at Unv. of Wisconsin in 1988
And now, in 2016 …

A true story – why fuzz?
• May 4, 2015
o Abhik was preparing lecture notes on fuzzing.
o 11:00 AM – finished deciding on structure and trying to decide on a motivating
example for fuzzing to interest the students, there are so many of them.
o 11:11 AM – I get email update about a latest incident – an integer overflow in
Boeing – a classic case where an automated method for sending out mal-formed
or boundary inputs can reveal errors.
13

Presented by Thuan Pham
(Model-Based) Black-
box Fuzzing
1
📄 Model-
Based
Blackbox
Fuzzing
Input model
Peach, Spike …
Seed Input
📄
📄
📄
Pass all checks
Satisfy some checks
Satisfy some checks
Mutated Inputs

Presented by Thuan Pham
📄 📄📄 📄AFLFast
(Coverage-based) Grey-box
Fuzzing
15
Seed Inputs Mutated Inputs
…
📄📄
Input Queue
Put “interesting” inputs back in the queue
EnqueueDequeue

Problem Statement
• How to direct the exploration to
reach certain locations or targets, or
enhance coverage
o in large-scale program binaries
o with highly-structured inputs (e.g., multi-media
files)
o given inadequate test suite or seeds.
17

Directed Search in White-
box Fuzzing
Apply to Crash Reproduction Problem
18
Crash reproducing supports
- In-house debugging and
fixing
- Vulnerability checking

Overview
19
Program binary
Benign input files
(Crash instruction, loaded
modules, call stack, register
values)
Crash input files
Hercules
Toolset
1. Directed Search Algorithm
2. Guided Selective Symbolic Execution

Control Flow Graph
Construction
Resolve indirect jumps/calls
20
IDA Pro
CFG
Generator
Jump Table Extraction
Edge Profiling
•Assembly code
•Direct Jumps/Calls
Indirect Jumps/Calls
CFG
Program binaries

First-cut Analyzer
21
• Output of Stage-1 : Flow Structures and input file(s) that can reach crash module
• Output of Stage-2 : refined CFG, MDG and Hybrid symbolic file
• Output of Stage-3: Crash input(s) and crash explanation (based on UNSAT core)

UNSAT-core
22
… …
b1
b2
b3
B4
bc1¬bc1
¬bc2 ¬bc3
¬bc4
bc2 bc3
bc4
First attempt:
PC = bc1 ^ ¬bc3 ^ bc4
PC ^ CC == UNSAT
bc1 contradicts CC
Second attempt:
PC’ = ¬bc1 ^ bc2 ^ bc4
PC’ ^ CC == SAT
1) Backtrack to b1
2) Take another
branch
Notations:
bx: branch instruction
bcx: branch condition at bx
PC: path condition
CC: crash condition
Crash
instruction

Evaluation
23
Progra
m
Advisory ID #Seed
files
Hercules Peach S2E
WMP 9.0 CVE-2014-2671 10
WMP 9.0 CVE-2010-0718 10
AR 9.2 CVE-2010-2204 10
RP 1.0 CVE-2010-3000 10
MP 0.35 CVE-2011-0502 10
OV 1.04 CVE-2010-0688 10
Time bound: 24hrs

Vulnerabilities in file-processing programs
24
315
399
328
352
304 310
199 203
343
169
0
100
200
300
400
500
2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
#CVE-assigned vulnerabilities by year
(US National Vulnerability Database) (By 30/8)
File Processing Programs

Combining Black-box and
White-box Fuzzing
25
Augmented MoBF
MoBF + Transplantation
Selective and Targeted
Whitebox Fuzzing
• Handles missing
data chunks by
data chunk
transplantation
• Enforces integrity
checks
• Guides data chunk
transplantation
• Explores deep
paths
• Generates specific
values causing
program crashes
Peach Fuzzer
Production-quality MoBF
Hercules (ICSE’15)
Scale to WMP, Adobe Reader

Crucial IF
27
Input File with
necessary part
Input File with
a missing part
Test
suites
Crucial
IFs

Experimental Results28
Program Advisory ID Input Model #Seed files Hercules++ Peach Hercules
VLC 2.0.7 OSVDB-95632 PNG 0 – 10
VLC 2.0.3 CVE-2012-5470 PNG 0 – 10
LTP 1.5.4 CVE-2011-3328 PNG 0 – 10
XNV1.98 Unknown-1 PNG 0 – 10
WMP 9.0 Unknown-4 WAV 10
WMP 9.0 CVE-2014-2671 WAV 10
WMP 9.0 CVE-2010-0718 MIDI 0 – 10
AR 9.2 CVE-2010-2204 PDF 10
RP 1.0 CVE-2010-3000 FLV 10
MP 0.35 CVE-2011-0502 MIDI 0 – 10
OV 1.04 CVE-2010-0688 ORB 0 – 10

Coverage-based Grey-box
Fuzzing
AFL, LibFuzzer …
2
Mutators
Test suite
Mutated files
Input Queue
EnqueueDequeue

Exposing paths in Grey-
Box Fuzzing
30

Key change
31
• Input: Seed Inputs S
• 1: T✗ = ∅
• 2: T = S
• 3: if T = ∅ then
• 4: add empty file to T
• 5: end if
• 6: repeat
• 7: t = chooseNext(T)
• 8: p = assignEnergy(t)
• 9: for i from 1 to p do
• 10: t0 = mutate_input(t)
• 11: if t0 crashes then
• 12: add t0 to T✗
• 13: else if isInteresting(t0 ) then
• 14: add t0 to T
• 15: end if
• 16: end for
• 17: until timeout reached or abort-signal
• Output: Crashing Inputs T✗

• Constant:
o AFL uses this schedule (fuzzing ~1 minute)
o (i) .. how AFL judges fuzzing time for the test exercising path i
• Cut-off Exponential:
Power Schedules
p(i) = (i)
p(i) = 0, if f(i) > µ
min( (i)/β*2s(i), M) otherwise
β is a constant
s(i) #times the input exercising path i has been chosen for
fuzzing
f(i) #fuzz exercising path i (path-frequency)
µ mean #fuzz exercising a discovered path (avg. path-
frequency)
M maximum energy expendable on a state

Prioritize low probability paths
[CCS16]
 Use grey-box fuzzer which keeps track of path id for a test.
 Find probabilities that fuzzing a test t which exercises π leads to
an input which exercises π’
 Higher weightage to low probability paths discovered, to
gravitate to those -> discover new states in Markov Chain with
minimal effort.
33
π
π
'
1 void crashme (char* s) {
2 if (s[0] == ’b’)
3 if (s[1] == ’a’)
4 if (s[2] == ’d’)
5 if (s[3] == ’!’)
6 abort ();
7 }
p
8 CVEs in Binutils (3 new over
GB fuzzing)
Finds crashes 7x faster, as
compared to plain GB fuzzing.
Independent evaluation found
crashes 19x faster on DARPA
Cyber Grand Challenge (CGC)
binaries.

Coverage-based Greybox Fuzzing as Markov Chain
From Hackernews
1

Other works – Crash
Bucketing
35
p1
f1
f2 f3 f4x x
x
b2
b1
b4
b3
b5
 Identify culprit constraint
 Use culprit constraint as “reason” of failure
 Group failing paths having same “reason” together
Culprit constraint[Upcoming work
FASE17]
Point-of-Failure based Approach
Call-stack based Approach
Symbolic analysis based Approach

Program Repository Size
(kLOC)
#Failing
Tests
#Cluster
Point-of-
Failure
#Cluster
Stack hash
#Cluster
Symbolic
Analysis
mkfifo Coreutils 38 2 1 1 1
mkdir Coreutils 40 2 1 1 1
mknod Coreutils 39 2 1 1 1
md5sum Coreutils 43 48 1 1 1
pr Coreutils 54 6 2 2 4
ptx Coreutils 62 3095 16 1 3
seq Coreutils 39 72 1 1 18
paste Coreutils 38 4510 10 1 3
touch Coreutils 18 406 2 3 14
du Coreutils 41 100 2 2 8
cut Coreutils 43 5 1 1 1
grep SIR 61 7122 1 1 11
gzip SIR 44 265 1 1 1
seq SIR 57 31 1 1 1
polymorph BugBench 25 67 1 1 2
xmail Exploit-db 30 129 1 1 1
exim Exploit-db 253 16 1 1 6
gpg Exploit-db 218 2 1 1 1

Recall CGC
37
A team of hackers won $2 million by
building a machine that could hack better
than they could
Read more at
http://www.businessinsider.sg/forallsecure-
mayhem-darpa-cyber-grand-challenge-2016-
8/#ZuIF7Dmq3aaCAdaq.99
DARPA Cyber Grand
Challenge
-> Automation of Security
[detecting and fixing
vulnerabilities in binaries
automatically]

Automated Patching
• Automated patching – source code and binaries
o Vulnerability localization [where to fix]
• Hypothesize the error causes – suspect
o Symbolic execution [what values should be returned: angelic values]
• Specification of the suspicious fragment
• Input-output requirements from each test
• Repair constraint
o Program synthesis [which code can return these values]
• Decide operators which can appear in the fix
• Generate a fix by solving repair constraint.
39
Buggy
Program
Failing /
Passing
Tests
Patched
Program
Patching
Tool

Example
40
1 int is_upward( int inhibit, int up_sep, int down_sep){
2 int bias;
3 if (inhibit)
4 bias = down_sep; // bias= up_sep + 100
5 else bias = up_sep ;
6 if (bias > down_sep)
7 return 1;
8 else return 0;
9 }
inhibit up_sep down_se
p
Observed
output
Expected
Output
Result
1 0 100 0 0 pass
1 11 110 0 1 fail
0 100 50 1 1 pass
1 -20 60 0 1 fail
0 0 10 0 0 pass

Repair Constraint
41
1 int is_upward( int inhibit, int up_sep, int
down_sep){
2 int bias;
3 if (inhibit)
4 bias = f(inhibit, up_sep, down_sep)
5 else bias = up_sep ;
6 if (bias > down_sep)
7 return 1;
8 else return 0;
9 }
Inhibit
== 1
up_sep ==
11
down_se
p == 110
Symbolic Execution
f(1,11,110) > 110

Conjure up a function
• Instead of solving
• Select primitive components to be used by the synthesized program
based on complexity
• Look for a program that uses only these primitive components and
satisfy the repair constraint
o Done via another constraint solving problem – pgm. synthesis
• Solving the repair constraint is the key, not how it is solved
• Enumerate expressions over a given set of components / operators
o Enforce axioms of the operators
o If candidate repair contains a constant, solve using SMT
42
Repair Constraint:
f(1,11,110) > 110  f(1,0,100) ≤ 100
 f(1,-20,60) > 60

Patching Tool Released
43
SEMFIX: ICSE 2013, Angelix: ICSE 2016
http://angelix.io

Repair-ed
44
0
10
20
30
40
wireshark
php
gzip
gmp
libtiff
Overall
Angelix
SPR
GenProg
#Fixes Del Del, Per
Angelix 28 5 18%
SPR 31 13 42%
Subject LoC
wireshark 2814K
php 1046K
gzip 491K
gmp 145K
libtiff 77K

Over-fitting problem in
Program Repair
• Searches for arbitrary modifications could lead to
undesirable program modifications like deletion of
functionality
45
static void BadPPM(char file) {
fprintf(stderr, "%s: Not a PPM file.n",
file);
exit(-2);
}
➢Derived rules that disallow patches that cause significant changes to the
control flow or data-flow of the program
➢Benefits of Anti-patterns:
○ Can be easily integrated with any automated repair tools
○ Localizes Better
○ Generate Fixes Faster
Example of automatically generated patches Goal of Repair tools: Make all
test pass
Test: Pass if non-zero exit status
Trivial Patch: Delete exit(-2)
➢Should disallow this
modifications

“Latest”
Results
46
1 i f ( hbtype == TLS1 HB REQUEST) {
2 . . .
3 memcpy (bp , pl , payload ) ;
4 . . .
5 }
(a) The buggy part of the Heartbleed-
vulnerable OpenSSL
1 i f ( hbtype == TLS1 HB REQUEST
2 && payload + 18 < s->s3->rrec.length) {
3 . . .
4 }
(b) A fix generated automatically
1 if (1 + 2 + payload + 16 > s->s3->rrec.length)
2 return 0;
3 . . .
4 i f ( hbtype == TLS1_HB_REQUEST) {
5 . . .
6 }
7 e l s e i f ( hbtype == TLS1_HB_RESPONSE) {
8 . . .
9 }
10 r e t u r n 0 ;
(c) The developer-provided repair
The Heartbleed Bug is a serious vulnerability in the popular
OpenSSL cryptographic software library. This weakness allows
stealing the information protected, under normal conditions, by the
SSL/TLS encryption used to secure the Internet. SSL/TLS provides
communication security and privacy over the Internet for
applications such as web, email, instant messaging (IM) and some
virtual private networks (VPNs).
--- Source: heartbleed.com

• Scalable white-box analysis on binaries
• How Why For whom
• Cluster paths online Guide search SW Acquisition
• Control Symbolic Variables Extract semantics Developers with 3rd party code
• Hybrid symbolic file COTS system assembly
• Inject path sensitivity into GB
47
Collaborators: Marcel Boehme, Satish Chandra (Facebook), Sergey Mechtaev, Van
Thuan Pham, Mukul Prasad (Fujitsu), Shin Hwei Tan, Jooyong Yi, Hiroaki Yoshida
(Fujitsu).
Relevant papers: http://www.comp.nus.edu.sg/~abhik/projects/Repair/index.html
http://www.comp.nus.edu.sg/~abhik/projects/Fuzz/

Binary Analysis - Luxembourg

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Binary Analysis - Luxembourg

Similaire à Binary Analysis - Luxembourg (20)

Plus de Abhik Roychoudhury

Plus de Abhik Roychoudhury (7)

Dernier

Dernier (20)

Binary Analysis - Luxembourg

Notes de l'éditeur