Beyond the EU: DORA and NIS 2 Directive's Global Impact
Binary Analysis - Luxembourg
1. Binary Analysis for Vulnerability
Detection
National University of Singapore
http://www.comp.nus.edu.sg/~abhik
Visit to University of Luxembourg S&T center, January 2017.
1
Research project with DSO National Labs, 2013-16.
“TSUNAMi: Trustworthy systems from un-trusted
component amalgamations”
National Research Foundation (NRF), 2015-2020.
4. Cybersecurity research
4
The National Cybersecurity R&D Programme seeks to develop R&D expertise and
capabilities in cybersecurity for Singapore. It aims to improve the trustworthiness of
cyber infrastructures with an emphasis on security, reliability, resiliency and usability.
A 5-year S$130 million funding will be available to support research efforts into both
technological and human-science aspects of cybersecurity in the following outcome-
based R&D themes. The themes are designed to provide an element of operational
context, while not restricting “game-changing” ideas from the community.
Cybersecurity research spans six themes:
Scalable Trustworthy Systems:
Resilient Systems:
Effective Situation Awareness and Attack Attribution:
Combatting Insider Threats:
Threats Detection, Analysis and Defence: Efficient and Effective Digital Forensics:
https://www.nrf.gov.sg/programmes/national-cybersecurity-r-d-
programme
5. Outline
• NCR project – Trustworthy systems from Un-trusted
Components
• Technical contributions in Binary Analysis
• Technology showcase
• Initiatives – Consortium
5
8. Use of research in NRF
project
• Binary Analysis
o Useful to government agencies for procuring software.
o Deep binary analysis on evaluation version prior to
procurement.
• Binary hardening
o Useful to government agencies on procured software.
• Point technologies from individual work-packages.
8
9. Contributions
• Binary analysis for
Fuzz testing
Comprehension
Debugging
Patching (Latest work)
• -> Research Program at NUS since 2008, with
DRTech, DSO, …
9
11. Who cares?
11
A team of hackers won $2 million by
building a machine that could hack better
than they could
Read more at
http://www.businessinsider.sg/forallsecure-
mayhem-darpa-cyber-grand-challenge-2016-
8/#ZuIF7Dmq3aaCAdaq.99
DARPA Cyber Grand
Challenge
-> Automation of Security
[detecting and fixing
vulnerabilities in binaries
automatically]
12. Fuzz Testing
12
Springfield Project - Fuzzing as a service
OSS-Fuzz - Continuous fuzzing for open-source projects
Pioneered by Barton Miller at Unv. of Wisconsin in 1988
And now, in 2016 …
13. A true story – why fuzz?
• May 4, 2015
o Abhik was preparing lecture notes on fuzzing.
o 11:00 AM – finished deciding on structure and trying to decide on a motivating
example for fuzzing to interest the students, there are so many of them.
o 11:11 AM – I get email update about a latest incident – an integer overflow in
Boeing – a classic case where an automated method for sending out mal-formed
or boundary inputs can reveal errors.
13
14. Presented by Thuan Pham
(Model-Based) Black-
box Fuzzing
1
📄 Model-
Based
Blackbox
Fuzzing
Input model
Peach, Spike …
Seed Input
📄
📄
📄
Pass all checks
Satisfy some checks
Satisfy some checks
Mutated Inputs
15. Presented by Thuan Pham
📄 📄📄 📄AFLFast
(Coverage-based) Grey-box
Fuzzing
15
Seed Inputs Mutated Inputs
…
📄📄
Input Queue
Put “interesting” inputs back in the queue
EnqueueDequeue
17. Problem Statement
• How to direct the exploration to
reach certain locations or targets, or
enhance coverage
o in large-scale program binaries
o with highly-structured inputs (e.g., multi-media
files)
o given inadequate test suite or seeds.
17
18. Directed Search in White-
box Fuzzing
Apply to Crash Reproduction Problem
18
Crash reproducing supports
- In-house debugging and
fixing
- Vulnerability checking
20. Control Flow Graph
Construction
Resolve indirect jumps/calls
20
IDA Pro
CFG
Generator
Jump Table Extraction
Edge Profiling
•Assembly code
•Direct Jumps/Calls
Indirect Jumps/Calls
CFG
Program binaries
21. First-cut Analyzer
21
• Output of Stage-1 : Flow Structures and input file(s) that can reach crash module
• Output of Stage-2 : refined CFG, MDG and Hybrid symbolic file
• Output of Stage-3: Crash input(s) and crash explanation (based on UNSAT core)
22. UNSAT-core
22
… …
b1
b2
b3
B4
bc1¬bc1
¬bc2 ¬bc3
¬bc4
bc2 bc3
bc4
First attempt:
PC = bc1 ^ ¬bc3 ^ bc4
PC ^ CC == UNSAT
bc1 contradicts CC
Second attempt:
PC’ = ¬bc1 ^ bc2 ^ bc4
PC’ ^ CC == SAT
1) Backtrack to b1
2) Take another
branch
Notations:
bx: branch instruction
bcx: branch condition at bx
PC: path condition
CC: crash condition
Crash
instruction
23. Evaluation
23
Progra
m
Advisory ID #Seed
files
Hercules Peach S2E
WMP 9.0 CVE-2014-2671 10
WMP 9.0 CVE-2010-0718 10
AR 9.2 CVE-2010-2204 10
RP 1.0 CVE-2010-3000 10
MP 0.35 CVE-2011-0502 10
OV 1.04 CVE-2010-0688 10
Time bound: 24hrs
31. Key change
31
• Input: Seed Inputs S
• 1: T✗ = ∅
• 2: T = S
• 3: if T = ∅ then
• 4: add empty file to T
• 5: end if
• 6: repeat
• 7: t = chooseNext(T)
• 8: p = assignEnergy(t)
• 9: for i from 1 to p do
• 10: t0 = mutate_input(t)
• 11: if t0 crashes then
• 12: add t0 to T✗
• 13: else if isInteresting(t0 ) then
• 14: add t0 to T
• 15: end if
• 16: end for
• 17: until timeout reached or abort-signal
• Output: Crashing Inputs T✗
32. • Constant:
o AFL uses this schedule (fuzzing ~1 minute)
o (i) .. how AFL judges fuzzing time for the test exercising path i
• Cut-off Exponential:
Power Schedules
p(i) = (i)
p(i) = 0, if f(i) > µ
min( (i)/β*2s(i), M) otherwise
β is a constant
s(i) #times the input exercising path i has been chosen for
fuzzing
f(i) #fuzz exercising path i (path-frequency)
µ mean #fuzz exercising a discovered path (avg. path-
frequency)
M maximum energy expendable on a state
33. Prioritize low probability paths
[CCS16]
Use grey-box fuzzer which keeps track of path id for a test.
Find probabilities that fuzzing a test t which exercises π leads to
an input which exercises π’
Higher weightage to low probability paths discovered, to
gravitate to those -> discover new states in Markov Chain with
minimal effort.
33
π
π
'
1 void crashme (char* s) {
2 if (s[0] == ’b’)
3 if (s[1] == ’a’)
4 if (s[2] == ’d’)
5 if (s[3] == ’!’)
6 abort ();
7 }
p
8 CVEs in Binutils (3 new over
GB fuzzing)
Finds crashes 7x faster, as
compared to plain GB fuzzing.
Independent evaluation found
crashes 19x faster on DARPA
Cyber Grand Challenge (CGC)
binaries.
35. Other works – Crash
Bucketing
35
p1
f1
f2 f3 f4x x
x
b2
b1
b4
b3
b5
Identify culprit constraint
Use culprit constraint as “reason” of failure
Group failing paths having same “reason” together
Culprit constraint[Upcoming work
FASE17]
Point-of-Failure based Approach
Call-stack based Approach
Symbolic analysis based Approach
37. Recall CGC
37
A team of hackers won $2 million by
building a machine that could hack better
than they could
Read more at
http://www.businessinsider.sg/forallsecure-
mayhem-darpa-cyber-grand-challenge-2016-
8/#ZuIF7Dmq3aaCAdaq.99
DARPA Cyber Grand
Challenge
-> Automation of Security
[detecting and fixing
vulnerabilities in binaries
automatically]
39. Automated Patching
• Automated patching – source code and binaries
o Vulnerability localization [where to fix]
• Hypothesize the error causes – suspect
o Symbolic execution [what values should be returned: angelic values]
• Specification of the suspicious fragment
• Input-output requirements from each test
• Repair constraint
o Program synthesis [which code can return these values]
• Decide operators which can appear in the fix
• Generate a fix by solving repair constraint.
39
Buggy
Program
Failing /
Passing
Tests
Patched
Program
Patching
Tool
40. Example
40
1 int is_upward( int inhibit, int up_sep, int down_sep){
2 int bias;
3 if (inhibit)
4 bias = down_sep; // bias= up_sep + 100
5 else bias = up_sep ;
6 if (bias > down_sep)
7 return 1;
8 else return 0;
9 }
inhibit up_sep down_se
p
Observed
output
Expected
Output
Result
1 0 100 0 0 pass
1 11 110 0 1 fail
0 100 50 1 1 pass
1 -20 60 0 1 fail
0 0 10 0 0 pass
41. Repair Constraint
41
1 int is_upward( int inhibit, int up_sep, int
down_sep){
2 int bias;
3 if (inhibit)
4 bias = f(inhibit, up_sep, down_sep)
5 else bias = up_sep ;
6 if (bias > down_sep)
7 return 1;
8 else return 0;
9 }
Inhibit
== 1
up_sep ==
11
down_se
p == 110
Symbolic Execution
f(1,11,110) > 110
42. Conjure up a function
• Instead of solving
• Select primitive components to be used by the synthesized program
based on complexity
• Look for a program that uses only these primitive components and
satisfy the repair constraint
o Done via another constraint solving problem – pgm. synthesis
• Solving the repair constraint is the key, not how it is solved
• Enumerate expressions over a given set of components / operators
o Enforce axioms of the operators
o If candidate repair contains a constant, solve using SMT
42
Repair Constraint:
f(1,11,110) > 110 f(1,0,100) ≤ 100
f(1,-20,60) > 60
45. Over-fitting problem in
Program Repair
• Searches for arbitrary modifications could lead to
undesirable program modifications like deletion of
functionality
45
static void BadPPM(char file) {
fprintf(stderr, "%s: Not a PPM file.n",
file);
exit(-2);
}
➢Derived rules that disallow patches that cause significant changes to the
control flow or data-flow of the program
➢Benefits of Anti-patterns:
○ Can be easily integrated with any automated repair tools
○ Localizes Better
○ Generate Fixes Faster
Example of automatically generated patches Goal of Repair tools: Make all
test pass
Test: Pass if non-zero exit status
Trivial Patch: Delete exit(-2)
➢Should disallow this
modifications
46. “Latest”
Results
46
1 i f ( hbtype == TLS1 HB REQUEST) {
2 . . .
3 memcpy (bp , pl , payload ) ;
4 . . .
5 }
(a) The buggy part of the Heartbleed-
vulnerable OpenSSL
1 i f ( hbtype == TLS1 HB REQUEST
2 && payload + 18 < s->s3->rrec.length) {
3 . . .
4 }
(b) A fix generated automatically
1 if (1 + 2 + payload + 16 > s->s3->rrec.length)
2 return 0;
3 . . .
4 i f ( hbtype == TLS1_HB_REQUEST) {
5 . . .
6 }
7 e l s e i f ( hbtype == TLS1_HB_RESPONSE) {
8 . . .
9 }
10 r e t u r n 0 ;
(c) The developer-provided repair
The Heartbleed Bug is a serious vulnerability in the popular
OpenSSL cryptographic software library. This weakness allows
stealing the information protected, under normal conditions, by the
SSL/TLS encryption used to secure the Internet. SSL/TLS provides
communication security and privacy over the Internet for
applications such as web, email, instant messaging (IM) and some
virtual private networks (VPNs).
--- Source: heartbleed.com
47. • Scalable white-box analysis on binaries
• How Why For whom
• Cluster paths online Guide search SW Acquisition
• Control Symbolic Variables Extract semantics Developers with 3rd party code
• Hybrid symbolic file COTS system assembly
• Inject path sensitivity into GB
47
Collaborators: Marcel Boehme, Satish Chandra (Facebook), Sergey Mechtaev, Van
Thuan Pham, Mukul Prasad (Fujitsu), Shin Hwei Tan, Jooyong Yi, Hiroaki Yoshida
(Fujitsu).
Relevant papers: http://www.comp.nus.edu.sg/~abhik/projects/Repair/index.html
http://www.comp.nus.edu.sg/~abhik/projects/Fuzz/
Notes de l'éditeur
It is the reason why the model-based blackbox fuzzing technique comes in.
The technique has been implemented in some well-known tools like Peach Fuzzer and Spike. Basically, the idea is using an input model (someone calls it input grammar) which specifies the information of file format such as the data chunk types and data fields. With that support of input model, the fuzzing tool can generate more valid and semi-valid inputs, As a result, these inputs can lead to deeper program paths and have more chance to expose vulnerabilities.
The first and common technique is blackbox fuzzing. It considers the PUT as a black box, and have no information about it. Given a seed input, the tool randomly mutate or modify some parts of the seed file to generate massive number of new files before feeding them to the program under test, and monitor the program to detect abnormal behaviours like program crashes.
However, since the seed file is randomly mutated, it is very likely that a large portion of the mutated files will be rejected by the parser code because these file are invalid respect to the file format.
File processing programs are everywhere.
Even though these programs are carefully tested, according to the data we collect from the US National Vulnerability Database, in 10 years, since 2007 the NVD has assigned CVE ID for more than 3000 vulnerabilities found in these programs. The number could be much bigger because we do not know how many vulnerabilities which have been discovered but not reported to NVD. Maybe several of them are sold in the black market so attackers can use them to exploit the affected programs and attack our systems.
In fact, a large portion of these vulnerabilities has been exposed by crafted common media and document file formats which we use very often in our daily life, such as MIDI, FLV, PDF, PNG. Because of that, it is the pressing need for us to design a better testing technique to effectively and efficiently discover before some attackers can do it.
Data chunk transplantation is the key idea in our new Whitebox Fuzzing approach - we call it Model-based Whitebox Fuzzing because this is a combination with substantial modifications between Model-based Blackbox Fuzzing and normal Whitebox Fuzzing.
Model-based Blackbox Fuzzing side handles the missing data chunk problem by implementing the data chunk transplantation idea. Moreover, having the input model, it also enforces the integrity constraints of generated test cases.
On the right hand side, the whitebox fuzzing supports the data chunk transplantation by providing some guidance. I will explain how Whitebox Fuzzing can support Data chunk transplantation in details in the next few slides. Moreover, Whitebox Fuzzing does concolic exploration to reach potential target crash locations and generate specific values that can cause program to crash. In terms of implementation, we build our system on top of Peach Fuzzer - a production-quality fuzzer and Hercules -- a selective and targeted whitebox fuzzing.
Now, let me explain in details how our system is designed and implemented. First of all, let me explain how the input model is written and how the original version of Peach Fuzzer works. These things are important to fully understand our approach.
More satisfying to me as a security researcher than any academic award.
Suppose f1 is a failing path. To identify the culprit constraint of f1, out technique explore all paths in DFS search strategy until it finds the closest passing path p. During the exploration, some new failing paths (f2,f3,f4) and some infeasible paths will be traversed/detected. The branch condition of the branch from which the passing path p deviates is identified as culprit constraint.