1. A Checksum-Aware Directed fuzzing
Tool for Automatic Software
Vulnerability Detection
Tielei Wang1, Tao Wei1, Guofei Gu2, Wei Zou1
1
Peking University, China
2
Texas A&M University, US
2. 2
Checksum – a way to check the integrity of data.
Used in network protocols and files.
data
Checksum function
data
Checksum field
Fuzzing – generating malformed inputs and
feeding them to the application.
Dynamic Taint Analysis – runs a program and
observes which computations are affected by
predefined taint sources (e.g. input)
3. 3
The
input mutation space is enormous .
Most
malformed inputs dropped at an early
stage, if the program employs a checksum
mechanism.
4. 4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
void decode_image(FILE* fd){
...
int length = get_length(fd);
int recomputed_chksum = checksum(fd, length);
int chksum_in_file = get_checksum(fd);
//line 6 is used to check the integrity of inputs
if(chksum_in_file != recomputed_chksum)
error();
int Width = get_width(input_file);
int Height = get_height(input_file);
int size = Width*Height*sizeof(int);
int* p = malloc(size);
...
for(i=0; i<Height; i++){// read ith row to p
read_row(p+Width*i, i, fd);
5. 5
To infer whether/where a program checks the
integrity of input.
Identify which input bytes can flow into sensitive
points:
Taint analysis at byte level – monitors how application uses
the input data.
Create malformed input focusing the “hot bytes”.
Repair checksum fields in input, to expose
vulnerability.
Fully automatic
Found 27 new vulnerability – acrobat reader, google
picasa and more.
8. 8
Runs
the program with well-formed input.
Execution
Which input bytes related to arguments of API functions
(e.g.
monitor records:
malloc, strcpy) – “hot bytes” report.
Which bytes each conditional jump instruction depends on
(e.g.
JZ, JE, JB) – checksum report.
Considering
only data flow (no control flow).
9. 9
Instruments
instructions – movement (e.g.
MOV, PUSH), arithmetic (e.g. SUB,
ADD), logic (e.g. AND, XOR)
Taints all values written by an instruction
with union of all taint labels associated with
values used by that instruction.
Considering
also
eflags register.
eax {0x6, 0x7}, ebx {0x8, 0x9}
add eax, ebx
eax {0x6, 0x7, 0x8, 0x9}, eflags
10. 10
Input size is 1024 bytes
“hot bytes” report:
8
9
10
11
int Width = get_width(input_file);
int Height = get_height(input_file);
int size = Width*Height*sizeof(int);
int* p = malloc(size);
…
0x8048d5b: invoking malloc: [0x8,0xf]
…
12. 12
Checksum detector:
identify
potential checksum check points
the recomputed checksum value depends on
many input bytes
Instruments conditional jump. Before execution,
checks whether the number of marks associated
with eflags register exceeds a threshold.
Problem with decompressed bytes.
14. 14
Refinement:
Well-formed inputs can pass the checksum test,
but most malformed inputs cannot
Run well-formed inputs, identify the
always-taken and always-not-taken
instructions.
15. 15
Refinement:
Well-formed inputs can pass the checksum test,
but most malformed inputs cannot
Run well-formed inputs, identify the
always-taken and always-not-taken
instructions.
Run malformed inputs, also identify the
always-taken and always-not-taken
instructions.
16. 16
Refinement:
Well-formed inputs can pass the checksum test,
but most malformed inputs cannot
Run well-formed inputs, identify the
always-taken and always-not-taken
instructions.
Run malformed inputs, also identify the
always-taken and always-not-taken
instructions.
Identify the conditional jump
instructions that behaves completely
different when processing well-formed
and malformed inputs.
18. 18
Checksum detector:
Checksum
6
7
field identification
if(chksum_in_file != recomputed_chksum)
error();
Input bytes that affects chksum_in_file are
the checksum field.
19. 19
Generates
malformed test cases – feeds them
to the original or instrumented program.
According
to the bypass rules, alters the
execution traces at check points – sets the
eflags register.
20. 20
All
malformed test cases are constructed
based on the “hot bytes” information
Using attack heuristics:
bytes that influence memory allocation are set to small,
large or negative.
bytes that flow into string functions are replaced by
characters such as %n, %p.
Output
– test cases that could cause to crash
or consume 100% CPU.
21. 21
6
7
8
9
10
11
if(chksum_in_file != recomputed_chksum)
error();
int Width = get_width(input_file);
int Height = get_height(input_file);
int size = Width*Height*sizeof(int);
int* p = malloc(size);
Checksum report
…
0x8048d4f: JZ: 1024: [0x0,0x3ff]
…
Bypass info
0x8048d4f: JZ: always-taken
“hot bytes” report
…
0x8048d5b: invoking malloc: [0x8,0xf]
…
22. 22
6 if(chksum_in_file != recomputed_chksum)
7
error();
8
int Width = get_width(input_file);
9 Before executing 0x8048d4f,
int Height = get_height(input_file);
10 int size = Width*Height*sizeof(int);
11 the fuzzer sets the flag
int* p = malloc(size);
in
eflags
Checksum report
to an
…
0x8048d4f: JZ: 1024: [0x0,0x3ff]
…
Bypass info
0x8048d4f: JZ: always-taken
ZF
opposite value
…
“hot bytes” report
0x8048d5b: invoking malloc: [0x8,0xf]
…
23. 23
Fixing
is expensive - fixes checksum fields
only in test cases that caused crashing.
How?
Cr – row data in the checksum field
D – input data protected by checksum filed
Checksum() – the complete checksum algorithm
T – transformation
We want to pass the constraint:
Checksum(D) == T(Cr)
24. 24
Using symbolic execution to solve:
Checksum(D) == T(Cr)
Checksum(D) is a runtime determinable constant:
c== T(Cr)
Only Cr is a symbolic value.
Common transformations (e.g. converting from
hex/oct to decimal), can be solved by existing
solvers (STP).
25. 25
If the new test case cause the original
program to crash,
a potential vulnerability is detected!
32. 32
TaintScope
cannot deal with secure integrity
check schemes (e.g. cryptographic hash
algorithms, digital signature) – impossible to
generate valid test cases.
Limited effectiveness when all input data are
encrypted (tracking decrypted data).
Checksum check points identification can be
affected by the quality of inputs.
Not tracks control flow propagation.
Not all instructions of x86 are instrumented
by the execution monitor.
33. 33
TaintScope can perform:
Directed fuzzing
Identify which bytes flow into system/library
calls.
dramatically reduce the mutation space.
Checksum-aware
fuzzing
Disable checksum checks by control flow
alternation.
Generate correct checksum fields in invalid
inputs.