La charla está enfocada en una herramienta de análisis de código estático, la cuál se encuentra en desarrollo actualmente, enfocada específicamente en la búsqueda de vulnerabilidades, en vez de centrarse en errores típicos de programación como las más populares herramientas de análisis de código tales como Coverity o Klockwork. Durante el transcurso de la misma se irá dando toda la base necesaria para entender el funcionamiento de estas herramientas, la diferencia entre herramientas para buscar bugs y vulnerabilidades así como la parte que el ponente considera fundamental de dar interactividad a este tipo de herramientas.
Al final de la charla se mostrará una pequeña demo de la herramienta actual y algunos fallos/vulnerabilidades encontrados gracias a la misma.
2. Static Analysis Tools
● What are them?
– Tools to find properties of a given piece of
software without actually executing it.
– The “properties” I find in this case are
bugs/vulnerabilities.
● We need good static analysis tools for
performing audits in software.
3. Why?
● Software is becoming bigger and bigger.
● As so, harder to analyze.
– Examples: Firefox, Google Chrome, MS Office...
● Auditing software like this, by hand, is tedious and
takes a long while.
● Fuzzing is good for finding vulnerabilities in such big
products.
– But is not the solution (neither is SA, I think).
– Is just another useful tool.
7/04/13
4. Why?
● Typical old vulnerabilities easily found by quick
manual code audits are almost gone, bye-bye!
– strcpy, memcpy, sprintf, syslog, etc...
● No vulnerabilities like this in highly audited code
bases (except maybe sudo or freetype...).
– Apache, Firefox, Google Chrome...
● We need better tools.
– My approach: Static analysis (Fugue).
7/04/13
6. What do we need tools for?
● For highlighting interesting possible error prone areas.
– Thus, reducing the number of areas the auditor
needs to focus on.
● For "automagically" finding known vulnerabilities.
– For example, bad usage of API calls.
● For matching a vulnerability of type/pattern A in
software B in other software C.
– Vulnerability extrapolation.
● ...
7/04/13
7. What do we need tools for?
● For checking against specific rules or patterns for the
software being audited.
– Different rules applies to every different software.
– Vulnerabilities specific to one product.
● For doing all of the previous things against a software
in either binary or source code format.
– Or even both.
● For doing all of this interactively.
– Why is IDA the best disassembler out there?
7/04/13
8. Interactivity is key
● We need automatic tools that can be
corrected by a human.
– The tool will make mistakes a human can
recognize.
● We need to let the human identify and
correct those mistakes “somehow”.
● We need, also, a way to let the auditor
decide what is (s)he interested in and what
is not.
7/04/13
9. Bug/Vulnerability Finding Tools
● There are plenty of bug finding tools:
– Coverity, Klockwork, Fortify, CodeSonar, etc...
● They all find different bugs.
– There is no tool A that finds a superset of bugs found by
B and/or C.
● They're good at finding bugs (and some
vulnerabilities).
● But they are focused on a different audience...
– In my opinion, bug and vulnerability finding tools are
different because of this.
7/04/13
10. Bug finding tools → Developers
● They try to find any kind of software defect.
● They try to minimize the complexity of alerts.
● They try to minimize the number of false positives to the
minimum possible.
– Sometimes, even dropping checkers that can find awesome
bugs but the false positive ratio is “high”.
● They tend to remove anything the developers cannot
understand or that can be too hard to understand.
– Otherwise, every bug would be, blindly, considered a false
positive and the tool would be, finally, ignored.
7/04/13
11. Vuln finding tools → Auditors
● I'm not interested on any kind of software defects (i.e., div
by zero). Only “theoretically” exploitable ones.
– Or perhaps yes: vulns in exception handlers...
● I don't mind to analyze 100 false positives if for every 100 I
get one awesome vulnerability.
● I don't mind having to spend a day or a week
understanding what a complex checker said if it's worth it.
– If it's really a vulnerability, it's even better.
– The harder it's to find the lower the chances that somebody
else found it.
7/04/13
12. How to do it?
● Steps:
– Identify the source code
– Parse the source code
– Translate the source code
– Understand the program
– Run checkers against the program
– Interact with the auditor
– Go to “Run checkers” or “Parse the source code” again...
7/04/13
13. Identifying the source
● A tool like this must be able to identify the source before anything
else.
– The "source" can be either real source code (C/C++/...),
disassembly code or decompiled code.
● If the tool cannot handle both source codes and binaries the tool will
be too restricted.
● Identifying the "source" is not as easy as it may sounds at first
chance...
– Correct disassembly, for example, is a problem.
– Auditor's interaction is required.
– Complete or partial source code.
● Include paths, conditional compilation, etc...
7/04/13
14. Parsing the source
● Typical misconception/false statement:
“Parsing source code is an already
solved problem”
7/04/13
16. Parsing source code
● Writing a parser for one compiler is a big task, but can be done
“easily”.
● Writing a parser for *any* compiler's accepted source code is a huge
task.
– You must accept and parse even malformed code.
– Examples: MS Visual C++ precompiler headers.
● You can write whatever you want before the first include.
● A parser for just one compiler doesn't have this kind of problems.
– You just accept what you consider OK.
● For finding vulnerabilities, your parser must accept anything you feed
with.
7/04/13
17. Writing a parser
● You need to parse “the source” to get the AST.
– Abstract Syntax Tree. More on this later...
● I don't like to reinvent the wheel and I don't
recommend you.
– Don't write your own parser.
– No.
– Really.
● Use an existing parser than can handle as
many “dialects” as possible.
7/04/13
18. “Writing” a parser
● For my 1st prototype, I used pycparser.
– OK for a quick prototype, not for the final tool.
● It would be a bad choice for many reasons, like:
– It only accepts well formed C.
● I wrote “filters” to “clean” the not accepted C...
– It only accepts C source for which all types are known.
– If just one error happens during parsing, it stops and cannot
recover from it.
– I patched it to try to recover from errors. But sometimes, it is
simply, not possible.
7/04/13
19. “Writing” a parser
● Fugue uses libclang. It accepts virtually anything.
– Very good at recovering from errors.
– Talking about C source code, it "swallows" almost
anything.
– Supports also C++ and Objective-C.
● Proved to be good in real scenarios: i.e., klockwork uses it.
● If you happen to have a rich uncle, Edison Design Group
C++ frontend is, probably, the best choice.
– Proved to be good in real scenarios: i.e., coverity uses
it.
7/04/13
20. A “parser” for binaries
● You need to parse "disassembly" to get the
AST (Abstract Syntax Tree).
● Parsing disassembly is, in my opinion, far
easier than parsing real source code.
– The code is not that flexible.
● But there are problems:
– Many different assemblies: ARM, 8086, 8087,
AMD64, MIPS, PPC, etc...
7/04/13
21. A “parser” for binaries
● What do? Intermediate representations.
– Translators of assembly.
– Examples:
● REIL (Zynamics).
7/04/13
22. A “parser” for binaries
● My idea: instead of writing a translator for the processors you want, use
existing tools.
– Decompilers. [Public] decompilers for x86 and ARM exists (Hex-Rays).
● Using them "could be" a good idea.
– Hex-Rays decompilers export an API to get the AST for a function.
– Just what I want.
● Problems:
– The decompilers are writen for humans to understand the code.
– Not writen for programs to find vulnerabilities.
– A bad decompiler assumption may generate a lot of false positives.
● Example: GCC.
7/04/13
23. GCC and decompiled code
● Given this example C source code, my
prototype found (only) 3 errors.
7/04/13
24. GCC and decompiled code
● However, running my tool against the
decompiled code for this toy program, 4
appeared.
● Notice the warning for “init_proc” function.
7/04/13
25. GCC and decompiled code
● Why this false positive? Because of a bad
decompiler assumption:
● The function “init_proc” returns void, not int.
7/04/13
26. More problems with decompilers
● This problem is easy to identify and fix.
● What about this one?
Source Code Decompiled Code
7/04/13
27. Problems with decompiled code
● It isn't a bug in the decompiler neither a
bad assumption.
● It is a compiler optimization.
● It is only noticeable in real source code.
– Having source code is very easy to identify:
Dead code.
● NOTE: Having both source code and
binaries this (and others optimizations) can
be detected and used.
7/04/13
28. Translating the “source”
● No matter how, we have the AST (Abstract
Syntax Tree).
– What is this?
7/04/13
29. Abstract Syntax Tree
● Extracted from Wikipedia:
“In computer science, an abstract syntax tree (AST), or
just syntax tree, is a tree representation of the abstract
syntactic structure of source code written in a
programming language. Each node of the tree denotes a
construct occurring in the source code. The syntax is
'abstract' in the sense that it does not represent every
detail that appears in the real syntax.”
7/04/13
30. Example AST
● An AST for the
following code:
while b != 0
if a > b
a = a – b
else
b = b – a
return a
7/04/13
31. Translating the source
● Every tool I use will have a different AST.
– Example: libclang and Hex-Rays decompiler.
● Need to translate the different ASTs
supported to an internal AST format.
– Not hard. But though.
● We have it! What's next? Typical error:
– Why do anything else? Just use the AST for
finding bugs! Let's do write checkers now!
7/04/13
33. Using the AST for finding bugs
● Do not use the AST for finding bugs.
– You're using the wrong tool for this task.
● Use the AST to build the CFG.
– Control Flow Graph, more on this later.
● However, ASTs are good for:
– Finding and enforcing specific code styles.
– Indenting source code.
– Writing source-to-source translators
– ...
7/04/13
34. Using the AST
● You have the AST for every function in either the
binary or the code base you want to audit.
● With the internal representation of the AST many other
things are still needed:
– The call graph of the program. Sort of easy, but not
always: function pointers, virtual functions,
constructors/destructors, etc...
– The control flow graph (CFG) of every function.
● Identify basic blocks and relationships between them.
– ...
7/04/13
35. More things...
●
More things still needed…
– The super control flow graph of the program.
● A call graph where every called function's CFG is
expanded in the call graph.
– The data dependency graph of the program.
● How argument A in function B travels over function
C and affects var D of function E...
● IMO, the hardest task.
● Those task aren't easy at all.
– I'll explain some of them in the next slides...
7/04/13
36. Understanding the program
● The Call Graph of the program is needed.
– Why? To know every possible function path in
the program.
● To build it we can, simply:
– Visit every node in every function's AST.
– Save a list of all functions referenced from
every function visited.
● That's is. The easiest way.
– Is not complete... But is “good enough” to start.
7/04/13
37. Understanding the program
● Next thing needed: The CFG (Control Flow
Graph).
● What is this? Wikipedia to the rescue:
– “A control flow graph (CFG) in computer
science is a representation, using graph
notation, of all paths that might be traversed
through a program during its execution.“
7/04/13
38. Control Flow Graph
● A CFG for the
following code:
while b != 0
if a > b
a = a – b
else
b = b – a
return a
7/04/13
39. Understanding the program
● Let's say, no matter how, that our tool
“understands” the program:
– We know every possible path in the program.
– We know how a variable X in function Y travels
and is used in the complete program.
● The next step is to convert the code from
the AST of every basic block of the CFG to
another form easier for analysing code.
– Why?
7/04/13
40. The AST, again...
● We “could” write simple checkers with the
CFG and the AST of every instruction of
every basic block, but I do not recommend
it.
– An AST can be very complex even for not so
complex expressions.
– Example:
● signed int u = (float)x * y + func()
● VarDecl → Assignment → Cast → VarRef →
BinaryOperator → VarRef → BinaryOperator →
CallExpr.
7/04/13
41. Understanding the program
● It's needed something that makes the
analysis easier.
● Typical forms of code aimed to make
analysis easier:
– 3AC: Three Address Code.
– SSA: Static Single Assingment form.
● What are them?
7/04/13
42. Three Address Code
● Definition by Wikipedia:
– “In computer science, three-address code (often
abbreviated to TAC or 3AC) is a form of representing
intermediate code used by compilers to aid in the
implementation of code-improving transformations.
Each instruction in three-address code can be
described as a 4-tuple: (operator, operand1, operand2,
result).“
● Basically, we have every instruction represented in “more
instructions” but all of them will only have one operator, 2
operands at most and a result.
7/04/13
44. Static Single Assignment form
● What is SSA?
– “Static single assignment form (often abbreviated as
SSA form or simply SSA) is a property of an
intermediate representation (IR), which says that each
variable is assigned exactly once. Existing variables in
the original IR are split into versions, new variables
typically indicated by the original name with a subscript
in textbooks, so that every definition gets its own
version.”
● Pretty similar to 3AC but creating different versions of the
variables, instead of temporary ones.
– There are more differences, though...
7/04/13
45. Understanding the program
● In my opinion, it doesn't matter what form
do you use:
– Both are great enough for the task.
● We just need that:
– Every instruction does one and *only* one
action.
● No side effects.
– And every instruction have, as most, 2
operands, 1 operator and a result.
7/04/13
46. Writing checkers to find vulns
● A bug finding tool finds software defects in any part of
the source.
– The most code you check, the better.
● A vulnerability finding tool should not, in my opinion...
– Client side code: I'm not interested in stack overflows
reading configuration files that I cannot influence from
remote.
– Server side: I'm not interested in bugs related to parsing
configuration files, environment variables, etc...
7/04/13
47. Writing checkers to find vulns
● ...however, I may be interested on such bugs if
I'm auditing privileged local applications.
– For example: any suid tool, like sudo.
● In short:
– It will depend on the kind of application (or which
part of the application) we're auditing.
– It changes from application to application.
– The tool must interact with the auditor.
● Not the checker itself, but must know “where”.
7/04/13
48. Writing checkers to find vulns
● In a vulnerability finding tool we need to say to
the tool what areas we're interested on.
– Is this a remote application? Only focus on what
can be influenced from remote.
– Is this a local SUID binary? Focus on whatever area
the user can feed input to.
● So, what we need? First of all, a way to say to
the tool: this is the area I'm interested on.
– Interactivity with the auditor.
7/04/13
49. Writing checkers to find vulns
● One example with Evince, a document viewer.
● Running some prior versions of my tool a
curious bug was found:
7/04/13
50. Writing checkers to find vulns
● Big mistake as "n" comes from a font file and, instead
of using Min the developer used Max.
– So great. Bravo!
● However, we cannot forge a DVI file with an embeded
font (this code parses fonts) so, while an obvious bug,
unfortunately, it isn't a vulnerability.
● My tool wasted time finding non remotely exploitable
bugs. This is bad.
● Interactivity is needed.
7/04/13
51. Writing checkers to find vulns
● For this, the auditor needs to identify the program's entry
point(s).
– Example: Find vulnerabilities starting from function
"recv_data" in the call graph.
– “Oh, BTW, I only control arg1 and arg3, not arg2”.
● We need a way to say: Analyze all functions called from
this "data entry point".
– And not those completely uninteresting functions that
deals with parsing local fonts, environment variables,
etc... As with the Evince example.
7/04/13
52. Writing checkers to find vulns
● Also, we need a way to let the auditor determine what
an external function/function pointer does.
– Example: It reserves/frees memory, executes code, loads a
library, etc...
● If not, our tool will fail to find even the simplest bugs in
real world scenarios.
– In Infiltrate 2011, Halvar Flake (Thomas Dullien) showed a
bug that in his opinion cannot bet handled by today's static
analysis tools (because of machine states handling).
– I'll show you even easier examples of what cannot be
handled by any current static analysis tool.
7/04/13
56. Problems writing checkers
● There are 2 types of checkers:
intraprocedurals and interprocedurals.
● Intraprocedural ones only checks what
happens inside one function.
● Interprocedural ones checks what happens
when var A travels to function B and is
assigned to var C, and so on, and so on...
7/04/13
57. “Hello World” checker
● Writing a "hello world" like checker: finding
uninitiliazed variable usages (intraprocedural).
● Seems to be easy at first. Happens to be not so easy.
● Why?
– One of the many problems: Path explosion.
● Suppose we have a function F0 with 10 basic blocks
and 20 edges. Analyzing all possible paths is possible
in not so many time.
● Now let's see a “short of complex” function...
7/04/13
59. The Acrobat Reader function
● The number of possible paths in this function is so big
we cannot traverse all of them in an acceptable time.
– Probably, impossible.
● We have to find solutions. One of them is “Sensitive
analysis”.
– Flow-Sensitive, path-sensitive, context-sensitive.
– Simply, we need to make the number of paths we need
to traverse smaller.
● For this type of analysis to be possible we need to abstract
all predicates in the function (remember 3AC/SSA?).
7/04/13
60. Sensitive analysis
● How to do it? Just my opinion, one idea:
– Find in what basic blocks "local variables" are used and what
predicates depends on them.
● I'm not even talking at this point about interprocedural
analysis.
– Find the paths between the entry point, the basic blocks where the
local vars are used and the function's exit points.
– Then, remove all the other nodes to generate a smaller CFG. If
there are unconnected nodes add the basic blocks and relations
needed to connect them.
– Hopefully, we will have a shorter version of the CFG with only
what you need.
7/04/13
61. And even more problems...
● Suppose that we have, finally, our "hello world"
intraprocedural checker.
– Finally! My first one took me a lot...
● Now, we should make it interprocedural.
● Very often, a variable is declared in a function A,
travels over function B, C, ..., until it's used in function
Y.
● We need to control "the machine state".
– There is no “state” but “many possible states”.
7/04/13
62. Problems, problems, problems...
● Do you remember the path explossion
problem? Think about it in intraprocedural
analysis.
– Horrible.
● Think about it controlling “the state”.
– Terrible.
● Let's talk a bit more about the state...
7/04/13
63. Problems, problems, problems...
● How many possible machine states we may have?
– We cannot control all of them. Impossible.
– Possible paths depends on machine states so, again,
we cannot control all the possible paths.
– We may guess the limits and try partial solutions.
● Predicate's abstraction, opaque predicates, etc...,
and symbolic execution.
7/04/13
64. Symbolic execution
● During symbolic execution we try to find if a particular
state S0 is possible for function F0 (let's say we're only
talking about intraprocedural analysis).
● We can abstract the predicates, the computational
operations that affects them and generate phormulaes
to prove satisfiability using a SAT/SMT solver.
– Some people says it isn't the way to go... (i.e. Coverity).
– Others do use this way (Goanna, for example).
– I really don't know.
7/04/13
65. Fugue: Current state, future directions and goals
● Current state: far from finished.
● I don't really know when I'll finish it, if at all. Really.
– But... I would like to release “something” in one year.
● Anyway, even if finish it... I can't be sure it will find
awesome bugs.
– But it amazes me that even the most rudimentary
(current & past) versions of the tool, actually, finds real
bugs.
7/04/13