SlideShare une entreprise Scribd logo
1  sur  51
Télécharger pour lire hors ligne
POLITECNICO DI MILANO

              Facolt` di Ingegneria dell’Informazione
                    a

            Corso di Laurea in Ingegneria Informatica

            Dipartimento di Elettronica e Informazione




Detecting aliased stale pointers via static analysis: An architecture

 independent practical application of pointer analysis and graph

                theory to find bugs in binary code




Relatore: Prof. Stefano Zanero

Correlatore: Ing. Federico Maggi



                                               Tesi di Laurea di:

                            Giovanni Gola, matricola 717847

                           Vincenzo Iozzo, matricola 713583



                Anno Accademico 2009-2010
Acknowledgements

The authors would like to thank Thomas Dullien, Julien Vanegue, Ralf-Philipp

Weinmann and Tim Kornau for their suggestions and help while researching

the topic.

The authors would also like to thank the thesis advisors from Politecnico di

Milano Stefano Zanero and Federico Maggi.

Finally we want to thank all the people who have reviewed the paper.




                                     3
4
Contents


1 Introduction                                                                 9


2 Static Analysis                                                             13

  2.1   General knowledge . . . . . . . . . . . . . . . . . . . . . . . . . 13

  2.2   Pointer Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

  2.3   Conclusions and contributions . . . . . . . . . . . . . . . . . . . 19


3 Preprocessing stage                                                         21

  3.1   The REIL intermediate language . . . . . . . . . . . . . . . . . 21

  3.2   Single Static Assignment (SSA) Form . . . . . . . . . . . . . . . 25

        3.2.1   Graph theory overview . . . . . . . . . . . . . . . . . . . 25

        3.2.2   Computing SSA Form . . . . . . . . . . . . . . . . . . . 26


4 Analysis stage                                                              31

  4.1   Pointer analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

        4.1.1   Analysis features . . . . . . . . . . . . . . . . . . . . . . 31

        4.1.2   MonoREIL . . . . . . . . . . . . . . . . . . . . . . . . . 33

  4.2   Intraprocedural analysis . . . . . . . . . . . . . . . . . . . . . . 34

                                       5
CONTENTS


  4.3   Interprocedural analysis . . . . . . . . . . . . . . . . . . . . . . 37

  4.4   C++ peculiarities . . . . . . . . . . . . . . . . . . . . . . . . . . 39


5 Stale pointers detection                                                   45


6 Results and future work                                                    49




                                      6
List of Figures


 2.1   Data-flow equations . . . . . . . . . . . . . . . . . . . . . . . . . 15

 2.2   Pointer analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . 18


 3.1   List of REIL instructions . . . . . . . . . . . . . . . . . . . . . . 23

 3.2   REIL translation of a function . . . . . . . . . . . . . . . . . . . 24

 3.3   Non-local variable . . . . . . . . . . . . . . . . . . . . . . . . . . 27

 3.4   SSA Form of a REIL function . . . . . . . . . . . . . . . . . . . 30


 4.1   REIL Instructions transformations . . . . . . . . . . . . . . . . 40

 4.2   Transfer functions for common instructions . . . . . . . . . . . . 41

 4.3   Transfer functions for -nodes instructions . . . . . . . . . . . . 41

 4.4   Intraprocedural analysis example . . . . . . . . . . . . . . . . . 41

 4.5   PCG used in our algorithm      . . . . . . . . . . . . . . . . . . . . 42

 4.6   Computing f1() to f2() alias trees . . . . . . . . . . . . . . . . . 42

 4.7   Computing f1() to f3() alias trees . . . . . . . . . . . . . . . . . 43

 4.8   Computing f2() to f4() alias trees through the leftmost edge . . 43

 4.9   Computing f2() to f4() alias trees through the rightmost edge . . 43

 4.10 E↵ects of combine() on functions alias trees . . . . . . . . . . . 44

                                      7
LIST OF FIGURES


  5.1   Example of callgraph . . . . . . . . . . . . . . . . . . . . . . . . 45

  5.2   Callgraph with relevant functions in red . . . . . . . . . . . . . 46

  5.3   Pruned callgraph . . . . . . . . . . . . . . . . . . . . . . . . . . 46

  5.4   Pruned callgraph ready for bug detection step . . . . . . . . . . 46

  5.5   Alias verification equations . . . . . . . . . . . . . . . . . . . . . 47




                                      8
Chapter 1


Introduction


In the era of cloud computing and internet-connected devices, attacks to such

devices are becoming increasingly profitable. Nowadays web clients, such as

PCs, mobiles, UMPCs, tablets and netbooks, are indeed a precious source

of information. Although built using diverse architectures, including but not

limited to the x86, x86-64, ARM and PowerPC familes, they necessarily share

a base set of network dedicated software, first of all the web browsers. Ap-

plications like Firefox, Safari, Internet Explorer, Google Chrome and other

complementary software including JavaScript engines, Adobe Flash, Adobe

AIR, Adobe Reader, the Java Runtime Environment, are distributed pre-

installed in almost every OS. Due to the magnitude of the latter applications,

developers are bound to adopt self-made custom complex memory manage-

ment systems built on top of native memory allocators. Noticeable examples

are NS Alloc()/NS Free() in the XPCOM library, PR NEW/PR DELETE

in NSPR library, JS malloc()/JS free() in SpiderMonkey JavaScript engine,

                                      9
CHAPTER 1. Introduction


TCmalloc() in Webkit and V8 JavaScript engines.


   The complexity and heterogeneity of memory management in large code

bases results in numerous memory corruption vulnerablities, such as unintial-

ized memory, double frees and dangling pointers. The latter are an insidious

type of flaws, also known as ”use-after-free” or ”stale pointers”, which occur

when pointer variables reference to freed memory areas. A quick search on the

Common Vulnerabilities and Exposures (CVE) list yields 162 reports for web

browsers and more than 470 results among the three most-popular browser

add-ons, i.e., Adobe Flash, Adobe Reader and Java Runtime Environment

(between 2006 and 2010).


   The astonishing number of use-after-free bugs reported in recent years make

them one of the most appetible attack vector on client systems. Although

use-after-free bugs are scattered throughout the code base, finding them and

other memory corruption bugs is arguably more di cult than overflows in that

they are temporal memory errors and, as such, an e↵ective detection of them

means understanding how a custom memory management system internally

works, in order to identify logical pitfalls. The only e↵ective solution consists

in manual code-review e↵orts, in the attempt to recognize exceptions with

respect to the rules that developers were supposed to follow while writing code.

This process is clearly cumbersome. A vast variety of approaches has been

proposed to automatically spot memory flaws. In particular, an analyst can

actually face the problem of finding dangling pointers with dynamic analysis

techniques, such as fuzzing, or by the means of static analysis. Even though

                                       10
fuzzing has been proved to be an e↵ective method for finding such bugs, it

su↵ers from several intrinsic limitations. For example, relying on input to

exercise code paths has the disadvantage of scarce coverage of the application

code and limits the depth of exercised code paths, leaving part of the code

unexplored. Moreover, the randomic nature of the generated input makes the

running time virtually infinite, and totally unrelated to the code coverage the

analysis reaches over time. We propose a practical approach to automatically

find use-after-free conditions in large binary code bases using static analysis

and graph theory.




                                     11
CHAPTER 1. Introduction




                          12
Chapter 2


Static Analysis


2.1     General knowledge

Static analysis is the automated process of extracting semantic information

about a program without executing it. Not having the need of executing the

binary, static analysis o↵ers a number of potential advantages over its dynamic

counterpart:


   • Architecture independence: the analysis, even if it might be specific to an

      instruction set or language, can be implemented on top of any framework

      and run on any machine;


   • Running time: time complexity of a static analysis algorithm may range

      from linear to exponential, but it will process the entire binary in finite

      time;


   • Code coverage and depth of analysis: not relying on input to exercise

      code paths, static analysis can achieve total code coverage. This facts

                                      13
CHAPTER 2. Static Analysis


     makes it particularly useful in analyzing large applications with complex

     path triggering conditions.



Although the static analysis problem has been proven to be theoretically un-

decidable, it can be formally reformulated as an over-approximation of the ori-

ginal problem which can be proven to halt in finite time. In fact the term ”static

analysis” is indeed over-broad. It includes various implementation techniques

that make static analysis feasible. Model checking was the first technique to

appear in chronological order. It arose in an attempt to solve the so called

Concurrent Program Verification problem. A model checker checks for the

correctness of a formula expressed in Temporal Logic, being able to e↵ectively

uncover hard-to-find concurrency errors. As stated in Chapter 1, use-after-free

bugs are temporal memory errors and, as such, can be expressed in terms of

temporal logic: a program point in which a pointer gets dereferenced is reached

after a program point in which the same pointer gets freed. Model Checking

turns out to be very precise in unveiling temporal memory errors, but reveals

non-negligible faults. The major defect is the need for source code. The model

checker simply tries to resolve the system of pre- and post-conditions of each

function. In order to do that, the pre- and post-conditions has to be previ-

ously specified by annotating the source code of the program to analyze. The

process of annotating is also extremely time consuming. Moreover, in the gen-

eral case, the complexity of model checking algorithms is exponential in time.

Another very important method for static analysis is the so called Abstract

Interpretation. When running an algorithm based on Abstract Interpretation,

                                       14
2.1. General knowledge


                               outb = transb (inb )

                             inb = joinp2predb (outb )


                       Figure 2.1: Data-flow equations




the program semantics is over-approximated as a set of monotonic functions,

the algorithm uses to transform ordered sets, which are the results of the ana-

lysis. It can be viewed as a partial execution of the program which tracks only

part of information about its semantics, without performing all the calcula-

tions. The constraints of monotonicity and order assure the analysis to halt

in finite time. This is a well-known technique, mainly used in compilers, for

optimization tasks, and in debugging. Every analysis, taking advantage of the

abstract interpretation pattern for gathering information about the possible

set of values (of registers, variables, memory locations, etc.) calculated at a

given point in a program, belongs to the ”Data-flow” family of analyses. In

particular, a data-flow analysis algorithm usually walks the control flow graph

(CFG) of a program. The algorithm, at each program point (instruction, as-

signment, basic block and so forth), applies data-flow equations (see Figure

2.1) to the state (ordered set) associated to it. The analysis is repeated on

every node until the sets stabilize. As previously stated, the data-flow equa-

tions must be monotonic and the sets must carry an order relation, at least

partial. If the latter conditions hold, the repetition of the analysis will reach

the so called fixpoint and the algorithm can stop.

                                        15
CHAPTER 2. Static Analysis


2.2      Pointer Analysis

We will now focus on data-flow analysis of pointer values, precisely named

Pointer Analysis. As one may immediately notice, Pointer Analysis fits our

need of tracking values assigned to pointers to later check for use-after-free

conditions. There are several dimensions that a↵ect cost/precision trade-o↵s

of pointer analysis. How a pointer analysis addresses each of these dimensions

helps to categorize the analysis. The dimensions we are now going to consider

are:



   • Scope: a static analysis algorithm can either be engineered to perform

       the analysis within a single function only, or could o↵er the possibility

       to extend the analysis in order to cover multiple functions. The former

       goes under the name of intraprocedural pointer analysis, the latter is

       interprocedural anaysis;



   • Flow-sensitivity: a flow-sensitive analysis takes into account the order of

       statements in a program, and therefore it can compute a solution for each

       program point, whereas a flow-insensitive analysis computes a solution

       for either the whole program or for each procedure. The immediate

       consequence is a higher degree of precision for flow-sensitive approaches.

       On the other hand, flow-insensitivity shows much higher scalability in

       terms of both time and space, therefore proving to be a better choice for

       analyzing very large programs;

                                        16
2.2. Pointer Analysis


• Context-sensitivity: in context-sensitive analyses the calling context of a

  function is considered. This means parameters passed on di↵erent calls

  of the same function can be distinguished and properly returned to the

  actual caller. Context-sensitivity o↵ers a higher degree of precision and,

  if properly implemented, mildly impacts speed;


• Heap modeling: an accurate pointer analysis should rely on a representa-

  tion of the entire heap space. This is a non-trivial issue, and constitutes

  a static analysis branch by itself, going under the name of Shape Ana-

  lysis. Even though shape analysis is undergoing heavy research, it shows

  extremely limited scalability in analyzing real-life programs;


• Aggregate modeling: a very important factor a↵ecting the precision of

  pointer analysis is how elements of aggregates are distinguished. An

  extremely precise modeling, in which every single object can be dis-

  tiguished, could be achieved by running a full-blown shape analysis. As

  just stated shape analysis is not a feasible solution for our purposes. A

  very fast and imprecise model could collapse the elements of the aggreg-

  ate into one object. This would introduce excessive noise in the analysis

  and would lead to a situation in which no heap object is discernible from

  another;


• Alias representation: indicates whether alias pairs or points-to pairs are

  mantained during the analysis. Alias pairs represent alias relations ex-

  plicitely, whereas points-to data is a more compact representation.

                                   17
CHAPTER 2. Static Analysis


   A lot of reasearch has been done on Pointer Analysis in the last twentyfive

years. Nowadays moderate intraprocedural analyses are commonly implemen-

ted in almost every compiler, whareas interprocedural algorithms are still in

research stage. Figure 2.2 shows a summary of the interprocedural analyses




                         Figure 2.2: Pointer analyses




proposed so far. Each one having its own pros and cons, they all share a few

limitations that make them not suitable for our purposes. Their major fault

is the need for source code, which usually is, from an analyst point of view,

practically impossible to retrieve. Moreover, they are often built to analyze

source code translated to a sub-language of the original language. The lat-

ter limitation also a↵ects the few interprocedural analyses proposed to work

at the assembly level, like the one by Naeem et al [3]. The need to reduce

a real assembly language, with hundreds of instructions, to a really narrow

sub-language proves to be actually impossible in real scenarios.

                                      18
2.3. Conclusions and contributions


2.3      Conclusions and contributions

Intraprocedural analysis, in terms of e ciency and scalability, is reliable enough

to be implemented with minor modifications apt to make it able to deal with

more expressive assembly languages. On the other hand, interprocedural ana-

lysis at the assembly level is still in an alpha stage of development. Therefore

we propose a new tree-based context-sensitive interprocedural analysis target-

ing assembly languages.




                                       19
CHAPTER 2. Static Analysis




                             20
Chapter 3


Preprocessing stage


3.1     The REIL intermediate language

The Reverse Engineering Intermediate Language (REIL) [6] is a platform-

independent intermediate language which aims to simplify static code analysis

algorithms such as the gadget finding algorithm for return oriented program-

ming presented in this paper. It allows to abstract various specific assembly

languages to facilitate cross-platform analysis of disassembled binary code.

   REIL performs a simple one-to-many mapping of native CPU instructions

to sequences of simple atomic instructions. Memory access is explicit. Every

instruction has exactly one e↵ect on the program state. This contrasts sharply

to native assembly instruction sets where the exact behaviour of instructions

is often influenced by CPU flags or other pre-conditions.

   All instructions use a three-operand format. For instructions where some

of the three operands are not used, place-holder operands of a special type

                                      21
CHAPTER 3. Preprocessing stage


called " are used where necessary. Each of the 17 di↵erent REIL instruction

has exactly one mnemonic that specifies the e↵ects of an instruction on the

program state.


The REIL VM


To define the runtime semantics of the REIL language it is necessary to define

a virtual machine (REIL VM) that defines how REIL instructions behave when

interacting with memory or registers.

   The name of REIL registers follows the convention t-number, like t0, t1,

t2. The actual size of these registers is specified upon use, and not defined a

priori (In practice only register sizes between 1 byte and 16 bytes have been

used). Registers of the original CPU can be used interchangeably with REIL

registers.

   The REIL VM uses a flat memory model without alignment constraints.

The endianness of REIL memory accesses equals the endianness of memory

accesses of the source platform.


REIL instructions


REIL instructions can loosely be grouped into five di↵erent categories accord-

ing to the type of the instruction (See Table 3.1).

   Arithmetic and bitwise instructions take two input operands and one output

operand. Input operands either are integer literals or registers; the output

operand is a register. None of the operands have any size restrictions. However,

arithmetic and bitwise operations can impose a minimum output operand size

                                        22
3.1. The REIL intermediate language




Arithmetic instructions      Operation

ADD x1 , x2 , y              y = x1 + x2

SUB x1 , x2 , y              y = x1       x2

MUL x1 , x2 , y              y = x1 · x2
                                 j k
DIV x1 , x2 , y              y = x1x  2

MOD x1 , x2 , y              y = x1 mod x2
                                 8
                                 >
                                 > x · 2x2
                                 < 1             if x2   0
BSH x1 , x2 , y              y =
                                 > j x1 k
                                 >
                                 :               if x2 < 0
                                          2 x2

Bitwise instructions         Operation

AND x1 , x2 , y              y = x1 &x2

OR x1 , x2 , y               y = x1 | x2

XOR x1 , x2 , y              y = x1       x2

Logical instructions         Operation
                                 8
                                 >
                                 > 1 if x = 0
                                 <       1
BISZ x1 , ", y               y =
                                 >
                                 > 0 if x 6= 0
                                 :       1


JCC x1 , ", y                transfer control flow to y i↵ x1 6= 0

Data transfer instructions   Operation

LDM x1 , ", y                y = mem[x1 ]

STM x1 , ", y                mem[y ] = x1

STR x1 , ", y                y = x1

Other instructions           Operation

NOP ", ", "                  no operation

UNDEF ", ", y                undefined instruction

UNKN ", ", "                 unknown instruction


           Figure 3.1: List of REIL instructions




                             23
CHAPTER 3. Preprocessing stage


or a maximum output operand size relative to the sizes of the input operands.

   Note that certain native instructions such as FPU instructions and mul-

timedia instruction set extensions cannot be translated to REIL code yet.

Another limitation is that some instructions which are close to the underlying

hardware such as privileged instructions can not be translated to REIL; sim-

ilarly exceptions are not handled. All of these cases require an explicit and

accurate modelling of the respective hardware features.

   An example of function, translated from x86 assembly alnguage to REIL is

shown in Figure 3.2




                 Figure 3.2: REIL translation of a function




                                     24
3.2. Single Static Assignment (SSA) Form


3.2      Single Static Assignment (SSA) Form

3.2.1      Graph theory overview

The algorithm for building SSA Form relies on the dominator tree and dom-

inance frontiers in order to identify merge points.

The following notions are required to understand the algorithms for SSA trans-

lation and how bug detection works:


   • Dominance Relation: In a Control Flow Graph, a node D dominates a

      node N if every path from the start node to N must through D. Nota-

      tionally, this is equivalent to D dom N. By defition every node dominates

      itself;


   • Strict Dominance Relation: A node D strictly dominates a node N if D

      dominates N and D does not equal N;


   • Immediate Dominator : The immediate dominator or idom of a node N is

      the unique node that strictly dominates N but does not strictly dominate

      any other node that strictly dominates N. Not all nodes have immediate

      dominators;


   • Dominator Tree: The dominator tree of a graph is a tree where each

      node’s children are those nodes it immediately dominates;


   • Dominance Frontier : The dominance frontier of a node S is the set of all

      nodes N such that S dominates a predecessor of N but does not strictly

                                      25
CHAPTER 3. Preprocessing stage


     dominates N; More intuitively, it is the set of nodes where N’s dominance

     stops;


   • Iterated Dominance Frontier : Formally, it is the irreflexive closure of the

     dominance frontier relation. It is actually calculated as follows:
                    S
     Let DF (S) =       x2s   DF (x) be the dominance frontier of a set of nodes.

     The iterated dominance frontier is:



3.2.2    Computing SSA Form

Single Static Assignment (SSA) Form is an intermediate representation of a

function graph that is very frequently used in compiler optimization. SSA form

imposes a naming convention on the function variables such that each variable

name corresponds to the value produced at a single definition point. Another

advantage of SSA Form is the ability to identify merge points inside a function

flow graph and mark them with so called -functions.

In our prototype all the functions flow graphs inside a binary are translated

into SSA Form before proceeding with the analysis. There exists three known

types of SSA Form translation based on the e ciency of the algorithm and on

the number of -functions present in the resulting graph. We chose to imple-

ment a ”semi-pruned” SSA Form as a good trade-o↵ between precision and

performance.



   In order to reduce the number of -functions inside a flow graph the pruned

SSA Form employs liveness analysis to determine which variables are still alive

                                           26
3.2. Single Static Assignment (SSA) Form




                             Figure 3.3: Non-local variable


at a given merge point. To improve performances instead of liveness analysis

the semi-pruned SSA form introduces the concept of non-locals. A non-local

is a variable which has been used inside a basic block but it has been defined

elsewhere, that is a variable that first appeared in a di↵erent basic block (see

Figure 3.3). It must be noticed that the concept of non-local is an under-

approximation of a full blown liveness analysis, thus the semi-pruned form is

still subject to the presence of not strictly needed -functions. The algorithm

proposed by Briggs et al[2] is the following:




  non-locals        ;

  for each block B do

    killed      ;

    for each instruction z      x op y in B do

       if x 2 killed then
            /

             non-locals   non-locals [{x}

       end if

       if y 2 killed then
            /

             non-locals   non-locals [{y }


                                               27
CHAPTER 3. Preprocessing stage


        end if

        killed     killed [ {z}

    end for

  end for


   In our implementation the algorithm maintains three pre-computed data

structures: a list of addresses where to insert the -functions and two hashmaps

to keep track respectively of all the previously created variables and of the next

variable name to be assigned. The first data structure is created by calculating

the iterated dominance frontier of every live variable in the flow graph. The

rest of the algorithm works by recursively walking the dominator tree renaming

variables in the original graph so that when a new assignment or a -function

is found a variable with a new name is created and the results are propagated

to the children in the tree. The pseudo-code as adapted in our implementation

is the following:

  for each variable v do

    Let A(v ) be the set of blocks containing assignment to v

    Place a -function for v in the iterated dominance frontier of A(v )

  end for

  for each variable v do

    Counters[v ]       0

    Stack[v ]      ;

  end for

  Let start be the root node of the dominator tree, RENAME(start)

  RENAME(block):

  for each -function, v           (...) in block do

    i    Counters[v ]


                                                 28
3.2. Single Static Assignment (SSA) Form


  Replace v with vi in the new graph

  Stack[v].push(i)

  Counters[v ]       i +1

end for

for each instruction, v          x op y in block do

  i       Stack[x].first(), c      Stack[y ].first()

  Replace x with xi and y with yc in the new graph

  i       Counters[v ]

  Replace v with vi in the new graph

  Stack[v].push(i)

  Counters[v ]       i +1

end for

for each successor s of block do

  j       block variables position index, corresponding to the position of block in the parents

  array of s. This is just a convention

  for each -function p in s do

      v      j th operand of p

      Replace v with vi where i             Stack[x].first()

  end for

end for

for each child c of block in the dominator tree do

  RENAME(c)

end for

for each instruction v         x op y ||v        (...) in block do

  Stack[v ].pop()

end for


 An example of REIL code translated in SSA Form is shown in Figure 3.4



                                                   29
CHAPTER 3. Preprocessing stage




               Figure 3.4: SSA Form of a REIL function




                                 30
Chapter 4


Analysis stage


4.1     Pointer analysis

In our work we implemented both intraprocedural and interprocedural pointer

analysis in order to track objects aliases and thus being able to reason about

possible dangling pointer conditions. The intraprocedural analysis is performed

on the top of MonoREIL[6], an abstract interpretation framework based on

REIL. In the following two subsections we are going to briefly describe the

main features of our analysis, and MonoREIL. Later on we will focus on the

intraprocedural pointer analysis algorithm. In the third section the interpro-

cedural algorithm will be explained.




4.1.1    Analysis features

Dataflow analysis and abstract interpretation algorithms have a number of

properties that characterize them. Among those the three most relevant are

                                       31
CHAPTER 4. Analysis stage


flow, context and path sensitivity. An algorithm is said to be path-sensitive

if it computes di↵erent piece of analysis information depending on predicates

at conditional branch instructions. The intraprocedural algorithm used in our

work merges results of the analysis at the function merge-points, this e↵ect-

ively results in a path-insensitive algorithm. In fact we are not able to discern

code-paths that lead to the presence of a given alias.

Moreover our algorithm is flow-insensitive, in fact during the analysis we do

not track code locations. That is, the analysis will not be able to say after

which statement a given variable became an alias of another one.

The main problem deriving from the path and flow insensitivity of our al-

gorithm is the increased number of false positives that can appear in our ana-

lysis. In fact we are not able to gauge whether a specific path yielding to a

stale pointer condition is feasible. Nonetheless the performance gain obtained

by this implementation of the algorithm are significantly more beneficial than

the increase in the number of false positives.

Moreover a number of empirical studies [9] [10] [11] [12] have shown that the

improvement o↵ered by flow-sensitivity is minimal in terms of precision.

Our interprocedural algorithm works by merging trees generated in each func-

tion, therefore the flow-insensitivity of the intraprocedural analysis and the

nature of the merging we perform make it both flow and path insensitive. The

same considerations done for the intraprocedural analysis on performance gain

and precision loss apply to the interprocedural part of our analysis.

The algorithm performs the analysis on the procedural call graph (PCG) of the



                                       32
4.1. Pointer analysis


binary. The PCG allows to discern function parameters and calling locations,

that is every edge in the PCG is marked with the parameters passed to a given

function. This property guarantees that our analysis is context-sensitive.

Context-sensitivity is crucial, in fact the ability to discern function parameters

of each call prevents ambiguity and imprecision in tracking aliases between

functions.

Another problem of pointer analysis is dealing with data structures which can

make it di cult to track aliases. In order to deal with this nuance we resorted

to two strategies.

The first one consists of tracking the size of objects whenever possible, that is

when there is no need to perform range analysis, this way we are able to recog-

nize whether a given heap location belongs to a specific object and therefore

we are able to properly track aliases for it.

The second strategy is to model widely used data structures such as linked

lists, vectors and other similar ones in order to be able to track objects stored

in them. It must be noticed that not all data structures are covered, therefore

some aliases may be missed by our analysis.

The two latter strategies allow us to completely avoid heap modeling thus

greatly simplifying the analysis.




4.1.2     MonoREIL

MonoREIL is an abstract interpretation framework that performs fixed-point

iteration until a final state is reached. MonoREIL operates on the control flow

                                       33
CHAPTER 4. Analysis stage


graph of a function that can be walked arbitrary depending on the analysis

that is intended to be performed. The definitions of a lattice, its elements and

a formula that can combine the elements are necessary for the framework to

work. Every analysis is supposed to start with an initial state that can be

arbitrary. Finally the e↵ects of REIL instructions on the lattice need to be

modelled.

To guarantee the termination of the analysis the lattice has to satisfy the

ascending chain condition, that is the lattice has to be a noetherian lattice. In

fact if the condition is violated it is not possible to guarantee that there exists

two states in the analysis, p n   1
                                      and p n , such that p n   1
                                                                    = pn .

In the following section we show that our analysis satisfies the requirement and

therefore is always guaranteed to terminate.




4.2      Intraprocedural analysis

Alias set analysis is a well-known variation of pointer analysis which grants an

higher degree of precision and at the same time avoids performance bottlenecks.

Intuitively an alias set is the set of all local pointer variables that point to a

given object. The strength of the analysis lies in the fact that whenever there

is some degree of uncertainty about whether a given variable x points to a

concrete object, instead of creating a may or must-point-to set, it creates two

alias sets, only one of which contains x.

Our analysis computes the alias sets for each function in the binary so that they

can be later combined in order to reason on the existence of dangling pointers

                                            34
4.2. Intraprocedural analysis


by propagating alive aliases between functions in the binary call graph.

We have adapted the algorithm proposed in [3] to fit our purposes and scope.

It can be proved that our analysis reaches the fixed point because our transfer

functions are distributive. In fact the fixed point computed for the alias set

dataflow algorithm corresponds to the merge-over-all-paths dataflow value of

our algorithm [7].

In order to analyze the functions we have to further simplify our intermediate

language so that it can be expressed by the means of a very simple grammar:


  s ::= v1      v2 |v   h|h   v |v   null|v   new


Where h represents any heap location, null represents a null pointer and new

represents a newly created object.

To simplify REIL code so that it can be expressed with the above grammar we

created transformation functions for every REIL instruction in our MonoREIL

algorithm. Table 4.1 shows the appropriate transformations we apply to REIL

instructions.

  It must be noticed that we consider an object to be newly created only

when it is the return value of either a constructor or an allocation functions.

Both constructors and allocation functions although partially recognized in an

automated fashion by our software need to be manually indicated by the user.

We treat code blocks di↵erently depending on whether we are dealing with a

simple assignment or a -function. In the former case we first merge all the

influencing states, performing an union on the sets, for a given node and then

we apply the equations shown in Figure 4.2 .

                                       35
CHAPTER 4. Analysis stage


   We create a new alias set for every newly allocated object, we then store in

the appropriate alias set all the variables that alias one of the objects aliases

and finally whenever an heap location, not previously known, is found we create

two alias sets one with the location and another one without it.

In the latter case instead we can easily assume that -functions are to be found

at merge points in the control flow graph of the function, that is when we need

to combine one or more incoming states in our lattice. In our analysis the

lattice is the set of all alias sets. Figure 4.3 shows the combine function for

merge points.



   Each state is first pruned, that is we remove all the aliases that do not

exist in the set of variables of the node strict dominators. Once the alias set

has been pruned it is then updated by adding all destination variables whose

values are being assigned from variables already in the alias set.

We defined the elements of the lattice so that each element is a set of linked

lists. To each lattice element corresponds an object to which the variables in

the linked list alias to.

The reason for choosing set of linked lists over other data structure is the

performance gain. In fact it can be proved that the analysis carried on an SSA

form graph allows to perform operations only on the head of the list thus saving

look-up time. Nonetheless for further optimization, when the analysis for a

given function is complete we transform each alias set in a tree-like structure

which makes it easier to perform the interprocedural analysis we will discuss

in the following section.

                                       36
4.3. Interprocedural analysis


A sample run of the algorithm can be seen in Figure 4.4




4.3      Interprocedural analysis

At the end of the intraprocedural alias analysis, the resulting alias lists of each

function are used to construct points-to tree structures that make the alias

relationships between variables explicit. In such a points-to tree, each node

represents a distinct variable and its children the variables pointing to it, so

that siblings are equivalent aliases.

For each function we extract its parameters and its return. Given that in-

formation, the interprocedural analysis algorithm performs a walkdown on the

procedure call graph, updating a set of points-to trees for the object that needs

to be tracked, until the final state of the analysis is reached. We propose an

implementation of our algorithm on the top of BinNavi.

The interprocedural analysis, as opposed to the intraprocedural one, is run on

a Procedure Call Graph (PCG). A PCG is identical to a call graph, with the

exception that it has an edge for each call site, and every edge is labelled with

the variables of the source node that act like parameters in the target node

(see Figure 4.5).



   At each iteration, the algorithm properly connects the points-to trees con-

taining the incoming parameters to the points-to trees of the previous iteration

on the graph. These are updated by connecting the trees containing the formal

parameters of the current function to the nodes corresponding to the incoming

                                        37
CHAPTER 4. Analysis stage


parameters, as indicated in the edge label. If the node corresponding to the

formal parameter is the root node of a points-to tree, than the tree is appen-

ded to the node corresponding to the incoming parameter and the result is

added to the newly generated set of trees. Alternatively, if it is not a root

node, the points-to tree containing it is added to the new state set, with the

node replaced by the incoming parameter. The points-to tree containing the

incoming parameter is also copied into the new set of trees and the sub-tree

whose root node is the node containing the incoming parameter is detached

from that copy of the tree (see Figures 4.6, Figure 4.7, Figure 4.8 and Figure

4.9).   When a merge point is met, the resulting sets of trees are computed

separately, one for each incoming edge, and a set union is performed, so that

duplicate trees are removed. Moreover, sub-trees of trees contained in the set

are also removed. Additional trees can be removed from the set in order to

furtherly reduce space requirements (see Figure 4.10).

 Considering that the algorithm will not walk more than once on a node not in-

volved in a cycle, it is safe to remove trees not containing aliases to the tracked

object from the set of points-to trees of that node. Similarly, the points-to

trees of the previous step are extended with the points-to trees containing the

returned variable.




   Once the fixed point iteration has been reached, the resulting set of trees

is an interprocedural set of points-to trees containing all of the aliases of the

object you want to track.

                                        38
4.4. C++ peculiarities


4.4     C++ peculiarities

In dealing with C++ we had to take into account a certain numbers of charac-

teristics linked to the language. One of the problems of complex applications

written in C++ is often the use of smart pointers interfaces. That is, C++

classes used for providing memory safety in terms of objects lifetime. In or-

der to deal with smart pointers we require the user to specify which functions

shall be considered the constructor and destructor of the object intended to

be analyzed and whether there are multiple constructors or destructors for an

object. In order to improve the precision of our analysis we used well-known

techniques explained in [8] to identify constructors and destructors of objects

in the binary whenever it is possible. These requirements are necessary to keep

the analysis as application independent as possible without constraining our

work to one specific kind of smart pointers or memory management architec-

ture.

User interaction is also needed to handle custom allocators, that is the user

is asked to specify whether or not the allocation and deallocation functions

identified by our tool are the correct ones.




                                      39
CHAPTER 4. Analysis stage




  Arithmetic instructions         Operation

  ADD x1 , x2 , y                 y is added to the alias set of x1 + x2

  SUB x1 , x2 , y                 y is added to the alias set of x1    x2

  MUL x1 , x2 , y                 y is added to the alias set of x1 · x2
                                                                 j k
  DIV x1 , x2 , y                 y is added to the alias set of x1x2

  MOD x1 , x2 , y                 y is added to the alias set of x1 mod x2
                                                                 8
                                                                 >
                                                                 > x · 2x2
                                                                 < 1         if x2   0
  BSH x1 , x2 , y                 y is added to the alias set of
                                                                 > ⌅ x1 ⇧
                                                                 >
                                                                 :
                                                                      2x2    if x2 < 0

  Bitwise instructions            Operation

  AND x1 , x2 , y                 y is added to the alias set of x1 &x2

  OR x1 , x2 , y                  y is added to the alias set of x1 | x2

  XOR x1 , x2 , y                 y is added to the alias set of x1    x2

  Logical instructions            Operation

  BISZ x1 , ", y                  y is removed from all alias sets

  JCC x1 , ", y                   does not a↵ect alias sets

  Data transfer instructions      Operation

  LDM x1 , ", y                   y is added to the alias set of mem[x1 ]

  STM x1 , ", y                   mem[y ] is added to the alias set of x1

  STR x1 , ", y                   y is added to the alias set of x1

  Other instructions              Operation

  NOP ", ", "                     does not a↵ect alias sets

  UNDEF ", ", y                   y is removed from all alias sets

  UNKN ", ", "                    does not a↵ect alias sets

                    Figure 4.1: REIL Instructions transformations




                                           40
4.4. C++ peculiarities



                             8
                             >
                             > {{v }} if s = v
                             <                      new
             [[s]]gen ,
                         >
                         > ;
                         :           otherwise
                           8
                           >
                           > {a [ {v }} if s = v
                           >
                           >         1              1  v2 ^ v2 2 a
                           >
                           >
                           <
             [[s]]a (a) ,
                           > {a, a [ {v }} if s = v
                           >
                                                       h
                           >
                           >
                           >
                           > {a}
                           :                 otherwise
                                    [
             [[s]]l (l) , [[s]]gen [ [[s]]a (a)
                                         a2l




        Figure 4.2: Transfer functions for common instructions



[[ ]]a (a, pred) , {(a  vars(sdom( ))) [ (yi : yi        xi 2 livevars( , pred) ^ x1 2 a)}
                     [
[[ ]]l (l, pred) ,         [[s]]a (a, pred)
                     a2l




         Figure 4.3: Transfer functions for -nodes instructions




               Figure 4.4: Intraprocedural analysis example

                                               41
CHAPTER 4. Analysis stage




                Figure 4.5: PCG used in our algorithm




             Figure 4.6: Computing f1() to f2() alias trees


                                  42
4.4. C++ peculiarities




              Figure 4.7: Computing f1() to f3() alias trees




Figure 4.8: Computing f2() to f4() alias trees through the leftmost edge




Figure 4.9: Computing f2() to f4() alias trees through the rightmost edge




                                   43
CHAPTER 4. Analysis stage




        Figure 4.10: E↵ects of combine() on functions alias trees




                                   44
Chapter 5


Stale pointers detection

Detecting use after free conditions means to verify whether there are any code

paths in which an object alias is used after the object itself was freed. In order

to reason on this condition we first prune the control flow graph (CFG, see

Figure 5.1) of the binary, so that only functions that use aliases of the object

we are interested in or are linked to a function using an alias are preserved.

This can be trivially done by walking the call flow graph and eliminating all

the functions that are neither successors nor predecessors of the procedures

when an object alias appears (see Figure 5.2 and Figure 5.3).




                       Figure 5.1: Example of callgraph

                                       45
CHAPTER 5. Stale pointers detection




             Figure 5.2: Callgraph with relevant functions in red




                        Figure 5.3: Pruned callgraph


   Finally we simply mark calls to the destructors on the pruned callgraph

(see Figure 5.4).

 The rest of the algorithm walks cross references to the object destructors

backwards, that is it computes all functions that at some program point inval-

idate the concrete object. We call them functions aliases.




          Figure 5.4: Pruned callgraph ready for bug detection step

                                      46
For each function that calls a destructor alias we verify whether the concrete

object itself or one of its aliases are used. To do so we build the dominator

tree of the function flow graph and verify the conditions shown in Figure 5.5.

We assume the following notation: B is a generic basic block, F is the basic

block that either calls the destructor or destroys the concrete object, v is an

object alias. dom(B) denotes the basic blocks dominated by node B. v 2 B

is the relation that represents the use of variable v in basic block B. Finally

succ(B) are successors of node B.


 Type of warning                                  Condition

 v is a stale pointer                             if v 2 B ^ B 2 dom(F )

 v may be a stale pointer                         if v 2 B ^ B 2 dom(F ) ^ B 2 succ(F )
                                                               /

 v might be a memory leak                         if v 2 B ^ F 2 dom(B) ^ F 2 succ(B)
                                                               /

 v is a memory leak                               if v 2 B ^ F 2 dom(B) ^ F 2 succ(B)
                                                               /            /

 v is neither a stale pointer nor a memory leak   otherwise



                    Figure 5.5: Alias verification equations




                                      47
CHAPTER 5. Stale pointers detection




                                  48
Chapter 6


Results and future work


In this paper we have targeted a widely known cause of security flaws. We

have shown that it is feasible to collect enough data in terms of alias sets on a

C++ binary to discover stale pointers bugs at interprocedural level.

We have implemented our work on the top of BinNavi using REIL as the

intermediate language and MonoREIL as the monotone solver framework for

our algorithms. Our approach of only verifying one type of object per time

allowed us to drastically reduce the execution time and the number of false

positives to analyze, nonetheless we do realize that this approach is suboptimal

for scenarios in which a developer has to fix bugs in his software because in

that case it would be necessary to run the analysis multiple times.

From our test results, on a set of samples we built, it is clear that the prime

cause of false positives is the lack of flow-sensitiveness of our analysis. One

of the primary goal of future work in this direction is to use a SMT solver in

order to verify path feasibility.

                                       49
CHAPTER 6. Results and future work


The principal source of false negatives in our analysis is the heavy presence

of function pointers in C++ code and complex data structures, in those cases

we were not able to obtain enough information either on the alias sets or

on the relationships between functions. Some techniques exist to deal with

these problems, we did not implement them because the results are far from

satisfying and could dramatically increase the number of false positives in our

analysis.

Finally we plan on augmenting our analysis by increasing the number of data

structures handled by our algorithm and by doing range analysis in order to

trace an higher number of aliases.




                                      50
Bibliography

[1] Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and

   F. Kenneth Zadeck: ”E ciently computing static single assignment form

   and the control dependence graph.” ACM Transactions on Programming

   Languages and Systems, 13(4):451-490, Oct 1991


[2] Preston Briggs, Keith D. Cooper, Timothy J. Harvey, and L. Taylor

   Simpson: ”Practical improvements to the construction and destruction

   of static single assignment form.” Software-Practice and Experience,

   28(8):859-881, Jul 1998.


[3] Nomair A. Naeem, and Ondrej Lhotak: ”E cient Alias Set Analysis Using

   SSA Form.”International Symposium on Memory Management - ISMM ,

   pp. 79-88, 2009


[4] Xiaodong Ma, Ji Wang, and Wei Dong: ”Computing Must and May Alias

   to Detect Null Pointer Dereference.”Leveraging Applications of Formal

   Methods - ISOLA , pp. 252-261, 2008


[5] Sean Heelan: ”Finding use-after-free bugs with static analysis”

                                    51
BIBLIOGRAPHY


 [6] Thomas Dullien, and Sebastian Porst: ”REIL: A platform-independent

    intermediate representation of disassembled code for static code analysis.”

    CanSecWest 2009


 [7] J. B. Kam and J. D. Ullman: ”Monotone data flow analysis frameworks.”

    Acta Inf., 1977.


 [8] Paul Vincent Sabanal, and Mark Vincent Yason: ”Reversing C++.” Black

    Hat DC 2007


 [9] Michael Hind: ”Pointer Analysis: Haven’t We Solved The Problem Yet?”

    ACM Transactions on Programming Languages and Systems, June 2001


[10] M. Hind, M. Burke, P. Carini and J.-D. Choi: ”Interprocedural pointer

    alias analysis” ACM Transactions on Programming Languages and Sys-

    tems, Apr. 1993


[11] M. Hind and A. Pioli: ”Which Pointer Analysis Should I Use?” Interna-

    tional Symposium on Software Testing and Analysis, Aug. 2000


[12] M. Hind and A. Pioli: ”Evaluating The E↵ectiveness of Pointer Alias

    Analysis”, Science of Computer Programming, Jan. 2001




                                     52

Contenu connexe

Similaire à Stale pointers are the new black - white paper

Security in mobile banking apps
Security in mobile banking appsSecurity in mobile banking apps
Security in mobile banking appsAlexandre Teyar
 
A Probabilistic Pointer Analysis For Speculative Optimizations
A Probabilistic Pointer Analysis For Speculative OptimizationsA Probabilistic Pointer Analysis For Speculative Optimizations
A Probabilistic Pointer Analysis For Speculative OptimizationsJeff Brooks
 
Work Measurement Application - Ghent Internship Report - Adel Belasker
Work Measurement Application - Ghent Internship Report - Adel BelaskerWork Measurement Application - Ghent Internship Report - Adel Belasker
Work Measurement Application - Ghent Internship Report - Adel BelaskerAdel Belasker
 
Preventing Illicit Information Flow in Networked Computer Games Using Securit...
Preventing Illicit Information Flow in Networked Computer Games Using Securit...Preventing Illicit Information Flow in Networked Computer Games Using Securit...
Preventing Illicit Information Flow in Networked Computer Games Using Securit...Jonas Rabbe
 
Master_Thesis_Jiaqi_Liu
Master_Thesis_Jiaqi_LiuMaster_Thesis_Jiaqi_Liu
Master_Thesis_Jiaqi_LiuJiaqi Liu
 
robert-kovacsics-part-ii-dissertation
robert-kovacsics-part-ii-dissertationrobert-kovacsics-part-ii-dissertation
robert-kovacsics-part-ii-dissertationRobert Kovacsics
 
Agentless Monitoring with AdRem Software's NetCrunch 7
Agentless Monitoring with AdRem Software's NetCrunch 7Agentless Monitoring with AdRem Software's NetCrunch 7
Agentless Monitoring with AdRem Software's NetCrunch 7Hamza Lazaar
 
Design and implementation of a Virtual Reality application for Computational ...
Design and implementation of a Virtual Reality application for Computational ...Design and implementation of a Virtual Reality application for Computational ...
Design and implementation of a Virtual Reality application for Computational ...Lorenzo D'Eri
 
bonino_thesis_final
bonino_thesis_finalbonino_thesis_final
bonino_thesis_finalDario Bonino
 
Im-ception - An exploration into facial PAD through the use of fine tuning de...
Im-ception - An exploration into facial PAD through the use of fine tuning de...Im-ception - An exploration into facial PAD through the use of fine tuning de...
Im-ception - An exploration into facial PAD through the use of fine tuning de...Cooper Wakefield
 

Similaire à Stale pointers are the new black - white paper (20)

Thesis
ThesisThesis
Thesis
 
3 g m gw
3 g m gw3 g m gw
3 g m gw
 
Security in mobile banking apps
Security in mobile banking appsSecurity in mobile banking apps
Security in mobile banking apps
 
A Probabilistic Pointer Analysis For Speculative Optimizations
A Probabilistic Pointer Analysis For Speculative OptimizationsA Probabilistic Pointer Analysis For Speculative Optimizations
A Probabilistic Pointer Analysis For Speculative Optimizations
 
Work Measurement Application - Ghent Internship Report - Adel Belasker
Work Measurement Application - Ghent Internship Report - Adel BelaskerWork Measurement Application - Ghent Internship Report - Adel Belasker
Work Measurement Application - Ghent Internship Report - Adel Belasker
 
Preventing Illicit Information Flow in Networked Computer Games Using Securit...
Preventing Illicit Information Flow in Networked Computer Games Using Securit...Preventing Illicit Information Flow in Networked Computer Games Using Securit...
Preventing Illicit Information Flow in Networked Computer Games Using Securit...
 
Master_Thesis_Jiaqi_Liu
Master_Thesis_Jiaqi_LiuMaster_Thesis_Jiaqi_Liu
Master_Thesis_Jiaqi_Liu
 
22024582
2202458222024582
22024582
 
E.M._Poot
E.M._PootE.M._Poot
E.M._Poot
 
Tutorial for EDA Tools:
Tutorial for EDA Tools:Tutorial for EDA Tools:
Tutorial for EDA Tools:
 
Tutorial for EDA Tools
Tutorial for EDA ToolsTutorial for EDA Tools
Tutorial for EDA Tools
 
robert-kovacsics-part-ii-dissertation
robert-kovacsics-part-ii-dissertationrobert-kovacsics-part-ii-dissertation
robert-kovacsics-part-ii-dissertation
 
Agentless Monitoring with AdRem Software's NetCrunch 7
Agentless Monitoring with AdRem Software's NetCrunch 7Agentless Monitoring with AdRem Software's NetCrunch 7
Agentless Monitoring with AdRem Software's NetCrunch 7
 
MSc_Thesis
MSc_ThesisMSc_Thesis
MSc_Thesis
 
Agathos-PHD-uoi-2016
Agathos-PHD-uoi-2016Agathos-PHD-uoi-2016
Agathos-PHD-uoi-2016
 
Agathos-PHD-uoi-2016
Agathos-PHD-uoi-2016Agathos-PHD-uoi-2016
Agathos-PHD-uoi-2016
 
Design and implementation of a Virtual Reality application for Computational ...
Design and implementation of a Virtual Reality application for Computational ...Design and implementation of a Virtual Reality application for Computational ...
Design and implementation of a Virtual Reality application for Computational ...
 
bonino_thesis_final
bonino_thesis_finalbonino_thesis_final
bonino_thesis_final
 
Srs
SrsSrs
Srs
 
Im-ception - An exploration into facial PAD through the use of fine tuning de...
Im-ception - An exploration into facial PAD through the use of fine tuning de...Im-ception - An exploration into facial PAD through the use of fine tuning de...
Im-ception - An exploration into facial PAD through the use of fine tuning de...
 

Plus de Vincenzo Iozzo

A tale of mobile threats
A tale of mobile threatsA tale of mobile threats
A tale of mobile threatsVincenzo Iozzo
 
Stale pointers are the new black
Stale pointers are the new blackStale pointers are the new black
Stale pointers are the new blackVincenzo Iozzo
 
Everybody be cool, this is a ROPpery - White paper
Everybody be cool, this is a ROPpery - White paperEverybody be cool, this is a ROPpery - White paper
Everybody be cool, this is a ROPpery - White paperVincenzo Iozzo
 
Everybody be cool, this is a ROPpery
Everybody be cool, this is a ROPperyEverybody be cool, this is a ROPpery
Everybody be cool, this is a ROPperyVincenzo Iozzo
 
Fun and Games with Mac OS X and iPhone Payloads White Paper, Black Hat EU 2009
Fun and Games with Mac OS X and iPhone Payloads White Paper, Black Hat EU 2009Fun and Games with Mac OS X and iPhone Payloads White Paper, Black Hat EU 2009
Fun and Games with Mac OS X and iPhone Payloads White Paper, Black Hat EU 2009Vincenzo Iozzo
 
Let your Mach-O fly, Black Hat DC 2009
Let your Mach-O fly, Black Hat DC 2009Let your Mach-O fly, Black Hat DC 2009
Let your Mach-O fly, Black Hat DC 2009Vincenzo Iozzo
 
Let Your Mach-O Fly, Black Hat DC 2009
Let Your Mach-O Fly, Black Hat DC 2009Let Your Mach-O Fly, Black Hat DC 2009
Let Your Mach-O Fly, Black Hat DC 2009Vincenzo Iozzo
 
Fun and Games with Mac OS X and iPhone Payloads, Black Hat Europe 2009
Fun and Games with Mac OS X and iPhone Payloads, Black Hat Europe 2009Fun and Games with Mac OS X and iPhone Payloads, Black Hat Europe 2009
Fun and Games with Mac OS X and iPhone Payloads, Black Hat Europe 2009Vincenzo Iozzo
 
Post exploitation techniques on OSX and Iphone, EuSecWest 2009
Post exploitation techniques on OSX and Iphone, EuSecWest 2009Post exploitation techniques on OSX and Iphone, EuSecWest 2009
Post exploitation techniques on OSX and Iphone, EuSecWest 2009Vincenzo Iozzo
 
Post Exploitation Bliss: Loading Meterpreter on a Factory iPhone, Black Hat U...
Post Exploitation Bliss: Loading Meterpreter on a Factory iPhone, Black Hat U...Post Exploitation Bliss: Loading Meterpreter on a Factory iPhone, Black Hat U...
Post Exploitation Bliss: Loading Meterpreter on a Factory iPhone, Black Hat U...Vincenzo Iozzo
 

Plus de Vincenzo Iozzo (10)

A tale of mobile threats
A tale of mobile threatsA tale of mobile threats
A tale of mobile threats
 
Stale pointers are the new black
Stale pointers are the new blackStale pointers are the new black
Stale pointers are the new black
 
Everybody be cool, this is a ROPpery - White paper
Everybody be cool, this is a ROPpery - White paperEverybody be cool, this is a ROPpery - White paper
Everybody be cool, this is a ROPpery - White paper
 
Everybody be cool, this is a ROPpery
Everybody be cool, this is a ROPperyEverybody be cool, this is a ROPpery
Everybody be cool, this is a ROPpery
 
Fun and Games with Mac OS X and iPhone Payloads White Paper, Black Hat EU 2009
Fun and Games with Mac OS X and iPhone Payloads White Paper, Black Hat EU 2009Fun and Games with Mac OS X and iPhone Payloads White Paper, Black Hat EU 2009
Fun and Games with Mac OS X and iPhone Payloads White Paper, Black Hat EU 2009
 
Let your Mach-O fly, Black Hat DC 2009
Let your Mach-O fly, Black Hat DC 2009Let your Mach-O fly, Black Hat DC 2009
Let your Mach-O fly, Black Hat DC 2009
 
Let Your Mach-O Fly, Black Hat DC 2009
Let Your Mach-O Fly, Black Hat DC 2009Let Your Mach-O Fly, Black Hat DC 2009
Let Your Mach-O Fly, Black Hat DC 2009
 
Fun and Games with Mac OS X and iPhone Payloads, Black Hat Europe 2009
Fun and Games with Mac OS X and iPhone Payloads, Black Hat Europe 2009Fun and Games with Mac OS X and iPhone Payloads, Black Hat Europe 2009
Fun and Games with Mac OS X and iPhone Payloads, Black Hat Europe 2009
 
Post exploitation techniques on OSX and Iphone, EuSecWest 2009
Post exploitation techniques on OSX and Iphone, EuSecWest 2009Post exploitation techniques on OSX and Iphone, EuSecWest 2009
Post exploitation techniques on OSX and Iphone, EuSecWest 2009
 
Post Exploitation Bliss: Loading Meterpreter on a Factory iPhone, Black Hat U...
Post Exploitation Bliss: Loading Meterpreter on a Factory iPhone, Black Hat U...Post Exploitation Bliss: Loading Meterpreter on a Factory iPhone, Black Hat U...
Post Exploitation Bliss: Loading Meterpreter on a Factory iPhone, Black Hat U...
 

Dernier

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 

Dernier (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 

Stale pointers are the new black - white paper

  • 1. POLITECNICO DI MILANO Facolt` di Ingegneria dell’Informazione a Corso di Laurea in Ingegneria Informatica Dipartimento di Elettronica e Informazione Detecting aliased stale pointers via static analysis: An architecture independent practical application of pointer analysis and graph theory to find bugs in binary code Relatore: Prof. Stefano Zanero Correlatore: Ing. Federico Maggi Tesi di Laurea di: Giovanni Gola, matricola 717847 Vincenzo Iozzo, matricola 713583 Anno Accademico 2009-2010
  • 2. Acknowledgements The authors would like to thank Thomas Dullien, Julien Vanegue, Ralf-Philipp Weinmann and Tim Kornau for their suggestions and help while researching the topic. The authors would also like to thank the thesis advisors from Politecnico di Milano Stefano Zanero and Federico Maggi. Finally we want to thank all the people who have reviewed the paper. 3
  • 3. 4
  • 4. Contents 1 Introduction 9 2 Static Analysis 13 2.1 General knowledge . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2 Pointer Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3 Conclusions and contributions . . . . . . . . . . . . . . . . . . . 19 3 Preprocessing stage 21 3.1 The REIL intermediate language . . . . . . . . . . . . . . . . . 21 3.2 Single Static Assignment (SSA) Form . . . . . . . . . . . . . . . 25 3.2.1 Graph theory overview . . . . . . . . . . . . . . . . . . . 25 3.2.2 Computing SSA Form . . . . . . . . . . . . . . . . . . . 26 4 Analysis stage 31 4.1 Pointer analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.1.1 Analysis features . . . . . . . . . . . . . . . . . . . . . . 31 4.1.2 MonoREIL . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.2 Intraprocedural analysis . . . . . . . . . . . . . . . . . . . . . . 34 5
  • 5. CONTENTS 4.3 Interprocedural analysis . . . . . . . . . . . . . . . . . . . . . . 37 4.4 C++ peculiarities . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5 Stale pointers detection 45 6 Results and future work 49 6
  • 6. List of Figures 2.1 Data-flow equations . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2 Pointer analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.1 List of REIL instructions . . . . . . . . . . . . . . . . . . . . . . 23 3.2 REIL translation of a function . . . . . . . . . . . . . . . . . . . 24 3.3 Non-local variable . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.4 SSA Form of a REIL function . . . . . . . . . . . . . . . . . . . 30 4.1 REIL Instructions transformations . . . . . . . . . . . . . . . . 40 4.2 Transfer functions for common instructions . . . . . . . . . . . . 41 4.3 Transfer functions for -nodes instructions . . . . . . . . . . . . 41 4.4 Intraprocedural analysis example . . . . . . . . . . . . . . . . . 41 4.5 PCG used in our algorithm . . . . . . . . . . . . . . . . . . . . 42 4.6 Computing f1() to f2() alias trees . . . . . . . . . . . . . . . . . 42 4.7 Computing f1() to f3() alias trees . . . . . . . . . . . . . . . . . 43 4.8 Computing f2() to f4() alias trees through the leftmost edge . . 43 4.9 Computing f2() to f4() alias trees through the rightmost edge . . 43 4.10 E↵ects of combine() on functions alias trees . . . . . . . . . . . 44 7
  • 7. LIST OF FIGURES 5.1 Example of callgraph . . . . . . . . . . . . . . . . . . . . . . . . 45 5.2 Callgraph with relevant functions in red . . . . . . . . . . . . . 46 5.3 Pruned callgraph . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.4 Pruned callgraph ready for bug detection step . . . . . . . . . . 46 5.5 Alias verification equations . . . . . . . . . . . . . . . . . . . . . 47 8
  • 8. Chapter 1 Introduction In the era of cloud computing and internet-connected devices, attacks to such devices are becoming increasingly profitable. Nowadays web clients, such as PCs, mobiles, UMPCs, tablets and netbooks, are indeed a precious source of information. Although built using diverse architectures, including but not limited to the x86, x86-64, ARM and PowerPC familes, they necessarily share a base set of network dedicated software, first of all the web browsers. Ap- plications like Firefox, Safari, Internet Explorer, Google Chrome and other complementary software including JavaScript engines, Adobe Flash, Adobe AIR, Adobe Reader, the Java Runtime Environment, are distributed pre- installed in almost every OS. Due to the magnitude of the latter applications, developers are bound to adopt self-made custom complex memory manage- ment systems built on top of native memory allocators. Noticeable examples are NS Alloc()/NS Free() in the XPCOM library, PR NEW/PR DELETE in NSPR library, JS malloc()/JS free() in SpiderMonkey JavaScript engine, 9
  • 9. CHAPTER 1. Introduction TCmalloc() in Webkit and V8 JavaScript engines. The complexity and heterogeneity of memory management in large code bases results in numerous memory corruption vulnerablities, such as unintial- ized memory, double frees and dangling pointers. The latter are an insidious type of flaws, also known as ”use-after-free” or ”stale pointers”, which occur when pointer variables reference to freed memory areas. A quick search on the Common Vulnerabilities and Exposures (CVE) list yields 162 reports for web browsers and more than 470 results among the three most-popular browser add-ons, i.e., Adobe Flash, Adobe Reader and Java Runtime Environment (between 2006 and 2010). The astonishing number of use-after-free bugs reported in recent years make them one of the most appetible attack vector on client systems. Although use-after-free bugs are scattered throughout the code base, finding them and other memory corruption bugs is arguably more di cult than overflows in that they are temporal memory errors and, as such, an e↵ective detection of them means understanding how a custom memory management system internally works, in order to identify logical pitfalls. The only e↵ective solution consists in manual code-review e↵orts, in the attempt to recognize exceptions with respect to the rules that developers were supposed to follow while writing code. This process is clearly cumbersome. A vast variety of approaches has been proposed to automatically spot memory flaws. In particular, an analyst can actually face the problem of finding dangling pointers with dynamic analysis techniques, such as fuzzing, or by the means of static analysis. Even though 10
  • 10. fuzzing has been proved to be an e↵ective method for finding such bugs, it su↵ers from several intrinsic limitations. For example, relying on input to exercise code paths has the disadvantage of scarce coverage of the application code and limits the depth of exercised code paths, leaving part of the code unexplored. Moreover, the randomic nature of the generated input makes the running time virtually infinite, and totally unrelated to the code coverage the analysis reaches over time. We propose a practical approach to automatically find use-after-free conditions in large binary code bases using static analysis and graph theory. 11
  • 12. Chapter 2 Static Analysis 2.1 General knowledge Static analysis is the automated process of extracting semantic information about a program without executing it. Not having the need of executing the binary, static analysis o↵ers a number of potential advantages over its dynamic counterpart: • Architecture independence: the analysis, even if it might be specific to an instruction set or language, can be implemented on top of any framework and run on any machine; • Running time: time complexity of a static analysis algorithm may range from linear to exponential, but it will process the entire binary in finite time; • Code coverage and depth of analysis: not relying on input to exercise code paths, static analysis can achieve total code coverage. This facts 13
  • 13. CHAPTER 2. Static Analysis makes it particularly useful in analyzing large applications with complex path triggering conditions. Although the static analysis problem has been proven to be theoretically un- decidable, it can be formally reformulated as an over-approximation of the ori- ginal problem which can be proven to halt in finite time. In fact the term ”static analysis” is indeed over-broad. It includes various implementation techniques that make static analysis feasible. Model checking was the first technique to appear in chronological order. It arose in an attempt to solve the so called Concurrent Program Verification problem. A model checker checks for the correctness of a formula expressed in Temporal Logic, being able to e↵ectively uncover hard-to-find concurrency errors. As stated in Chapter 1, use-after-free bugs are temporal memory errors and, as such, can be expressed in terms of temporal logic: a program point in which a pointer gets dereferenced is reached after a program point in which the same pointer gets freed. Model Checking turns out to be very precise in unveiling temporal memory errors, but reveals non-negligible faults. The major defect is the need for source code. The model checker simply tries to resolve the system of pre- and post-conditions of each function. In order to do that, the pre- and post-conditions has to be previ- ously specified by annotating the source code of the program to analyze. The process of annotating is also extremely time consuming. Moreover, in the gen- eral case, the complexity of model checking algorithms is exponential in time. Another very important method for static analysis is the so called Abstract Interpretation. When running an algorithm based on Abstract Interpretation, 14
  • 14. 2.1. General knowledge outb = transb (inb ) inb = joinp2predb (outb ) Figure 2.1: Data-flow equations the program semantics is over-approximated as a set of monotonic functions, the algorithm uses to transform ordered sets, which are the results of the ana- lysis. It can be viewed as a partial execution of the program which tracks only part of information about its semantics, without performing all the calcula- tions. The constraints of monotonicity and order assure the analysis to halt in finite time. This is a well-known technique, mainly used in compilers, for optimization tasks, and in debugging. Every analysis, taking advantage of the abstract interpretation pattern for gathering information about the possible set of values (of registers, variables, memory locations, etc.) calculated at a given point in a program, belongs to the ”Data-flow” family of analyses. In particular, a data-flow analysis algorithm usually walks the control flow graph (CFG) of a program. The algorithm, at each program point (instruction, as- signment, basic block and so forth), applies data-flow equations (see Figure 2.1) to the state (ordered set) associated to it. The analysis is repeated on every node until the sets stabilize. As previously stated, the data-flow equa- tions must be monotonic and the sets must carry an order relation, at least partial. If the latter conditions hold, the repetition of the analysis will reach the so called fixpoint and the algorithm can stop. 15
  • 15. CHAPTER 2. Static Analysis 2.2 Pointer Analysis We will now focus on data-flow analysis of pointer values, precisely named Pointer Analysis. As one may immediately notice, Pointer Analysis fits our need of tracking values assigned to pointers to later check for use-after-free conditions. There are several dimensions that a↵ect cost/precision trade-o↵s of pointer analysis. How a pointer analysis addresses each of these dimensions helps to categorize the analysis. The dimensions we are now going to consider are: • Scope: a static analysis algorithm can either be engineered to perform the analysis within a single function only, or could o↵er the possibility to extend the analysis in order to cover multiple functions. The former goes under the name of intraprocedural pointer analysis, the latter is interprocedural anaysis; • Flow-sensitivity: a flow-sensitive analysis takes into account the order of statements in a program, and therefore it can compute a solution for each program point, whereas a flow-insensitive analysis computes a solution for either the whole program or for each procedure. The immediate consequence is a higher degree of precision for flow-sensitive approaches. On the other hand, flow-insensitivity shows much higher scalability in terms of both time and space, therefore proving to be a better choice for analyzing very large programs; 16
  • 16. 2.2. Pointer Analysis • Context-sensitivity: in context-sensitive analyses the calling context of a function is considered. This means parameters passed on di↵erent calls of the same function can be distinguished and properly returned to the actual caller. Context-sensitivity o↵ers a higher degree of precision and, if properly implemented, mildly impacts speed; • Heap modeling: an accurate pointer analysis should rely on a representa- tion of the entire heap space. This is a non-trivial issue, and constitutes a static analysis branch by itself, going under the name of Shape Ana- lysis. Even though shape analysis is undergoing heavy research, it shows extremely limited scalability in analyzing real-life programs; • Aggregate modeling: a very important factor a↵ecting the precision of pointer analysis is how elements of aggregates are distinguished. An extremely precise modeling, in which every single object can be dis- tiguished, could be achieved by running a full-blown shape analysis. As just stated shape analysis is not a feasible solution for our purposes. A very fast and imprecise model could collapse the elements of the aggreg- ate into one object. This would introduce excessive noise in the analysis and would lead to a situation in which no heap object is discernible from another; • Alias representation: indicates whether alias pairs or points-to pairs are mantained during the analysis. Alias pairs represent alias relations ex- plicitely, whereas points-to data is a more compact representation. 17
  • 17. CHAPTER 2. Static Analysis A lot of reasearch has been done on Pointer Analysis in the last twentyfive years. Nowadays moderate intraprocedural analyses are commonly implemen- ted in almost every compiler, whareas interprocedural algorithms are still in research stage. Figure 2.2 shows a summary of the interprocedural analyses Figure 2.2: Pointer analyses proposed so far. Each one having its own pros and cons, they all share a few limitations that make them not suitable for our purposes. Their major fault is the need for source code, which usually is, from an analyst point of view, practically impossible to retrieve. Moreover, they are often built to analyze source code translated to a sub-language of the original language. The lat- ter limitation also a↵ects the few interprocedural analyses proposed to work at the assembly level, like the one by Naeem et al [3]. The need to reduce a real assembly language, with hundreds of instructions, to a really narrow sub-language proves to be actually impossible in real scenarios. 18
  • 18. 2.3. Conclusions and contributions 2.3 Conclusions and contributions Intraprocedural analysis, in terms of e ciency and scalability, is reliable enough to be implemented with minor modifications apt to make it able to deal with more expressive assembly languages. On the other hand, interprocedural ana- lysis at the assembly level is still in an alpha stage of development. Therefore we propose a new tree-based context-sensitive interprocedural analysis target- ing assembly languages. 19
  • 19. CHAPTER 2. Static Analysis 20
  • 20. Chapter 3 Preprocessing stage 3.1 The REIL intermediate language The Reverse Engineering Intermediate Language (REIL) [6] is a platform- independent intermediate language which aims to simplify static code analysis algorithms such as the gadget finding algorithm for return oriented program- ming presented in this paper. It allows to abstract various specific assembly languages to facilitate cross-platform analysis of disassembled binary code. REIL performs a simple one-to-many mapping of native CPU instructions to sequences of simple atomic instructions. Memory access is explicit. Every instruction has exactly one e↵ect on the program state. This contrasts sharply to native assembly instruction sets where the exact behaviour of instructions is often influenced by CPU flags or other pre-conditions. All instructions use a three-operand format. For instructions where some of the three operands are not used, place-holder operands of a special type 21
  • 21. CHAPTER 3. Preprocessing stage called " are used where necessary. Each of the 17 di↵erent REIL instruction has exactly one mnemonic that specifies the e↵ects of an instruction on the program state. The REIL VM To define the runtime semantics of the REIL language it is necessary to define a virtual machine (REIL VM) that defines how REIL instructions behave when interacting with memory or registers. The name of REIL registers follows the convention t-number, like t0, t1, t2. The actual size of these registers is specified upon use, and not defined a priori (In practice only register sizes between 1 byte and 16 bytes have been used). Registers of the original CPU can be used interchangeably with REIL registers. The REIL VM uses a flat memory model without alignment constraints. The endianness of REIL memory accesses equals the endianness of memory accesses of the source platform. REIL instructions REIL instructions can loosely be grouped into five di↵erent categories accord- ing to the type of the instruction (See Table 3.1). Arithmetic and bitwise instructions take two input operands and one output operand. Input operands either are integer literals or registers; the output operand is a register. None of the operands have any size restrictions. However, arithmetic and bitwise operations can impose a minimum output operand size 22
  • 22. 3.1. The REIL intermediate language Arithmetic instructions Operation ADD x1 , x2 , y y = x1 + x2 SUB x1 , x2 , y y = x1 x2 MUL x1 , x2 , y y = x1 · x2 j k DIV x1 , x2 , y y = x1x 2 MOD x1 , x2 , y y = x1 mod x2 8 > > x · 2x2 < 1 if x2 0 BSH x1 , x2 , y y = > j x1 k > : if x2 < 0 2 x2 Bitwise instructions Operation AND x1 , x2 , y y = x1 &x2 OR x1 , x2 , y y = x1 | x2 XOR x1 , x2 , y y = x1 x2 Logical instructions Operation 8 > > 1 if x = 0 < 1 BISZ x1 , ", y y = > > 0 if x 6= 0 : 1 JCC x1 , ", y transfer control flow to y i↵ x1 6= 0 Data transfer instructions Operation LDM x1 , ", y y = mem[x1 ] STM x1 , ", y mem[y ] = x1 STR x1 , ", y y = x1 Other instructions Operation NOP ", ", " no operation UNDEF ", ", y undefined instruction UNKN ", ", " unknown instruction Figure 3.1: List of REIL instructions 23
  • 23. CHAPTER 3. Preprocessing stage or a maximum output operand size relative to the sizes of the input operands. Note that certain native instructions such as FPU instructions and mul- timedia instruction set extensions cannot be translated to REIL code yet. Another limitation is that some instructions which are close to the underlying hardware such as privileged instructions can not be translated to REIL; sim- ilarly exceptions are not handled. All of these cases require an explicit and accurate modelling of the respective hardware features. An example of function, translated from x86 assembly alnguage to REIL is shown in Figure 3.2 Figure 3.2: REIL translation of a function 24
  • 24. 3.2. Single Static Assignment (SSA) Form 3.2 Single Static Assignment (SSA) Form 3.2.1 Graph theory overview The algorithm for building SSA Form relies on the dominator tree and dom- inance frontiers in order to identify merge points. The following notions are required to understand the algorithms for SSA trans- lation and how bug detection works: • Dominance Relation: In a Control Flow Graph, a node D dominates a node N if every path from the start node to N must through D. Nota- tionally, this is equivalent to D dom N. By defition every node dominates itself; • Strict Dominance Relation: A node D strictly dominates a node N if D dominates N and D does not equal N; • Immediate Dominator : The immediate dominator or idom of a node N is the unique node that strictly dominates N but does not strictly dominate any other node that strictly dominates N. Not all nodes have immediate dominators; • Dominator Tree: The dominator tree of a graph is a tree where each node’s children are those nodes it immediately dominates; • Dominance Frontier : The dominance frontier of a node S is the set of all nodes N such that S dominates a predecessor of N but does not strictly 25
  • 25. CHAPTER 3. Preprocessing stage dominates N; More intuitively, it is the set of nodes where N’s dominance stops; • Iterated Dominance Frontier : Formally, it is the irreflexive closure of the dominance frontier relation. It is actually calculated as follows: S Let DF (S) = x2s DF (x) be the dominance frontier of a set of nodes. The iterated dominance frontier is: 3.2.2 Computing SSA Form Single Static Assignment (SSA) Form is an intermediate representation of a function graph that is very frequently used in compiler optimization. SSA form imposes a naming convention on the function variables such that each variable name corresponds to the value produced at a single definition point. Another advantage of SSA Form is the ability to identify merge points inside a function flow graph and mark them with so called -functions. In our prototype all the functions flow graphs inside a binary are translated into SSA Form before proceeding with the analysis. There exists three known types of SSA Form translation based on the e ciency of the algorithm and on the number of -functions present in the resulting graph. We chose to imple- ment a ”semi-pruned” SSA Form as a good trade-o↵ between precision and performance. In order to reduce the number of -functions inside a flow graph the pruned SSA Form employs liveness analysis to determine which variables are still alive 26
  • 26. 3.2. Single Static Assignment (SSA) Form Figure 3.3: Non-local variable at a given merge point. To improve performances instead of liveness analysis the semi-pruned SSA form introduces the concept of non-locals. A non-local is a variable which has been used inside a basic block but it has been defined elsewhere, that is a variable that first appeared in a di↵erent basic block (see Figure 3.3). It must be noticed that the concept of non-local is an under- approximation of a full blown liveness analysis, thus the semi-pruned form is still subject to the presence of not strictly needed -functions. The algorithm proposed by Briggs et al[2] is the following: non-locals ; for each block B do killed ; for each instruction z x op y in B do if x 2 killed then / non-locals non-locals [{x} end if if y 2 killed then / non-locals non-locals [{y } 27
  • 27. CHAPTER 3. Preprocessing stage end if killed killed [ {z} end for end for In our implementation the algorithm maintains three pre-computed data structures: a list of addresses where to insert the -functions and two hashmaps to keep track respectively of all the previously created variables and of the next variable name to be assigned. The first data structure is created by calculating the iterated dominance frontier of every live variable in the flow graph. The rest of the algorithm works by recursively walking the dominator tree renaming variables in the original graph so that when a new assignment or a -function is found a variable with a new name is created and the results are propagated to the children in the tree. The pseudo-code as adapted in our implementation is the following: for each variable v do Let A(v ) be the set of blocks containing assignment to v Place a -function for v in the iterated dominance frontier of A(v ) end for for each variable v do Counters[v ] 0 Stack[v ] ; end for Let start be the root node of the dominator tree, RENAME(start) RENAME(block): for each -function, v (...) in block do i Counters[v ] 28
  • 28. 3.2. Single Static Assignment (SSA) Form Replace v with vi in the new graph Stack[v].push(i) Counters[v ] i +1 end for for each instruction, v x op y in block do i Stack[x].first(), c Stack[y ].first() Replace x with xi and y with yc in the new graph i Counters[v ] Replace v with vi in the new graph Stack[v].push(i) Counters[v ] i +1 end for for each successor s of block do j block variables position index, corresponding to the position of block in the parents array of s. This is just a convention for each -function p in s do v j th operand of p Replace v with vi where i Stack[x].first() end for end for for each child c of block in the dominator tree do RENAME(c) end for for each instruction v x op y ||v (...) in block do Stack[v ].pop() end for An example of REIL code translated in SSA Form is shown in Figure 3.4 29
  • 29. CHAPTER 3. Preprocessing stage Figure 3.4: SSA Form of a REIL function 30
  • 30. Chapter 4 Analysis stage 4.1 Pointer analysis In our work we implemented both intraprocedural and interprocedural pointer analysis in order to track objects aliases and thus being able to reason about possible dangling pointer conditions. The intraprocedural analysis is performed on the top of MonoREIL[6], an abstract interpretation framework based on REIL. In the following two subsections we are going to briefly describe the main features of our analysis, and MonoREIL. Later on we will focus on the intraprocedural pointer analysis algorithm. In the third section the interpro- cedural algorithm will be explained. 4.1.1 Analysis features Dataflow analysis and abstract interpretation algorithms have a number of properties that characterize them. Among those the three most relevant are 31
  • 31. CHAPTER 4. Analysis stage flow, context and path sensitivity. An algorithm is said to be path-sensitive if it computes di↵erent piece of analysis information depending on predicates at conditional branch instructions. The intraprocedural algorithm used in our work merges results of the analysis at the function merge-points, this e↵ect- ively results in a path-insensitive algorithm. In fact we are not able to discern code-paths that lead to the presence of a given alias. Moreover our algorithm is flow-insensitive, in fact during the analysis we do not track code locations. That is, the analysis will not be able to say after which statement a given variable became an alias of another one. The main problem deriving from the path and flow insensitivity of our al- gorithm is the increased number of false positives that can appear in our ana- lysis. In fact we are not able to gauge whether a specific path yielding to a stale pointer condition is feasible. Nonetheless the performance gain obtained by this implementation of the algorithm are significantly more beneficial than the increase in the number of false positives. Moreover a number of empirical studies [9] [10] [11] [12] have shown that the improvement o↵ered by flow-sensitivity is minimal in terms of precision. Our interprocedural algorithm works by merging trees generated in each func- tion, therefore the flow-insensitivity of the intraprocedural analysis and the nature of the merging we perform make it both flow and path insensitive. The same considerations done for the intraprocedural analysis on performance gain and precision loss apply to the interprocedural part of our analysis. The algorithm performs the analysis on the procedural call graph (PCG) of the 32
  • 32. 4.1. Pointer analysis binary. The PCG allows to discern function parameters and calling locations, that is every edge in the PCG is marked with the parameters passed to a given function. This property guarantees that our analysis is context-sensitive. Context-sensitivity is crucial, in fact the ability to discern function parameters of each call prevents ambiguity and imprecision in tracking aliases between functions. Another problem of pointer analysis is dealing with data structures which can make it di cult to track aliases. In order to deal with this nuance we resorted to two strategies. The first one consists of tracking the size of objects whenever possible, that is when there is no need to perform range analysis, this way we are able to recog- nize whether a given heap location belongs to a specific object and therefore we are able to properly track aliases for it. The second strategy is to model widely used data structures such as linked lists, vectors and other similar ones in order to be able to track objects stored in them. It must be noticed that not all data structures are covered, therefore some aliases may be missed by our analysis. The two latter strategies allow us to completely avoid heap modeling thus greatly simplifying the analysis. 4.1.2 MonoREIL MonoREIL is an abstract interpretation framework that performs fixed-point iteration until a final state is reached. MonoREIL operates on the control flow 33
  • 33. CHAPTER 4. Analysis stage graph of a function that can be walked arbitrary depending on the analysis that is intended to be performed. The definitions of a lattice, its elements and a formula that can combine the elements are necessary for the framework to work. Every analysis is supposed to start with an initial state that can be arbitrary. Finally the e↵ects of REIL instructions on the lattice need to be modelled. To guarantee the termination of the analysis the lattice has to satisfy the ascending chain condition, that is the lattice has to be a noetherian lattice. In fact if the condition is violated it is not possible to guarantee that there exists two states in the analysis, p n 1 and p n , such that p n 1 = pn . In the following section we show that our analysis satisfies the requirement and therefore is always guaranteed to terminate. 4.2 Intraprocedural analysis Alias set analysis is a well-known variation of pointer analysis which grants an higher degree of precision and at the same time avoids performance bottlenecks. Intuitively an alias set is the set of all local pointer variables that point to a given object. The strength of the analysis lies in the fact that whenever there is some degree of uncertainty about whether a given variable x points to a concrete object, instead of creating a may or must-point-to set, it creates two alias sets, only one of which contains x. Our analysis computes the alias sets for each function in the binary so that they can be later combined in order to reason on the existence of dangling pointers 34
  • 34. 4.2. Intraprocedural analysis by propagating alive aliases between functions in the binary call graph. We have adapted the algorithm proposed in [3] to fit our purposes and scope. It can be proved that our analysis reaches the fixed point because our transfer functions are distributive. In fact the fixed point computed for the alias set dataflow algorithm corresponds to the merge-over-all-paths dataflow value of our algorithm [7]. In order to analyze the functions we have to further simplify our intermediate language so that it can be expressed by the means of a very simple grammar: s ::= v1 v2 |v h|h v |v null|v new Where h represents any heap location, null represents a null pointer and new represents a newly created object. To simplify REIL code so that it can be expressed with the above grammar we created transformation functions for every REIL instruction in our MonoREIL algorithm. Table 4.1 shows the appropriate transformations we apply to REIL instructions. It must be noticed that we consider an object to be newly created only when it is the return value of either a constructor or an allocation functions. Both constructors and allocation functions although partially recognized in an automated fashion by our software need to be manually indicated by the user. We treat code blocks di↵erently depending on whether we are dealing with a simple assignment or a -function. In the former case we first merge all the influencing states, performing an union on the sets, for a given node and then we apply the equations shown in Figure 4.2 . 35
  • 35. CHAPTER 4. Analysis stage We create a new alias set for every newly allocated object, we then store in the appropriate alias set all the variables that alias one of the objects aliases and finally whenever an heap location, not previously known, is found we create two alias sets one with the location and another one without it. In the latter case instead we can easily assume that -functions are to be found at merge points in the control flow graph of the function, that is when we need to combine one or more incoming states in our lattice. In our analysis the lattice is the set of all alias sets. Figure 4.3 shows the combine function for merge points. Each state is first pruned, that is we remove all the aliases that do not exist in the set of variables of the node strict dominators. Once the alias set has been pruned it is then updated by adding all destination variables whose values are being assigned from variables already in the alias set. We defined the elements of the lattice so that each element is a set of linked lists. To each lattice element corresponds an object to which the variables in the linked list alias to. The reason for choosing set of linked lists over other data structure is the performance gain. In fact it can be proved that the analysis carried on an SSA form graph allows to perform operations only on the head of the list thus saving look-up time. Nonetheless for further optimization, when the analysis for a given function is complete we transform each alias set in a tree-like structure which makes it easier to perform the interprocedural analysis we will discuss in the following section. 36
  • 36. 4.3. Interprocedural analysis A sample run of the algorithm can be seen in Figure 4.4 4.3 Interprocedural analysis At the end of the intraprocedural alias analysis, the resulting alias lists of each function are used to construct points-to tree structures that make the alias relationships between variables explicit. In such a points-to tree, each node represents a distinct variable and its children the variables pointing to it, so that siblings are equivalent aliases. For each function we extract its parameters and its return. Given that in- formation, the interprocedural analysis algorithm performs a walkdown on the procedure call graph, updating a set of points-to trees for the object that needs to be tracked, until the final state of the analysis is reached. We propose an implementation of our algorithm on the top of BinNavi. The interprocedural analysis, as opposed to the intraprocedural one, is run on a Procedure Call Graph (PCG). A PCG is identical to a call graph, with the exception that it has an edge for each call site, and every edge is labelled with the variables of the source node that act like parameters in the target node (see Figure 4.5). At each iteration, the algorithm properly connects the points-to trees con- taining the incoming parameters to the points-to trees of the previous iteration on the graph. These are updated by connecting the trees containing the formal parameters of the current function to the nodes corresponding to the incoming 37
  • 37. CHAPTER 4. Analysis stage parameters, as indicated in the edge label. If the node corresponding to the formal parameter is the root node of a points-to tree, than the tree is appen- ded to the node corresponding to the incoming parameter and the result is added to the newly generated set of trees. Alternatively, if it is not a root node, the points-to tree containing it is added to the new state set, with the node replaced by the incoming parameter. The points-to tree containing the incoming parameter is also copied into the new set of trees and the sub-tree whose root node is the node containing the incoming parameter is detached from that copy of the tree (see Figures 4.6, Figure 4.7, Figure 4.8 and Figure 4.9). When a merge point is met, the resulting sets of trees are computed separately, one for each incoming edge, and a set union is performed, so that duplicate trees are removed. Moreover, sub-trees of trees contained in the set are also removed. Additional trees can be removed from the set in order to furtherly reduce space requirements (see Figure 4.10). Considering that the algorithm will not walk more than once on a node not in- volved in a cycle, it is safe to remove trees not containing aliases to the tracked object from the set of points-to trees of that node. Similarly, the points-to trees of the previous step are extended with the points-to trees containing the returned variable. Once the fixed point iteration has been reached, the resulting set of trees is an interprocedural set of points-to trees containing all of the aliases of the object you want to track. 38
  • 38. 4.4. C++ peculiarities 4.4 C++ peculiarities In dealing with C++ we had to take into account a certain numbers of charac- teristics linked to the language. One of the problems of complex applications written in C++ is often the use of smart pointers interfaces. That is, C++ classes used for providing memory safety in terms of objects lifetime. In or- der to deal with smart pointers we require the user to specify which functions shall be considered the constructor and destructor of the object intended to be analyzed and whether there are multiple constructors or destructors for an object. In order to improve the precision of our analysis we used well-known techniques explained in [8] to identify constructors and destructors of objects in the binary whenever it is possible. These requirements are necessary to keep the analysis as application independent as possible without constraining our work to one specific kind of smart pointers or memory management architec- ture. User interaction is also needed to handle custom allocators, that is the user is asked to specify whether or not the allocation and deallocation functions identified by our tool are the correct ones. 39
  • 39. CHAPTER 4. Analysis stage Arithmetic instructions Operation ADD x1 , x2 , y y is added to the alias set of x1 + x2 SUB x1 , x2 , y y is added to the alias set of x1 x2 MUL x1 , x2 , y y is added to the alias set of x1 · x2 j k DIV x1 , x2 , y y is added to the alias set of x1x2 MOD x1 , x2 , y y is added to the alias set of x1 mod x2 8 > > x · 2x2 < 1 if x2 0 BSH x1 , x2 , y y is added to the alias set of > ⌅ x1 ⇧ > : 2x2 if x2 < 0 Bitwise instructions Operation AND x1 , x2 , y y is added to the alias set of x1 &x2 OR x1 , x2 , y y is added to the alias set of x1 | x2 XOR x1 , x2 , y y is added to the alias set of x1 x2 Logical instructions Operation BISZ x1 , ", y y is removed from all alias sets JCC x1 , ", y does not a↵ect alias sets Data transfer instructions Operation LDM x1 , ", y y is added to the alias set of mem[x1 ] STM x1 , ", y mem[y ] is added to the alias set of x1 STR x1 , ", y y is added to the alias set of x1 Other instructions Operation NOP ", ", " does not a↵ect alias sets UNDEF ", ", y y is removed from all alias sets UNKN ", ", " does not a↵ect alias sets Figure 4.1: REIL Instructions transformations 40
  • 40. 4.4. C++ peculiarities 8 > > {{v }} if s = v < new [[s]]gen , > > ; : otherwise 8 > > {a [ {v }} if s = v > > 1 1 v2 ^ v2 2 a > > < [[s]]a (a) , > {a, a [ {v }} if s = v > h > > > > {a} : otherwise [ [[s]]l (l) , [[s]]gen [ [[s]]a (a) a2l Figure 4.2: Transfer functions for common instructions [[ ]]a (a, pred) , {(a vars(sdom( ))) [ (yi : yi xi 2 livevars( , pred) ^ x1 2 a)} [ [[ ]]l (l, pred) , [[s]]a (a, pred) a2l Figure 4.3: Transfer functions for -nodes instructions Figure 4.4: Intraprocedural analysis example 41
  • 41. CHAPTER 4. Analysis stage Figure 4.5: PCG used in our algorithm Figure 4.6: Computing f1() to f2() alias trees 42
  • 42. 4.4. C++ peculiarities Figure 4.7: Computing f1() to f3() alias trees Figure 4.8: Computing f2() to f4() alias trees through the leftmost edge Figure 4.9: Computing f2() to f4() alias trees through the rightmost edge 43
  • 43. CHAPTER 4. Analysis stage Figure 4.10: E↵ects of combine() on functions alias trees 44
  • 44. Chapter 5 Stale pointers detection Detecting use after free conditions means to verify whether there are any code paths in which an object alias is used after the object itself was freed. In order to reason on this condition we first prune the control flow graph (CFG, see Figure 5.1) of the binary, so that only functions that use aliases of the object we are interested in or are linked to a function using an alias are preserved. This can be trivially done by walking the call flow graph and eliminating all the functions that are neither successors nor predecessors of the procedures when an object alias appears (see Figure 5.2 and Figure 5.3). Figure 5.1: Example of callgraph 45
  • 45. CHAPTER 5. Stale pointers detection Figure 5.2: Callgraph with relevant functions in red Figure 5.3: Pruned callgraph Finally we simply mark calls to the destructors on the pruned callgraph (see Figure 5.4). The rest of the algorithm walks cross references to the object destructors backwards, that is it computes all functions that at some program point inval- idate the concrete object. We call them functions aliases. Figure 5.4: Pruned callgraph ready for bug detection step 46
  • 46. For each function that calls a destructor alias we verify whether the concrete object itself or one of its aliases are used. To do so we build the dominator tree of the function flow graph and verify the conditions shown in Figure 5.5. We assume the following notation: B is a generic basic block, F is the basic block that either calls the destructor or destroys the concrete object, v is an object alias. dom(B) denotes the basic blocks dominated by node B. v 2 B is the relation that represents the use of variable v in basic block B. Finally succ(B) are successors of node B. Type of warning Condition v is a stale pointer if v 2 B ^ B 2 dom(F ) v may be a stale pointer if v 2 B ^ B 2 dom(F ) ^ B 2 succ(F ) / v might be a memory leak if v 2 B ^ F 2 dom(B) ^ F 2 succ(B) / v is a memory leak if v 2 B ^ F 2 dom(B) ^ F 2 succ(B) / / v is neither a stale pointer nor a memory leak otherwise Figure 5.5: Alias verification equations 47
  • 47. CHAPTER 5. Stale pointers detection 48
  • 48. Chapter 6 Results and future work In this paper we have targeted a widely known cause of security flaws. We have shown that it is feasible to collect enough data in terms of alias sets on a C++ binary to discover stale pointers bugs at interprocedural level. We have implemented our work on the top of BinNavi using REIL as the intermediate language and MonoREIL as the monotone solver framework for our algorithms. Our approach of only verifying one type of object per time allowed us to drastically reduce the execution time and the number of false positives to analyze, nonetheless we do realize that this approach is suboptimal for scenarios in which a developer has to fix bugs in his software because in that case it would be necessary to run the analysis multiple times. From our test results, on a set of samples we built, it is clear that the prime cause of false positives is the lack of flow-sensitiveness of our analysis. One of the primary goal of future work in this direction is to use a SMT solver in order to verify path feasibility. 49
  • 49. CHAPTER 6. Results and future work The principal source of false negatives in our analysis is the heavy presence of function pointers in C++ code and complex data structures, in those cases we were not able to obtain enough information either on the alias sets or on the relationships between functions. Some techniques exist to deal with these problems, we did not implement them because the results are far from satisfying and could dramatically increase the number of false positives in our analysis. Finally we plan on augmenting our analysis by increasing the number of data structures handled by our algorithm and by doing range analysis in order to trace an higher number of aliases. 50
  • 50. Bibliography [1] Ron Cytron, Jeanne Ferrante, Barry K. Rosen, Mark N. Wegman, and F. Kenneth Zadeck: ”E ciently computing static single assignment form and the control dependence graph.” ACM Transactions on Programming Languages and Systems, 13(4):451-490, Oct 1991 [2] Preston Briggs, Keith D. Cooper, Timothy J. Harvey, and L. Taylor Simpson: ”Practical improvements to the construction and destruction of static single assignment form.” Software-Practice and Experience, 28(8):859-881, Jul 1998. [3] Nomair A. Naeem, and Ondrej Lhotak: ”E cient Alias Set Analysis Using SSA Form.”International Symposium on Memory Management - ISMM , pp. 79-88, 2009 [4] Xiaodong Ma, Ji Wang, and Wei Dong: ”Computing Must and May Alias to Detect Null Pointer Dereference.”Leveraging Applications of Formal Methods - ISOLA , pp. 252-261, 2008 [5] Sean Heelan: ”Finding use-after-free bugs with static analysis” 51
  • 51. BIBLIOGRAPHY [6] Thomas Dullien, and Sebastian Porst: ”REIL: A platform-independent intermediate representation of disassembled code for static code analysis.” CanSecWest 2009 [7] J. B. Kam and J. D. Ullman: ”Monotone data flow analysis frameworks.” Acta Inf., 1977. [8] Paul Vincent Sabanal, and Mark Vincent Yason: ”Reversing C++.” Black Hat DC 2007 [9] Michael Hind: ”Pointer Analysis: Haven’t We Solved The Problem Yet?” ACM Transactions on Programming Languages and Systems, June 2001 [10] M. Hind, M. Burke, P. Carini and J.-D. Choi: ”Interprocedural pointer alias analysis” ACM Transactions on Programming Languages and Sys- tems, Apr. 1993 [11] M. Hind and A. Pioli: ”Which Pointer Analysis Should I Use?” Interna- tional Symposium on Software Testing and Analysis, Aug. 2000 [12] M. Hind and A. Pioli: ”Evaluating The E↵ectiveness of Pointer Alias Analysis”, Science of Computer Programming, Jan. 2001 52