2. Basics
A format specifier is used to tell the compiler what type of data the program is taking as an input or
output. They usually begin with ‘%’ character.
Format specifiers indicate the location and method to translate a piece of data (such as a number) to
characters. Ex: %s, %d, %f, etc.
A format string refers to a control parameter used by a class of functions in the input/output libraries of C
and many other programming languages. These statements contain format specifiers.
Example of a statement
containing format specifiers
3. Introduction
An uncontrolled format string is a software vulnerability discovered in the late ‘80s that can be used to
crash the program or make it execute harmful code.
Attacking by exploiting an uncontrolled format string is known as a Format String Attack.
Most of these attacks are executed in ANSI C as the problem stems from the use of unchecked user input
as the format string parameter in certain functions that perform formatting, such as ‘print(f)’.
A malicious user may use the %s and %x format tokens, among others, to print data from the call stack or
possibly other locations in memory.
One may also write arbitrary data to arbitrary locations using the %n format token, which
commands printf() and similar functions to write the number of bytes formatted to an address stored on
the stack.
In essence, the format string exploit occurs when the submitted data of an input string is compiled as a
command by the application. So, the attacker could execute code, read the stack, or cause a segmentation
fault in the running application, causing new behaviors that could compromise the security or the stability
of the system.
4. Components
It is important to identify, locate and understand the attack. To understand the attack, it’s necessary to
understand the components that constitute it.
The Format Function is an ANSI C conversion function, like printf, fprintf, which converts a primitive
variable of the programming language into a human-readable string representation.
The Format String is the argument of the Format Function and is an ASCII Z string which contains text and
format parameters, like: printf("The magic number is: %dn", 1911);
The Format String Parameter, like %x, %s defines the type of conversion of the format function.
Some examples of
Format Functions,
which if not treated,
can be attacked
To verify if the format
function accepts and
parses the format
string parameters.
5. Uncommon Formats and Format Options
In order to full leverage the power of the format, we need to review the full list of formats and format options.
%n : Saving the Number of Bytes:
Format printing services allows you to save the total bytes formatted into a variable. There is a decent
chance you've never heard of this format, but it actually is surprisingly useful for certain tasks. For example,
given a format and its arguments, it is not obvious how to determine how long the output is until it actually
formatted. Here's a basic example, of using %n:
%n format matches to an address, in particular an address of an integer, at which the number of bytes
formatted up to that point are stored. So, for example, running this program, we get:
6. Note that the %n character is not actually produced in the output: it is not printable. Instead, it only
has a side effect. Ok, so why does this format exist? Well, there are some really practical uses, for
example, consider counting the digits of a number read in using scanf():
7. Format Flag and Argument Options:
Another tool of formats we will need is some of the extra options for formats to better manipulate the
format output. So far you are fairly familiar with the conversion formats:
•%d : signed number
•%u : unsigned number
•%x : hexadecimal number
•%f : floating point number
•%s : string conversion
What you might not be aware is there is a wealth more options to change the formatting. Here's a sample
program that will illuminate some of these, so called "flag" options:
8. * The first flag option is the "#" which is used to add prefix formatting. In the case of printing in hexadecimal it
will add '0x' to the start of non-zero values. That's pretty useful.
* The next option is adding a number prior to the conversion argument, as in %#50x. This conversion will right
adjust the format such that the entirety of the number takes up 50 hex digits. If you were to add a leading 0 to the
adjustment, as in %#050x, the format will fill those blank spaces with 0’s.
* Perhaps the least familiar option you've seen is the m$ format where m is some number, allows you to refer to
a specific argument being passed. In the example above, we refer to the same argument twice using two different
conversion formats to follow. This is really useful to not have to pass the same argument multiple times; however,
when you use the $ references, you have to do for all the format arguments.
* Finally, we have the half-conversion option h which says to only convert half the typical size. In this case, since
we are working with 4-byte integer values, that would mean to format a 2-byte short size value when using one h,
or a single char length 1-byte value with two, hh.
9. Flag Options for Strings:
With strings, things are similar but a bit different. Here's some example code:
* Like with numbers, we can specify a length flag to right adjust the string up to some specified size,
but we can't fill in that with 0's. Instead the space is filled with spaces.
* Unlike with integer numbers (but can be done with float numbers) we can also truncate the length of
the format if we use the . option. The number following the . says how many bytes from the string should
be used, and this can be combined with the right adjustment. And, interestingly, the right adjustment can
be flipped to left adjustment with a negative sign.
10. Buffer overflow
In information security and programming, a buffer overflow, or buffer overrun, is an anomaly where a
program , while writing data to a buffer, it overruns the buffer's boundary
and overwrites adjacent memory locations.
Buffers are areas of memory set aside to hold data, often while moving it from one section of a program to
another, or between programs. Buffer overflows can often be triggered by malformed inputs; if one
assumes all inputs will be smaller than a certain size and the buffer is created to be that size, then an
anomalous transaction that produces more data could cause it to write past the end of the buffer.
If this overwrites adjacent data or executable code, this may result in erratic program behavior, including
memory access errors, incorrect results, and crashes. Exploiting the behavior of a buffer overflow is a
well-known security exploit.
Programming languages commonly associated with buffer overflows include C and C++ which provide no
built-in protection against accessing or overwriting data in any part of memory and do not automatically
check that data written to an array is within the boundaries of that array. Bounds checking can prevent
buffer overflows, but requires additional code and processing time
11. Example for Buffer overflow
In the following example expressed in C, a program has two variables which are adjacent in memory: an 8-
byte-long string buffer, A, and an unsigned integer, B.
Initially, A contains nothing but zero bytes, and B contains the number 1979.
Now, the program attempts to store the null-terminated string "excessive" with ASCII encoding in the A
buffer.
"excessive" is 9 characters long and encodes to 10 bytes including the null terminator, but A can take only
8 bytes. By failing to check the length of the string, it also overwrites the value of B
12. B's value has now been inadvertently replaced by a number formed from part of the character string. In
this example "e" followed by a zero byte would become 25856.
Writing data past the end of allocated memory can sometimes be detected by the operating system to
generate a segmentation fault error that terminates the process.
To prevent the buffer overflow from happening in this example, the call to strcpy could be replaced with
strlcpy, which takes the maximum capacity of A (including a null-termination character) as an additional
parameter and ensures that no more than this amount of data is written to A:
When available, the strlcpy library function is preferred over strncpy which does not null-terminate the
destination buffer if the source string's length is greater than or equal to the size of the buffer (the third
argument passed to the function), therefore A may not be null-terminated and cannot be treated as a valid
C-style string.
13. Using formats in an exploit
Now that we've had a whirl-wind tour of formats you've never heard of nor ever really wanted to use, how
can we use them in an exploit. Here's the program we are going to exploit.
This is a rather contrived example of using sprintf() to do a copy. One may think because
in the first sprintf() the %.400s format is used, this would not enable a overflow of buffer or
outbuff. For example, this does not cause a segmentation fault:
True, we can't overflow buffer, but we can overflow outbuff because buffer is treated as
the format character. For example, what if the input was like:
And if we look at the dmesg output: [dmesg is a command on most Unix-like operating systems that prints
the message buffer of the kernel]
We see that we overwrote the instruction pointer with a bunch of 0x20 bytes, or spaces!
Now, the goal is to overwrite the return address with something useful, like the address of
bad().
14. To do this, we need to do the right number of extended format to hit the return address, We can do this by first using 0xdeadbeef and
checking the dmesg output:
So if we use a 505 byte length %d format, the next 4-bytes we write is the return address. And adding that, we get what we want:
We can also get this to execute a shell in the normal way
15. Preventing Format String Vulnerabilities
Always specify a format string as part of program, not as an input. Most format string vulnerabilities are
solved by specifying “%s” as format string and not using the data string as format string
If possible, make the format string a constant. Extract all the variable parts as other arguments to the call.
Difficult to do with some internationalization libraries
If the above two practices are not possible, use defenses such as Format_Guard . Rare at design time.
Perhaps a way to keep using a legacy application and keep costs down .Increase trust that a third-party
application will be safe