SQL that isn't caught by WAFs but also isn't used (yet) by attackers! Why detecting SQLi is good, and why doing it with regular expressions is hard. And re-introducing libinjections which is a new way of detecting SQLi attacks.
This is a mashup of my Black Hat USA 2012 and DEFCON 20 talks, refreshed and updated.
5. All photos where taken by me last week
on Martha's Vineyard, MA or from
previous conferences.
Unless it's a picture of a clown
Or otherwise noted.
Then it came from The Internet.
Thanks Internet!
6. The Next 30 Minutes
• Why detecting SQLi is Important
• SQL you didn't know about
• Why detecting SQLi is a hard problem
• Why current solutions aren't so good
• The libinjection algorithm and library
7. Why Detect SQLi
• Even if you know 100% of your queries
are parameterized...
• Knowing who and how often SQLi
attacks is still good to know
• Graph the attacks! Make security
visible! Great internal PR.
8. So Let's Detect SQLi !!!!
It's Easy to Get Started
with Regular Expressions
s/UNIONs+(ALL)?/i
‣ At least two open source WAF
use regular expressions.
‣ Failure cases in closed-source
WAFs also indicate regexp.
12. MySQL NULL Alias
MySQL NULL can written as N
case sensitive. n is not a null.
This means any WAF that does a
"to_lower" on the user input and
looks for "null" will miss this case.
16. Floating Point, Oracle
• Everyone of the previous formats can
have a trailing [dDfF]
• But why use numbers as all?
Special literals, maybe case sensitive:
• binary_double_infinity
• binary_double_nan
• binary_float_infinity
• binary_float_nan
20. MySQL # Comment
• '#' signals an till-end-of-line Comment
• Well used in SQLi attacks
• However... '#' is an operator in PgSQL.
Beware that s/#.*n// will delete
code that needs inspecting.
• Lots of other MySQL comment
oddities:
http://dev.mysql.com/doc/refman/5.6/
en/comments.html
21. PGSQL Comments
• Besides the usual '--' comment
• PgSQL has nested C-Style Comments
•/* foo /* bar */ */
• Careful! What happens when you
'remove comments' in
/* /* */ UNION ALL /* */ */
25. MySQL Ad-Hoc
Charset
• _charset'....'
• _latin1'.....'
• _utf8'....'
26. PGSQL Dollar Quoting
From http://www.postgresql.org/docs/9.1/static/sql-syntax-lexical.html#SQL-SYNTAX-COMMENTS
A dollar-quoted string constant consists of a dollar sign
($), an optional "tag" of zero or more characters, another
dollar sign, an arbitrary sequence of characters that
makes up the string content, a dollar sign, the same tag
that began this dollar quote, and a dollar sign. For
example, here are two different ways to specify the
string "Dianne's horse" using dollar quoting:
$$Dianne's horse$$
$SomeTag$Dianne's horse$SomeTag$
Want more fun? They can be nested!
27. PGSQL Unicode
From http://www.postgresql.org/docs/9.1/static/sql-syntax-
lexical.html emphasis mine:
... This variant starts with U& (upper or lower case U followed by ampersand) immediately before the
opening double quote, without any spaces in between, for example U&"foo". (Note that this creates an
ambiguity with the operator &. Use spaces around the operator to avoid this problem.) Inside the quotes,
Unicode characters can be specified in escaped form by writing a backslash followed by the four-digit
hexadecimal code point number or alternatively a backslash followed by a plus sign followed by a six-digit
hexadecimal code point number. For example, the identifier "data" could be written as
U&"d0061t+000061"
The following less trivial example writes the Russian word "slon" (elephant)
in Cyrillic letters:
U&"0441043B043E043D"
If a different escape character than backslash is
desired, it can be specified using the UESCAPE clause
after the string, for example:
U&"d!0061t!+000061" UESCAPE '!'
28. Oracle Q String
http://docs.oracle.com/cd/B28359_01/appdev.111/b28370/
fundamentals.htm#autoId6
q'!...!' notation allows use of single quotes inside literal
string_var := q'!I'm a string!';
You can use delimiters [, {, <, and (, pair them with ], }, >, and ),
pass a string literal representing a SQL statement to a
subprogram, without doubling the quotation marks around
'INVALID' as follows:
func_call(q'[SELECT index_name FROM user_indexes
WHERE status ='INVALID']');
30. Ridiculous Operators
• != not equals, standard • ||/ cube root (pgsql)
• <=> (mysql) • ** exponents (oracle
• <> (mssql)
• ^= (oracle)
• !>, !< not less than
(mssql)
• / oracle
• !! factorial (pgsql)
• |/ sqaure root (pgsql)
31. Expressions!
• Using the common query extension of
"OR 1=1"
• Besides using literals, one can use functions:
•COS(0) = SIN(PI()/2)
•COS(@VERSION) = -
SIN(@VERSION + PI()/2)
32. EXCEPT (mssql)
MINUS (Oracle)
• Like UNION, UNION ALL
• But returns all results from first query
minus/except the ones from the
second query
• There is also INTERSECT and MINUS
as well.
• I think someone clever could use these,
typically not in WAF rules.
33. Side Note: "IN" lists
• e.g. ....WHERE id IN (1,2,3,4) ....
• These have to be manually created.
• There is no API or parameter binding
for this construct in any
platform,framework or language.
• There is no consistent, safe way to
make this (other than convention,
36. Unicorn Alert!
‣ At Black Hat USA 2005,
Hanson and Patterson presented:
Guns and Butter: Towards Formal Axioms
of Validation (http://bit.ly/OBe7mJ)
‣ …formally proved that for any regex validator,
we could construct either a safe query which
would be flagged as dangerous, or a dangerous
query which would be flagged as correct.
‣ (summary from libdejector documentation)
39. Key Insight
‣ A SQLi attack must be parsed as SQL within
the original query.
‣ SQL has a rigid syntax
‣ it works, or it's a syntax error.
‣ Compare this to HTML/XSS rules
‣ "Is it a SQLi attack?" becomes
"Could it be a SQL snippet?"
40. Only 3 Contexts
User input is only "injected" into SQL in three
ways:
‣ As-Is
‣ Inside a single quoted string
‣ Inside a double quoted string
Means we have to parse input three times.
Compare to XSS
41. Identification of
SQL snippets
without context is hard
‣ 1-917-660-3400 my phone number or
... an arithmetic expression in SQL?
‣ @ngalbreath my twitter account or
... a SQL variable?
‣ English-like syntax and common keywords:
union, group, natural, left, right, join, top,
table, create, in, is, not, before, begin, between
42. Existing SQL Parsers
‣ Only parse their flavor of SQL
‣ Not well designed to handle snippets
‣ Hard to extend
‣ Worried about correctness
... so I wrote my own!
... so I wrote my own!
43. Tokenization
‣ Converts input into a stream of tokens
‣ Uses "master list" of keywords and functions
across all databases.
‣ Handles comments, string, literals, weirdos.
44. 5000224' UNION USER_ID>0--
[ ('...500224', string),
('UNION', union operator),
('USER_ID', name),
('>', operator),
('0', number),
('--.....', comment) ]
45. Meet the Tokens
‣ none/name ‣ group-like operation
‣ variable ‣ union-like operator
‣ string ‣ logical operator
‣ regular operator ‣ function
‣ unknown ‣ comma
‣ number ‣ semi-colon
‣ comment ‣ left parens
‣ keyword ‣ right parens
46. Merging,
Specialization,
Disambiguation
‣ "IS", "NOT" ==> "IS NOT" (single op)
‣ "NATURAL", "JOIN" => "NATURAL JOIN"
‣ ("+", operator) -> ("+", unary operator)
‣ (COS, function), (1, number) ==>
(COS, name), (1, number)
functions must be followed with a
parenthesis!
47. Folding
‣ This step actually isn't needed to detect, but
Text
is needed to reduce false positives.
‣ Converts simple arithmetic expressions into a
single value, and does not try to evaluate.
‣ 1-917-660-3400 -> "1"
pics courtesy The Internet
pics
48. Knows nothing about SQLi
‣ So far this is purely a parsing problem.
‣ Knows nothing about SQLi (which is evolving)
‣ Can be 100% tested against any SQL input
(not SQLi) for correctness.
‣ Language independent test cases
$ cat test-tokens-numbers-floats-003.txt
--TEST--
floating-point parsing test
--INPUT--
SELECT .0;
--EXPECTED--
k SELECT
1 .0
; ;
49. Fingerprints
‣ The token types of a user input form a hash or
a fingerprint.
‣ -6270" UNION ALL SELECT 5594, 5594, 5594, 5594, 5594, 5594,
5594, 5594, 5594, 5594, 5594, 5594, 5594, 5594, 5594, 5594,
5594, 5594, 5594, 5594, 5594, 5594, 5594, 5594, 5594, 5594,
5594, 5594, 5594, 5594, 5594, 5594, 5594, 5594, 5594, 5594,
5594, 5594, 5594, 5594, 5594# AND "JWWQ"="JWWQ
‣ becomes "sUk1,1,1,1,1,1,1,1,&"
‣ Now let's generate fingerprints from
Real World Data.
‣ Can we distinguish between SQLi and benign
input?
pics courtesy The Internet
50. Training on SQLi
‣ Parse known SQLi attacks from
‣ SQLi vulnerability scanners
‣ Published reports
‣ SQLi How-Tos and Cheet Sheets
‣ > 32,000 total
‣ Since Black Hat, donations from
‣ modsecurity
‣ Qualys
‣ > 50,000 total
51. Training on Real Input
‣ 100s of Millions of user inputs from Etsy's
access logs were also parsed.
‣ Large enough to get a good sample (Top 50
USA site)
‣ Old enough to have lots of odd ways of query
string formatting.
‣ Full text search with an diverse subject
domain
52. How many tokens are
needed to determine if
user input is SQLi or not?
55. The Library
On GitHub Now
~500 Lines of Code
One file + data
No memory allocation
No threads
No external dependencies
Fixed stack size
>100k checks a second
56. tada
#include "sqlparse.h"
#include <string.h>
int main()
{
const char* ucg = "1 OR 1=1";
// input should be normalized, upper-cased
// You can use sqli_normalize
// if you don't have your own function
sfilter sf;
return is_sqli(&sf, ucg, strlen(ucg));
}
$ gcc -Wall -Wextra sample.c sqlparse.c
$ ./a.out
$ echo $?
1
58. What's Next?
• Change API to allow passing in
fingerprint data or a function. Allows
upgrades without code changes.
• Can we reduce the number of tokens?
String, variables, numbers are all just
values.
• Folding of comma-separated values?
1,2,3,4 => 1
• Can we just eliminate all parenthesis?
59. Help!
• More SQLi from the field please!
• False positives welcome
• More test cases with exotic SQL to test
parser.
• Ports to other languages (the language-
neutral test framework should make
this easier).
• Compiling on Windows (mostly tested
on Mac OS X and Linux)
60. Do you have a
commercial WAF?
• Set up a dead page and let me poke it.
• Curious on false positives and false
negatives.
61. Slides and Source Code:
http://www.client9.com/libinjection/
Nick Galbreath
@ngalbreath
nickg@client9.com
Oct 25, OWASP USA, Austin, Texas
Continuous Deployment and Security