2 Roads to Redemption - Thoughts on XSS and SQLIA

2 Roads to Redemption
Thoughts on ﬁxing SQLIA and XSS

Florian Thiel,
TU Berlin, 12/16/2008
ﬂorian.thiel ät noroute.de

Source: OWASP (Open Web Application Security Project)
works on awareness and best practices for WAS. Open approach.
based on MITRE Vulnerability Trends 2006
- biased a little, not based on a controlled population (various research bodies)

I will talk about the first two. OWASP differentiates between XSS and injection flaws (such as
SQLIA) but I’ll show you that they have much in common.

The two vulnerabilities are very common and high-profile websites have been attacked.

I’ll explain the background and than what I’m going to do about it.

OWASP Top 10 2007

1. XSS
2. Injection Flaws
3. Malicious File Execution
4. Insecure Direct Object Reference
5. Cross-Site Request Forgery

Source: OWASP (Open Web Application Security Project)
works on awareness and best practices for WAS. Open approach.
based on MITRE Vulnerability Trends 2006
- biased a little, not based on a controlled population (various research bodies)

I will talk about the first two. OWASP differentiates between XSS and injection flaws (such as
SQLIA) but I’ll show you that they have much in common.

The two vulnerabilities are very common and high-profile websites have been attacked.

I’ll explain the background and than what I’m going to do about it.

© by xckd: http://xkcd.com/327/

Let’t start with SQLIA. Who knows this comic strip from XKCD? This is a really simple form of
SQLIA. This is one of the threats of SQLIA, namely to data integrity.

“SELECT ﬁrstname FROM Students
WHERE (login = ‘%s’);” % login



“SELECT ﬁrstname FROM Students
WHERE (login = ‘%s’);” % login


SELECT ﬁrstname FROM Students WHERE
(login = ‘Robert’); DROP TABLE Students; -- ‘);


SQLIA threats

• data integrity
• conﬁdentiality
• new attack vector

XSS. Infamous with PHP due to it’s origins. Not so much cross-site today...

“This issue isn't just about scripting, and
there isn't necessarily anything cross site
about it. So why the name? It was coined
earlier on when the problem was less
understood, and it stuck. Believe me, we have
had more important things to do than think
of a better name. <g>. “
-- Marc Slemko, Apache.org

The real problem is getting code included in a web page so that it is interpreted by the web
browser. Cross-site comes from frames, popular in 1996, include() in PHP

XSS SQLIA

eval(‘user input’)1,2

1) the essence of injections
2) limited only by the execution environment

The common root of the two problems is that you basically eval user input. In XSS, you do
this directly (aka ‘by design’) in SQL this is a matter of missing sophistication in app design.

Failure to sanitize data
into a different plane

This is the weakness that actually appears in apps. Common Weakness Enumeration

What’s perfectly ﬁne in one plane might not be in the other one. Injecting SQL via XSS does
no harm (because the browser cannot make sense of it).

For SQL: From User Input or HTML to SQL Engine
For XSS: From HTML, Strings to Web Browser (HTML, JS, Flash, ... engine)

technical non-solutions

• addslashes() or any one-size-ﬁts-all
• blacklisting (IPS, validation, etc.)

PHP: numeric ﬁeld (no quotes) and encoding
IDS: char(114,111,111,116) (“root” (in MySQL))
white spaces, or for 1=1: do “something” = “some”+”thing”

technical solutions

• AntiSamy
• ReForm
• prepared statements
• Safe Query Objects
• ...

AntiSamy from OWASP, interesting project, supplies “proﬁles” for pages that need some
markup. ReForm is a standard library by OWASP again, escapes HTML in a deﬁned way,
multiple languages.
Safe Query Objects are an extension of the prepared statements idea, adds types. Makes
constraint checking possible.

Looks good!

only half-way there

technical solutions become more and more sophisticated. There’s light at the end of the
tunnel, but...

WP MU < 2.6 XSS
“In /wp-admin/wpmu-blogs.php an attacker can
inject javascript code, the input variables quot;squot; and
quot;ip_addressquot; of GET method aren't properly
sanitized.”
--[Full-disclosure], Sept 2008

This is just an example. Recent WP exploit. Nothing to do with missing technology. Failure to
do sanitation correctly.

ey ’re
er e. Th t!
a re h ye Gibson
ed illiam
io ns ut ing W
rib hras
solut istparap
d --
e
Th ot eve nly
j ustn

William Gibson knew it: We got the techniques, but not everybody is aware of it. Not evenly
distributed yet, heh? So, let’s talk about approaches how to get there and the interesting part
(tm)

The interesting* part

* what my thesis is really about

This part is shorter and has less data, more fuzzy assumptions.

Developers more Code

I see two roads to go from here. The ﬁrst one is to educate developers and make it easier for
them to use the future/the tools. Reduce oversight!

The second one is a technical solution. Write code that takes care of correct application of
measures.

Common to both: Take away some cognitive load from the developers.

Helping developers

• raise awareness
• facilitate detection/motivate reviews
• motivate repair

awareness: “You wouldn’t recognize iambic pentameter if it bit you on the butt”. We need
patterns to ﬁnd vulnerabilities. (There are broken examples in the wild, many books have
them) But: developers have to be constantly aware of the critical spots in their code. That’s
where annotations come in.

Mark the spots that directly interact with user input, add rich annotations to tell the
developer of his/her surroundings.

Facilitate detection: If you annotate critical spots you’ll ﬁnd inconsistencies easier. Add
annotations gives you extra value for reviews, turns up in VC.

Motivate repair: annotations may show you how many critical sections you have in your code.
Tool support will make you embarassed!

// @userinput(data,source=”webform”,
// type=”username”)
// [insert data into query, ignore
// non-alphanums]
def insertAlphaNum(query, data):
// [make sure data is
canonical]
c_data = data.toCharSet(...)
c_data.replace(...)
...
// [insert data into query]
// @output(target=sql,
// type=”username”)
query.prepare(...)
query.insert(data...)
...
An example:
the @userinput annotation says that data is from userinput and comes from a webform. The
idea is to spread knowledge about the data out in the sourcecode. You usually have to know
know more about data than its type (it’s “string”, anyway.

This can also provide support for tooling in the future. The [] part comes from structured
programming. We want to have a light-weight “proof” for code sections. The idea is stepwise
abstraction. look at what the innermost annotation says and check the (less than 10) lines of
code. Then you can use the abstraction given in the annotation as the effect of the code. The
idea is to keep mental load low.

Also, this facilitates reuse. If you change a part of code, you only have to see if the
surrounding annotation still holds.

If you’re constructing code, you can use this the other way round. Give the effect of the
outermost annotation, write the next abstraction reﬁnement, prove this, reﬁne further, etc.

This is used in Cleanroom software development, which has a really good track record for
defect-free software.

What do you use to
communicate critical sections?

Audience participation: How many of you work in a multi-person project? How do you tell
your colleagues about critical sections? Documentation? Annotations? Do you have a technical
solution?

Would you use annotations?

Your requirements?

Does this annotation thing make sense to you? Would you use it? How easy would it have to
be? What functions would you like?

I promised you another approach. Let’s just motivate this a bit...

GET /en-us/library/aa287673(VS.71).aspx HTTP/1.1
Host: msdn.microsoft.com
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:
1.9.0.3) Gecko/2008092414 Firefox/3.0.3
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://www.google.de/search?q=http+request+header
+example&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-
US:official&client=firefox-a
Cache-Control: max-age=0

This is the zoo!

there are many sources of potential user input. Basically everything that is external to the
system.
HTTP Header parameters (anyone know exploits for the Referer? I would expect some for ad
companies :-)
Cookies (the cookie monster didn’t know, but cookies can be evil)
Environment variables (maybe)
Forms

We have to trace EVERY SINGLE occurrence of user input! -> Flow analysis

Tainting functionality only takes us so far. String-based taint is missing granularity.
Character-based tainting is better but still cannot determine if the programmer is doing the
right thing.

Oh, and types. We still need the programmer to decide on sensible types for EVERY SINGLE
input value and implement the sanitation correctly.

This is the zoo!

there are many sources of potential user input. Basically everything that is external to the
system.
HTTP Header parameters (anyone know exploits for the Referer? I would expect some for ad
companies :-)
Cookies (the cookie monster didn’t know, but cookies can be evil)
Environment variables (maybe)
Forms

We have to trace EVERY SINGLE occurrence of user input! -> Flow analysis

Tainting functionality only takes us so far. String-based taint is missing granularity.
Character-based tainting is better but still cannot determine if the programmer is doing the
right thing.

Oh, and types. We still need the programmer to decide on sensible types for EVERY SINGLE
input value and implement the sanitation correctly.

Current approaches

• global XSS ﬁlter (HTML escapes) on/off
• default sanitation of all data

That’s how it’s done currently. E.g. Code Igniter, PHP framework or Django, Python. Global
sanitation does not allow for some markup. It does not address the SQLIA problem, too.
There are SQLIA attacks that rely on the differing execution context of the validator and the
DB.

Current approaches

g h !
n u
o on/off
• e
ledata
global XSS ﬁlter (HTML escapes)
b
• default sanitationiof all
x
t ﬂ e
N o
That’s how it’s done currently. E.g. Code Igniter, PHP framework or Django, Python. Global
sanitation does not allow for some markup. It does not address the SQLIA problem, too.
There are SQLIA attacks that rely on the differing execution context of the validator and the
DB.

Helping the framework

• machines are good at doing repetitive
work!
• if they just knew enough...

OK, the other approach. Why let humans do what machines are best at? Oh, they don’t know
enough to be good enough.

Rich Types

Let’s educate the framework. If we had semantic types that could do more than naïve
operations.

Rich Types

• if we had a “ﬁrstname” type
• and one for “XML”
• and one for a “ebay-style post”

operations.

Rich Types

• if we had a “ﬁrstname” type
• and one for “XML”
• and one for a “ebay-style post”
• we could do ﬂexible validation/sanitation

operations.

What we’d get

• Types for SQL prepared statements
• Types for AntiSamy/Template engine
• Types for future backends
• Types/Constraints for forms (XForms?)
• rich constraints on complex types

We could use something like Safe Query Objects because we knew enough.
We could automatically use AntiSamy without custom proﬁle speciﬁcations.
We could do rich validation in forms, maybe XForms engines.
We could have internal constraints on types, maybe good for overall system security.

How it’d look like

class MyTextField(models.Field):
# may only contain <H1>
sqlserializer = SQLFilter(type=”html”) # to SQL
htmlserializer = AntiSamy(“H1Proﬁle”) # to HTML
validator = HtmlValidator(tagsAllowed=(“h1”))

This is a very simple example. A textﬁeld that should only contain <H1>s. SQL Serializer can
take care of charset conversion (encode everything).
htmlserializer only lets H1s through.
Validator can make sure that input is valid HTML (all characters are encoded).
There is some redundancy, but that’s good for security, right?

Drawbacks
• needs decent infrastructure form
framework
• needs good type catalogue to be easy
enough to use
• what about HTTP headers, cookies?
• simpler approaches available (Django)

We would need quite a heavyweight frameworks were we could attach validators/ﬁlters/
whatever to all data interactions.

Nobody would use it if the default types were not covered. Maybe create a public catalogue
usable in different languages. Central updates

To be consistens, HTTP Headers and Cookies would need types. As long as applications don’t
do crazy things, we could think up a generic type for that...

There’s Django: They don’t allow any HTML and stuff. Use RST, Textile, Markdown instead.
You have to explicitly say if you want something not htmlescaped. Secure by default.

Beneﬁts for Django:
- Better form validation
- internal type checking
- allow HTML in apps

Is it worth it?

What do you think? How difficult is it? Does it provide good beneﬁts? Enough to be useful? I
need my approach to be actually adopted by developers.

Questions?

Coming to an end. Do you have more questions or suggestions?

Thank You!

Than I’d like to thank you for attending and for your input.

This presentation is
licensed under a Creative
Commons BY-SA license.
Attribution for pictures through links.

Slides, materials, progress etc. can be found @
http://www.noroute.de/blog/diplomathesis

2 Roads to Redemption - Thoughts on XSS and SQLIA

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (8)

Similaire à 2 Roads to Redemption - Thoughts on XSS and SQLIA

Similaire à 2 Roads to Redemption - Thoughts on XSS and SQLIA (20)

Dernier

Dernier (20)

2 Roads to Redemption - Thoughts on XSS and SQLIA