Explore the importance of matching escape functions properly. Learn more about how this impacts cross site scripting. Examples in EJS and JavaScript.
NOTE: There are animated gifs that add some fun. You'll get all the meat viewing online. Download it if you want to see the GIFs.
1. XSS and How to Escape
Tyler Peterson
@managerJS
2. Bottom Line Up-Front (BLUF)
• Anything from the user is
definitely unsafe.
• Don’t render unsafe data
into script tags.
• Do render html-escaped
data into html tags
(including meta-tags).
3. Strange Bedfellows
• XSS is a common attack vector
• Escaping is a commonly used countermeasure
• Can be effective but not a perfect fit
4. Escaping is All About Magic
• Every interesting computing
context is a mix of data and magic.
• The magic is triggered by special
data sequences.
• It’s just like a wizard that can carry
out a normal conversation with no
magic, then invoke magic using
key-words.
5. Escaping (itself)
• Escaping prepares data to enter a context with
different magic rules.
Escaping Algorithm:
Deathstar to Hogwarts
Replace all protocol
droids with house elves.
6. Magic Mismatch
• When you don’t properly prepare data for the new
context you end up triggering magic on accident.
7. (Or On Purpose…)
• Nefarious folks capitalize on our escaping mistakes
and abuse magic we fail to protect.
9. Real Magic
• RegExp: punctuation is magic, others are not
(mostly)
• HTML: <, >, and & are magic
• URL: /, &, ?, and # are magic
• SQL: ' and ` are magic
10. Really Real Magic
• RegExp: different things are magic inside a
character class. E.g. - as in [-a-z]
• HTML: Different things are magic inside a tag
definition. e.g. =, ", '.
• URL: Different things are magic in host, path,
query, and fragment. E.G. the first ? begins the
query, but they aren’t special in the fragment.
• SQL: Different things are magic inside a string: ''
is an escaped quote.
11. Even Simple Languages are Hard
• RegExp, HTML, URL, and SQL have magic that can
be difficult to reason about.
• Programming languages, like JavaScript, are even
more complicated.
– More contexts with different nuances
– More magic and less data
12. Enter Cross Site Scripting (XSS)
others making your page misbehave
13. Simple Data Rendering Example
// template.ejs
var lang = "<%- locale %>";
• NOTE: I’m using EJS in these examples but the
problems I illustrate are fundamental.
14. Simple XSS Attack
• Attacker sends
locale = 'en"; doEvil(); "throw away string literal?'
• You render
var lang = "en"; doEvil(); "throw away string
literal";
15. Simple Escaping Countermeasure
// template.ejs
var lang = "<%= locale %>";
You changed this to EJS’s back fat arrow.
This does HTML escaping of the string
before rendering it.
16. Attack Foiled!
• This XSS vulnerability is closed.
• The doEvil() function or code is NOT invoked.
17. How Did it Work?
$ node
> var ejs = require('ejs')
undefined
> var locale = 'en"; doEvil(); "throw away string literal'
undefined
> ejs.render('var lang = "<%- locale %>";')
'var lang = "en"; doEvil(); "throw away string literal”;'
> ejs.render('var lang = "<%= locale %>";')
'var lang = "en"; doEvil(); "throw away
string literal";'
1
2
18. How Good is the Fix?
• No hands necessary, but please reflect:
– Have you ever done this?
– How certain are you that this is a high-quality fix?
19. Sidebar Example
• You are hosting a 1 day, 2 event, classic video
game tournament.
– Contra
– Classic Doom
• No cheat codes allowed
– Cheat codes are like magic
– You can escape the codes to render them inert
20. Escape the Konami Code
• What are you going to look for?
↑↑↓↓←→←→BA
• You foil this cheat by inserting two “start”
commands right before the A.
21. Escape the Doom Clipping Code
• What are you going to look for?
– idspispopd
• You foil this by inserting any letter (but d) before
the final d.
22. Flint’s Dad Mixes It Up
• Your less adept colleague mixes up the cheat
detectors.
• They work great but can’t stop the cheating.
23. Same Thing Happens With Escaping
• Sometimes we escape till it works, but it’s really
not right.
24. Escaping Has Sharp Edges
• Most escape algorithms treat most data the same,
because most data is non-magic most of the time.
• The characters they treat differently—the edge
cases—are the most important parts to consider.
• Take away:
– It’s not enough to casually test an application of
escaping.
– You need to thoroughly understand the old context, the
new context, and the joining algorithm.
25. Back to XSS: <%= Worked, but…
• What if you had a number instead of a string?
> var onServer = '6; doEvil();'
undefined
> ejs.render('var count = <%= onServer %>')
'var count = 6; doEvil();'
26. Missed Some Magic
• The fix worked at first because " is magical in
JavaScript and HTML
• The fix failed because ; is only magical in
JavaScript
27. Good Enough?
• So, you’re kinda safe as long as you are using
strings OR at least match the untrusted string with
a RegEx like /[^;'"]*/ and use the matched text
instead of the full text.
28. My Tools Have Betrayed Me?!
• Why is EJS so broken? Why doesn’t escaping help
me escape?
• It isn’t broken.
• Escaping isn’t a security measure. It only ferry’s
data between magical worlds.
29. Escaping Must Match Context
For escaping to be reliable you
have to match the new data
context with the escaping
algorithm.
• The problem is that <%=
(back fat arrow) is an HTML
escape and you are rendering
text into a JavaScript
execution context.
30. The (Nonexistent) JavaScript Escape
• So just switch to using JavaScript escape. Well,
there isn’t a standard JavaScript escape function
so you can’t.
• What’s more, JavaScript has so many contexts
that you shouldn’t write one.
31. What’s the Right Way™?
• Render into HTML with an HTML escape
// template.ejs
<meta name="lang" content="<%= lang %>">
• HTML escape replaces ', ", and >. An attacker
can’t end the attribute.
32. The Right Way™ to Read:
var metas = document.getElementsByTagName('meta');
var i, l = metas.length, lang;
for (i=0; i < l; ++i) {
if (metas[i].getAttribute('name') == 'lang') {
lang = metas[i].getAttribute('content');
}
}
33. The Right Way™ is Kinda Yucky
• No wonder we take short-cuts.
• Really is a good way to match escaping algorithms.
• Read is awkward from scratch.
34. I Lied About JavaScript
• The standard, safe way to encode data in
JavaScript is JSON.stringify().
• JSON is a form of escaping so you must be careful
not to double escape.
• Forces all data into a string context.
• npm/js-string-escape is similar and popular
35. JSON Example
> ejs.render('var count = <%- JSON.stringify(number) %>')
'var count = "6; doEvil()"'
• JSON does the escaping so EJS doesn’t have to (in
fact MUST NOT).
• Notice that JSON added quotes.
1
36. Show of Hands
• Who likes the Right Way™?
• Who’s going to use the JSON Way?
37. OK, I Lied About JSON Being Safe
• You’re not rendering into a JavaScript context.
• You’re rendering into a JavaScript context through
an HTML context.
• Both magics can apply!
38. Here’s Your Template
<!DOCTYPE html>
<html>
<head><title>JSON Demo</title></head>
<body>
<script type="text/javascript">
var locale = <%- JSON.stringify(locale) %>;
</script>
<h1>Was it Safe?</h1>
</body>
</html>
39. These Attacks Work
• '</script><h1>embarrassing content</h1>'
• '</script><script>doEvil();</script><script>'
40. Rendering into a Script Tag is Doomed
• Adding <%= makes the happy path fail
• < and " are magical to HTML
– so you have to escape them.
– But the browser doesn’t replace the entity reference so
JavaScript sees an & and chokes.
• Even if you found a way to do it
– Would you remember it?
– Would your team-mates understand and perpetuate it?
42. Rules For Us Mere Mortals
• The more you have to reason about the security of
a fix the less secure it is.
• Every step in logic is an opportunity for error and
exploit.
• In general, straightforward and yucky is more
secure than well reasoned and slick.
• You can be slick, but you’re taking on risk.
43. Take Away: XSS Abuses Magic
• A cross-site scripting attack is normally magic
masquerading as data.
44. Related: De-taint
• Block the bad data at the front door.
• No general solution.
• Ideally unnecessary.
• Escaping errors abound, so still a good idea to
use.
45. Example: De-taint locale
var pat = /[a-zA-Z_]{2,5}/
pat.exec('en_US"; evil()')[0] // "en_US"
• Effectively limits evil.
• Can accidentally be too restrictive, so be liberal.
• Evil looking inputs are sometimes valid, so this
can’t be your only solution.
46. Final Recommendation
• HTML escape data into meta tags and retrieve
them from JavaScript.
• Pick a safe way and stick to it. No shortcuts.
## XSS and Escaping — strange bedfellows
XSS is a common exploit vector
Escaping is sometimes used to mitigate it.
Escaping isn’t a security feature.
Like hiding your key under a mat instead of above the doorpost.
Can be effective but not really a great fit.
Look at Escaping Itself
Escaping isn’t about security. It’s about handoff from two different magical realms. As long as data flows about in a single realm is very unlikely to be misunderstood or exploited.
When data transitions from one magical realm to another it changes rules for interpretation. Escaping is about re-encoding the data in a way that preserves the intended nature of the data.
Most data is interpreted in the same way across realms. So, you encode most of the data as plain data. Some bits have magical meanings in one realm but not the other and here’s where the problems arise.
Real Magic, not like that fiction we were just talking about.
## Really Real Magic
Even these have special contexts inside them
## Simple Example
You have a value on the server (like locale) that you want accessible on the client. You realize that you’re building the whole page in EJS anyway so why not plop a script tag on the page and pop a var into it? So, we render it right into some JavaScript like this:
Which is valid AND EVIL code.
## Simple Countermeasure
Does <%= Do the Necessary Escaping? Erm…
What if we use the escaping capability of EJS? Are we safe? Sorta.
TODO show it being foiled.
## How did that work?
Let’s bust out the REPL.
You see that using the back fat arrow (<%=) does prevent the evil from running in this case. But it isn’t really a safe technique in general.
## How Good is the Fix
You don’t have to raise your hands. This is a gotcha. But think to yourself: have I ever done this? Have I ever used escaping in a similar way? How certain am I that this is a high-quality fix?
## Pretend You’re Hosting a Tournament
Switching gears for a moment:
Pretend you’re hosting a classic gaming tournament. There are two events: Contra and Classic Doom. One wrinkle: No cheating allowed.
Cheating is like magic. It normally works by the game scanning the player’s input and matching it to special sequences that unleash magic abilities.
Now suppose you have a system that will monitor the player’s input and allow you to intervene if you detect cheating. You create filters that will detect a cheat about to be executed and insert other moves to prevent it from making it through to the vulnerable vintage game.
## Break the Konami Code
Maybe it messes up the game a bit, but they were about to cheat so they deserve it.
## Break the Doom Clipping Code
This has no effect on honest players but foils the cheat.
## Enter Flint’s Dad
Suppose now that you are called away on urgent business. Your countermeasures are ready to go, they just need to be installed. Your less adept colleague doesn’t realize that there’s a significant difference between the countermeasures and plugs the Konami countermeasure into the doom console and the Doom countermeasure into the Contra console.
The cheating continues un-hindered.
## Total Mismatch
How could this even happen? How could anyone think that the countermeasures were equivalent? I mean, why would the chords for the Konami measure fit into the Doom playing system?
The more you understand escaping the more ridiculous our common usages of it become.
## Most Escaping is the Same
Most escaping is the same, because the non-magic data makes up the majority of any escaping algorithm. So, it’s easy to guess at the proper escaping function and fail to identify the mismatch by testing. In order to spot the mismatch you have to understand the escaping algorithm and be sure to round-trip data that has the key magical elements in it.
Most escaping is the same, but it’s the differences that kill you.
## It Worked, But…
Returning to our previous example:
Remember I’m asserting that JavaScript has many magics and it is easy to mismatch escaping algorithms to it.
Number: What if you wanted to do the same thing to it? Continuing in the REPL the sample attack would look like this:
Notice that the escaping doesn’t help because there are no quotes in the attack string. In order for escaping to really work it would have to escape semicolons, too.
I imagine he was trying to stay cool. Not a goo way.
## Escaping Must Match Context
In this case the context is JavaScript and the algorithm is HTML. Close. But missed it by that much.
## JavaScript Escape
I hope you will believe this statements by the end of this presentation.
## What’s the Right Way?
The Right Way™ to do this is to render it into a meta tag like this:
SEE SLIDE
Notice that here the escaping algorithm (HTML) matches the data context (HTML).
Then you get the value using code like this:
SEE SLIDE
For other ideas on how to get meta data from the DOM using JavaScript you can always Stack Overflow.
## I Lied About JavaScript
There is a standard way to encode data from the JavaScript execution space into a safe, serializable format: JSON. So, if you’re dead set to render into a script tag you should at least do it like this:
Moving the script to head does no good
## Remember: XSS Abuses Magic
A cross-site scripting attack is normally magic masquerading as data.
## Related: De-taint
If you properly escape data as it changes contexts then de-taint isn’t strictly necessary.
No de-taint function or library of functions could generally guarantee that a slip up in escaping couldn’t be exploited.
Data traverses many layers with different magic sequences. You can’t eliminate all of these sequences from every input type. For example, a wiki article on XSS might contain XSS example code. The same wiki may contain example DB exploit code for the DB it uses. In these cases you can’t remove it from the input. You must escape it properly upon display.
So, de-tainting is neither necessary nor sufficient.
I still recommend you use it where possible.
## Example: De-taint locale
Before using the value of the locale header you could match it with a list of valid locales, or with RegExp of AlNums, Space, and hyphen. This would close the hole for nearly any imaginable XSS exploit via that parameter.
Not all inputs can be so restricted. That’s why de-tainting is not a sufficient strategy.
## Final recommendation:
Study this out and understand it all at once. Decide on the safe sidemethod you want to use. Then use it unerringly. For me, I recommend html escaping data into the body of a meta tag.