SlideShare a Scribd company logo
1 of 75
Download to read offline
Efficient Context-aware Output Escaping
for JavaScript Template Engines
PRESENTED BY
Nera Liu, Adonis Fung, and Albert Yu
Paranoids Labs, Yahoo!
SEPT 24, 2015
How to defend against XSS in Javascript Template
Engines using contextual analysis?
Background, Related Work & Implementation > Design >
Evaluation > Conclusion
Problem Statement
2
Background, Related Work &
Implementation
What is Cross Site Scripting (XSS)?
Given no proper output filtering:
<h1>Hello <?php echo $_GET['name']; ?></h1>
A typical attack vector coming through XXX of query
string at victim.com/?name=XXX:
"'><script>alert(1)</script>
HTML of victim.com ends up being:
<h1>Hello "'><script>alert(1)</script></h1>
4
Cross-Site Scripting (XSS) & OWASP Top
10■ Ranked No. 3 / OWASP Top 10 WebApp Security Risks
■ Root Cause
● Untrusted inputs executed as scripts under a victim’s origin/domain.
■ Consequences
● Cookie stealing, user privacy leaking.
● Fully control the web content / defacing.
Screen-captured from https://www.owasp.org/index.php/Top_10_2013-A3-Cross-Site_Scripting_(XSS) 5
How to defend against XSS?
- Filtering at the Front Gate
6Image from Rob, On guard, 2007, flickr.com, License: creative common
7Image from 呉 松本, Pipes! Pipes! Pipes!, 2009, flickr.com, License: creative common
It is the internal data flow of
your web application…
● with databases
● with APIs
● with browsers
● …
all interconnecting with each
other, how would you design
filtering rules for both APIs
and databases?
How to defend against XSS?
- Systems are getting more complicated
8
Fundamental Limitations
- NO universal filtering rule that is flexible yet secure
e.g., filtering for <a href="..."> ≠ <div>...</div>
- Impossible to settle at the front gate on
- how data should be further mangled,
- and predict how it would be output in the resultant HTML
- As a result, subject to XSS attacks and over-filtering issues
Input Filtering
- Limitations
■ Template Engines
● Handlebars, DustJS
- Escape & < > " ' ` into &amp; &lt; &gt; &quot; &#39;
&#96;
- {{untrustedData}} is escaped by default.
9
How to defend against XSS?
- Output Filtering in Template Engines
The industry is
shifting from input
filtering to output
filtering
Image from Tom Page, CRW_1978, 2008, flickr.com, License: creative common
10Image from john, Secure, 2009, flickr.com, License: creative common
Not Yet!!!
Are your web applications safe now?
Most Template Engines are still vulnerable!
- Blindly escaping
Blindly-escaping (&<>"'`) would not stop XSS
- {{url}} is an untrusted user input (assumed thereafter)
- {{url}}is javascript:alert(1), or
- {{url}}is # onclick=alert(1)
→ Solution: Context-Aware Output Escaping
(aka. contextual escaping)
A template is typically written like so:
<a href={{url}}>{{data}}</a>
11
Partial
Automatic
Contextual Escaping
Ember.js1
,
Facebook React2
,
Google Angular.js3
Automatic
Contextual Escaping
Google Closure,
Google Go Template4
No Contextual
Escaping
Handlebars,
LinkedIn Dust.js
(making use of the blindly-
escaping filter)
Notes:
1
Ember.js does not apply contextual filtering rules in <style>, <script> and style attributes.
2
Facebook React does not apply contextual filtering rules in <style>, <script>, style attributes and URI contexts.
3
AngularJS does not apply contextual filtering rules in style attributes.
4
Google Go Template is not a JavaScript Template Engine.
12
Related Work
- Template Engines vs Contextual Escaping
Handlebars
Context Parser
Contextual Analyzer
Handlebars
Template
Parser
Handlebars
Template AST
HTML5 Parser
(w/auto HTML canonicalization)
AST
Walker
Handlebars
Template
w/filter markups
CSS Parser
Pre-
compiler
Contextual XSS Filters
(registered as helpers/callbacks)
HTML
Data
(possibly untrusted)
Runtime
Compiler
Template
Spec.
Our solution (comprised of the blue boxes)
rewrites templates before Handlebars
(2)
online
13
(1)
offline
Secure Handlebars
- Software Architecture
14
■ Handlebars with Default Escaping.
■ Secure-Handlebars with Contextual Escaping.
Demo videos!!
original handlebars, secure handlebars
Demonstration
- Handlebars vs. Secure Handlebars
Express Secure Handlebars
15
var express = require('express');
// simply replace the original express-handlebars with
express-secure-handlebars, our implementation will
preprocess the template(s) before passing to the
original handlebars compiler.
// exphbs = require('express-handlebars');
exphbs = require('express-secure-handlebars');
Design
Our Approach
17Image from Andrea Goh, baking ingredients, 2012, flickr.com, License: creative common
What are the ingredients?
● Template Parser & Walker
○ for extracting template markups
● Standard Compliant Context
Parsers
○ for analyzing output contexts
○ for auto-correcting browser quirks
● Context-sensitive XSS Filters
○ for applying contextual filtering rules
to defend against XSS!
Design
Template Parser and Walker
<div style="{{cssContext}} ">{{htmlContext}} </div>
{{#if data}}
<a href="{{uriContext}}">link</a>
{{else}}
<div>Data not found</div>
{{/if}}
19
■ Extract template markups and build an AST for further
contextual analysis
H T/C H T/H H T/B
H T/U H H
Legend:
R: Root, H: HTML context, T/C: template output in css context, T/H: template out in
HTML context, T/U: template output in URI context, T/B: a branching node in template
R
Template Parser & Walker
20
■ Template walker traverses the AST, and triggers different parsers
We trigger an HTML5
context parser for
analysis! (green & blue)
We trigger a CSS context
parser for analysis!
(orange)
We trigger a URI parser
for analysis! (red)
H H H T/B
H H H
R
Legend:
R: Root, H: HTML context, T/C: template output in css context, T/H: template out in
HTML context, T/U: template output in URI context, T/B: a branching node in template
T/C T/H
T/U
Context Parsers (HTML, CSS etc.)
21
■ Based on the contextual analysis, precise filtering rules can be
applied!
We apply the filtering
rules (i.e. the most
basic HTML escaping)
for an HTML context
We apply the
filtering rules
for an HTML
double-quoted
attribute value
context and
CSS context
We apply the filtering rules for an HTML double-quoted
attribute value context and URI context
Legend:
R: Root, H: HTML context, T/C: template output in css context, T/H: template out in
HTML context, T/U: template output in URI context, T/B: a branching node in template
H H H T/B
H H H
R
T/C T/H
T/U
Contextual-Sensitive XSS Filters
22
The parsing sequence of the AST
● R → H → T/C → H → T/H → H → H → T/U → H
● R → H → T/C → H → T/H → H → H
The end context of
this HTML chunk
will copy to each
branch as a start
context for further
contextual analysis
H H H T/B
H H H
R
T/C T/H
T/U
Legend:
R: Root, H: HTML context, T/C: template output in css context, T/H: template out in HTML
context, T/U: template output in URI context, T/B: a branching node in template
Template Parser & Walker
- Handling of branching logic
Branch A Branch B
23
... T/B
<a href=" <a style="
T/?
Ambiguous Context!
CSS or URI?
...{{#if data}}
<a href="
{{else}}
<a style="
{{/if}}
{{ambiguousContext}}
Legend:
R: Root, H: HTML context, T/C: template output in css context, T/H: template out in
HTML context, T/U: template output in URI context, T/B: a branching node in template
Template Parser & Walker
- Ambiguous context after branching
-
24
<input sub-tmpl
style=" T/H
<input sub-tmplstyle=" T/C
Parent template AST
Sub-template AST
AST with sub-template
expansions
WITHOUT sub-template expansions,
templates are analyzed separately.
WITH sub-template expansions,
templates are analyzed together.
parent template content
<input {{>sub-tmpl}}
sub-template content
style="{{output}}"
HTML context?
CSS context!!
Template Parser & Walker
- Sub-template Expansion
Legend:
R: Root, H: HTML context, T/C: template output in css context, T/H: template out in
HTML context, T/U: template output in URI context, T/B: a branching node in template
Design
Contextual Parsing
26
Given a piece of HTML, find out which portions of it are in
executable context. E.g.
■ <html> <script> ... </script> </html>
■ <a href=javascript:... >
■ <img src=x onerror=... >
Contextual Parsing
- Problem Definition
Day 1: "Parsing Html The Cthulhu Way"
Quoted from Coding Horror,
http://blog.codinghorror.com/parsing-html-the-cthulhu-way/
27
# pull out data between <script> tags
($script_data) = $html =~ /<script>(.*?)</script>/gis;
Day 2: Search npmjs, and pick the first one.
28
Less horrible, until you see it...
HTML 5 seems like fun
<a<b<c>
<! comment !> <? comment >
</d id=e/>
<f g = h > , <f g=<h>, or <f g=h> ?
<script> what
<!-- this
<script> actually
</script> means
--> ?
</script>
29
view-source:https://rawgit.com/yukinying/appsecusa-2015-jsdemo/master/sample1.html
30
Dilemma - Reuse or Build?
31
■ Programming Language
■ Speed
■ Compliance with HTML5 Specification
■ Ease for maintenance
Day 20: I need a high speed racing car
32
■ Libraries that are compliant to HTML5 specification:
● Google's CTemplate and Closure Template
Day 20: Use a library that's compliant
33
■ Server side binding C with nodejs
■ Client side?
■ Can't extend easily for our use case (templating)
■ Get a coffee machine, tons of coffee bean
■ Read section "The HTML syntax" (sect 8 and sect 12, resp.)
Day 21: https://html.spec.whatwg.org/ and
http://www.w3.org/TR/html5/
34
HTML := TOKEN | TEXT | TAG
TAG := TAGNAME + TAGATTR*
Day 50: HTML Grammar?
35
if tag name == "Script" { alert }
• Erroneous HTML will always be accepted.
Day 50: HTML := ANY*
36
Day 99: Flows can be visualized
37
https://rawgit.com/yukinying/appsecusa-2015-jsdemo/master/svg/everything.svg
Key Observation #1
Section 12.2.4 - HTML Tokenization
1. HTML are tokenized as DATA (TEXT), TAG, ATTR using
the flow.
2. Special tokens are defined as RAWDATA, RCDATA,
SCRIPT.
Section 12.2.5 - HTML Tree Construction
1. describes how DATA -> RAWDATA / RCDATA / SCRIPT
38
Key Observation #2
Describing flows can be cumbersome. But there are patterns.
• token state changes only when seeing
• WHITESPACE
• < , /, > (for tag)
• & (for html entity)
• ', " , = (for attribute)
• !, - , ? (for comment)
• A-z (valid start character for tag name)
39
40
https://rawgit.com/yukinying/appsecusa-2015-jsdemo/master/svg/tag.svg
Finite State Machine
41
42
https://rawgit.com/yukinying/appsecusa-2015-jsdemo/master/visual.html
Key Observation #3
One state transition table can cover the normal cases.
Special cases:
1. a<b< , algorithm ask for reconsumption of b when b is
followed by non-tag related element.
2. tag matching is required for in RAWTEXT (noframes, xmp,
style, iframe, noembed, noscript), RCDATA (textarea,
title), SCRIPT, PLAINTEXT.
- Thus no tag nesting allowed in RAWTEXT / RCDATA / SCRIPT
- <textarea><script><textarea>
Quick and dirty solution: Use 3 state transition tables altogether.
Formal solution: expand state space to N^3.
43
44
https://github.com/yahoo/context-parser/blob/master/src/html5-state-machine.js
Exercise: Parse the following.
45
<script> what
<!-- this
<script> actually
</script> means
--> ?
</script>
https://rawgit.com/yukinying/appsecusa-2015-jsdemo/master/demo.html
46
Hint: use state diagram of <script> tag flow
https://rawgit.com/yukinying/appsecusa-2015-jsdemo/master/svg/script.svg
HTML5 Context Parser
- design principle
- Standard Compliance
- Cross browser compatibility.
- Speed and Efficiency.
47
QR code points to the repo of the standalone CSS Parser
Command line version of our context parser
report state (aka
context) of each
character
- Our implementation is based on WHATWG
- HTML5 compliance
- Language
- Context as output
HTML5 Context Parser
- standard compliance
QR code points to the repo of the standalone Context Parser
Figure from Overview of HTML 5 Parsing Model: https://html.spec.whatwg.org/multipage/syntax.html#overview-of-the-parsing-model 48
Gumbo - DOM tree as an output
html5lib - Python implementation
HTML5 Context Parser
- speed & efficiency
Lightweight & Efficient
- State transitions reduction
- (e.g., omit 16 doctype transitions, i.e., 23%
of all states)
- No tree/DOM construction
QR code points to standalone Context Parser github repo
Figure from Overview of HTML 5 Parsing Model: https://html.spec.whatwg.org/multipage/syntax.html#overview-of-the-parsing-model 49
Standard-compliant Parser is NOT Enough
50
■ Purpose: Add filters based on
the determined context
■ Problem:
Context inferred by browsers
≠
Context inferred by our parsers
■ Worst case: filters voided; XSS
Some Browser-specific Quirks
<a href="..." <script>{{inScriptInSafari5/Data}}</a>
51
<!--[if IE]><script>{{inScriptInIE/Comment}}</script><![endif]-->
<textarea><!-- </textarea>{{inRawText/DataHTML5}}--></textarea>
Compatibility issues by HTML 5 (e.g., in IE7)
<div><!-- Comment1 --!> {{inComment/DataHTML5}} --></div>
<div id=`{{inGraceAccentQuotedinIE/UnquotedAttrVal}}`></div>
etc...
etc...
Auto HTML Canonicalization
52
■ Comparisons
● Prior work: manual corrections, or no warning at all
● Our work: auto. rewrite HTML to clear parse errors
■ Goals
● Ensure parsing experience aligned across browsers/parsers
● Decisions: honor HTML 5 standard; secure-by-default
● Hence, contextual filtering can work accordingly
Design
XSS Filters
Security Model
54
Go Template Security Model: http://golang.org/pkg/html/template/#hdr-Security_Model
Closure Security Model: http://js-quasis-libraries-and-repl.googlecode.com/svn/trunk/safetemplate.html#problem_definition
Templates: Trusted; Data from output expressions: Untrusted
Context-aware Filterings are specific to the determined context
Data Self-contained Untrusted data cannot break out from its context
Non-executable Data Untrusted data cannot be executed as script
Preserve Trusted Code Trusted code and logics should be preserved
Security Goals for filters
Assumption
p.s. Go Temp. and Closure have similar security models
55
Image from: https://www.flickr.com/photos/ravenshoegroup/5692831233/ (CC BY 2.0)
XSS in More Details
Differentiations stem from the DESIGN…
Prior Work Assumption
56
Untrusted variable assumed non-empty;
In reality, ever thought an empty variable could break security?
Go Template Security Model: http://golang.org/pkg/html/template/#hdr-Security_Model
Closure Security Model: http://js-quasis-libraries-and-repl.googlecode.com/svn/trunk/safetemplate.html#problem_definition
#1: Data self-contained; Trusted Code Preserved
- Sample: Output Markup in Unquoted Attribute Value
+ <input value=� name=email> by our work
- <input value={{email}} name=email>
57
+ <input value= name=email> by Closure/Go
When data is empty, the resulted HTML after filtering and data binding:
- Browser/HTML interprets “ name=email” as the attribute value
- trusted structure broken. reference to email’s value lost; surprise to devel.
- legit use of document.querySelector('[name=email]').value throws error
- To mitigate, our filter inserts U+FFFD (meaning NULL) when empty
- good faith: developers still have a chance to validate the value (e.g., email)
- preserved developers’ logics (if not help quoting it)
58
State transition in DATA state (e.g., <div>↑</div>)
Are existing filters really designed for the
era of contextual escaping?
#2: More Context-sensitive and Efficient
59
Go Template Security Model: http://golang.org/pkg/html/template/#hdr-Security_Model
Closure Security Model: http://js-quasis-libraries-and-repl.googlecode.com/svn/trunk/safetemplate.html#problem_definition
Output Markup Prior Work Our Work
in Data Context
Apply the same
HTML Filter
(i.e., encode &<>"'`)
Apply yd Filter
(i.e., encode < only)
in Double-quoted Attr
Value Context
Apply yavd Filter
(i.e., encode " only)
<input type="hidden" value="{{email}}" name="email">{{email}}
■ Insight: Context Parser accurately determined the contexts
● Why over-encode? Let’s embrace just-sufficient encoding
● Runtime performance > 5x faster, as a result
Double-encoding Issue
● Input filtering likely applied in
existing website:
<3 becomes &lt;3
● Output filtering encodes it again:
finally becomes &amp;lt;3
rendered as in the diagram
#3: Life is worthless without love <3
60
Can we omit encoding the & character?
Graceful Output and Input Filtering
- A brave attempt not to encode &
■ & cannot lead to script executions, as defined by HTML 5
“JS includes” <br size="&{alert(1)}"> became history (IE5)
61
■ Require HTML decoding in case of blacklisting,
e.g., to filter javascript&colon;alert(1),
● prior work tests value, &colon; becomes &amp;colon;
● we html decode first, then tests the resulted value
● Our decoder: correct (&gta → >a), fastest (using FSM, trie)
Evaluations
& Comparisons
63
Evaluations
- Contextual Analysis on a Yahoo! website
● 90.9% of output markups automatically secured with the
contextual filtering
Location of output expressions (aka. Context) No of findings Notes
Simple HTML Contexts (e.g., data, dqAttr ...) 52 (~10.5%) Secured by default handlebars.
ATTRIBUTE_VALUE_UNQUOTED state 5 (~1.0%) Secured also by secure-handlebars.
ATTRIBUTE_VALUE states + URI 280 (~56.8%)
ATTRIBUTE_VALUE states + CSS 111 (~22.5%)
ATTRIBUTE_VALUE states + JS (e.g., onclick) 1 (~0.2%) Manual review is required.
Dangerous contexts.SCRIPT state (i.e., <script>) 37 (~7.5%)
ATTRIBUTE_NAME state 4 (~0.8%)
RAW_TEXT state (e.g., <style>) 3 (~0.6%)
■ Offline/one-off overhead (for contextual analysis)
● It takes 63s to pre-process and canonicalize 512 templates
(i.e. 0.35MB/sec).
■ Negligible runtime overhead (for filtering only)
● for contexts defendable by default filter, ours is >5x faster
● to secure other contexts, by design, least amount of chars
Ref: https://github.com/yahoo/xss-filters/blob/benchmarks/tests/benchmarks/compare-default.js#L9-L22
64
Evaluations
- Performance
65
Yahoo Secure
Handlebars
Google
Angular.js
Google
Closure
Facebook
React
Ember.js
Contextual Escaping Supported
HTML Contexts ✓ ✓ ✓ ✓ ✓
URI Contexts ✓ ✓ ✓ no ✓
CSS Contexts ✓ ✓ / no1
✓ no no
JS Contexts no no ✓ no no
Important Features
Auto HTML Canonicalization ✓ no no no no
Auto Sub-template Analysis ✓ ✓2
✓3
✓2
✓2
Secure Filters for > 90% of
Browser Market Share
(incl., IE 7+, Safari 5+, FF &
Chrome)
✓ no ✓ no no
Framework Comparisons
1
AngularJS does not apply contextual filtering rules on style attribute.
2
AngularJS, React and EmberJS restrict the sub-template in HTML Data context only.
3
Google Closure requires manual annotation for sub-template analysis.
■ When developers don’t know how to sanitize...
● use of SafeString/dangerouslySetInnerHTML
66
■ No need to sanitize individual fields
■ Usable on client side or server side
■ Whitelist based approach
QR code points to html-purify github repo
Future Work / Rich HTML Sanitization
safeHtml = Purifier.purify(untrustedRichHtml);
Image from https://www.flickr.com/photos/tnarik/3416160916 (CC BY 2.0)
■ Efficient HTML5 compliant parser w/auto corrections
■ Auto apply contextual, just-sufficient, and faster escaping
■ Effortless adoption w/express-secure-handlebars
■ Open-sourced at github.com/yahoo and npmjs.com
Portal: https://yahoo.github.io/secure-handlebars
Conclusion: Building A Safer Internet for All
Automatic contextual escaping made easy
67
Thank you!
Nera, Adon, Albert
{neraliu, adon, albertyu}@yahoo-inc.com
Twitter: @neraliu, @adonatwork, @yukinying
68
We’d like to acknowledge
the support and help from:
- Stuart Larsen
- Alaa Mubaied
- Aditya Mahendrakar
- Eric Ferraiuolo
- Christopher Harrell
- Christopher Rohlf
- Jeremy Ruppel
Bug Bounty Program Contributors
● https://github.com/yahoo/secure-handlebars/blob/master/CONTRIBUTORS.md
● https://github.com/yahoo/xss-filters/blob/master/CONTRIBUTORS.md
Appendix
Besides, client-side use with secure-handlebars
- Contextual analyzer can preprocess templates during the build process (at
server side)
- Handlebars pre-compiles the rewritten templates
Filters registered at client-side allow handlebars to filter data at data binding
stage.
Hassle-free server-side adoption
- To switch from express-handlebars to express-secure-handlebars npm:
- 2 LOCs changes: (1) dependency in package.json, (2) require(...)
-
70
Deployability of Secure-Handlebars
- secure-handlebars & express-secure-handlebars
■ Work as a Preprocessor
● Parse template and build an Abstract Syntax Tree (AST)
● Walk thru every branch, trigger different parser for contextual analysis
● Insert filter markups to {{outputExpression}} based on its context
● Produce a rewritten template, compatible w/handlebars (unlike ember.js)
■ Facilitate Seamless Upgrade
● Existing template logics must all be preserved
QR code points to the secure-handlebars github repo
71
Secure Handlebars
- Design Principles
+ <a href="{{{yavd (yubl (yufull url))}}}">{{{yd url}}}</a>
● Handlebars applies the filters (aka helpers) during compilation.
○ {{{ }}} - disable the default blindly-escaping.
○ yufull - encodeURI() with IPvFuture support
○ yubl - disable dangerous protocols such as javascript:
○ yavd - html-escape double-quote character (" → &quot;)
○ yd - html-escape less-than character (< → &lt;)
● Contextual Analyzer adds filter markups specific to output contexts
- <a href="{{url}}">{{url}}</a>
72
Rewrite template before Handlebars
■ Same considerations as HTML5 Context Parser
● Standard compliance.
● Cross-browser compatibility.
■ Design Goal
- All browsers MUST parse the CSS with the same contextual
result.
73
QR code points to the standalone CSS Parser github repo
CSS Context Parser
- design principles
■ Approach:
● Rewrite the CSS grammar into a stricter grammar.
● The original grammar allows escape char (i.e. {6digits}), the
stricter grammar only allows known set of chars (i.e. [a-zA-Z0-9])
and special chars (i.e. :, ;).
● It is unusual to use escape char in CSS template.
74
QR code points to the standalone CSS Parser github repo
CSS Context Parser
- strict mode
// this is a valid syntax, but our parser would
reject it!
<div style="color:{{output}}">...</div>
Why are we reluctant to support auto JS
Context filtering? Static vs. Dynamic
75
■ What XSS filters should we apply?
single-quoted JS string? double-quoted URI attr?
● Static (incl related) approach can only apply the former one
● Warn & manual check; Avoid false sense of security
<script>
var html = '<a href="{{untrustedUrl}}"><b>link</b></a>...';
document.write(html);
</script>

More Related Content

What's hot

Attacks against Microsoft network web clients
Attacks against Microsoft network web clients Attacks against Microsoft network web clients
Attacks against Microsoft network web clients
Positive Hack Days
 
Automated Patching for Vulnerable Source Code
Automated Patching for Vulnerable Source CodeAutomated Patching for Vulnerable Source Code
Automated Patching for Vulnerable Source Code
Vladimir Kochetkov
 
Comparing DOM XSS Tools On Real World Bug
Comparing DOM XSS Tools On Real World BugComparing DOM XSS Tools On Real World Bug
Comparing DOM XSS Tools On Real World Bug
Stefano Di Paola
 
Eight simple rules to writing secure PHP programs
Eight simple rules to writing secure PHP programsEight simple rules to writing secure PHP programs
Eight simple rules to writing secure PHP programs
Aleksandr Yampolskiy
 

What's hot (20)

Same Origin Policy Weaknesses
Same Origin Policy WeaknessesSame Origin Policy Weaknesses
Same Origin Policy Weaknesses
 
Vulnerable Active Record: A tale of SQL Injection in PHP Framework
Vulnerable Active Record: A tale of SQL Injection in PHP FrameworkVulnerable Active Record: A tale of SQL Injection in PHP Framework
Vulnerable Active Record: A tale of SQL Injection in PHP Framework
 
Attacks against Microsoft network web clients
Attacks against Microsoft network web clients Attacks against Microsoft network web clients
Attacks against Microsoft network web clients
 
XSS Primer - Noob to Pro in 1 hour
XSS Primer - Noob to Pro in 1 hourXSS Primer - Noob to Pro in 1 hour
XSS Primer - Noob to Pro in 1 hour
 
Do WAFs dream of static analyzers
Do WAFs dream of static analyzersDo WAFs dream of static analyzers
Do WAFs dream of static analyzers
 
Defending against Injections
Defending against InjectionsDefending against Injections
Defending against Injections
 
DEFCON 23 - Jason Haddix - how do i shot web
DEFCON 23 - Jason Haddix - how do i shot webDEFCON 23 - Jason Haddix - how do i shot web
DEFCON 23 - Jason Haddix - how do i shot web
 
Security In PHP Applications
Security In PHP ApplicationsSecurity In PHP Applications
Security In PHP Applications
 
Intro to Php Security
Intro to Php SecurityIntro to Php Security
Intro to Php Security
 
Static Analysis: The Art of Fighting without Fighting
Static Analysis: The Art of Fighting without FightingStatic Analysis: The Art of Fighting without Fighting
Static Analysis: The Art of Fighting without Fighting
 
Asp
AspAsp
Asp
 
Filter Evasion: Houdini on the Wire
Filter Evasion: Houdini on the WireFilter Evasion: Houdini on the Wire
Filter Evasion: Houdini on the Wire
 
Automated Patching for Vulnerable Source Code
Automated Patching for Vulnerable Source CodeAutomated Patching for Vulnerable Source Code
Automated Patching for Vulnerable Source Code
 
Polyglot payloads in practice by avlidienbrunn at HackPra
Polyglot payloads in practice by avlidienbrunn at HackPraPolyglot payloads in practice by avlidienbrunn at HackPra
Polyglot payloads in practice by avlidienbrunn at HackPra
 
Comparing DOM XSS Tools On Real World Bug
Comparing DOM XSS Tools On Real World BugComparing DOM XSS Tools On Real World Bug
Comparing DOM XSS Tools On Real World Bug
 
Why haven't we stamped out SQL injection and XSS yet
Why haven't we stamped out SQL injection and XSS yetWhy haven't we stamped out SQL injection and XSS yet
Why haven't we stamped out SQL injection and XSS yet
 
Breaking AngularJS Javascript sandbox
Breaking AngularJS Javascript sandboxBreaking AngularJS Javascript sandbox
Breaking AngularJS Javascript sandbox
 
Eight simple rules to writing secure PHP programs
Eight simple rules to writing secure PHP programsEight simple rules to writing secure PHP programs
Eight simple rules to writing secure PHP programs
 
Sql Injection Attacks Siddhesh
Sql Injection Attacks SiddheshSql Injection Attacks Siddhesh
Sql Injection Attacks Siddhesh
 
JSON SQL Injection and the Lessons Learned
JSON SQL Injection and the Lessons LearnedJSON SQL Injection and the Lessons Learned
JSON SQL Injection and the Lessons Learned
 

Similar to Efficient Context-sensitive Output Escaping for Javascript Template Engines

EN - BlackHat US 2009 favorite XSS Filters-IDS and how to attack them.pdf
EN - BlackHat US 2009 favorite XSS Filters-IDS and how to attack them.pdfEN - BlackHat US 2009 favorite XSS Filters-IDS and how to attack them.pdf
EN - BlackHat US 2009 favorite XSS Filters-IDS and how to attack them.pdf
GiorgiRcheulishvili
 
Automated JavaScript Deobfuscation - PacSec 2007
Automated JavaScript Deobfuscation - PacSec 2007Automated JavaScript Deobfuscation - PacSec 2007
Automated JavaScript Deobfuscation - PacSec 2007
Stephan Chenette
 
Web Application Security
Web Application SecurityWeb Application Security
Web Application Security
Chris x-MS
 
Speed up your developments with Symfony2
Speed up your developments with Symfony2Speed up your developments with Symfony2
Speed up your developments with Symfony2
Hugo Hamon
 
주로사용되는 Xss필터와 이를 공격하는 방법
주로사용되는 Xss필터와 이를 공격하는 방법주로사용되는 Xss필터와 이를 공격하는 방법
주로사용되는 Xss필터와 이를 공격하는 방법
guestad13b55
 

Similar to Efficient Context-sensitive Output Escaping for Javascript Template Engines (20)

More about PHP
More about PHPMore about PHP
More about PHP
 
EN - BlackHat US 2009 favorite XSS Filters-IDS and how to attack them.pdf
EN - BlackHat US 2009 favorite XSS Filters-IDS and how to attack them.pdfEN - BlackHat US 2009 favorite XSS Filters-IDS and how to attack them.pdf
EN - BlackHat US 2009 favorite XSS Filters-IDS and how to attack them.pdf
 
Web Dev Intro Crash Course
Web Dev Intro Crash CourseWeb Dev Intro Crash Course
Web Dev Intro Crash Course
 
Css Founder.com | Cssfounder Net
Css Founder.com | Cssfounder NetCss Founder.com | Cssfounder Net
Css Founder.com | Cssfounder Net
 
Intro to mobile web application development
Intro to mobile web application developmentIntro to mobile web application development
Intro to mobile web application development
 
W3 conf hill-html5-security-realities
W3 conf hill-html5-security-realitiesW3 conf hill-html5-security-realities
W3 conf hill-html5-security-realities
 
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
 
Automated JavaScript Deobfuscation - PacSec 2007
Automated JavaScript Deobfuscation - PacSec 2007Automated JavaScript Deobfuscation - PacSec 2007
Automated JavaScript Deobfuscation - PacSec 2007
 
OrientDB for real & Web App development
OrientDB for real & Web App developmentOrientDB for real & Web App development
OrientDB for real & Web App development
 
It is not HTML5. but ... / HTML5ではないサイトからHTML5を考える
It is not HTML5. but ... / HTML5ではないサイトからHTML5を考えるIt is not HTML5. but ... / HTML5ではないサイトからHTML5を考える
It is not HTML5. but ... / HTML5ではないサイトからHTML5を考える
 
Sandboxing JS and HTML. A lession Learned
Sandboxing JS and HTML. A lession LearnedSandboxing JS and HTML. A lession Learned
Sandboxing JS and HTML. A lession Learned
 
Technology Based Testing
Technology Based TestingTechnology Based Testing
Technology Based Testing
 
Tips on Securing Drupal Sites - DrupalCamp Atlanta (DCA)
Tips on Securing Drupal Sites - DrupalCamp Atlanta (DCA)Tips on Securing Drupal Sites - DrupalCamp Atlanta (DCA)
Tips on Securing Drupal Sites - DrupalCamp Atlanta (DCA)
 
Linkedin.com DomXss 04-08-2014
Linkedin.com DomXss 04-08-2014Linkedin.com DomXss 04-08-2014
Linkedin.com DomXss 04-08-2014
 
Web Application Security
Web Application SecurityWeb Application Security
Web Application Security
 
State of modern web technologies: an introduction
State of modern web technologies: an introductionState of modern web technologies: an introduction
State of modern web technologies: an introduction
 
Speed up your developments with Symfony2
Speed up your developments with Symfony2Speed up your developments with Symfony2
Speed up your developments with Symfony2
 
주로사용되는 Xss필터와 이를 공격하는 방법
주로사용되는 Xss필터와 이를 공격하는 방법주로사용되는 Xss필터와 이를 공격하는 방법
주로사용되는 Xss필터와 이를 공격하는 방법
 
Scout xss csrf_security_presentation_chicago
Scout xss csrf_security_presentation_chicagoScout xss csrf_security_presentation_chicago
Scout xss csrf_security_presentation_chicago
 
Thug: a new low-interaction honeyclient
Thug: a new low-interaction honeyclientThug: a new low-interaction honeyclient
Thug: a new low-interaction honeyclient
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

Efficient Context-sensitive Output Escaping for Javascript Template Engines

  • 1. Efficient Context-aware Output Escaping for JavaScript Template Engines PRESENTED BY Nera Liu, Adonis Fung, and Albert Yu Paranoids Labs, Yahoo! SEPT 24, 2015
  • 2. How to defend against XSS in Javascript Template Engines using contextual analysis? Background, Related Work & Implementation > Design > Evaluation > Conclusion Problem Statement 2
  • 3. Background, Related Work & Implementation
  • 4. What is Cross Site Scripting (XSS)? Given no proper output filtering: <h1>Hello <?php echo $_GET['name']; ?></h1> A typical attack vector coming through XXX of query string at victim.com/?name=XXX: "'><script>alert(1)</script> HTML of victim.com ends up being: <h1>Hello "'><script>alert(1)</script></h1> 4
  • 5. Cross-Site Scripting (XSS) & OWASP Top 10■ Ranked No. 3 / OWASP Top 10 WebApp Security Risks ■ Root Cause ● Untrusted inputs executed as scripts under a victim’s origin/domain. ■ Consequences ● Cookie stealing, user privacy leaking. ● Fully control the web content / defacing. Screen-captured from https://www.owasp.org/index.php/Top_10_2013-A3-Cross-Site_Scripting_(XSS) 5
  • 6. How to defend against XSS? - Filtering at the Front Gate 6Image from Rob, On guard, 2007, flickr.com, License: creative common
  • 7. 7Image from 呉 松本, Pipes! Pipes! Pipes!, 2009, flickr.com, License: creative common It is the internal data flow of your web application… ● with databases ● with APIs ● with browsers ● … all interconnecting with each other, how would you design filtering rules for both APIs and databases? How to defend against XSS? - Systems are getting more complicated
  • 8. 8 Fundamental Limitations - NO universal filtering rule that is flexible yet secure e.g., filtering for <a href="..."> ≠ <div>...</div> - Impossible to settle at the front gate on - how data should be further mangled, - and predict how it would be output in the resultant HTML - As a result, subject to XSS attacks and over-filtering issues Input Filtering - Limitations
  • 9. ■ Template Engines ● Handlebars, DustJS - Escape & < > " ' ` into &amp; &lt; &gt; &quot; &#39; &#96; - {{untrustedData}} is escaped by default. 9 How to defend against XSS? - Output Filtering in Template Engines The industry is shifting from input filtering to output filtering Image from Tom Page, CRW_1978, 2008, flickr.com, License: creative common
  • 10. 10Image from john, Secure, 2009, flickr.com, License: creative common Not Yet!!! Are your web applications safe now?
  • 11. Most Template Engines are still vulnerable! - Blindly escaping Blindly-escaping (&<>"'`) would not stop XSS - {{url}} is an untrusted user input (assumed thereafter) - {{url}}is javascript:alert(1), or - {{url}}is # onclick=alert(1) → Solution: Context-Aware Output Escaping (aka. contextual escaping) A template is typically written like so: <a href={{url}}>{{data}}</a> 11
  • 12. Partial Automatic Contextual Escaping Ember.js1 , Facebook React2 , Google Angular.js3 Automatic Contextual Escaping Google Closure, Google Go Template4 No Contextual Escaping Handlebars, LinkedIn Dust.js (making use of the blindly- escaping filter) Notes: 1 Ember.js does not apply contextual filtering rules in <style>, <script> and style attributes. 2 Facebook React does not apply contextual filtering rules in <style>, <script>, style attributes and URI contexts. 3 AngularJS does not apply contextual filtering rules in style attributes. 4 Google Go Template is not a JavaScript Template Engine. 12 Related Work - Template Engines vs Contextual Escaping
  • 13. Handlebars Context Parser Contextual Analyzer Handlebars Template Parser Handlebars Template AST HTML5 Parser (w/auto HTML canonicalization) AST Walker Handlebars Template w/filter markups CSS Parser Pre- compiler Contextual XSS Filters (registered as helpers/callbacks) HTML Data (possibly untrusted) Runtime Compiler Template Spec. Our solution (comprised of the blue boxes) rewrites templates before Handlebars (2) online 13 (1) offline Secure Handlebars - Software Architecture
  • 14. 14 ■ Handlebars with Default Escaping. ■ Secure-Handlebars with Contextual Escaping. Demo videos!! original handlebars, secure handlebars Demonstration - Handlebars vs. Secure Handlebars
  • 15. Express Secure Handlebars 15 var express = require('express'); // simply replace the original express-handlebars with express-secure-handlebars, our implementation will preprocess the template(s) before passing to the original handlebars compiler. // exphbs = require('express-handlebars'); exphbs = require('express-secure-handlebars');
  • 17. 17Image from Andrea Goh, baking ingredients, 2012, flickr.com, License: creative common What are the ingredients? ● Template Parser & Walker ○ for extracting template markups ● Standard Compliant Context Parsers ○ for analyzing output contexts ○ for auto-correcting browser quirks ● Context-sensitive XSS Filters ○ for applying contextual filtering rules to defend against XSS!
  • 19. <div style="{{cssContext}} ">{{htmlContext}} </div> {{#if data}} <a href="{{uriContext}}">link</a> {{else}} <div>Data not found</div> {{/if}} 19 ■ Extract template markups and build an AST for further contextual analysis H T/C H T/H H T/B H T/U H H Legend: R: Root, H: HTML context, T/C: template output in css context, T/H: template out in HTML context, T/U: template output in URI context, T/B: a branching node in template R Template Parser & Walker
  • 20. 20 ■ Template walker traverses the AST, and triggers different parsers We trigger an HTML5 context parser for analysis! (green & blue) We trigger a CSS context parser for analysis! (orange) We trigger a URI parser for analysis! (red) H H H T/B H H H R Legend: R: Root, H: HTML context, T/C: template output in css context, T/H: template out in HTML context, T/U: template output in URI context, T/B: a branching node in template T/C T/H T/U Context Parsers (HTML, CSS etc.)
  • 21. 21 ■ Based on the contextual analysis, precise filtering rules can be applied! We apply the filtering rules (i.e. the most basic HTML escaping) for an HTML context We apply the filtering rules for an HTML double-quoted attribute value context and CSS context We apply the filtering rules for an HTML double-quoted attribute value context and URI context Legend: R: Root, H: HTML context, T/C: template output in css context, T/H: template out in HTML context, T/U: template output in URI context, T/B: a branching node in template H H H T/B H H H R T/C T/H T/U Contextual-Sensitive XSS Filters
  • 22. 22 The parsing sequence of the AST ● R → H → T/C → H → T/H → H → H → T/U → H ● R → H → T/C → H → T/H → H → H The end context of this HTML chunk will copy to each branch as a start context for further contextual analysis H H H T/B H H H R T/C T/H T/U Legend: R: Root, H: HTML context, T/C: template output in css context, T/H: template out in HTML context, T/U: template output in URI context, T/B: a branching node in template Template Parser & Walker - Handling of branching logic Branch A Branch B
  • 23. 23 ... T/B <a href=" <a style=" T/? Ambiguous Context! CSS or URI? ...{{#if data}} <a href=" {{else}} <a style=" {{/if}} {{ambiguousContext}} Legend: R: Root, H: HTML context, T/C: template output in css context, T/H: template out in HTML context, T/U: template output in URI context, T/B: a branching node in template Template Parser & Walker - Ambiguous context after branching -
  • 24. 24 <input sub-tmpl style=" T/H <input sub-tmplstyle=" T/C Parent template AST Sub-template AST AST with sub-template expansions WITHOUT sub-template expansions, templates are analyzed separately. WITH sub-template expansions, templates are analyzed together. parent template content <input {{>sub-tmpl}} sub-template content style="{{output}}" HTML context? CSS context!! Template Parser & Walker - Sub-template Expansion Legend: R: Root, H: HTML context, T/C: template output in css context, T/H: template out in HTML context, T/U: template output in URI context, T/B: a branching node in template
  • 26. 26 Given a piece of HTML, find out which portions of it are in executable context. E.g. ■ <html> <script> ... </script> </html> ■ <a href=javascript:... > ■ <img src=x onerror=... > Contextual Parsing - Problem Definition
  • 27. Day 1: "Parsing Html The Cthulhu Way" Quoted from Coding Horror, http://blog.codinghorror.com/parsing-html-the-cthulhu-way/ 27 # pull out data between <script> tags ($script_data) = $html =~ /<script>(.*?)</script>/gis;
  • 28. Day 2: Search npmjs, and pick the first one. 28 Less horrible, until you see it...
  • 29. HTML 5 seems like fun <a<b<c> <! comment !> <? comment > </d id=e/> <f g = h > , <f g=<h>, or <f g=h> ? <script> what <!-- this <script> actually </script> means --> ? </script> 29 view-source:https://rawgit.com/yukinying/appsecusa-2015-jsdemo/master/sample1.html
  • 30. 30
  • 31. Dilemma - Reuse or Build? 31 ■ Programming Language ■ Speed ■ Compliance with HTML5 Specification ■ Ease for maintenance
  • 32. Day 20: I need a high speed racing car 32 ■ Libraries that are compliant to HTML5 specification: ● Google's CTemplate and Closure Template
  • 33. Day 20: Use a library that's compliant 33 ■ Server side binding C with nodejs ■ Client side? ■ Can't extend easily for our use case (templating)
  • 34. ■ Get a coffee machine, tons of coffee bean ■ Read section "The HTML syntax" (sect 8 and sect 12, resp.) Day 21: https://html.spec.whatwg.org/ and http://www.w3.org/TR/html5/ 34
  • 35. HTML := TOKEN | TEXT | TAG TAG := TAGNAME + TAGATTR* Day 50: HTML Grammar? 35 if tag name == "Script" { alert }
  • 36. • Erroneous HTML will always be accepted. Day 50: HTML := ANY* 36
  • 37. Day 99: Flows can be visualized 37 https://rawgit.com/yukinying/appsecusa-2015-jsdemo/master/svg/everything.svg
  • 38. Key Observation #1 Section 12.2.4 - HTML Tokenization 1. HTML are tokenized as DATA (TEXT), TAG, ATTR using the flow. 2. Special tokens are defined as RAWDATA, RCDATA, SCRIPT. Section 12.2.5 - HTML Tree Construction 1. describes how DATA -> RAWDATA / RCDATA / SCRIPT 38
  • 39. Key Observation #2 Describing flows can be cumbersome. But there are patterns. • token state changes only when seeing • WHITESPACE • < , /, > (for tag) • & (for html entity) • ', " , = (for attribute) • !, - , ? (for comment) • A-z (valid start character for tag name) 39
  • 43. Key Observation #3 One state transition table can cover the normal cases. Special cases: 1. a<b< , algorithm ask for reconsumption of b when b is followed by non-tag related element. 2. tag matching is required for in RAWTEXT (noframes, xmp, style, iframe, noembed, noscript), RCDATA (textarea, title), SCRIPT, PLAINTEXT. - Thus no tag nesting allowed in RAWTEXT / RCDATA / SCRIPT - <textarea><script><textarea> Quick and dirty solution: Use 3 state transition tables altogether. Formal solution: expand state space to N^3. 43
  • 45. Exercise: Parse the following. 45 <script> what <!-- this <script> actually </script> means --> ? </script> https://rawgit.com/yukinying/appsecusa-2015-jsdemo/master/demo.html
  • 46. 46 Hint: use state diagram of <script> tag flow https://rawgit.com/yukinying/appsecusa-2015-jsdemo/master/svg/script.svg
  • 47. HTML5 Context Parser - design principle - Standard Compliance - Cross browser compatibility. - Speed and Efficiency. 47 QR code points to the repo of the standalone CSS Parser Command line version of our context parser report state (aka context) of each character
  • 48. - Our implementation is based on WHATWG - HTML5 compliance - Language - Context as output HTML5 Context Parser - standard compliance QR code points to the repo of the standalone Context Parser Figure from Overview of HTML 5 Parsing Model: https://html.spec.whatwg.org/multipage/syntax.html#overview-of-the-parsing-model 48 Gumbo - DOM tree as an output html5lib - Python implementation
  • 49. HTML5 Context Parser - speed & efficiency Lightweight & Efficient - State transitions reduction - (e.g., omit 16 doctype transitions, i.e., 23% of all states) - No tree/DOM construction QR code points to standalone Context Parser github repo Figure from Overview of HTML 5 Parsing Model: https://html.spec.whatwg.org/multipage/syntax.html#overview-of-the-parsing-model 49
  • 50. Standard-compliant Parser is NOT Enough 50 ■ Purpose: Add filters based on the determined context ■ Problem: Context inferred by browsers ≠ Context inferred by our parsers ■ Worst case: filters voided; XSS
  • 51. Some Browser-specific Quirks <a href="..." <script>{{inScriptInSafari5/Data}}</a> 51 <!--[if IE]><script>{{inScriptInIE/Comment}}</script><![endif]--> <textarea><!-- </textarea>{{inRawText/DataHTML5}}--></textarea> Compatibility issues by HTML 5 (e.g., in IE7) <div><!-- Comment1 --!> {{inComment/DataHTML5}} --></div> <div id=`{{inGraceAccentQuotedinIE/UnquotedAttrVal}}`></div> etc... etc...
  • 52. Auto HTML Canonicalization 52 ■ Comparisons ● Prior work: manual corrections, or no warning at all ● Our work: auto. rewrite HTML to clear parse errors ■ Goals ● Ensure parsing experience aligned across browsers/parsers ● Decisions: honor HTML 5 standard; secure-by-default ● Hence, contextual filtering can work accordingly
  • 54. Security Model 54 Go Template Security Model: http://golang.org/pkg/html/template/#hdr-Security_Model Closure Security Model: http://js-quasis-libraries-and-repl.googlecode.com/svn/trunk/safetemplate.html#problem_definition Templates: Trusted; Data from output expressions: Untrusted Context-aware Filterings are specific to the determined context Data Self-contained Untrusted data cannot break out from its context Non-executable Data Untrusted data cannot be executed as script Preserve Trusted Code Trusted code and logics should be preserved Security Goals for filters Assumption p.s. Go Temp. and Closure have similar security models
  • 55. 55 Image from: https://www.flickr.com/photos/ravenshoegroup/5692831233/ (CC BY 2.0) XSS in More Details Differentiations stem from the DESIGN…
  • 56. Prior Work Assumption 56 Untrusted variable assumed non-empty; In reality, ever thought an empty variable could break security? Go Template Security Model: http://golang.org/pkg/html/template/#hdr-Security_Model Closure Security Model: http://js-quasis-libraries-and-repl.googlecode.com/svn/trunk/safetemplate.html#problem_definition
  • 57. #1: Data self-contained; Trusted Code Preserved - Sample: Output Markup in Unquoted Attribute Value + <input value=� name=email> by our work - <input value={{email}} name=email> 57 + <input value= name=email> by Closure/Go When data is empty, the resulted HTML after filtering and data binding: - Browser/HTML interprets “ name=email” as the attribute value - trusted structure broken. reference to email’s value lost; surprise to devel. - legit use of document.querySelector('[name=email]').value throws error - To mitigate, our filter inserts U+FFFD (meaning NULL) when empty - good faith: developers still have a chance to validate the value (e.g., email) - preserved developers’ logics (if not help quoting it)
  • 58. 58 State transition in DATA state (e.g., <div>↑</div>) Are existing filters really designed for the era of contextual escaping?
  • 59. #2: More Context-sensitive and Efficient 59 Go Template Security Model: http://golang.org/pkg/html/template/#hdr-Security_Model Closure Security Model: http://js-quasis-libraries-and-repl.googlecode.com/svn/trunk/safetemplate.html#problem_definition Output Markup Prior Work Our Work in Data Context Apply the same HTML Filter (i.e., encode &<>"'`) Apply yd Filter (i.e., encode < only) in Double-quoted Attr Value Context Apply yavd Filter (i.e., encode " only) <input type="hidden" value="{{email}}" name="email">{{email}} ■ Insight: Context Parser accurately determined the contexts ● Why over-encode? Let’s embrace just-sufficient encoding ● Runtime performance > 5x faster, as a result
  • 60. Double-encoding Issue ● Input filtering likely applied in existing website: <3 becomes &lt;3 ● Output filtering encodes it again: finally becomes &amp;lt;3 rendered as in the diagram #3: Life is worthless without love <3 60 Can we omit encoding the & character?
  • 61. Graceful Output and Input Filtering - A brave attempt not to encode & ■ & cannot lead to script executions, as defined by HTML 5 “JS includes” <br size="&{alert(1)}"> became history (IE5) 61 ■ Require HTML decoding in case of blacklisting, e.g., to filter javascript&colon;alert(1), ● prior work tests value, &colon; becomes &amp;colon; ● we html decode first, then tests the resulted value ● Our decoder: correct (&gta → >a), fastest (using FSM, trie)
  • 63. 63 Evaluations - Contextual Analysis on a Yahoo! website ● 90.9% of output markups automatically secured with the contextual filtering Location of output expressions (aka. Context) No of findings Notes Simple HTML Contexts (e.g., data, dqAttr ...) 52 (~10.5%) Secured by default handlebars. ATTRIBUTE_VALUE_UNQUOTED state 5 (~1.0%) Secured also by secure-handlebars. ATTRIBUTE_VALUE states + URI 280 (~56.8%) ATTRIBUTE_VALUE states + CSS 111 (~22.5%) ATTRIBUTE_VALUE states + JS (e.g., onclick) 1 (~0.2%) Manual review is required. Dangerous contexts.SCRIPT state (i.e., <script>) 37 (~7.5%) ATTRIBUTE_NAME state 4 (~0.8%) RAW_TEXT state (e.g., <style>) 3 (~0.6%)
  • 64. ■ Offline/one-off overhead (for contextual analysis) ● It takes 63s to pre-process and canonicalize 512 templates (i.e. 0.35MB/sec). ■ Negligible runtime overhead (for filtering only) ● for contexts defendable by default filter, ours is >5x faster ● to secure other contexts, by design, least amount of chars Ref: https://github.com/yahoo/xss-filters/blob/benchmarks/tests/benchmarks/compare-default.js#L9-L22 64 Evaluations - Performance
  • 65. 65 Yahoo Secure Handlebars Google Angular.js Google Closure Facebook React Ember.js Contextual Escaping Supported HTML Contexts ✓ ✓ ✓ ✓ ✓ URI Contexts ✓ ✓ ✓ no ✓ CSS Contexts ✓ ✓ / no1 ✓ no no JS Contexts no no ✓ no no Important Features Auto HTML Canonicalization ✓ no no no no Auto Sub-template Analysis ✓ ✓2 ✓3 ✓2 ✓2 Secure Filters for > 90% of Browser Market Share (incl., IE 7+, Safari 5+, FF & Chrome) ✓ no ✓ no no Framework Comparisons 1 AngularJS does not apply contextual filtering rules on style attribute. 2 AngularJS, React and EmberJS restrict the sub-template in HTML Data context only. 3 Google Closure requires manual annotation for sub-template analysis.
  • 66. ■ When developers don’t know how to sanitize... ● use of SafeString/dangerouslySetInnerHTML 66 ■ No need to sanitize individual fields ■ Usable on client side or server side ■ Whitelist based approach QR code points to html-purify github repo Future Work / Rich HTML Sanitization safeHtml = Purifier.purify(untrustedRichHtml); Image from https://www.flickr.com/photos/tnarik/3416160916 (CC BY 2.0)
  • 67. ■ Efficient HTML5 compliant parser w/auto corrections ■ Auto apply contextual, just-sufficient, and faster escaping ■ Effortless adoption w/express-secure-handlebars ■ Open-sourced at github.com/yahoo and npmjs.com Portal: https://yahoo.github.io/secure-handlebars Conclusion: Building A Safer Internet for All Automatic contextual escaping made easy 67
  • 68. Thank you! Nera, Adon, Albert {neraliu, adon, albertyu}@yahoo-inc.com Twitter: @neraliu, @adonatwork, @yukinying 68 We’d like to acknowledge the support and help from: - Stuart Larsen - Alaa Mubaied - Aditya Mahendrakar - Eric Ferraiuolo - Christopher Harrell - Christopher Rohlf - Jeremy Ruppel Bug Bounty Program Contributors ● https://github.com/yahoo/secure-handlebars/blob/master/CONTRIBUTORS.md ● https://github.com/yahoo/xss-filters/blob/master/CONTRIBUTORS.md
  • 70. Besides, client-side use with secure-handlebars - Contextual analyzer can preprocess templates during the build process (at server side) - Handlebars pre-compiles the rewritten templates Filters registered at client-side allow handlebars to filter data at data binding stage. Hassle-free server-side adoption - To switch from express-handlebars to express-secure-handlebars npm: - 2 LOCs changes: (1) dependency in package.json, (2) require(...) - 70 Deployability of Secure-Handlebars - secure-handlebars & express-secure-handlebars
  • 71. ■ Work as a Preprocessor ● Parse template and build an Abstract Syntax Tree (AST) ● Walk thru every branch, trigger different parser for contextual analysis ● Insert filter markups to {{outputExpression}} based on its context ● Produce a rewritten template, compatible w/handlebars (unlike ember.js) ■ Facilitate Seamless Upgrade ● Existing template logics must all be preserved QR code points to the secure-handlebars github repo 71 Secure Handlebars - Design Principles
  • 72. + <a href="{{{yavd (yubl (yufull url))}}}">{{{yd url}}}</a> ● Handlebars applies the filters (aka helpers) during compilation. ○ {{{ }}} - disable the default blindly-escaping. ○ yufull - encodeURI() with IPvFuture support ○ yubl - disable dangerous protocols such as javascript: ○ yavd - html-escape double-quote character (" → &quot;) ○ yd - html-escape less-than character (< → &lt;) ● Contextual Analyzer adds filter markups specific to output contexts - <a href="{{url}}">{{url}}</a> 72 Rewrite template before Handlebars
  • 73. ■ Same considerations as HTML5 Context Parser ● Standard compliance. ● Cross-browser compatibility. ■ Design Goal - All browsers MUST parse the CSS with the same contextual result. 73 QR code points to the standalone CSS Parser github repo CSS Context Parser - design principles
  • 74. ■ Approach: ● Rewrite the CSS grammar into a stricter grammar. ● The original grammar allows escape char (i.e. {6digits}), the stricter grammar only allows known set of chars (i.e. [a-zA-Z0-9]) and special chars (i.e. :, ;). ● It is unusual to use escape char in CSS template. 74 QR code points to the standalone CSS Parser github repo CSS Context Parser - strict mode // this is a valid syntax, but our parser would reject it! <div style="color:{{output}}">...</div>
  • 75. Why are we reluctant to support auto JS Context filtering? Static vs. Dynamic 75 ■ What XSS filters should we apply? single-quoted JS string? double-quoted URI attr? ● Static (incl related) approach can only apply the former one ● Warn & manual check; Avoid false sense of security <script> var html = '<a href="{{untrustedUrl}}"><b>link</b></a>...'; document.write(html); </script>