Protecting JavaScript source code using obfuscation - OWASP Europe Tour 2013 Lisbon

Protecting JavaScript
source code using
obfuscation
Facts and Fiction
Pedro Fortuna, Co-Founder and CTO
AuditMark
OWASP Europe Tour 2013
Lisbon - June 21st, 2013

2
Code
Obfuscation
concepts
Code
Obfuscation
metrics
Practical
examples
Outline

3PART 1 – OBFUSCATION CONCEPTS PART 2 – OBFUSCATION METRICS PART 3 – JAVASCRIPT OBFUSCATION PRACTICAL EXAMPLES
WHAT IS CODE OBFUSCATION?
PART 1
PART 1 – OBFUSCATION CONCEPTS

4
Obfuscation
“transforms a program into a form that is more difficult for an
adversary to understand or change than the original code” [1]
Where more difficult means
“requires more human time, more money, or more computing power
to analyze than the original program.”
[1] in Collberg, C., and Nagra, J., “Surreptitious software: obfuscation, watermarking, and
tamperproofing for software protection.”, Addison-Wesley Professional, 2010.
Code Obfuscation

5
Lowers the code quality in terms of
Readability Maintainability
Delay program understanding
Time required to reverse it > program
useful lifetime
Resources needed to reverse it > value
obtained from reversing it
Delay program modification
Cost reversing it > cost of developing it
from scratch
Code Obfuscation

6
Obfuscation != Encryption
Web
Application
Encryption
Algorithm
Decryption
Algorithm
JS Engine
Executable JavaScript
Source Code
Source Code
Non-Executable
Encrypted Code
Encryption Key Decryption Key
{
{
{
• This is a common misconception
• Encrypted code is not executable by the browser or JS Engine
• A decryption process is always needed

7
Obfuscation != Encryption
Web
Application
Obfuscation
Engine
JS Engine
Source Code
Source Code
{
{
• JavaScript obfuscated code is still valid, ready to execute code
• It does not require a symmetric deobfuscation function

8
JavaScript Obfuscation Example #1
HTML5 Canvas
example from
mozilla.org
• Being JavaScript, this code is delivered to the browser as clear
text, and as such, it can be captured by anyone

9
JavaScript Obfuscation Example #1
• This is the obfuscated version of the code.
• It can still be captured by anyone, but it is much harder to
grasp and to change.

What is it good for?
Good
• Prevent code theft and reuse
– E.g. Stop a competitor from using your code
as a quickstart to build its own
• Protect Intellectual Property
– Hide algorithms
– Hide data
– DRM (e.g. Watermarks)
• Enforce license agreements
– e.g. domain-lock the code
• As an extra security layer
– Harder to find vulnerabilities in the client-side
• Test the strength of security controls
(IDS/IPS/WAFs/web filters)
Evil
• Test the strength of security controls
(IDS/IPS/WAFs/web filters)
• Hide malicious code
• Make it look like harmless code

11
• Question often raised: why not move security sensitive code to
the server and have JS request it whenever needed ?
• Sometimes you can... and you should!
• But there are plenty situations where you can’t:
– You may not have a server
• Widgets
• Mobile Apps
• Standalone, offline-playable games
• Windows 8 Apps made with WinJS
– You may not want to have a server
• May not be cost effective doing computations on a server (you have to guarantee 100% uptime,
support teams)
• Latency
Why not rely on the Server?

CODE OBFUSCATION METRICS
PART 2
PART 2 – OBFUSCATION METRICS

13
• Potency
• Resilience
• Stealthiness
• Execution Cost
• Maintainability
Measuring Obfuscation
Next:
• We’ll present each metric using
simple examples
• This is intentional, to ease the
process of understanding the
metrics
• However, they do not represent to
the full extent what you can obtain
if you combine a large set of
different obfuscation
transformations.

14
Generates confusion
Obfuscation Potency
• Measure of confusion that a certain
obfuscation adds
• Or “how harder it gets to read and
understand the new form when
compared with the original”
• To the left you can see a simple
example of a factorial function

15
Generates confusion
Obfuscation Potency
Rename all + Comment removal
• Now to the right you see the result of renaming every symbol to a mix of lower and upper O’s.
We all know that function names and variable names are quite useful for the purpose of
understanding the code. Not only we’ve lost that, but the new names can be easily confused.
• Also comments were removed. They are also important to understand a program.
• So we can definitely say that the obfuscation introduced a certain degree of confusion. It has
added some potency.

16
Generates confusion
Obfuscation Potency
Whitespace removal
• Now, below, you can see the result of removing
whitespaces from the code. It becomes slightly more
confusing, so we can say it is slighly more potent than
the previous example.

17
Resistance to deobfuscation techniques
be it manual or automatic
Obfuscation Resilience
• Represents the measure of the
resistance that a certain obfuscation
offers to deobfuscation techniques
• Or “how hard it is to undo the back
to the original form”
• To the left you can see the same
example function as before

18
• On the right you can see the result of applying rename_all
obfuscation.
• This is an example of an obfuscation which is 100% resilient,
because, assuming that you don’t have access to the original
source code, it’s impossible to tell what were the original names.
• The comment removal obfuscation is also 100% resilient as you
can’t possibly know if the original form had any comments and
recover them

19
String splitting• on the bottom, you see the result after applying string
splitting.
• You can definitly see that it is more potent than the
previous, but if you look carefully, you can see that its
not hard to revert back to the previous form.
• So we can say that this version does not really add
much resilience when compared with the previous
form.

20
One way of attacking obfuscation is using a Static Code Analyser
1. Parses the code
2. Transforms it to fullfill a purpose
– Usually to make it simpler => better performance
– Simpler also fullfills reverse-engineering purpose
• Example simplifications
– Constant propagation, constant folding
– Remove (some) dead code
• And most importantly, it is automatic!
Static Code Analysis
for defeating obfuscation
Constant propagation:
x = 10;
y = 7 – x / 2;
x = 10;
y = 7 – 10 / 2;
Constant folding:
N = 12 + 4 – 2;
N = 14;

21
• We used Google Closure Compiler, a Static Code Analyser to simplify the code.
• The result is on the right, which as you can see returned much easier to read code.

22
• If we compare the code on the right with the original code (on the left) we
can see that they are not far apart.
• So the potency of the obfuscation is only apparent. The real potency or the
potency we should consider is the one that you get after using automated
ways of reversing the code.
• This does not mean that the string-splitting obfuscation is useless. It has to
be combined with other obfuscations that provide more resilience.

23
• Another way of attacking obfuscation
• Analysis performed by executing the code
– Retrieves the Control flow graph (CFG) of the code executed
– Retrieve values of some expressions
• How it can be used to defeat obfuscation
– Understand (one instance of) the program execution
• You can focus on parts that you are sure that are executed
– Retrieve values of some expressions
• Aids code simplification
• Find needle in the haystack => e.g. retrieve encryption key
– Bypasses deadcode
– Not very good for automatic reversal of obfuscation
• May not “see” all useful code
• If you need to make sure the code will remain 100% functional, you cannot use this technique
– Gather knowledge for manual reverse engineering
Dynamic Code Analysis
for defeating obfuscation

24
• How hard is to spot?
– Or “how hard is to spot the changes performed by the
obfuscation”
– Or “how successfull the obfuscation was in making the
obfuscated targets look like other parts of the code”
• An obfuscation is more stealthy if it avoids common telltale
indicators
– eval()
– unescape()
– Large blocks of meaningless text
Obfuscation Stealthiness

25
• Impact on performance
– Runs per second
– FPS (e.g. Games)
– Usually obfuscation does not have a positive impact on performance, but it does
not necessarily have a negative impact. It depends on the mix of transformations
chosen and on the nature of the original source code.
• E.g. Renaming symbols => Same execution cost
• Impact on loading times
– Time before starting executing
– Usually a function of file size
– Usually obfuscation tends to grow filesize. But there are also some obfuscation
transformations which also makes it smaller.
• E.g. Renaming symbols again
Obfuscation Execution Cost

26
Effect on maintainability = 1/potency (after static code analysis)
Lower maintainability => mitigates code theft and reuse
This is one of the most important
concepts around obfuscation
Obfuscation & Maintainability

PRACTICAL EXAMPLES
PART 3
PART 3 – JAVASCRIPT OBFUSCATION PRACTICAL EXAMPLES

28
Compression/Minification vs Obfuscation
• This first example aims to clarify one of the most common
misconceptions around obfuscation: a lot of people do not
understand very well the difference between compressing
or minifying the code and obfuscating it.
• This code is a portion of a md5 function in JavaScript.

29
• This is a compressed
version of it
• It really seems to be more
potent. No doubt about it.

30
• But look, it has got an eval()
on it. Not much stealthy.
• It is needed because the
javascript has been encoded
and the result of decoding it
must be evaluated in
runtime.
• When encoding is used,
there is always a decoding
function somehwere.
• The real questions is: Is it
resilient ?

31
eval(
(function(....))
);
document.write(‘<textarea>
(function(...))
</textarea>’);
A simple trick will do it
• By replacing the eval() with a
document.write (just one
way to do it) you get access
to the decoded source.

32
Reverse-engineered result
Original source
• And that results in the code you see on the right. If you compare with the original source code, you can see that it’s pretty much
the same code.
• To many this isn’t surprising, but a lot of people uses JavaScript compressors or minifiers with the intention of protecting the code.
• This is perfect example of a code transformation that is very potent but with almost null resilience.
• Compressor/Minifier tools do not aim at protecting the code. Their sole purpose is to make it smaller and faster.

33
• First JavaScript version proposed by Yosuke Hasegawa (in sla.ckers.org, Jun 2009)
• Encoding method which uses strictly non-alphanumeric symbols
• Example: alert(1) (obfuscated version below)
Non alphanumeric Obfuscation

34
• Using type coercion and browser quirks
• We can obtain alphanumeric characters indirectly
How is that possible ?
+[] -> 0
+!+[] -> 1
+!+[]+!+[] -> 2 Easy to get any number
+”1” -> 1 Type coercion to number
“”+1 = “1” Type coercion to string
How to get letters?
+”a” -> NaN
+”a”+”” -> “NaN”
(+”a”+””)[0] -> “N”
Ok, but now without alphanumerics:
(+”a”+””)[+[]] -> “N”
How to get an “a” ?
![] -> false
![]+“” -> “false”
(![]+””)[1] -> “a”
(![]+””)[+!+[]]
(+(![]+"")[+!+[]]+””)[+[]] -> “N”
eval( (![]+"")[+!+[]]+"lert(1)");

36
• “eval” uses alphanumeric characters!
• eval() is not the only way to eval() !
• You have 4 or 5 methods more
• Examples
– Function("alert(1)")()
– Array.constructor(alert(1))()
– []["sort"]["constructor"]("alert(1)")()
• Subscript notation
• Strings (we already know how to convert them)
Wait... What about the eval ?

37
Let me see that again!

38
• 100% potent
• 0% stealthy (when you see it, you know someone is trying to hide something)
• High execution cost
– eval is a bit slower
– But the worst is: file is much larger => slower loading times
• May not work in all browsers
• What about resilience ?
– Unfortunately, not much (you can get a parser to simplify it back to the
original source)
• Good for bypassing filters (e.g. WAFs)
Non alphanumeric Obfuscation

39
Original source code
Deadcode injection + Rename localDeadcode injection
Can you spot where
is the dead code ?

40
Original source code
Deadcode injection + Rename localDeadcode injection

41
• Deadcode insertion is a natural way of adding confusion to a source code, and thus increasing
the potency of obfuscation.
• Being deadcode, the code isn’t really executed, so this has no impact on Execution Cost
• Would a Static Code Analyser remove this particular dead code?
• No, because it relies on opaque predicates
– Not removable using Static Code Analysis
– Predicates similar to ones found in the original source ( ++stealthiness )
• Randomly injected ( ++potent )
• Increase complexity of control flow ( ++potent )
• Dummy statements created out of own code (++potent & ++stealthiness )
Deadcode injection

42
All Together Now
HTML5 Canvas
example from
mozilla.org
• Up to now we have mostly
seen no more than two or
three obfuscation
transformations working
together.
• Let’s go back to the first
example and see what
happens when we mix a
larger number of code
obfuscation transformations
together.

43
All Together Now
• remove comments
• dot notation
• rename local
• member enumeration
• literal hooking :low
• deadcode injection
• string splitting :high
• function reordering
• function outlining
• literal duplicates
• expiration date "2199-12-31
00:00:00"

44
All Together Now
• As you can see, you get and heavily obfuscated
result.
• We intentionally didn’t used any encoding-
based obfuscation in this example to let you
see the effect of these transformations
together. Also, you are not seeing the whole
code here.
• For the record, not all encoding
transformations are easily reversed. We could
use for instance a Domain-lock encoding which
needs to get the correct information from the
browser to decode properly.

45
After Closure Compiler
• And this is the result after running the code
through Google Closure Compiler.
• It didn’t improved the readability much because
the obfuscation transformations offered a good
degree of resillience.

46
• People often judge obfuscation based on its (aparent) potency
• Its resilience and the “real” potency that matters
– Potency that you get after applying automated tools to reverse it
• Evaluating resilience is not trivial
– Looking at simpler examples it may be relatively easy “at naked eye” to tell which
of two obfuscations is more resilient
– But looking when comparing complicated obfuscated versions, that use many
code transformations, its not easy to say which version is more resilient.
– This is a job for JavaScript obfuscators
• They should offer not only potency, but also resilience
• Make an effort to explain its users what is best to protect their code
• Avoid making available options that may reduce resilience
Conclusion

47
• Don’t forget execution cost
– And where the code is executed. A Smartphone usually has less resources than a
desktop computer. Obfuscation should be tuned to the platform where the code is
being executed.
• Obfuscation can be very effective as a way to prevent code theft and reuse, by
– Making it a real pain to understand of the code
– Making it a real pain to change the code successfully
– Significantly lower the value that can be obtained by an attacker from reversing a
code
Conclusion

Contact Information
Pedro Fortuna
Owner & Co-Founder & CTO
pedro.fortuna@auditmark.com
Phone: +351 917331552
Porto - Headquarters
Edifício Central da UPTEC
Rua Alfredo Allen, 455
4200-135 Porto, Portugal
Lisbon office
Startup Lisboa
Rua da prata, 121 5A
1100-415 Lisboa, Portugal

Protecting JavaScript source code using obfuscation - OWASP Europe Tour 2013 Lisbon

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (11)

Similaire à Protecting JavaScript source code using obfuscation - OWASP Europe Tour 2013 Lisbon

Similaire à Protecting JavaScript source code using obfuscation - OWASP Europe Tour 2013 Lisbon (20)

Dernier

Dernier (20)

Protecting JavaScript source code using obfuscation - OWASP Europe Tour 2013 Lisbon