SlideShare une entreprise Scribd logo
1  sur  38
Télécharger pour lire hors ligne
!!!! Unicode 
Application Security Forum - 2014 
Western Switzerland 
! 
5-6 novembre 2014 
Y-Parc / Yverdon-les-Bains 
Hacks 
Nicolas 
Seriot 
November 
6th, 
2014 
h<p://unicode-­‐wall-­‐of-­‐shame.com
• full 
presentaFon 
at 
SoGShake 
• 10 
min. 
/ 
38 
slides 
–> 
15.8 
s. 
/ 
slide 
• an 
arFcle 
is 
coming…
BCD 
EBCDIC 
1963: 
ASCII 
Baudot 
Code 
1990’s: 
8 
bit 
encodings 
ISO/IEC 
8859-­‐1 
(LaFn 
1) 
ISO/IEC 
8859-­‐6 
(Arabic)
The 
Unicode 
ConsorFum
glyphs 
☃ 
text 
rendering 
engine 
NSLayoutManager 
fonts 
Times New Roman.ttf 
codepoints 
U+2603 SNOWMAN 
Unicode 
Standard algorithms 
normalizaFon collaFon casing 
binary 
representaFon 
E2 98 83 
(UTF-­‐8)
Visual 
SimilariFes 
AΑ А ᗅ ᗋ ᴀ A 
www.google.com 
– 
U+0067 LATIN SMALL LETTER G! 
www.ɡooɡle.com 
– 
U+0261 LATIN SMALL LETTER SCRIPT G 
৪ – U+09EA BENGALI DIGIT FOUR! 
୨ – U+0B68 ORIYA DIGIT TWO
Country 
Flags 
U+1F1E6 + U+1F1E7 ! 
U+1F1E8 + U+1F1F3 #  ! 
U+1F1E9 + U+1F1EA &! 
U+1F1EA + U+1F1F8 '! 
U+1F1EB + U+1F1F7 (! 
U+1F1EC + U+1F1E7 )! 
U+1F1EE + U+1F1F9 *! 
U+1F1EF + U+1F1F5 +! 
U+1F1F0 + U+1F1F7 ,! 
U+1F1F7 + U+1F1FA -! 
U+1F1FA + U+1F1F8 .
Bi-­‐direc4nal 
Text 
# U+202E RIGHT-TO-LEFT OVERRIDE! 
# double click a .jpg, open an .exe ! 
$ python3 -c "print('su202Egpj.exe')"! 
sexe.jpg
glyphs 
☃ 
text 
rendering 
engine 
NSLayoutManager 
fonts 
Times New Roman.ttf 
codepoints 
U+2603 SNOWMAN 
Unicode 
Standard algorithms 
normaliza4on colla4on casing 
binary 
representa4on 
E2 98 83 
(UTF-­‐8)
$ gdb Twitter ! 
! 
(gdb) r! 
Starting program: /Applications/Twitter.app/Contents/MacOS/Twitter ! 
! 
Program received signal EXC_BAD_ACCESS, Could not access memory.! 
Reason: KERN_INVALID_ADDRESS at address: 0x00000001084e8008! 
0x00007fff9432ead2 in vDSP_sveD ()! 
! 
(gdb) bt! 
#0 0x00007fff9432ead2 in vDSP_sveD ()! 
#1 0x00007fff934594fe in TStorageRange::SetStorageSubRange ()! 
#2 0x00007fff93457d5c in TRun::TRun ()! 
#3 0x00007fff934579ee in CTGlyphRun::CloneRange ()! 
#4 0x00007fff93466764 in TLine::SetLevelRange ()! 
#5 0x00007fff93467e2c in TLine::SetTrailingWhitespaceLevel ()! 
#6 0x00007fff93467d58 in TRunReorder::ReorderRuns ()! 
#7 0x00007fff93467bfe in TTypesetter::FinishLineFill ()! 
#8 0x00007fff934858ae in TFramesetter::FrameInRect ()! 
#9 0x00007fff93485110 in TFramesetter::CreateFrame ()! 
#10 0x00007fff93484af2 in CTFramesetterCreateFrame ()! 
...
OS 
X 
Finder 
$ echo -e "xFFxFE" > x.txt # UTF-16LE BOM! 
$ xattr -w com.apple.TextEncoding "utf-16le" x.txt! 
$ qlmanage -p x.txt # or QuickLook with Finder 
[ERROR] An uncaught exception was raised outside of any generator: *** -[NSConcreteTextStorage attribute:atIndex:longestEffectiveRange:inRange:]: Range or index out of 
bounds! 
2014-10-24 10:53:08.474 qlmanage[5268:11f] *** Terminating app due to uncaught exception 'NSRangeException', reason: '*** -[NSConcreteTextStorage 
attribute:atIndex:longestEffectiveRange:inRange:]: Range or index out of bounds'! 
*** First throw call stack:! 
(!! 
0 CoreFoundation 0x00007fff89ebe25c __exceptionPreprocess + 172! 
! 1 libobjc.A.dylib 0x00007fff87934e75 objc_exception_throw + 43! 
! 2 CoreFoundation 0x00007fff89ebe10c +[NSException raise:format:] + 204! 
! 3 AppKit 0x00007fff81a83a7a -[NSConcreteTextStorage attribute:atIndex:longestEffectiveRange:inRange:] + 118! 
! 4 AppKit 0x00007fff81951ded -[NSMutableAttributedString(NSMutableAttributedStringKitAdditions) fixGlyphInfoAttributeInRange:] + 204! 
! 5 AppKit 0x00007fff81951cd8 -[NSMutableAttributedString(NSMutableAttributedStringKitAdditions) fixAttributesInRange:] + 39! 
! 6 AppKit 0x00007fff81a838e1 -[NSTextStorage processEditing] + 109! 
! 7 AppKit 0x00007fff81a7f742 -[NSTextStorage endEditing] + 110! 
! 8 AppKit 0x00007fff81c5db4f _NSReadAttributedStringFromURLOrData + 14525! 
! 9 AppKit 0x00007fff81c5e3a5 -[NSAttributedString(NSAttributedStringKitAdditions) initWithURL:options:documentAttributes:
glyphs 
☃ 
text 
rendering 
engine 
NSLayoutManager 
fonts 
Times New Roman.ttf 
codepoints 
U+2603 SNOWMAN 
Unicode 
Standard algorithms 
normaliza4on colla4on casing 
binary 
representa4on 
E2 98 83 
(UTF-­‐8)
Weird 
Code 
Points 
May 
Bypass 
Filters 
• Non-­‐characters: 
eg. 
U+FFFE, U+FFFF, U+1FFFE, U+10FFFF 
Unassigned 
code 
points: 
eg. 
U+2073! 
• Must 
not 
be 
deleted 
(as 
allowed 
by 
Unicode 
< 
5.2 
C7) 
but 
replaced 
with 
U+FFFD REPLACEMENT CHARACTER. 
<a href=“javauFEFFscript:alert("XSS")>
Non-­‐Characters 
and 
OS 
X 
Bash 
/ 
HFS+ 
$ mkdir /tmp/test! 
$ cd /tmp/test! 
$ touch `printf "axefxbbxbfb"`! 
# or "auFFFEb".encode('utf-8')! 
# which is a non-character! 
$ ls a*! 
a?b! 
$ touch ab! 
$ ls a* ! 
a?b! 
# where did ab go?!
Regex 
$ python3! 
>>> import re! 
>>> reg = re.compile("d") ! 
>>> gen = ( chr(c) for c in range(0, 0xFFFF) if re.match(reg, chr(c)) )! 
>>> print(''.join(gen))! 
0123456789۰۱۲۳٤٥٦۷۸۹۰۱۲۳۴۵۶۷۸۹߉߈߇߆߅߄߃߂߁߀०१२३४५६७८९০১২৩৪৫৬৭৮৯੦੧੨੩੪੫੬੭੮੯૦૧૨૩૪૫૬૭૮૯୦ 
୧୨୩୪୫୬୭୮୯௦௧௨௩௪௫௬௭௮௯౦౧౨౩౪౫౬౭౮౯೦೧೨೩೪೫೬೭೮೯൦൧൨൩൪൫൬൭൮൯๐๑๒๓๔๕๖๗๘๙໐໑໒໓໔໕໖ 
໗໘໙༠༡༢༣༤༥༦༧༨༩၀၁၂၃၄၅၆၇၈၉႐႑႒႓႔႕႖႗႘႙០១២៣៤៥៦៧៨៩᠐᠑᠒᠓᠔᠕᠖᠗᠘᠙᥆᥇᥈᥉᥊᥋᥌᥍᥎᥏᧐᧑᧒᧓᧔᧕᧖᧗᧘ 
᧙᪀᪁᪂᪃᪄᪅᪆᪇᪈᪉᪐᪑᪒᪓᪔᪕᪖᪗᪘᪙᭐᭑᭒᭓᭔᭕᭖᭗᭘᭙᮰᮱᮲᮳᮴᮵᮶᮷᮸᮹᱀᱁᱂᱃᱄᱅᱆᱇᱈᱉᱐᱑᱒᱓᱔᱕᱖᱗᱘᱙꘠꘡꘢꘣꘤꘥꘦꘧꘨꘩ 
꣐꣑꣒꣓꣔꣕꣖꣗꣘꣙꤀꤁꤂꤃꤄꤅꤆꤇꤈꤉꧐꧑꧒꧓꧔꧕꧖꧗꧘꧙꩐꩑꩒꩓꩔꩕꩖꩗꩘꩙꯰꯱꯲꯳꯴꯵꯶꯷꯸꯹0123456789! 
>>> reg = re.compile("d", re.ASCII)
Regex 
$ jsc! 
>>> /a.c/.test('abc')! 
true! 
>>> /a.c/.test('ac')! 
false! 
>>> /a....c/.test('ac')! 
true
glyphs 
☃ 
text 
rendering 
engine 
NSLayoutManager 
fonts 
Times New Roman.ttf 
codepoints 
U+2603 SNOWMAN 
Unicode 
Standard algorithms 
normaliza3on colla3on casing 
binary 
representa3on 
E2 98 83 
(UTF-­‐8)
é 
Normaliza3on TR#15 
U+00E9 
① 
U+2460 
CompaCanonical 
decomposi&on &bility 
decomposi&on 
NFD NFKD 1 
é 
U+0065 
◌́ 
U+0301 
é 
U+00E9 
1 
U+0031 
Canonical 
composi&on 
U+0031 
NFC NFKC 
◌́ 
é 
U+0065 
① 
① 
U+2460 
e 
U+0065 
U+0301 
U+2460 
(most 
common)
NFC 
doesn’t 
Always 
Compose 
שּׁ HEBREW LETTER! 
U+FB2C 
SHIN WITH DAGESH 
AND SHIN DOT 
◌ּ HEBREW LETTER! 
U+05BC 
SHIN 
ש HEBREW LETTER! 
U+05E9 
◌ׁ HEBREW LETTER! 
SHIN WITH DAGESH 
AND SHIN DOT U+05C1 
SHIN DOT 
NFC(U+FB2C) 
+ + 
buffer 
overflow
NFKD 
Expands 
Up 
to 
18x 
صلى الله عليه وسلمU+FDFA 
ARABIC 
LIGATURE! 
SALLALLAHOU 
ALAYHE 
WASALLAM 
>>> import unicodedata 
! 
>>> s = 'uFDFA' 
>>> len(s) 
1 
! 
>>> s_nfkd = unicodedata.normalize('NFKD', s) 
>>> s_nfkd.encode('unicode-escape') 
b'u0635u0644u0649 u0627u0644u0644u0647 u0639 
u0644u064au0647 u0648u0633u0644u0645' 
>>> len(s_nfkd) 
18 
buffer 
overflow
NFK* 
May 
Bypass 
Filters 
' FULLWIDTH 
U+FF07 
APOSTROPHE 
NFK*(U+FF07) 
‘ APOSTROPHE! 
U+0027 
SQL 
injec3on
hYps://labs.spo3fy.com/2013/06/18/crea3ve-­‐usernames/
glyphs 
☃ 
text 
rendering 
engine 
NSLayoutManager 
fonts 
Times New Roman.ttf 
codepoints 
U+2603 SNOWMAN 
Unicode 
Standard algorithms 
normaliza3on colla3on casing 
binary 
representa3on 
E2 98 83 
(UTF-­‐8)
Unicode 
Colla3on 
Algorithm 
– 
TR#10 
(UTS) 
German Swedish 
Åkersberga 1 2 Alingsås 
Alingsås 2 4 Oskarshamn 
Äpplebo 3 7 Ufng 
Oskarshamn 4 6 Üheld 
Östersund 5 8 Zwickau 
Üheld 6 1 Åkersberga 
Ufng 7 3 Äpplebo 
Zwickau 8 5 Östersund 
(Steven 
R. 
Loomis, 
Mark 
Davis) 
• Text 
comparison 
café < cafe ? 
cafe < café ? 
• Usage 
dependent 
German 
dic,onary: 
öf 
< 
of 
German 
phonebook: 
of 
< 
öf 
• Unstable 
over 
6me 
Sorted 
lists 
should 
be 
versioned
glyphs 
☃ 
text 
rendering 
engine 
NSLayoutManager 
fonts 
Times New Roman.ttf 
codepoints 
U+2603 SNOWMAN 
Unicode 
Standard algorithms 
normaliza,on colla,on casing 
binary 
representa,on 
E2 98 83 
(UTF-­‐8)
Case 
Folding 
hHp://www.unicode.org/Public/UNIDATA/CaseFolding.txt 
# The data supports both implementations that require simple case foldings! 
# (where string lengths don't change), and implementations that allow full case folding! 
# (where string lengths may grow). Note that where they can be supported, the! 
# full case foldings are superior: for example, they allow "MASSE" and "Maße" to match.
Case 
Conversion 
I 
U+0049 
İ 
U+0130 
ı 
U+0131 
i 
U+0069 
I 
U+0049 
i 
U+0069 
◌̇ 
U+0307 
◌̇ 
U+0307 
İ 
U+0130 
◌̇ 
U+0307 
Posix 
Locale 
Turkish 
Locale
Case 
Conversion 
– 
Locale 
NSString *s = [NSString stringWithFormat:@"istambul"]; 
! 
NSLocale *locale = [NSLocale localeWithLocaleIdentifier:@"tr_TR"]; 
! 
NSString *s2 = [s uppercaseStringWithLocale:locale]; 
! 
// İSTAMBUL ✅
Python 
3 
• ❌ 
Colla,on: 
s,ll 
compare 
codepoints 
>>> 'café' < 'caff' 
False 
• ❌ 
Case 
Conversion 
restricted 
to 
1:1 
case 
mappings 
>>> 'ß'.upper() 
'ß'! 
• ❌ 
Case 
conversion 
ignores 
locale 
❌ 
Addi,onaly, 
locale 
is 
global 
>>> import locale 
>>> locale.setlocale(locale.LC_ALL, 'tr_TR') 
>>> s = "istanbul" 
>>> s.upper() 
'ISTANBUL'
glyphs 
☃ 
text 
rendering 
engine 
NSLayoutManager 
fonts 
Times New Roman.ttf 
codepoints 
U+2603 SNOWMAN 
Unicode 
Standard algorithms 
normaliza,on colla,on casing 
binary 
representa,on 
E2 98 83 
(UTF-­‐8)
0x0800 
0xFFFF 
0x0100000 
0x10FFFF 
UTF-­‐8 
Bits Hex Min Hex Max Byte Sequence in Binary! 
1 7 00000000 0000007f 0vvvvvvv! 
2 11 00000080 000007FF 110vvvvv 10vvvvvv! 
3 16 00000800 0000FFFF 1110vvvv 10vvvvvv 10vvvvvv! 
4 21 00010000 001FFFFF 11110vvv 10vvvvvv 10vvvvvv 10vvvvvv! 
5 26 00200000 03FFFFFF 111110vv 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv! 
6 31 04000000 7FFFFFFF 1111110v 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv 
Malformed 
UTF-­‐8 
sequences 
include: 
-­‐ 
overlong 
encoding, 
0x1 
on 
2 
bytes 
11000000 10000001 0xC0 0x41 
-­‐ 
unexpected 
con,nua,on 
byte 
11000000 00000000 0xC0 0x00
UTF-­‐16 
0x0000 
0xD800 
0xE000 
0xFFFF 
0x0000 
0x010000 
00xx1F0FFFFFFF 
Bits Hex Min Hex Max Byte Sequence in Binary! 
2 16 00000000 0000FFFF vvvvvvvv vvvvvvvv! 
4 21 00010000 001FFFFF 110110ww wwwwwwww 110111ww wwwwwwww! 
! 
www.. is (vvv.. - 0x10000) to map a 20 bits value 
Malformed 
sequences 
include 
unpaired 
surrogates 
such 
as: 
-­‐ 
110110ww wwwwwwww 
not 
followed 
by 
110111ww wwwwwwww 
-­‐ 
110111ww wwwwwwww 
not 
preceded 
by 
110110ww wwwwwwww
Wide 
Characters 
• Unicode 
code 
points 
were 
first 
defined 
on 
16 
bits 
(UCS-­‐2) 
• and 
now 
Java 
char 
/ 
Objec,ve-­‐C 
unichar 
are 
16 
bits 
• code 
points 
> 
0xFFFF 
defined 
as 
a 
pair 
of 
16 
bits 
values 
• sizeof(wchar_t) 
is 
generally 
16 
bits 
on 
Windows, 
32 
bits 
on 
Linux
Objec,ve-­‐C 
/ 
Cocoa 
NSString *s1 = @"abc"; 
NSString *s2 = @"U0001F600bc"; 
! 
NSLog(@"s1 %@", s1); // s1 abc 
NSLog(@"s2 %@", s2); // s2 bc 
! 
NSLog(@"s1[0] -> %C", [s1 characterAtIndex:0]); // s1[0] -> a 
NSLog(@"s2[0] -> %C", [s2 characterAtIndex:0]); 
// nothing printed because 
// s2 = [0xD83D, 0xDE00], and U+D83D is a high surrogate 
// and NSLog() ignores nil strings
HFS+ 
Apple 
Technical 
Q&A 
QA1173
HFS+ 
# what you write…! 
$ echo ü; echo ü | xxd! 
ü! 
0000000: c3bc 0a # NFC! 
! 
# is not what you read! 
$ touch ü; ls; ls | xxd! 
ü! 
0000000: 75cc 880a # NFD 
# watch your Finder go nuts!!!! 
$ cd; touch `printf "x41xe9"` 
# NFC("Aé")! 
$ open .! 
# fixed in OS X 10.10
Conclusion 
• Unicode 
is 
cool. 
Unicode 
is 
hard. 
Unicode 
is 
ubiquitous. 
• How 
well 
do 
you 
know 
your 
framework 
of 
choice? 
• Everything 
dealing 
with 
Unicode 
is 
a 
bug 
nest. 
• Under-­‐studied 
topic. 
Tons 
of 
low-­‐hanging 
fruits. 
• See 
Chris 
Weber’s 
hHp://websec.github.io/unicode-­‐security-­‐guide/ 
« 
Unicode 
is 
just 
too 
complex 
to 
ever 
be 
secure. 
» 
– 
Bruce 
Schneier, 
2000

Contenu connexe

Tendances

3 scanning-ger paoctes-pub
3  scanning-ger paoctes-pub3  scanning-ger paoctes-pub
3 scanning-ger paoctes-pub
Cassio Ramos
 
us-17-Tsai-A-New-Era-Of-SSRF-Exploiting-URL-Parser-In-Trending-Programming-La...
us-17-Tsai-A-New-Era-Of-SSRF-Exploiting-URL-Parser-In-Trending-Programming-La...us-17-Tsai-A-New-Era-Of-SSRF-Exploiting-URL-Parser-In-Trending-Programming-La...
us-17-Tsai-A-New-Era-Of-SSRF-Exploiting-URL-Parser-In-Trending-Programming-La...
sonjeku1
 

Tendances (20)

Information Theft: Wireless Router Shareport for Phun and profit - Hero Suhar...
Information Theft: Wireless Router Shareport for Phun and profit - Hero Suhar...Information Theft: Wireless Router Shareport for Phun and profit - Hero Suhar...
Information Theft: Wireless Router Shareport for Phun and profit - Hero Suhar...
 
Code Red Security
Code Red SecurityCode Red Security
Code Red Security
 
Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...
Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...
Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...
 
Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...
Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...
Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...
 
App secforum2014 andrivet-cplusplus11-metaprogramming_applied_to_software_obf...
App secforum2014 andrivet-cplusplus11-metaprogramming_applied_to_software_obf...App secforum2014 andrivet-cplusplus11-metaprogramming_applied_to_software_obf...
App secforum2014 andrivet-cplusplus11-metaprogramming_applied_to_software_obf...
 
Specializing the Data Path - Hooking into the Linux Network Stack
Specializing the Data Path - Hooking into the Linux Network StackSpecializing the Data Path - Hooking into the Linux Network Stack
Specializing the Data Path - Hooking into the Linux Network Stack
 
DMVPN
DMVPNDMVPN
DMVPN
 
True stories on the analysis of network activity using Python
True stories on the analysis of network activity using PythonTrue stories on the analysis of network activity using Python
True stories on the analysis of network activity using Python
 
3 scanning-ger paoctes-pub
3  scanning-ger paoctes-pub3  scanning-ger paoctes-pub
3 scanning-ger paoctes-pub
 
IPsec Basics: AH and ESP Explained
IPsec Basics: AH and ESP ExplainedIPsec Basics: AH and ESP Explained
IPsec Basics: AH and ESP Explained
 
ハイパフォーマンスブラウザネットワーキング2
ハイパフォーマンスブラウザネットワーキング2ハイパフォーマンスブラウザネットワーキング2
ハイパフォーマンスブラウザネットワーキング2
 
2 netcat enum-pub
2 netcat enum-pub2 netcat enum-pub
2 netcat enum-pub
 
Beginner's Guide to the nmap Scripting Engine - Redspin Engineer, David Shaw
Beginner's Guide to the nmap Scripting Engine - Redspin Engineer, David ShawBeginner's Guide to the nmap Scripting Engine - Redspin Engineer, David Shaw
Beginner's Guide to the nmap Scripting Engine - Redspin Engineer, David Shaw
 
us-17-Tsai-A-New-Era-Of-SSRF-Exploiting-URL-Parser-In-Trending-Programming-La...
us-17-Tsai-A-New-Era-Of-SSRF-Exploiting-URL-Parser-In-Trending-Programming-La...us-17-Tsai-A-New-Era-Of-SSRF-Exploiting-URL-Parser-In-Trending-Programming-La...
us-17-Tsai-A-New-Era-Of-SSRF-Exploiting-URL-Parser-In-Trending-Programming-La...
 
Dynamic Port Scanning
Dynamic Port ScanningDynamic Port Scanning
Dynamic Port Scanning
 
Vm ware fuzzing - defcon russia 20
Vm ware fuzzing  - defcon russia 20Vm ware fuzzing  - defcon russia 20
Vm ware fuzzing - defcon russia 20
 
[嵌入式系統] MCS-51 實驗 - 使用 IAR (2)
[嵌入式系統] MCS-51 實驗 - 使用 IAR (2)[嵌入式系統] MCS-51 實驗 - 使用 IAR (2)
[嵌入式系統] MCS-51 實驗 - 使用 IAR (2)
 
managing your network environment
managing your network environmentmanaging your network environment
managing your network environment
 
Detectando Hardware keylogger
Detectando Hardware keyloggerDetectando Hardware keylogger
Detectando Hardware keylogger
 
Staging driver sins
Staging driver sinsStaging driver sins
Staging driver sins
 

Similaire à 20141106 asfws unicode_hacks

Exploring the x64
Exploring the x64Exploring the x64
Exploring the x64
FFRI, Inc.
 
Im trying to run make qemu-nox In a putty terminal but it.pdf
Im trying to run  make qemu-nox  In a putty terminal but it.pdfIm trying to run  make qemu-nox  In a putty terminal but it.pdf
Im trying to run make qemu-nox In a putty terminal but it.pdf
maheshkumar12354
 
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary dataKernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Anne Nicolas
 
Vectorization on x86: all you need to know
Vectorization on x86: all you need to knowVectorization on x86: all you need to know
Vectorization on x86: all you need to know
Roberto Agostino Vitillo
 

Similaire à 20141106 asfws unicode_hacks (20)

Writing Metasploit Plugins
Writing Metasploit PluginsWriting Metasploit Plugins
Writing Metasploit Plugins
 
[FT-11][suhorng] “Poor Man's” Undergraduate Compilers
[FT-11][suhorng] “Poor Man's” Undergraduate Compilers[FT-11][suhorng] “Poor Man's” Undergraduate Compilers
[FT-11][suhorng] “Poor Man's” Undergraduate Compilers
 
Exploring the x64
Exploring the x64Exploring the x64
Exploring the x64
 
Im trying to run make qemu-nox In a putty terminal but it.pdf
Im trying to run  make qemu-nox  In a putty terminal but it.pdfIm trying to run  make qemu-nox  In a putty terminal but it.pdf
Im trying to run make qemu-nox In a putty terminal but it.pdf
 
Kernel Recipes 2013 - Deciphering Oopsies
Kernel Recipes 2013 - Deciphering OopsiesKernel Recipes 2013 - Deciphering Oopsies
Kernel Recipes 2013 - Deciphering Oopsies
 
PostgreSQL as seen by Rubyists (Kaigi on Rails 2022)
PostgreSQL as seen by Rubyists (Kaigi on Rails 2022)PostgreSQL as seen by Rubyists (Kaigi on Rails 2022)
PostgreSQL as seen by Rubyists (Kaigi on Rails 2022)
 
Getting Started with Raspberry Pi - DCC 2013.1
Getting Started with Raspberry Pi - DCC 2013.1Getting Started with Raspberry Pi - DCC 2013.1
Getting Started with Raspberry Pi - DCC 2013.1
 
Continuation Passing Style and Macros in Clojure - Jan 2012
Continuation Passing Style and Macros in Clojure - Jan 2012Continuation Passing Style and Macros in Clojure - Jan 2012
Continuation Passing Style and Macros in Clojure - Jan 2012
 
Journey of Bsdconv
Journey of BsdconvJourney of Bsdconv
Journey of Bsdconv
 
Rapport
RapportRapport
Rapport
 
The Dark Side of Programming Languages
The Dark Side of Programming LanguagesThe Dark Side of Programming Languages
The Dark Side of Programming Languages
 
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary dataKernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
 
Static analysis and writing C/C++ of high quality code for embedded systems
Static analysis and writing C/C++ of high quality code for embedded systemsStatic analysis and writing C/C++ of high quality code for embedded systems
Static analysis and writing C/C++ of high quality code for embedded systems
 
プログラム実行の話と
OSとメモリの挙動の話
プログラム実行の話と
OSとメモリの挙動の話プログラム実行の話と
OSとメモリの挙動の話
プログラム実行の話と
OSとメモリの挙動の話
 
Manipulators in c++
Manipulators in c++Manipulators in c++
Manipulators in c++
 
04 - I love my OS, he protects me (sometimes, in specific circumstances)
04 - I love my OS, he protects me (sometimes, in specific circumstances)04 - I love my OS, he protects me (sometimes, in specific circumstances)
04 - I love my OS, he protects me (sometimes, in specific circumstances)
 
Make ARM Shellcode Great Again - HITB2018PEK
Make ARM Shellcode Great Again - HITB2018PEKMake ARM Shellcode Great Again - HITB2018PEK
Make ARM Shellcode Great Again - HITB2018PEK
 
Vectorization on x86: all you need to know
Vectorization on x86: all you need to knowVectorization on x86: all you need to know
Vectorization on x86: all you need to know
 
Debugging Ruby Systems
Debugging Ruby SystemsDebugging Ruby Systems
Debugging Ruby Systems
 
DEF CON 27 - SMEA - adventures in smart buttplug penetration testing
DEF CON 27 - SMEA - adventures in smart buttplug penetration testingDEF CON 27 - SMEA - adventures in smart buttplug penetration testing
DEF CON 27 - SMEA - adventures in smart buttplug penetration testing
 

Plus de Cyber Security Alliance

Asfws 2014 slides why .net needs ma-cs and other serial(-ization) tales_v2.0
Asfws 2014 slides why .net needs ma-cs and other serial(-ization) tales_v2.0Asfws 2014 slides why .net needs ma-cs and other serial(-ization) tales_v2.0
Asfws 2014 slides why .net needs ma-cs and other serial(-ization) tales_v2.0
Cyber Security Alliance
 

Plus de Cyber Security Alliance (20)

Bug Bounty @ Swisscom
Bug Bounty @ SwisscomBug Bounty @ Swisscom
Bug Bounty @ Swisscom
 
Robots are among us, but who takes responsibility?
Robots are among us, but who takes responsibility?Robots are among us, but who takes responsibility?
Robots are among us, but who takes responsibility?
 
iOS malware: what's the risk and how to reduce it
iOS malware: what's the risk and how to reduce itiOS malware: what's the risk and how to reduce it
iOS malware: what's the risk and how to reduce it
 
Why huntung IoC fails at protecting against targeted attacks
Why huntung IoC fails at protecting against targeted attacksWhy huntung IoC fails at protecting against targeted attacks
Why huntung IoC fails at protecting against targeted attacks
 
Corporations - the new victims of targeted ransomware
Corporations - the new victims of targeted ransomwareCorporations - the new victims of targeted ransomware
Corporations - the new victims of targeted ransomware
 
Blockchain for Beginners
Blockchain for Beginners Blockchain for Beginners
Blockchain for Beginners
 
Le pentest pour les nuls #cybsec16
Le pentest pour les nuls #cybsec16Le pentest pour les nuls #cybsec16
Le pentest pour les nuls #cybsec16
 
Introducing Man in the Contacts attack to trick encrypted messaging apps
Introducing Man in the Contacts attack to trick encrypted messaging appsIntroducing Man in the Contacts attack to trick encrypted messaging apps
Introducing Man in the Contacts attack to trick encrypted messaging apps
 
Understanding the fundamentals of attacks
Understanding the fundamentals of attacksUnderstanding the fundamentals of attacks
Understanding the fundamentals of attacks
 
Rump : iOS patch diffing
Rump : iOS patch diffingRump : iOS patch diffing
Rump : iOS patch diffing
 
An easy way into your sap systems v3.0
An easy way into your sap systems v3.0An easy way into your sap systems v3.0
An easy way into your sap systems v3.0
 
Easy public-private-keys-strong-authentication-using-u2 f
Easy public-private-keys-strong-authentication-using-u2 fEasy public-private-keys-strong-authentication-using-u2 f
Easy public-private-keys-strong-authentication-using-u2 f
 
Create a-strong-two-factors-authentication-device-for-less-than-chf-100
Create a-strong-two-factors-authentication-device-for-less-than-chf-100Create a-strong-two-factors-authentication-device-for-less-than-chf-100
Create a-strong-two-factors-authentication-device-for-less-than-chf-100
 
Warning Ahead: SecurityStorms are Brewing in Your JavaScript
Warning Ahead: SecurityStorms are Brewing in Your JavaScriptWarning Ahead: SecurityStorms are Brewing in Your JavaScript
Warning Ahead: SecurityStorms are Brewing in Your JavaScript
 
Rump attaque usb_caralinda_fabrice
Rump attaque usb_caralinda_fabriceRump attaque usb_caralinda_fabrice
Rump attaque usb_caralinda_fabrice
 
Operation emmental appsec
Operation emmental appsecOperation emmental appsec
Operation emmental appsec
 
Colt sp sec2014_appsec-nf-vfinal
Colt sp sec2014_appsec-nf-vfinalColt sp sec2014_appsec-nf-vfinal
Colt sp sec2014_appsec-nf-vfinal
 
Asfws2014 tproxy
Asfws2014 tproxyAsfws2014 tproxy
Asfws2014 tproxy
 
Asfws 2014 slides why .net needs ma-cs and other serial(-ization) tales_v2.0
Asfws 2014 slides why .net needs ma-cs and other serial(-ization) tales_v2.0Asfws 2014 slides why .net needs ma-cs and other serial(-ization) tales_v2.0
Asfws 2014 slides why .net needs ma-cs and other serial(-ization) tales_v2.0
 
Appsec rump reverse-i_os_machook
Appsec rump reverse-i_os_machookAppsec rump reverse-i_os_machook
Appsec rump reverse-i_os_machook
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

20141106 asfws unicode_hacks

  • 1. !!!! Unicode Application Security Forum - 2014 Western Switzerland ! 5-6 novembre 2014 Y-Parc / Yverdon-les-Bains Hacks Nicolas Seriot November 6th, 2014 h<p://unicode-­‐wall-­‐of-­‐shame.com
  • 2. • full presentaFon at SoGShake • 10 min. / 38 slides –> 15.8 s. / slide • an arFcle is coming…
  • 3. BCD EBCDIC 1963: ASCII Baudot Code 1990’s: 8 bit encodings ISO/IEC 8859-­‐1 (LaFn 1) ISO/IEC 8859-­‐6 (Arabic)
  • 5.
  • 6. glyphs ☃ text rendering engine NSLayoutManager fonts Times New Roman.ttf codepoints U+2603 SNOWMAN Unicode Standard algorithms normalizaFon collaFon casing binary representaFon E2 98 83 (UTF-­‐8)
  • 7. Visual SimilariFes AΑ А ᗅ ᗋ ᴀ A www.google.com – U+0067 LATIN SMALL LETTER G! www.ɡooɡle.com – U+0261 LATIN SMALL LETTER SCRIPT G ৪ – U+09EA BENGALI DIGIT FOUR! ୨ – U+0B68 ORIYA DIGIT TWO
  • 8. Country Flags U+1F1E6 + U+1F1E7 ! U+1F1E8 + U+1F1F3 #  ! U+1F1E9 + U+1F1EA &! U+1F1EA + U+1F1F8 '! U+1F1EB + U+1F1F7 (! U+1F1EC + U+1F1E7 )! U+1F1EE + U+1F1F9 *! U+1F1EF + U+1F1F5 +! U+1F1F0 + U+1F1F7 ,! U+1F1F7 + U+1F1FA -! U+1F1FA + U+1F1F8 .
  • 9. Bi-­‐direc4nal Text # U+202E RIGHT-TO-LEFT OVERRIDE! # double click a .jpg, open an .exe ! $ python3 -c "print('su202Egpj.exe')"! sexe.jpg
  • 10. glyphs ☃ text rendering engine NSLayoutManager fonts Times New Roman.ttf codepoints U+2603 SNOWMAN Unicode Standard algorithms normaliza4on colla4on casing binary representa4on E2 98 83 (UTF-­‐8)
  • 11. $ gdb Twitter ! ! (gdb) r! Starting program: /Applications/Twitter.app/Contents/MacOS/Twitter ! ! Program received signal EXC_BAD_ACCESS, Could not access memory.! Reason: KERN_INVALID_ADDRESS at address: 0x00000001084e8008! 0x00007fff9432ead2 in vDSP_sveD ()! ! (gdb) bt! #0 0x00007fff9432ead2 in vDSP_sveD ()! #1 0x00007fff934594fe in TStorageRange::SetStorageSubRange ()! #2 0x00007fff93457d5c in TRun::TRun ()! #3 0x00007fff934579ee in CTGlyphRun::CloneRange ()! #4 0x00007fff93466764 in TLine::SetLevelRange ()! #5 0x00007fff93467e2c in TLine::SetTrailingWhitespaceLevel ()! #6 0x00007fff93467d58 in TRunReorder::ReorderRuns ()! #7 0x00007fff93467bfe in TTypesetter::FinishLineFill ()! #8 0x00007fff934858ae in TFramesetter::FrameInRect ()! #9 0x00007fff93485110 in TFramesetter::CreateFrame ()! #10 0x00007fff93484af2 in CTFramesetterCreateFrame ()! ...
  • 12. OS X Finder $ echo -e "xFFxFE" > x.txt # UTF-16LE BOM! $ xattr -w com.apple.TextEncoding "utf-16le" x.txt! $ qlmanage -p x.txt # or QuickLook with Finder [ERROR] An uncaught exception was raised outside of any generator: *** -[NSConcreteTextStorage attribute:atIndex:longestEffectiveRange:inRange:]: Range or index out of bounds! 2014-10-24 10:53:08.474 qlmanage[5268:11f] *** Terminating app due to uncaught exception 'NSRangeException', reason: '*** -[NSConcreteTextStorage attribute:atIndex:longestEffectiveRange:inRange:]: Range or index out of bounds'! *** First throw call stack:! (!! 0 CoreFoundation 0x00007fff89ebe25c __exceptionPreprocess + 172! ! 1 libobjc.A.dylib 0x00007fff87934e75 objc_exception_throw + 43! ! 2 CoreFoundation 0x00007fff89ebe10c +[NSException raise:format:] + 204! ! 3 AppKit 0x00007fff81a83a7a -[NSConcreteTextStorage attribute:atIndex:longestEffectiveRange:inRange:] + 118! ! 4 AppKit 0x00007fff81951ded -[NSMutableAttributedString(NSMutableAttributedStringKitAdditions) fixGlyphInfoAttributeInRange:] + 204! ! 5 AppKit 0x00007fff81951cd8 -[NSMutableAttributedString(NSMutableAttributedStringKitAdditions) fixAttributesInRange:] + 39! ! 6 AppKit 0x00007fff81a838e1 -[NSTextStorage processEditing] + 109! ! 7 AppKit 0x00007fff81a7f742 -[NSTextStorage endEditing] + 110! ! 8 AppKit 0x00007fff81c5db4f _NSReadAttributedStringFromURLOrData + 14525! ! 9 AppKit 0x00007fff81c5e3a5 -[NSAttributedString(NSAttributedStringKitAdditions) initWithURL:options:documentAttributes:
  • 13. glyphs ☃ text rendering engine NSLayoutManager fonts Times New Roman.ttf codepoints U+2603 SNOWMAN Unicode Standard algorithms normaliza4on colla4on casing binary representa4on E2 98 83 (UTF-­‐8)
  • 14. Weird Code Points May Bypass Filters • Non-­‐characters: eg. U+FFFE, U+FFFF, U+1FFFE, U+10FFFF Unassigned code points: eg. U+2073! • Must not be deleted (as allowed by Unicode < 5.2 C7) but replaced with U+FFFD REPLACEMENT CHARACTER. <a href=“javauFEFFscript:alert("XSS")>
  • 15. Non-­‐Characters and OS X Bash / HFS+ $ mkdir /tmp/test! $ cd /tmp/test! $ touch `printf "axefxbbxbfb"`! # or "auFFFEb".encode('utf-8')! # which is a non-character! $ ls a*! a?b! $ touch ab! $ ls a* ! a?b! # where did ab go?!
  • 16. Regex $ python3! >>> import re! >>> reg = re.compile("d") ! >>> gen = ( chr(c) for c in range(0, 0xFFFF) if re.match(reg, chr(c)) )! >>> print(''.join(gen))! 0123456789۰۱۲۳٤٥٦۷۸۹۰۱۲۳۴۵۶۷۸۹߉߈߇߆߅߄߃߂߁߀०१२३४५६७८९০১২৩৪৫৬৭৮৯੦੧੨੩੪੫੬੭੮੯૦૧૨૩૪૫૬૭૮૯୦ ୧୨୩୪୫୬୭୮୯௦௧௨௩௪௫௬௭௮௯౦౧౨౩౪౫౬౭౮౯೦೧೨೩೪೫೬೭೮೯൦൧൨൩൪൫൬൭൮൯๐๑๒๓๔๕๖๗๘๙໐໑໒໓໔໕໖ ໗໘໙༠༡༢༣༤༥༦༧༨༩၀၁၂၃၄၅၆၇၈၉႐႑႒႓႔႕႖႗႘႙០១២៣៤៥៦៧៨៩᠐᠑᠒᠓᠔᠕᠖᠗᠘᠙᥆᥇᥈᥉᥊᥋᥌᥍᥎᥏᧐᧑᧒᧓᧔᧕᧖᧗᧘ ᧙᪀᪁᪂᪃᪄᪅᪆᪇᪈᪉᪐᪑᪒᪓᪔᪕᪖᪗᪘᪙᭐᭑᭒᭓᭔᭕᭖᭗᭘᭙᮰᮱᮲᮳᮴᮵᮶᮷᮸᮹᱀᱁᱂᱃᱄᱅᱆᱇᱈᱉᱐᱑᱒᱓᱔᱕᱖᱗᱘᱙꘠꘡꘢꘣꘤꘥꘦꘧꘨꘩ ꣐꣑꣒꣓꣔꣕꣖꣗꣘꣙꤀꤁꤂꤃꤄꤅꤆꤇꤈꤉꧐꧑꧒꧓꧔꧕꧖꧗꧘꧙꩐꩑꩒꩓꩔꩕꩖꩗꩘꩙꯰꯱꯲꯳꯴꯵꯶꯷꯸꯹0123456789! >>> reg = re.compile("d", re.ASCII)
  • 17. Regex $ jsc! >>> /a.c/.test('abc')! true! >>> /a.c/.test('ac')! false! >>> /a....c/.test('ac')! true
  • 18. glyphs ☃ text rendering engine NSLayoutManager fonts Times New Roman.ttf codepoints U+2603 SNOWMAN Unicode Standard algorithms normaliza3on colla3on casing binary representa3on E2 98 83 (UTF-­‐8)
  • 19. é Normaliza3on TR#15 U+00E9 ① U+2460 CompaCanonical decomposi&on &bility decomposi&on NFD NFKD 1 é U+0065 ◌́ U+0301 é U+00E9 1 U+0031 Canonical composi&on U+0031 NFC NFKC ◌́ é U+0065 ① ① U+2460 e U+0065 U+0301 U+2460 (most common)
  • 20. NFC doesn’t Always Compose שּׁ HEBREW LETTER! U+FB2C SHIN WITH DAGESH AND SHIN DOT ◌ּ HEBREW LETTER! U+05BC SHIN ש HEBREW LETTER! U+05E9 ◌ׁ HEBREW LETTER! SHIN WITH DAGESH AND SHIN DOT U+05C1 SHIN DOT NFC(U+FB2C) + + buffer overflow
  • 21. NFKD Expands Up to 18x صلى الله عليه وسلمU+FDFA ARABIC LIGATURE! SALLALLAHOU ALAYHE WASALLAM >>> import unicodedata ! >>> s = 'uFDFA' >>> len(s) 1 ! >>> s_nfkd = unicodedata.normalize('NFKD', s) >>> s_nfkd.encode('unicode-escape') b'u0635u0644u0649 u0627u0644u0644u0647 u0639 u0644u064au0647 u0648u0633u0644u0645' >>> len(s_nfkd) 18 buffer overflow
  • 22. NFK* May Bypass Filters ' FULLWIDTH U+FF07 APOSTROPHE NFK*(U+FF07) ‘ APOSTROPHE! U+0027 SQL injec3on
  • 24. glyphs ☃ text rendering engine NSLayoutManager fonts Times New Roman.ttf codepoints U+2603 SNOWMAN Unicode Standard algorithms normaliza3on colla3on casing binary representa3on E2 98 83 (UTF-­‐8)
  • 25. Unicode Colla3on Algorithm – TR#10 (UTS) German Swedish Åkersberga 1 2 Alingsås Alingsås 2 4 Oskarshamn Äpplebo 3 7 Ufng Oskarshamn 4 6 Üheld Östersund 5 8 Zwickau Üheld 6 1 Åkersberga Ufng 7 3 Äpplebo Zwickau 8 5 Östersund (Steven R. Loomis, Mark Davis) • Text comparison café < cafe ? cafe < café ? • Usage dependent German dic,onary: öf < of German phonebook: of < öf • Unstable over 6me Sorted lists should be versioned
  • 26. glyphs ☃ text rendering engine NSLayoutManager fonts Times New Roman.ttf codepoints U+2603 SNOWMAN Unicode Standard algorithms normaliza,on colla,on casing binary representa,on E2 98 83 (UTF-­‐8)
  • 27. Case Folding hHp://www.unicode.org/Public/UNIDATA/CaseFolding.txt # The data supports both implementations that require simple case foldings! # (where string lengths don't change), and implementations that allow full case folding! # (where string lengths may grow). Note that where they can be supported, the! # full case foldings are superior: for example, they allow "MASSE" and "Maße" to match.
  • 28. Case Conversion I U+0049 İ U+0130 ı U+0131 i U+0069 I U+0049 i U+0069 ◌̇ U+0307 ◌̇ U+0307 İ U+0130 ◌̇ U+0307 Posix Locale Turkish Locale
  • 29. Case Conversion – Locale NSString *s = [NSString stringWithFormat:@"istambul"]; ! NSLocale *locale = [NSLocale localeWithLocaleIdentifier:@"tr_TR"]; ! NSString *s2 = [s uppercaseStringWithLocale:locale]; ! // İSTAMBUL ✅
  • 30. Python 3 • ❌ Colla,on: s,ll compare codepoints >>> 'café' < 'caff' False • ❌ Case Conversion restricted to 1:1 case mappings >>> 'ß'.upper() 'ß'! • ❌ Case conversion ignores locale ❌ Addi,onaly, locale is global >>> import locale >>> locale.setlocale(locale.LC_ALL, 'tr_TR') >>> s = "istanbul" >>> s.upper() 'ISTANBUL'
  • 31. glyphs ☃ text rendering engine NSLayoutManager fonts Times New Roman.ttf codepoints U+2603 SNOWMAN Unicode Standard algorithms normaliza,on colla,on casing binary representa,on E2 98 83 (UTF-­‐8)
  • 32. 0x0800 0xFFFF 0x0100000 0x10FFFF UTF-­‐8 Bits Hex Min Hex Max Byte Sequence in Binary! 1 7 00000000 0000007f 0vvvvvvv! 2 11 00000080 000007FF 110vvvvv 10vvvvvv! 3 16 00000800 0000FFFF 1110vvvv 10vvvvvv 10vvvvvv! 4 21 00010000 001FFFFF 11110vvv 10vvvvvv 10vvvvvv 10vvvvvv! 5 26 00200000 03FFFFFF 111110vv 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv! 6 31 04000000 7FFFFFFF 1111110v 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv Malformed UTF-­‐8 sequences include: -­‐ overlong encoding, 0x1 on 2 bytes 11000000 10000001 0xC0 0x41 -­‐ unexpected con,nua,on byte 11000000 00000000 0xC0 0x00
  • 33. UTF-­‐16 0x0000 0xD800 0xE000 0xFFFF 0x0000 0x010000 00xx1F0FFFFFFF Bits Hex Min Hex Max Byte Sequence in Binary! 2 16 00000000 0000FFFF vvvvvvvv vvvvvvvv! 4 21 00010000 001FFFFF 110110ww wwwwwwww 110111ww wwwwwwww! ! www.. is (vvv.. - 0x10000) to map a 20 bits value Malformed sequences include unpaired surrogates such as: -­‐ 110110ww wwwwwwww not followed by 110111ww wwwwwwww -­‐ 110111ww wwwwwwww not preceded by 110110ww wwwwwwww
  • 34. Wide Characters • Unicode code points were first defined on 16 bits (UCS-­‐2) • and now Java char / Objec,ve-­‐C unichar are 16 bits • code points > 0xFFFF defined as a pair of 16 bits values • sizeof(wchar_t) is generally 16 bits on Windows, 32 bits on Linux
  • 35. Objec,ve-­‐C / Cocoa NSString *s1 = @"abc"; NSString *s2 = @"U0001F600bc"; ! NSLog(@"s1 %@", s1); // s1 abc NSLog(@"s2 %@", s2); // s2 bc ! NSLog(@"s1[0] -> %C", [s1 characterAtIndex:0]); // s1[0] -> a NSLog(@"s2[0] -> %C", [s2 characterAtIndex:0]); // nothing printed because // s2 = [0xD83D, 0xDE00], and U+D83D is a high surrogate // and NSLog() ignores nil strings
  • 36. HFS+ Apple Technical Q&A QA1173
  • 37. HFS+ # what you write…! $ echo ü; echo ü | xxd! ü! 0000000: c3bc 0a # NFC! ! # is not what you read! $ touch ü; ls; ls | xxd! ü! 0000000: 75cc 880a # NFD # watch your Finder go nuts!!!! $ cd; touch `printf "x41xe9"` # NFC("Aé")! $ open .! # fixed in OS X 10.10
  • 38. Conclusion • Unicode is cool. Unicode is hard. Unicode is ubiquitous. • How well do you know your framework of choice? • Everything dealing with Unicode is a bug nest. • Under-­‐studied topic. Tons of low-­‐hanging fruits. • See Chris Weber’s hHp://websec.github.io/unicode-­‐security-­‐guide/ « Unicode is just too complex to ever be secure. » – Bruce Schneier, 2000