Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Meltdown & spectre
1. Meltdown & Spectre
Mitigate the unmitigable
Based on
● Meltdown: Reading Kernel Memory from User Space
● Spectre Attacks: Exploiting Speculative Execution
3. ● What are them?
○ Vulnerabilities in most modern CPU, due to
poorly understood interactions between
speculative execution and side effects
● When?
○ Discovered in june/july of 2017 but leaked on
4th of january 2018 by Google’s Project Zero
● Effects?
○ Need to modify:
■ CPU Memory
■ OS Virtual memory handling
■ Compilers
■ Hypervisors
■ Browsers
○ Real solution: change CPU DESIGN
3
Are considered the greatest
hardware vulnerabilities in
computer history
6. Basic memory model of modern CPU
❑ Each CPU has 3 caches*:
❏ L1 with access latency ~ 5 cycles
(1.2 ns)
❏ L2 with access latency ~ 10
cycles (4.2 ns)
❏ L3 with access latency ~ 40
cycles (20 ns)
❏ DRAM (100 ns)
*Modern Core i7 Xeon Server Edition
6
7. Fetch-decode-execute parallelism
❑ On a non-pipelined CPU, when a instruction
is being processed at a particular stage, the
other stages are at an idle state
❑ On a pipelined CPU, all the stages work in
parallel:
❏ When the 1st instruction is being
decoded by the Decoder Unit, the 2nd
instruction is being fetched by the Fetch
Unit. It only takes 5 clock cycles to
execute 2 instructions on a pipelined
CPU.
Image source: https://stackpointer.io/hardware/how-pipelining-improves-cpu-performance/113/
7
8. Intel SkyLake CPU Microarchitecture
❑ Simplified illustration of a
single core of the Intel’s
Skylake microarchitecture
❑ Instructions are decoded
into µOPs and executed
out-of-order in the execution
engine by individual
execution units
Figure from MeltDown by Lipp
https://www.usenix.org/system/files/conference/usenixsecurity18/sec18-lipp.pdf
8
9. Out-of-order processing
1 op per cycle
9
❑ t = a + b
❑ u = c + d
❑ v = e + f
❑ w = v + g
❑ x = h + i
❑ y = j + k
6 clock cycles
❑ t = a + b, u = c + d
❑ v = e + f, w = v + g
❑ x = h + i, y = j + k
3 clock cycles
2 op per cycle
10. Out-of-order processing
1 op per cycle
10
❑ t = a + b
❑ u = c + d
❑ v = e + f
❑ w = v + g
❑ x = h + i
❑ y = j + k
6 clock cycles
❑ t = a + b, u = c + d
❑ v = e + f, w = v + g
❑ x = h + i, y = j + k
3 clock cycles
2 op per cycle
11. Out-of-order processing
11
❑ t = a + b, u = c + d
❑ v = e + f
❑ w = v + g, x = h + i
❑ y = j + k
❑ t = a + b, u = c + d
❑ v = e + f, y = j + k
❑ w = v + g, x = h + i
3 clock cycles, 2 op per cycle and
out-of-order execution
12. Fast in a straight line
not so good on corners
13. Branch prediction
13
❑ Pipelining, superscalar and out-of-order execution only
helps if you know what instructions are coming next
❑ Conditionals are a problem - we don’t know what to load
into a pipeline until conditional IF is clear
Branch prediction will help: let's guess
❑ If the guess is right, great
❑ If the guess is wrong, clear pipeline
Branch Prediction Is Not A Solved Problem: Measurements, Opportunities, and Future Directions
https://arxiv.org/pdf/1906.08170.pdf
B1 B2
?
14. Branch prediction + Speculative execution
14
❑ Cache misses cause long delay in data extraction
❑ Speculative execution:
❏ Execute instructions on predicted branch
❏ If prediction was right - great!
❏ If prediction was wrong - undo all effects of running in
speculative mode and flush pipeline
Branch Prediction Is Not A Solved Problem: Measurements, Opportunities, and Future Directions
https://arxiv.org/pdf/1906.08170.pdf
15. Example of branch prediction with speculation
15
1. C = A + B;
2. E = C + D;
3. G = E + F;
4. if (G == 0){
5. J = H + I;
6. L = J + K;
7. N = L + M;
8. }
Without speculation
can’t reorder
anything
16. Example of branch prediction with speculation
16
1. C = A + B;
2. E = C + D;
3. G = E + F;
4. if (G == 0){
5. J = H + I;
6. L = J + K;
7. N = L + M;
8. }
With speculation
1. C = A + B; _J = H + I;
2. E = C + D; _L = _J + K;
3. G = E + F; _N = _L + M;
4. if (G == 0){
5. J =_J; L =_L; N=_N;
6. }
Only make
results available
if G equals 0
ALU 1 ALU 2
ALU 1
17. Virtual memory
❑ Virtual memory is divided into
Kernel and User space
❑ Page table handles mappings
❑ Kernel space pages have an
extra bit
❑ If user code tries accessing
kernel space a trap will occur
17
Kernel space
User space
Physical
memory
Page
table
19. Meltdown Abstract
19
Meltdown exploits side effects of out-order execution on
modern processors to read arbitrary kernel-memory locations
including personal data and passwords.
...read memory of process or virtual machines in the cloud
without any permissions or privileges.
CVE-2017-5754
20. Meltdown attack: side channel
20
1. C = A + B;
2. E = C + D;
3. G = E + F;
4. Z = G + Y
5. if (Z == 0){
6. J = kernel_mem[addr];
7. L = J & 0x01;
8. N = L * 4096;
9. M = user_mem[N];
10. }
1. C = A + B; _J = kernel_mem[addr];
2. E = C + D; _L = _J & 0x01;
3. G = E + F; _N = _L * 4096;
4. Z = G + Y; _M = user_mem[_N];
5. if (Z == 0){
6. J =_J; L =_L; N =_N; M =_M;
7. }
8. t1 = get_time();
9. V = user_mem[0];
10. t2 = get_time();
11. delta = t2 - t1;
ALU 1 ALU 2
Predictor
thinks this is
true
Normally, this should receive
segmentation fault, but not in
speculative case
As this will never being executed, _J,
_L and _N will be never revealed
If the difference is < 10
cycles the 1st
bit of
kernel_mem[addr] = 0
Side channel magic
here
22. Meltdown mitigation
22
❑ Why do operating systems map kernel memory into global
address space?
❏ ...making in this way Meltdown possible..
❑ .. because of memory page caching named TLB
(translation lookaside buffer)
❏ Page table is stored in memory itself, so accessing it
costs a lot
* Meltdown Main Paper
https://meltdownattack.com/meltdown.pdf
23. TLB
23
❑ A translation lookaside buffer (TLB)
is a memory cache that is used to
reduce the time taken to access a
user memory location, it is a part of
MMU
❑ If you try accessing page which is
not in TLB need to traverse all page
table to find where the physical
page is
❏ Cost is near 300-600 cycles!
* Meltdown Main Paper
https://meltdownattack.com/meltdown.pdf
Kernel space
User space
Physical
memory
MMU
TLB
24. KPTI - Kernel Page Table Isolation
24
❑ Only have user space memory
mapping in the page table
❏ So meltdown can “physically”
read kernel memory
❑ Some additional line of code in kernel
to handle userspace and kernel space
page table switching
25. KPTI - Kernel Page Table Isolation
25
❑ Complete mitigation of meltdown
❑ Changing page tables requires flushing
TLB
❑ Big impact on performance (this will
happen on each OS Call!)
❑ In some cases > 50% slowdown
❑ Newer Intel CPU has association Process
Context ID with TLB
❏ This allows flushing only part of TLB
27. Spectre Abstract
27
❏ Spectre Variant 1: Bound Check Bypass
❏ Exploiting Conditional Branches
❏ CVE-2017-5753
❏ Spectre Variant 2: Branch Target Injection
❏ Exploiting Indirect Branches
❏ CVE-2017-5715
Spectre attacks involve inducing a victim to speculatively perform
operations that would not occur during correct program execution
and wich leak victim’s confidential information via a side channel to
the adversary
30. Spectre attack: Overview
30
❑ Setup Phase: Minstrain the CPU
❑ Speculative Execution of Instructions that leak sensitive
information to side channel
❑ Recovering sensitive instruction from side channel using:
❏ Flush + Reload
❏ Evict + Reload
Yarom, Y., and Fakner, K. - FLUSH+RELOAD: A HIGH RESOLUTION, LOW NOISE, L3 CACHE SIDE-CHANNEL ATTACK, - in Usenix
Security Symposium (2014).
transitories
31. Variant 1 : Bound Check Bypass
31
1. array a = …; // size 400
2. array b = …; // size 512
3. offset = …; // user input
4. if (offset < size(a)){
5. v = a[offset];
6. i = (v & 0x01) * 4096;
7. x = b[i];
8. }
32. Variant 2 : Poisoning Indirect Branches
❏ Indirect branch instructions have ability to jump to more than
two possible target addresses.
❏ X86 example
❏ jmp eax: Address in register
❏ jmp [eax]: Address in memory location
❏ jmp dword ptr [0x12345678]
❏ Address from the stack (“ret”)
❏ MIPS example
❏ jr $ra
32
33. How can we use this?
❏ Find unsafe user code [unlikely]
❏ Use JIT Compiler of user code [highly likely]
❏ Google PoC uses BPF (packet filter in JIT)
in Linux kernel
❏ Microsoft PoC uses Javascript
❏ Potentially any visited website can
access all your RAM memory
33
37. Preventing Speculative Execution
Inserting speculative execution blocking instructions
❏ Degrades performance if used too extensively
❏ Use static analysis to find out optimum placement of blocking instructions
❏ Requires code recompilation
37
❏ Intel Processors:
❏ add lfence before IF in bound checks (prevent speculation)
❏ ARM Processors:
❏ add build_in_no_speculate() to your compilation process
38. Conclusion
❏ Software Isolation techniques are widely deployed
❏ A fundamental security assumption underpinning all of these is that the CPU will
faithfully execute software, including its safety checks
❏ Speculative execution violates this assumption that allow adversaries to
determine the contents of memory and register
❏ Trade-offs between security and performance
38