SlideShare a Scribd company logo
1 of 52
Download to read offline
COMP9242 
Advanced Operating Systems 
S2/2014 Week 7: 
Microkernel Design 
@GernotHeiser 
COMP9242 S2/2014
Copyright Notice 
These slides are distributed under the Creative Commons 
Attribution 3.0 License 
• You are free: 
– to share—to copy, distribute and transmit the work 
– to remix—to adapt the work 
• under the following conditions: 
– Attribution: You must attribute the work (but not in any way that 
suggests that the author endorses you or your use of the work) as 
follows: 
• “Courtesy of Gernot Heiser, [Institution]”, where [Institution] is one of 
“UNSW” or “NICTA” 
The complete license text can be found at 
http://creativecommons.org/licenses/by/3.0/legalcode 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 2 Attribution License 
COMP9242 S2/2014
Microkernel Principles: Minimality 
A concept is tolerated inside the microkernel 
only if moving it outside the kernel, i.e. 
permitting competing implementations, would 
prevent the implementation of the system’s 
required functionality. [SOSP’95] 
• Advantages of resulting small kernel: 
Limited by arch-specific 
micro-optimisations 
– Easy to implement, port? 
– Easier to optimise 
– Hopefully enables a minimal trusted computing base (TCB) 
– Easier debug, maybe even prove correct? 
• Challenges: 
Small attack 
surface, fewer 
failure modes 
– API design: generality despite small code base 
– Kernel design and implementation for high performance 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 3 Attribution License 
COMP9242 S2/2014
Consequence of Minimality: User-level Services 
VFS 
IPC, file system 
Scheduler, virtual memory 
Device drivers, dispatcher 
Hardware 
IPC, virtual memory 
User 
Mode 
• Kernel provides no services, only mechanisms 
• Kernel is policy-free 
– Policies limit (good for 90% of cases, disastrous for some) 
– “General” policies lead to bloat, inefficiency 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 4 Attribution License 
COMP9242 S2/2014 
Hardware 
Application 
Application 
Unix 
Server 
File 
Server 
Device 
Driver 
Syscall 
IPC 
Kernel 
Mode 
IPC 
performance 
is critical!
1993 “Microkernel” IPC Performance 
[μs] 
400 
300 
200 
100 
0 
Mach 
L4 
0 2000 4000 6000 
Message Length [B] 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 5 Attribution License 
COMP9242 S2/2014 
115 μs 
5 μs 
i486 @ 
50 MHz 
Culprit: 
Cache 
footprint 
[SOSP’95] raw copy
L4 IPC Performance over 20 Years 
Name Year Processor MHz Cycles μs 
Original 1993 i486 50 250 5.00 
Original 1997 Pentium 160 121 0.75 
L4/MIPS 1997 R4700 100 86 0.86 
L4/Alpha 1997 21064 433 45 0.10 
Hazelnut 2002 Pentium 4 1,400 2,000 1.38 
Pistachio 2005 Itanium 1,500 36 0.02 
OKL4 2007 XScale 255 400 151 0.64 
NOVA 2010 i7 Bloomfield (32-bit) 2,660 288 0.11 
seL4 2013 i7 Haswell (32-bit) 3,400 301 0.09 
seL4 2013 ARM11 532 188 0.35 
seL4 2013 Cortex A9 1,000 316 0.32 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 6 Attribution License 
COMP9242 S2/2014
Minimality: Source Code Size 
Name Architecture C/C++ asm total kSLOC 
Original i486 0 6.4 6.4 
L4/Alpha Alpha 0 14.2 14.2 
L4/MIPS MIPS64 6.0 4.5 10.5 
Hazelnut x86 10.0 0.8 10.8 
Pistachio x86 22.4 1.4 23.0 
L4-embedded ARMv5 7.6 1.4 9.0 
OKL4 3.0 ARMv6 15.0 0.0 15.0 
Fiasco.OC x86 36.2 1.1 37.6 
seL4 ARMv6 9.7 0.5 10.2 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 7 Attribution License 
COMP9242 S2/2014
L4 Family Tree 
API Inheritance 
Code Inheritance 
L4/MIPS 
L4/Alpha 
Caps 
L3→L4 “X” Hazelnut Pistachio 
seL4 
L4-embed. 
OKL4-μKernel 
OKL4-Microvisor 
Codezero 
Fiasco Fiasco.OC 
GMD/IBM/Karlsruhe Nova 
UNSW/NICTA 
P4 → PikeOS 
Dresden 
93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 8 Attribution License 
Assember 
C++ 
C 
Asm+C 
Portable 
Verified 
COMP9242 S2/2014
L4 Deployments – in the Billions 
SiMKo 3 “Merkelphone” 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 9 Attribution License 
COMP9242 S2/2014
L4 Design and Implementation 
Implement. Tricks [SOSP’93] 
• Process kernel 
• Virtual TCB array 
• Lazy scheduling 
• Direct process switch 
• Non-preemptible 
• Non-portable 
• Non-standard calling 
convention 
• Assembler 
Design Decisions [SOSP’95] 
• Synchronous IPC 
• Rich message structure, 
arbitrary out-of-line messages 
• Zero-copy register messages 
• User-mode page-fault handlers 
• Threads as IPC destinations 
• IPC timeouts 
• Hierarchical IPC control 
• User-mode device drivers 
• Process hierarchy 
• Recursive address-space 
construction 
Objective: Minimise cache footprint and TLB misses 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 10 Attribution License 
COMP9242 S2/2014 
COMP9242 S2/2014
DESIGN 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 11 Attribution License 
COMP9242 S2/2014
What Mechanisms? 
• Fundamentally, the microkernel must abstract 
– Physical memory: Address spaces 
– CPU: Threads 
– Interrupts/Exceptions 
• Unfettered access to any of these bypasses security 
– No further abstraction needed for devices 
• memory-mapping device registers and interrupt abstraction suffices 
• …but some generalised memory abstraction needed for I/O space 
• Above isolates execution units, hence microkernel must also provide 
– Communication (traditionally referred to as IPC) 
– Synchronization 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 12 Attribution License 
COMP9242 S2/2014
Memory: Policy-Free Address-Space Management 
Text Data BSS libc File Stack 
• Kernel provides empty address-space “shell” 
– page faults forwarded to server 
– server provides mapping 
– AS layout is server policy (not kernel) 
Page-fault 
server 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 13 Attribution License 
• Cost: 
– 1 round-trip IPC, plus mapping operation 
• mapping may be side effect of IPC 
• kernel may expose data structure 
• Kernel mechanism: forwarding page-fault exception 
• “External pagers” first appeared in Mach [Rashid et al, ’88] 
– … but were optional (and slow) – in L4 there’s no alternative 
Map 
Exception 
Stack 
COMP9242 S2/2014
Abstracting Memory: Address Spaces 
Map’d 
Page 
Unm. 
Page 
• Minimum address-space abstraction: empty slots for page mappings 
– paging server can fill with mappings 
• virtual address → physical address + permissions 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 14 Attribution License 
• Can be 
– page-table–like: array under full user control 
– TLB-like: cache for mappings which may vanish 
• Main design decision: is source of a mapping a page or a frame? 
– Frame: hardware-like 
– Page: recursive address spaces (original L4 model) 
COMP9242 S2/2014
Traditional L4: Recursive Address Spaces 
Unmap 
X 
Map Grant 
Mappings are 
page → page Magic initial AS to 
Initial Address Space 
Physical Memory 
anchor recursion 
(map of PM) 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 15 Attribution License 
COMP9242 S2/2014
Recursive Address Space Experience 
API complexity: Recursive address-space model 
• Conceptually elegant 
– trivially supports virtualization 
• Drawback: Complex mapping database 
– Kernel needs to track mapping relationship 
• Tear down dependent mappings on unmap 
– Mapping database problems: 
• accounts for 1/4–1/2 of kernel memory use 
• SMP coherence is performance bottleneck 
• NICTA’s L4-embedded, OKL4 removed MDB 
– Map frames rather than pages 
• need separate abstraction for frames / physical memory 
• subsystems no longer virtualizable (even in OKL4 cap model) 
• Properly addressed by seL4’s capability-based model 
– But have cap derivation tree, subject of on-going research 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 16 Attribution License 
COMP9242 S2/2014
Abstracting Execution 
• Can abstract as: 
– kernel-scheduled threads 
• Forces (scheduling) policy into the kernel 
– vCPUs or scheduler activations 
• This essentially virtualizes the timer interrupt through upcall 
– Scheduler activations also upcall for exceptions, blocking etc 
• Multiple vCPUs only for real multiprocessing 
• Threads can be tied to address space or “migrating” 
IPC Cross- 
AS call 
– Implementation-wise not much of a difference 
• Tight integration/interdependence with IPC model! 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 17 Attribution License 
COMP9242 S2/2014
Abstracting Interrupts and Exceptions 
• Can abstract as: 
– Upcall to interrupt/exception handler 
• hardware-like diversion of execution 
• need to save enough state to continue interrupted execution 
– IPC message to handler from magic “hardware thread” 
• OS-like 
• needs separate handler thread ready to receive 
H/W 
“Thread” 
Exception IPC 
Handler 
Thread 
• Page fault tends to be special-cased for practical reason 
– Tends to require handling external to faulter 
• IPC message to page-fault server rather than exception handler 
– But also “self-paging” as in Nemesis [Hand ’99] or Barrelfish 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 18 Attribution License 
COMP9242 S2/2014
L4 Synchronous IPC 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 19 Attribution License 
COMP9242 S2/2014 
Thread1 
Running Blocked 
Thread2 
Blocked Running 
Send (dest, msg) 
Wait (src, msg) 
….... 
Kernel 
copy 
Rendezvous 
model 
Kernel executes in sender’s context 
• copies memory data directly to 
receiver (single-copy) 
• leaves message registers unchanged 
during context switch (zero copy)
“Long” IPC 
Sender address space 
Kernel copy 
Receiver address space 
LONG IPC 
ABANDONED 
Page fault! 
• IPC page faults are nested exceptions ⇒ In-kernel concurrency 
– L4 executes with interrupts disabled for performance, no concurrency 
• Must invoke untrusted usermode page-fault handlers 
– potential for DOSing other thread 
• Timeouts to avoid DOS attacks 
– complexity 
Why have long IPC? 
• POSIX-style APIs 
write (fd, buf, nbytes) 
• Usually prefer shared buffers 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 20 Attribution License 
COMP9242 S2/2014
IPC Timeouts 
ABANDONED 
in seL4, OKL4 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 21 Attribution License 
Timeouts 
COMP9242 S2/2014 
Thread1 
Running Blocked 
Thread2 
Blocked Running 
Send (dest, msg) 
….... Wait (src, msg) 
Kernel 
copy 
Limit IPC 
blocking 
time 
Thread1 
Running Blocked 
Rcv(NIL_THRD, delay) 
….... 
Timed 
wait 
• No theory/heuristics for 
determining timeouts 
• Typically server reply 
with zero TO, else ∞ 
• Added complexity 
• Can do timed wait with 
timer syscall
Synchronous IPC Issues 
Thread1 
Running Blocked 
Initiate_IO(…,…) 
IO_Wait(…,…) 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 22 Attribution License 
COMP9242 S2/2014 
Not 
generally 
possible 
Worker_Th 
Running Blocked 
IO_Th 
Blocked Running 
Unblock (IO_Th) ….... Call (IO,msg) 
Sync(Worker_Th) 
Sync(IO_Th) ….... 
• Sync IPC forces multi-threaded code! 
• Also poor choice for multi-core
Asynchronous Notifications 
….... 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 23 Attribution License 
Thread1 
Running Blocked 
Thread2 
Blocked Running 
w = Poll (…) 
…... w = Wait (…) 
Send (Thr _ 2 , …) ….... 
Send (Thr_2, …) • Delivers few bits (destructively) 
• Kernel only buffers single word 
• Maps well to interrupts, exceptions 
Server 
Client Driver Sync Async 
• Thread can wait for 
synchronous and 
asynchronous 
messages concurrently 
Sync IPC 
complemented 
with async 
COMP9242 S2/2014
Is Synchronous IPC Redundant? 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 24 Attribution License 
COMP9242 S2/2014 
Client 
Running Blocked 
Server 
Blocked Running 
Call (dest, msg) ReplyWait (src, msg) 
….... 
Control 
transfer 
2 IPC 
mechanisms: 
Minimality 
violation 
Sync IPC is useful intra-core: 
• fast control transfer 
• mimics migrating threads 
• enables scheduling context donation 
! useful for real-time systems 
But presently no 
clean model!
Direct vs Indirect IPC Adressing 
• Direct: Queue senders/messages at receiver 
– Need unique thread IDs 
– Kernel guarantees identity of sender 
• useful for authentication 
• Indirect: Mailbox/port object 
– Just a user-level handle for the kernel-level queue 
– Extra object type – extra weight? 
– Communication partners are anonymous 
• Need separate mechanism for authentication 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 25 Attribution License 
Receiver 
Sender 
Sender 
Port 
Sender 
Sender 
Port 
Receiver 
Receiver 
COMP9242 S2/2014
IPC Destination Naming 
Client Server 
Thread IDs 
replaced by 
IPC “endpoints” 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 26 Attribution License 
IPC 
COMP9242 S2/2014 
Client Server 
Load 
balancer Workers 
Client Server 
All IPCs 
duplicated! 
Original L4 
addressed IPC 
to threads 
Client must do 
load balancing? 
RPC reply from 
wrong thread! 
• Inefficient designs 
• Poor information hiding 
• Covert channels [Shapiro ‘02] 
Interpose 
Access transparently? 
monitor 
(ports)
Endpoints 
Client Server 
Client Server 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 27 Attribution License 
COMP9242 S2/2014 
IPC 
Send 
Rcv 
Sync EP 
• Sync EP queues senders/receivers 
• Does not buffer messages 
0x01 
0x10 
0x30 
Async EP 
00xx0013011 • Async EP accumulates bits
Other Design Issues 
IPC Control: “Clans & Chiefs” Process Hierarchy 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 28 Attribution License 
IPC 
COMP9242 S2/2014 
Chief 
Clan 
IPC outside clan 
re-directs to chief 
Create 
Hierarchical 
resource 
management 
• Inflexible, clumsy, 
inefficient hierarchies! 
• Fundamental problem: 
no rights delegation 
Hierarchies 
replaced by 
delegatable cap-based 
access 
control
IMPLEMENTATION 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 29 Attribution License 
COMP9242 S2/2014
Virtual TCB Array 
Thread ID 
VM TCB TCB 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 30 Attribution License 
COMP9242 S2/2014 
Fast TCB & 
stack lookup 
TCB 
proper 
Kernel 
stack 
Trades cache for TLB footprint 
and virtual address space 
Not worthwhile on 
modern processors!
Process Kernel: Per-Thread Kernel Stack 
TCB 
TCB 
proper 
Kernel 
stack 
Get own 
TCB base 
by masking 
stack pointer 
• Not worthwhile on 
modern processors! 
• Stacks can dominate 
kernel memory use! 
• Reduces TLB footprint at cost 
of cache and kernel memory 
• Easier to deal with blocking 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 31 Attribution License 
COMP9242 S2/2014
Scheduler Optimisation Tricks: “Lazy Scheduling” 
thread_t schedule() { 
foreach (prio in priorities) { 
foreach (thread in runQueue[prio]) { 
if (isRunnable(thread)) 
return thread; 
else 
schedDequeue(thread); 
} 
} 
return idleThread; 
} 
• Frequent blocking/unblocking 
in IPC-based systems 
• Many ready-queue 
manipulations 
Problem: Unbounded 
scheduler execution time! 
Idea: leave blocked 
threads in ready 
queue, scheduler 
cleans up 
Client 
Call() 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 32 Attribution License 
Server 
Reply_Wait() 
BLOCKED BLOCKED 
COMP9242 S2/2014
Alternative: “Benno Scheduling” 
thread_t schedule() { 
foreach (prio in priorities) { 
foreach (thread in runQueue[prio]) { 
Client 
Only current thread 
needs fixing up at 
preemtion time! 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 33 Attribution License 
COMP9242 S2/2014 
if (isRunnable(thread)) 
return thread; 
else 
schedDequeue(thread); 
} 
} 
return idleThread; 
} 
• Frequent blocking/unblocking 
in IPC-based systems 
• Many ready-queue 
manipulations 
Idea: Lazy on 
unblocking instead 
on blocking 
Call() 
Server 
Reply_Wait()
Scheduler Optimisation: “Direct Process Switch” 
• Sender was running ⇒ had highest prio 
• If receiver prio ≥ sender prio ⇒ run receiver 
Implication: Time slice 
donation – receiver runs 
on sender’s time slice • Arguably, sender should donate back if it’s 
a server replying to a Call() 
• Hence, always donate on Reply_Wait() 
• Frequent context switches in 
IPC-based systems 
• Many scheduler invocations 
Client 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 34 Attribution License 
COMP9242 S2/2014 
Idea: Don’t invoke 
scheduler if you know 
who’ll be chosen 
Call() 
Server 
Reply_Wait() 
Problem: 
• Accounting (RT systems) 
• Policy 
• No clean model yet!
Speaking of Real Time… 
• Kernel runs with interrupts disabled 
– No concurrency control ⇒ simpler kernel 
• Easier reasoning about correctness 
• Better average-case performance 
• How about long-running system calls? 
– Use strategic premption points 
– (Original) Fiasco has fully preemptible kernel 
Limited 
concurrency 
in kernel! 
• Like commercial microkernels (QNX, Green Hills INTEGRITY) 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 35 Attribution License 
COMP9242 S2/2014 
while (!done) { 
process_stuff(); 
PSW.IRQ_disable=1; 
PSW.IRQ_disable=0; 
} 
Lots of 
concurrency 
in kernel!
Incremental Consistency 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 36 Attribution License 
Kernel 
entry 
COMP9242 S2/2014 
O(1) 
operation 
Long operation 
Kernel 
exit 
Check pending 
interrupts 
O(1) 
operation 
O(1) 
operation 
O(1) 
operation 
Abort & 
restart later 
Disable 
interrupts 
Enable 
interrupts 
No concurrency in (single-core) kernel! 
• Consistency 
• Restartability 
• Progress 
Good fit to 
event kernel!
Example: Destroying IPC Endpoint 
Actions: 
1. Disable EP cap (prevent new messages) 
2. while message queue not empty do 
3. remove head of queue (abort message) 
4. check for pending interrupts 
5. done 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 37 Attribution License 
Client1 
COMP9242 S2/2014 
Server 
Client2 
IPC 
endpoint 
Message 
queue
Difficult Example: Revoking IPC “Badge” 
State to keep across preemptions 
• Badge being removed 
• Point in queue where preempted 
• End of queue at time operation started 
• Thread performing revocation 
Need to squeeze into endpoint data structure! 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 38 Attribution License 
Client1 
COMP9242 S2/2014 
Server 
Client1 
state 
Client2 Client2 
state 
Badge 
Removing 
orange 
badge
Synchronous IPC Implementation 
Simple send (e.g. as part of RPC-like “call”): 
185 cycles 
on ARM11! 
Running Wait to receive 
1) Preamble 
" save minimal state, get args 
2) Identify destination 
" Cap lookup; 
get endpoint; check queue 
3) Get receiver TCB 
" Check receiver can still run 
" Check receiver priority is ≥ ours 
“Direct process 
switch” without 
scheduler invocation! 
Running Wait to receive 
4) Mark sender blocked and enqueue 
" Create reply cap & insert in slot 
5) Switch to receiver 
" Leave message registers untouched 
" nuke reply cap 
Wait to receive Running 
6) Postamble (restore & return) 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 39 Attribution License 
COMP9242 S2/2014
Fastpath Coding Tricks 
Common case: 0 
slow = 
cap_get_capType(en_c) != cap_endpoint_cap || 
!cap_endpoint_cap_get_capCanSend(en_c); 
if (slow) enter_slow_path(); 
• Reduces branch-prediction footprint 
• Avoids mispredicts, stalls & flushes 
• Uses ARM instruction predication 
• But: increases slow-path latency 
Common case: 1 
– should be minimal compared to basic slow-path cost 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 40 Attribution License 
COMP9242 S2/2014
Lazy FPU Switch 
• FPU context tends to be heavyweight 
– eg 512 bytes FPU state on x86 
• Only few apps use FPU (and those don’t do many syscalls) 
– saving and restoring FPU state on every context switch is wastive! 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 41 Attribution License 
finit 
current 
Kernel 
FPU_owner 
FPU_locked 
Saved 
FPU state 
fld 
fcos 
fst 
finit 
fld 
sosh() 
Standard trick, 
not only for 
microkernels! 
COMP9242 S2/2014
Other implementation tricks 
• Cache-friendly data structure layout, especially TCBs 
– data likely used together is on same cache line 
– helps best-case and worst-case performance 
• Kernel mappings locked in TLB (using superpages) 
– helps worst-case performance 
– helps establish invariants: page table never walked when in kernel 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 42 Attribution License 
Avoid RAM 
like the 
plague! 
COMP9242 S2/2014
Other Lessons Learned from 2nd Generation 
• Programming languages: 
– original i496 kernel [’95]: all assembler 
– UNSW MIPS and Alpha kernels [’96,’98]: half assembler, half C 
– Fiasco [TUD ’98], Pistachio [’02]: C++ with assembler “fast path” 
– seL4 [‘09], OKL4 [‘09]: all C 
C++ ABANDONED 
Assembler coding 
ABANDONED 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 43 Attribution License 
• Lessons: 
– C++ sux: code bloat, no real benefit 
– Changing calling conventions not worthwhile 
• Conversion cost in library stubs and when entering C in kernel 
• Reduced compiler optimization 
– Assembler unnecessary for performance 
• Can write C so compiler will produce near-optimal code 
• C entry from assembler cheap if calling conventions maintained 
• seL4 performance with C-only pastpath as good as other L4 kernels 
[Blackham & Heiser ‘12] 
COMP9242 S2/2014
LESSONS & PRINCIPLES 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 44 Attribution License 
COMP9242 S2/2014
L4 Design and Implementation 
Implement. Tricks [SOSP’93] 
• Process kernel 
• Virtual TCB array 
• Lazy scheduling 
• Direct process switch 
• Non-preemptible 
• Non-portable 
• Non-standard calling 
convention 
• Assembler 
Design Decisions [SOSP’95] 
• Synchronous IPC 
• Rich message structure, 
arbitrary out-of-line messages 
• Zero-copy register messages 
• User-mode page-fault handlers 
• Threads as IPC destinations 
• IPC timeouts 
• Hierarchical IPC control 
• User-mode device drivers 
• Process hierarchy 
• Recursive address-space 
construction 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 45 Attribution License 
COMP9242 S2/2014
seL4 Design Principles 
• Fully delegatable access control 
• All resource management is subject to user-defined policies 
– Applies to kernel resources too! 
• Suitable for formal verification 
– Requires small size, avoid complex constructs 
• Performance on par with best-performing L4 kernels 
– Prerequisite for real-world deployment! 
• Suitability for real-time use 
– Only partially achieved to date # 
• on-going work… 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 46 Attribution License 
COMP9242 S2/2014
(Informal) Requirements for Formal Verification 
• Verification scales poorly ⇒ small size (LOC and API) 
• Conceptual complexity hurts ⇒ KISS 
• Global invariants are expensive ⇒ KISS 
• Concurrency difficult to reason about ⇒ single-threaded kernel 
Largely in line with traditional L4 approach! 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 47 Attribution License 
COMP9242 S2/2014
seL4 Fundamental Abstractions 
• Capabilities as opaque names and access tokens 
– All kernel operations are cap invokations (except Yield()) 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 48 Attribution License 
• IPC: 
– Synchonous (blocking) message passing plus asynchous notification 
– Endpoint objects implemented as message queues 
• Send: get receiver TCB from endpoint or enqueue self 
• Receive: obtain sender’s TCB from endpoint or enqueue self 
• Other APIs: 
– Send()/Receive() to/from virtual kernel endpoint 
– Can interpose operations by substituting actual endpoint 
• Fully user-controlled memory management 
seL4’s main 
conceptual 
novelty! 
COMP9242 S2/2014
Remember: seL4 User-Level Memory Management 
Global Resource Manager 
RAM Kernel 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 49 Attribution License 
Data 
GRM 
Data 
Resource Manager 
RM 
Data 
Resource Manager 
RM 
Data 
Addr 
Space 
AS 
Addr 
Space 
Addr 
Space 
RM 
RM 
Data 
Resources fully 
delegated, allows 
autonomous 
operation 
Strong isolation, 
No shared kernel 
resources 
Delegation 
can be 
revoked 
COMP9242 S2/2014
Remaining Conceptual Issues in seL4 
IPC & Tread Model: 
• Is the “mostly synchronous + a bit of async” model appropriate? 
– forces kernel scheduling of user activities 
– forces multi-threaded userland 
Time management: 
• Present scheduling model is ad-hoc and insufficient 
– fixed-prio round-robin forces policy 
– not sufficient for some classes of real-time systems (time triggered) 
– no real support for hierarchical real-time scheduling 
– lack of an elegant resource management model for time 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 50 Attribution License 
COMP9242 S2/2014 
About to be 
solved!
Lessons From 20 Years of L4 
• Minimality is excellent driver of design decisions 
– L4 kernels have become simpler over time 
– Policy-mechanism separation (user-mode page-fault handlers) 
– Device drivers really belong to user level 
– Minimality is key enabler for formal verification! 
• IPC speed still matters 
– But not everywhere, premature optimisation is wasteful 
– Compilers have got so much better 
– Verification does not compromise performance 
– Verification invariants can help improve speed! [Shi, OOPSLA’13] 
• Capabilities are the way to go 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 51 Attribution License 
COMP9242 S2/2014 
• Details changed, but principles remained 
• Microkernels rock! (If done right!)
Where To Find More 
• UNSW Advanced Operating Systems Course 
http://www.cse.unsw.edu.au/~cs9242 
• NICTA Trustworthy Systems research 
http://trustworthy.systems 
• seL4 open-source portal 
http://sel4.systems 
• L4 Microkernel Headquarters 
http://l4hq.org 
• Gernot’s blog: 
http://microkerneldude.wordpress.com/ 
• Gernot’s research home page: 
http://ssrg.nicta.com.au/people/?cn=Gernot+Heiser 
©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 52 Attribution License 
COMP9242 S2/2014 W01

More Related Content

What's hot

A tour of F9 microkernel and BitSec hypervisor
A tour of F9 microkernel and BitSec hypervisorA tour of F9 microkernel and BitSec hypervisor
A tour of F9 microkernel and BitSec hypervisorLouie Lu
 
Microkernel architecture
Microkernel architecture Microkernel architecture
Microkernel architecture RQK Khan
 
Analysis of Practicality and Performance Evaluation for Monolithic Kernel and...
Analysis of Practicality and Performance Evaluation for Monolithic Kernel and...Analysis of Practicality and Performance Evaluation for Monolithic Kernel and...
Analysis of Practicality and Performance Evaluation for Monolithic Kernel and...CSCJournals
 
F9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
F9: A Secure and Efficient Microkernel Built for Deeply Embedded SystemsF9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
F9: A Secure and Efficient Microkernel Built for Deeply Embedded SystemsNational Cheng Kung University
 

What's hot (20)

Memory, IPC and L4Re
Memory, IPC and L4ReMemory, IPC and L4Re
Memory, IPC and L4Re
 
L4 Microkernel :: Design Overview
L4 Microkernel :: Design OverviewL4 Microkernel :: Design Overview
L4 Microkernel :: Design Overview
 
A tour of F9 microkernel and BitSec hypervisor
A tour of F9 microkernel and BitSec hypervisorA tour of F9 microkernel and BitSec hypervisor
A tour of F9 microkernel and BitSec hypervisor
 
Microkernel architecture
Microkernel architecture Microkernel architecture
Microkernel architecture
 
Unix v6 Internals
Unix v6 InternalsUnix v6 Internals
Unix v6 Internals
 
Implement Runtime Environments for HSA using LLVM
Implement Runtime Environments for HSA using LLVMImplement Runtime Environments for HSA using LLVM
Implement Runtime Environments for HSA using LLVM
 
μ-Kernel Evolution
μ-Kernel Evolutionμ-Kernel Evolution
μ-Kernel Evolution
 
Making Linux do Hard Real-time
Making Linux do Hard Real-timeMaking Linux do Hard Real-time
Making Linux do Hard Real-time
 
Construct an Efficient and Secure Microkernel for IoT
Construct an Efficient and Secure Microkernel for IoTConstruct an Efficient and Secure Microkernel for IoT
Construct an Efficient and Secure Microkernel for IoT
 
Embedded Hypervisor for ARM
Embedded Hypervisor for ARMEmbedded Hypervisor for ARM
Embedded Hypervisor for ARM
 
2. microkernel new
2. microkernel new2. microkernel new
2. microkernel new
 
olibc: Another C Library optimized for Embedded Linux
olibc: Another C Library optimized for Embedded Linuxolibc: Another C Library optimized for Embedded Linux
olibc: Another C Library optimized for Embedded Linux
 
Embedded Virtualization applied in Mobile Devices
Embedded Virtualization applied in Mobile DevicesEmbedded Virtualization applied in Mobile Devices
Embedded Virtualization applied in Mobile Devices
 
Hints for L4 Microkernel
Hints for L4 MicrokernelHints for L4 Microkernel
Hints for L4 Microkernel
 
Analysis of Practicality and Performance Evaluation for Monolithic Kernel and...
Analysis of Practicality and Performance Evaluation for Monolithic Kernel and...Analysis of Practicality and Performance Evaluation for Monolithic Kernel and...
Analysis of Practicality and Performance Evaluation for Monolithic Kernel and...
 
Faults inside System Software
Faults inside System SoftwareFaults inside System Software
Faults inside System Software
 
Kernel
KernelKernel
Kernel
 
F9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
F9: A Secure and Efficient Microkernel Built for Deeply Embedded SystemsF9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
F9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
 
Unikernels Introduction
Unikernels IntroductionUnikernels Introduction
Unikernels Introduction
 
Embedded Virtualization for Mobile Devices
Embedded Virtualization for Mobile DevicesEmbedded Virtualization for Mobile Devices
Embedded Virtualization for Mobile Devices
 

Similar to Microkernel design

LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...
LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...
LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...The Linux Foundation
 
”Bare-Metal Container" presented at HPCC2016
”Bare-Metal Container" presented at HPCC2016”Bare-Metal Container" presented at HPCC2016
”Bare-Metal Container" presented at HPCC2016Kuniyasu Suzaki
 
Hybrid cloud openstack meetup
Hybrid cloud openstack meetupHybrid cloud openstack meetup
Hybrid cloud openstack meetupdfilppi
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageMayaData Inc
 
BMC: Bare Metal Container @Open Source Summit Japan 2017
BMC: Bare Metal Container @Open Source Summit Japan 2017BMC: Bare Metal Container @Open Source Summit Japan 2017
BMC: Bare Metal Container @Open Source Summit Japan 2017Kuniyasu Suzaki
 
Enabling Cognitive Workloads on the Cloud: GPUs with Mesos, Docker and Marath...
Enabling Cognitive Workloads on the Cloud: GPUs with Mesos, Docker and Marath...Enabling Cognitive Workloads on the Cloud: GPUs with Mesos, Docker and Marath...
Enabling Cognitive Workloads on the Cloud: GPUs with Mesos, Docker and Marath...Indrajit Poddar
 
[KubeCon NA 2020] containerd: Rootless Containers 2020
[KubeCon NA 2020] containerd: Rootless Containers 2020[KubeCon NA 2020] containerd: Rootless Containers 2020
[KubeCon NA 2020] containerd: Rootless Containers 2020Akihiro Suda
 
Linux containers – next gen virtualization for cloud (atl summit) ar4 3 - copy
Linux containers – next gen virtualization for cloud (atl summit) ar4 3 - copyLinux containers – next gen virtualization for cloud (atl summit) ar4 3 - copy
Linux containers – next gen virtualization for cloud (atl summit) ar4 3 - copyBoden Russell
 
Affordable trustworthy-systems
Affordable trustworthy-systemsAffordable trustworthy-systems
Affordable trustworthy-systemsmicrokerneldude
 
Exploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design spaceExploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design spacejsvetter
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSSteve Wong
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDKKernel TLV
 
High Performance Networking Leveraging the DPDK and Growing Community
High Performance Networking Leveraging the DPDK and Growing CommunityHigh Performance Networking Leveraging the DPDK and Growing Community
High Performance Networking Leveraging the DPDK and Growing Community6WIND
 
Method of NUMA-Aware Resource Management for Kubernetes 5G NFV Cluster
Method of NUMA-Aware Resource Management for Kubernetes 5G NFV ClusterMethod of NUMA-Aware Resource Management for Kubernetes 5G NFV Cluster
Method of NUMA-Aware Resource Management for Kubernetes 5G NFV Clusterbyonggon chun
 
Rancher Rodéo France
Rancher Rodéo FranceRancher Rodéo France
Rancher Rodéo FranceSUSE
 
Ceph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-GeneCeph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-GeneCeph Community
 
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...Jim St. Leger
 
Rootless Containers
Rootless ContainersRootless Containers
Rootless ContainersAkihiro Suda
 
ODSA Sub-Project Launch
ODSA Sub-Project LaunchODSA Sub-Project Launch
ODSA Sub-Project LaunchODSA Workgroup
 

Similar to Microkernel design (20)

LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...
LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...
LCNA14: Why Use Xen for Large Scale Enterprise Deployments? - Konrad Rzeszute...
 
”Bare-Metal Container" presented at HPCC2016
”Bare-Metal Container" presented at HPCC2016”Bare-Metal Container" presented at HPCC2016
”Bare-Metal Container" presented at HPCC2016
 
Hybrid cloud openstack meetup
Hybrid cloud openstack meetupHybrid cloud openstack meetup
Hybrid cloud openstack meetup
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
 
BMC: Bare Metal Container @Open Source Summit Japan 2017
BMC: Bare Metal Container @Open Source Summit Japan 2017BMC: Bare Metal Container @Open Source Summit Japan 2017
BMC: Bare Metal Container @Open Source Summit Japan 2017
 
Enabling Cognitive Workloads on the Cloud: GPUs with Mesos, Docker and Marath...
Enabling Cognitive Workloads on the Cloud: GPUs with Mesos, Docker and Marath...Enabling Cognitive Workloads on the Cloud: GPUs with Mesos, Docker and Marath...
Enabling Cognitive Workloads on the Cloud: GPUs with Mesos, Docker and Marath...
 
[KubeCon NA 2020] containerd: Rootless Containers 2020
[KubeCon NA 2020] containerd: Rootless Containers 2020[KubeCon NA 2020] containerd: Rootless Containers 2020
[KubeCon NA 2020] containerd: Rootless Containers 2020
 
Linux containers – next gen virtualization for cloud (atl summit) ar4 3 - copy
Linux containers – next gen virtualization for cloud (atl summit) ar4 3 - copyLinux containers – next gen virtualization for cloud (atl summit) ar4 3 - copy
Linux containers – next gen virtualization for cloud (atl summit) ar4 3 - copy
 
Affordable trustworthy-systems
Affordable trustworthy-systemsAffordable trustworthy-systems
Affordable trustworthy-systems
 
Exploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design spaceExploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design space
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OS
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
Clustering
ClusteringClustering
Clustering
 
High Performance Networking Leveraging the DPDK and Growing Community
High Performance Networking Leveraging the DPDK and Growing CommunityHigh Performance Networking Leveraging the DPDK and Growing Community
High Performance Networking Leveraging the DPDK and Growing Community
 
Method of NUMA-Aware Resource Management for Kubernetes 5G NFV Cluster
Method of NUMA-Aware Resource Management for Kubernetes 5G NFV ClusterMethod of NUMA-Aware Resource Management for Kubernetes 5G NFV Cluster
Method of NUMA-Aware Resource Management for Kubernetes 5G NFV Cluster
 
Rancher Rodéo France
Rancher Rodéo FranceRancher Rodéo France
Rancher Rodéo France
 
Ceph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-GeneCeph on 64-bit ARM with X-Gene
Ceph on 64-bit ARM with X-Gene
 
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
 
Rootless Containers
Rootless ContainersRootless Containers
Rootless Containers
 
ODSA Sub-Project Launch
ODSA Sub-Project LaunchODSA Sub-Project Launch
ODSA Sub-Project Launch
 

Recently uploaded

Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 

Recently uploaded (20)

Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 

Microkernel design

  • 1. COMP9242 Advanced Operating Systems S2/2014 Week 7: Microkernel Design @GernotHeiser COMP9242 S2/2014
  • 2. Copyright Notice These slides are distributed under the Creative Commons Attribution 3.0 License • You are free: – to share—to copy, distribute and transmit the work – to remix—to adapt the work • under the following conditions: – Attribution: You must attribute the work (but not in any way that suggests that the author endorses you or your use of the work) as follows: • “Courtesy of Gernot Heiser, [Institution]”, where [Institution] is one of “UNSW” or “NICTA” The complete license text can be found at http://creativecommons.org/licenses/by/3.0/legalcode ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 2 Attribution License COMP9242 S2/2014
  • 3. Microkernel Principles: Minimality A concept is tolerated inside the microkernel only if moving it outside the kernel, i.e. permitting competing implementations, would prevent the implementation of the system’s required functionality. [SOSP’95] • Advantages of resulting small kernel: Limited by arch-specific micro-optimisations – Easy to implement, port? – Easier to optimise – Hopefully enables a minimal trusted computing base (TCB) – Easier debug, maybe even prove correct? • Challenges: Small attack surface, fewer failure modes – API design: generality despite small code base – Kernel design and implementation for high performance ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 3 Attribution License COMP9242 S2/2014
  • 4. Consequence of Minimality: User-level Services VFS IPC, file system Scheduler, virtual memory Device drivers, dispatcher Hardware IPC, virtual memory User Mode • Kernel provides no services, only mechanisms • Kernel is policy-free – Policies limit (good for 90% of cases, disastrous for some) – “General” policies lead to bloat, inefficiency ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 4 Attribution License COMP9242 S2/2014 Hardware Application Application Unix Server File Server Device Driver Syscall IPC Kernel Mode IPC performance is critical!
  • 5. 1993 “Microkernel” IPC Performance [μs] 400 300 200 100 0 Mach L4 0 2000 4000 6000 Message Length [B] ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 5 Attribution License COMP9242 S2/2014 115 μs 5 μs i486 @ 50 MHz Culprit: Cache footprint [SOSP’95] raw copy
  • 6. L4 IPC Performance over 20 Years Name Year Processor MHz Cycles μs Original 1993 i486 50 250 5.00 Original 1997 Pentium 160 121 0.75 L4/MIPS 1997 R4700 100 86 0.86 L4/Alpha 1997 21064 433 45 0.10 Hazelnut 2002 Pentium 4 1,400 2,000 1.38 Pistachio 2005 Itanium 1,500 36 0.02 OKL4 2007 XScale 255 400 151 0.64 NOVA 2010 i7 Bloomfield (32-bit) 2,660 288 0.11 seL4 2013 i7 Haswell (32-bit) 3,400 301 0.09 seL4 2013 ARM11 532 188 0.35 seL4 2013 Cortex A9 1,000 316 0.32 ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 6 Attribution License COMP9242 S2/2014
  • 7. Minimality: Source Code Size Name Architecture C/C++ asm total kSLOC Original i486 0 6.4 6.4 L4/Alpha Alpha 0 14.2 14.2 L4/MIPS MIPS64 6.0 4.5 10.5 Hazelnut x86 10.0 0.8 10.8 Pistachio x86 22.4 1.4 23.0 L4-embedded ARMv5 7.6 1.4 9.0 OKL4 3.0 ARMv6 15.0 0.0 15.0 Fiasco.OC x86 36.2 1.1 37.6 seL4 ARMv6 9.7 0.5 10.2 ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 7 Attribution License COMP9242 S2/2014
  • 8. L4 Family Tree API Inheritance Code Inheritance L4/MIPS L4/Alpha Caps L3→L4 “X” Hazelnut Pistachio seL4 L4-embed. OKL4-μKernel OKL4-Microvisor Codezero Fiasco Fiasco.OC GMD/IBM/Karlsruhe Nova UNSW/NICTA P4 → PikeOS Dresden 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 8 Attribution License Assember C++ C Asm+C Portable Verified COMP9242 S2/2014
  • 9. L4 Deployments – in the Billions SiMKo 3 “Merkelphone” ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 9 Attribution License COMP9242 S2/2014
  • 10. L4 Design and Implementation Implement. Tricks [SOSP’93] • Process kernel • Virtual TCB array • Lazy scheduling • Direct process switch • Non-preemptible • Non-portable • Non-standard calling convention • Assembler Design Decisions [SOSP’95] • Synchronous IPC • Rich message structure, arbitrary out-of-line messages • Zero-copy register messages • User-mode page-fault handlers • Threads as IPC destinations • IPC timeouts • Hierarchical IPC control • User-mode device drivers • Process hierarchy • Recursive address-space construction Objective: Minimise cache footprint and TLB misses ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 10 Attribution License COMP9242 S2/2014 COMP9242 S2/2014
  • 11. DESIGN ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 11 Attribution License COMP9242 S2/2014
  • 12. What Mechanisms? • Fundamentally, the microkernel must abstract – Physical memory: Address spaces – CPU: Threads – Interrupts/Exceptions • Unfettered access to any of these bypasses security – No further abstraction needed for devices • memory-mapping device registers and interrupt abstraction suffices • …but some generalised memory abstraction needed for I/O space • Above isolates execution units, hence microkernel must also provide – Communication (traditionally referred to as IPC) – Synchronization ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 12 Attribution License COMP9242 S2/2014
  • 13. Memory: Policy-Free Address-Space Management Text Data BSS libc File Stack • Kernel provides empty address-space “shell” – page faults forwarded to server – server provides mapping – AS layout is server policy (not kernel) Page-fault server ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 13 Attribution License • Cost: – 1 round-trip IPC, plus mapping operation • mapping may be side effect of IPC • kernel may expose data structure • Kernel mechanism: forwarding page-fault exception • “External pagers” first appeared in Mach [Rashid et al, ’88] – … but were optional (and slow) – in L4 there’s no alternative Map Exception Stack COMP9242 S2/2014
  • 14. Abstracting Memory: Address Spaces Map’d Page Unm. Page • Minimum address-space abstraction: empty slots for page mappings – paging server can fill with mappings • virtual address → physical address + permissions ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 14 Attribution License • Can be – page-table–like: array under full user control – TLB-like: cache for mappings which may vanish • Main design decision: is source of a mapping a page or a frame? – Frame: hardware-like – Page: recursive address spaces (original L4 model) COMP9242 S2/2014
  • 15. Traditional L4: Recursive Address Spaces Unmap X Map Grant Mappings are page → page Magic initial AS to Initial Address Space Physical Memory anchor recursion (map of PM) ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 15 Attribution License COMP9242 S2/2014
  • 16. Recursive Address Space Experience API complexity: Recursive address-space model • Conceptually elegant – trivially supports virtualization • Drawback: Complex mapping database – Kernel needs to track mapping relationship • Tear down dependent mappings on unmap – Mapping database problems: • accounts for 1/4–1/2 of kernel memory use • SMP coherence is performance bottleneck • NICTA’s L4-embedded, OKL4 removed MDB – Map frames rather than pages • need separate abstraction for frames / physical memory • subsystems no longer virtualizable (even in OKL4 cap model) • Properly addressed by seL4’s capability-based model – But have cap derivation tree, subject of on-going research ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 16 Attribution License COMP9242 S2/2014
  • 17. Abstracting Execution • Can abstract as: – kernel-scheduled threads • Forces (scheduling) policy into the kernel – vCPUs or scheduler activations • This essentially virtualizes the timer interrupt through upcall – Scheduler activations also upcall for exceptions, blocking etc • Multiple vCPUs only for real multiprocessing • Threads can be tied to address space or “migrating” IPC Cross- AS call – Implementation-wise not much of a difference • Tight integration/interdependence with IPC model! ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 17 Attribution License COMP9242 S2/2014
  • 18. Abstracting Interrupts and Exceptions • Can abstract as: – Upcall to interrupt/exception handler • hardware-like diversion of execution • need to save enough state to continue interrupted execution – IPC message to handler from magic “hardware thread” • OS-like • needs separate handler thread ready to receive H/W “Thread” Exception IPC Handler Thread • Page fault tends to be special-cased for practical reason – Tends to require handling external to faulter • IPC message to page-fault server rather than exception handler – But also “self-paging” as in Nemesis [Hand ’99] or Barrelfish ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 18 Attribution License COMP9242 S2/2014
  • 19. L4 Synchronous IPC ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 19 Attribution License COMP9242 S2/2014 Thread1 Running Blocked Thread2 Blocked Running Send (dest, msg) Wait (src, msg) ….... Kernel copy Rendezvous model Kernel executes in sender’s context • copies memory data directly to receiver (single-copy) • leaves message registers unchanged during context switch (zero copy)
  • 20. “Long” IPC Sender address space Kernel copy Receiver address space LONG IPC ABANDONED Page fault! • IPC page faults are nested exceptions ⇒ In-kernel concurrency – L4 executes with interrupts disabled for performance, no concurrency • Must invoke untrusted usermode page-fault handlers – potential for DOSing other thread • Timeouts to avoid DOS attacks – complexity Why have long IPC? • POSIX-style APIs write (fd, buf, nbytes) • Usually prefer shared buffers ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 20 Attribution License COMP9242 S2/2014
  • 21. IPC Timeouts ABANDONED in seL4, OKL4 ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 21 Attribution License Timeouts COMP9242 S2/2014 Thread1 Running Blocked Thread2 Blocked Running Send (dest, msg) ….... Wait (src, msg) Kernel copy Limit IPC blocking time Thread1 Running Blocked Rcv(NIL_THRD, delay) ….... Timed wait • No theory/heuristics for determining timeouts • Typically server reply with zero TO, else ∞ • Added complexity • Can do timed wait with timer syscall
  • 22. Synchronous IPC Issues Thread1 Running Blocked Initiate_IO(…,…) IO_Wait(…,…) ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 22 Attribution License COMP9242 S2/2014 Not generally possible Worker_Th Running Blocked IO_Th Blocked Running Unblock (IO_Th) ….... Call (IO,msg) Sync(Worker_Th) Sync(IO_Th) ….... • Sync IPC forces multi-threaded code! • Also poor choice for multi-core
  • 23. Asynchronous Notifications ….... ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 23 Attribution License Thread1 Running Blocked Thread2 Blocked Running w = Poll (…) …... w = Wait (…) Send (Thr _ 2 , …) ….... Send (Thr_2, …) • Delivers few bits (destructively) • Kernel only buffers single word • Maps well to interrupts, exceptions Server Client Driver Sync Async • Thread can wait for synchronous and asynchronous messages concurrently Sync IPC complemented with async COMP9242 S2/2014
  • 24. Is Synchronous IPC Redundant? ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 24 Attribution License COMP9242 S2/2014 Client Running Blocked Server Blocked Running Call (dest, msg) ReplyWait (src, msg) ….... Control transfer 2 IPC mechanisms: Minimality violation Sync IPC is useful intra-core: • fast control transfer • mimics migrating threads • enables scheduling context donation ! useful for real-time systems But presently no clean model!
  • 25. Direct vs Indirect IPC Adressing • Direct: Queue senders/messages at receiver – Need unique thread IDs – Kernel guarantees identity of sender • useful for authentication • Indirect: Mailbox/port object – Just a user-level handle for the kernel-level queue – Extra object type – extra weight? – Communication partners are anonymous • Need separate mechanism for authentication ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 25 Attribution License Receiver Sender Sender Port Sender Sender Port Receiver Receiver COMP9242 S2/2014
  • 26. IPC Destination Naming Client Server Thread IDs replaced by IPC “endpoints” ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 26 Attribution License IPC COMP9242 S2/2014 Client Server Load balancer Workers Client Server All IPCs duplicated! Original L4 addressed IPC to threads Client must do load balancing? RPC reply from wrong thread! • Inefficient designs • Poor information hiding • Covert channels [Shapiro ‘02] Interpose Access transparently? monitor (ports)
  • 27. Endpoints Client Server Client Server ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 27 Attribution License COMP9242 S2/2014 IPC Send Rcv Sync EP • Sync EP queues senders/receivers • Does not buffer messages 0x01 0x10 0x30 Async EP 00xx0013011 • Async EP accumulates bits
  • 28. Other Design Issues IPC Control: “Clans & Chiefs” Process Hierarchy ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 28 Attribution License IPC COMP9242 S2/2014 Chief Clan IPC outside clan re-directs to chief Create Hierarchical resource management • Inflexible, clumsy, inefficient hierarchies! • Fundamental problem: no rights delegation Hierarchies replaced by delegatable cap-based access control
  • 29. IMPLEMENTATION ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 29 Attribution License COMP9242 S2/2014
  • 30. Virtual TCB Array Thread ID VM TCB TCB ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 30 Attribution License COMP9242 S2/2014 Fast TCB & stack lookup TCB proper Kernel stack Trades cache for TLB footprint and virtual address space Not worthwhile on modern processors!
  • 31. Process Kernel: Per-Thread Kernel Stack TCB TCB proper Kernel stack Get own TCB base by masking stack pointer • Not worthwhile on modern processors! • Stacks can dominate kernel memory use! • Reduces TLB footprint at cost of cache and kernel memory • Easier to deal with blocking ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 31 Attribution License COMP9242 S2/2014
  • 32. Scheduler Optimisation Tricks: “Lazy Scheduling” thread_t schedule() { foreach (prio in priorities) { foreach (thread in runQueue[prio]) { if (isRunnable(thread)) return thread; else schedDequeue(thread); } } return idleThread; } • Frequent blocking/unblocking in IPC-based systems • Many ready-queue manipulations Problem: Unbounded scheduler execution time! Idea: leave blocked threads in ready queue, scheduler cleans up Client Call() ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 32 Attribution License Server Reply_Wait() BLOCKED BLOCKED COMP9242 S2/2014
  • 33. Alternative: “Benno Scheduling” thread_t schedule() { foreach (prio in priorities) { foreach (thread in runQueue[prio]) { Client Only current thread needs fixing up at preemtion time! ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 33 Attribution License COMP9242 S2/2014 if (isRunnable(thread)) return thread; else schedDequeue(thread); } } return idleThread; } • Frequent blocking/unblocking in IPC-based systems • Many ready-queue manipulations Idea: Lazy on unblocking instead on blocking Call() Server Reply_Wait()
  • 34. Scheduler Optimisation: “Direct Process Switch” • Sender was running ⇒ had highest prio • If receiver prio ≥ sender prio ⇒ run receiver Implication: Time slice donation – receiver runs on sender’s time slice • Arguably, sender should donate back if it’s a server replying to a Call() • Hence, always donate on Reply_Wait() • Frequent context switches in IPC-based systems • Many scheduler invocations Client ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 34 Attribution License COMP9242 S2/2014 Idea: Don’t invoke scheduler if you know who’ll be chosen Call() Server Reply_Wait() Problem: • Accounting (RT systems) • Policy • No clean model yet!
  • 35. Speaking of Real Time… • Kernel runs with interrupts disabled – No concurrency control ⇒ simpler kernel • Easier reasoning about correctness • Better average-case performance • How about long-running system calls? – Use strategic premption points – (Original) Fiasco has fully preemptible kernel Limited concurrency in kernel! • Like commercial microkernels (QNX, Green Hills INTEGRITY) ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 35 Attribution License COMP9242 S2/2014 while (!done) { process_stuff(); PSW.IRQ_disable=1; PSW.IRQ_disable=0; } Lots of concurrency in kernel!
  • 36. Incremental Consistency ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 36 Attribution License Kernel entry COMP9242 S2/2014 O(1) operation Long operation Kernel exit Check pending interrupts O(1) operation O(1) operation O(1) operation Abort & restart later Disable interrupts Enable interrupts No concurrency in (single-core) kernel! • Consistency • Restartability • Progress Good fit to event kernel!
  • 37. Example: Destroying IPC Endpoint Actions: 1. Disable EP cap (prevent new messages) 2. while message queue not empty do 3. remove head of queue (abort message) 4. check for pending interrupts 5. done ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 37 Attribution License Client1 COMP9242 S2/2014 Server Client2 IPC endpoint Message queue
  • 38. Difficult Example: Revoking IPC “Badge” State to keep across preemptions • Badge being removed • Point in queue where preempted • End of queue at time operation started • Thread performing revocation Need to squeeze into endpoint data structure! ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 38 Attribution License Client1 COMP9242 S2/2014 Server Client1 state Client2 Client2 state Badge Removing orange badge
  • 39. Synchronous IPC Implementation Simple send (e.g. as part of RPC-like “call”): 185 cycles on ARM11! Running Wait to receive 1) Preamble " save minimal state, get args 2) Identify destination " Cap lookup; get endpoint; check queue 3) Get receiver TCB " Check receiver can still run " Check receiver priority is ≥ ours “Direct process switch” without scheduler invocation! Running Wait to receive 4) Mark sender blocked and enqueue " Create reply cap & insert in slot 5) Switch to receiver " Leave message registers untouched " nuke reply cap Wait to receive Running 6) Postamble (restore & return) ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 39 Attribution License COMP9242 S2/2014
  • 40. Fastpath Coding Tricks Common case: 0 slow = cap_get_capType(en_c) != cap_endpoint_cap || !cap_endpoint_cap_get_capCanSend(en_c); if (slow) enter_slow_path(); • Reduces branch-prediction footprint • Avoids mispredicts, stalls & flushes • Uses ARM instruction predication • But: increases slow-path latency Common case: 1 – should be minimal compared to basic slow-path cost ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 40 Attribution License COMP9242 S2/2014
  • 41. Lazy FPU Switch • FPU context tends to be heavyweight – eg 512 bytes FPU state on x86 • Only few apps use FPU (and those don’t do many syscalls) – saving and restoring FPU state on every context switch is wastive! ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 41 Attribution License finit current Kernel FPU_owner FPU_locked Saved FPU state fld fcos fst finit fld sosh() Standard trick, not only for microkernels! COMP9242 S2/2014
  • 42. Other implementation tricks • Cache-friendly data structure layout, especially TCBs – data likely used together is on same cache line – helps best-case and worst-case performance • Kernel mappings locked in TLB (using superpages) – helps worst-case performance – helps establish invariants: page table never walked when in kernel ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 42 Attribution License Avoid RAM like the plague! COMP9242 S2/2014
  • 43. Other Lessons Learned from 2nd Generation • Programming languages: – original i496 kernel [’95]: all assembler – UNSW MIPS and Alpha kernels [’96,’98]: half assembler, half C – Fiasco [TUD ’98], Pistachio [’02]: C++ with assembler “fast path” – seL4 [‘09], OKL4 [‘09]: all C C++ ABANDONED Assembler coding ABANDONED ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 43 Attribution License • Lessons: – C++ sux: code bloat, no real benefit – Changing calling conventions not worthwhile • Conversion cost in library stubs and when entering C in kernel • Reduced compiler optimization – Assembler unnecessary for performance • Can write C so compiler will produce near-optimal code • C entry from assembler cheap if calling conventions maintained • seL4 performance with C-only pastpath as good as other L4 kernels [Blackham & Heiser ‘12] COMP9242 S2/2014
  • 44. LESSONS & PRINCIPLES ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 44 Attribution License COMP9242 S2/2014
  • 45. L4 Design and Implementation Implement. Tricks [SOSP’93] • Process kernel • Virtual TCB array • Lazy scheduling • Direct process switch • Non-preemptible • Non-portable • Non-standard calling convention • Assembler Design Decisions [SOSP’95] • Synchronous IPC • Rich message structure, arbitrary out-of-line messages • Zero-copy register messages • User-mode page-fault handlers • Threads as IPC destinations • IPC timeouts • Hierarchical IPC control • User-mode device drivers • Process hierarchy • Recursive address-space construction ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 45 Attribution License COMP9242 S2/2014
  • 46. seL4 Design Principles • Fully delegatable access control • All resource management is subject to user-defined policies – Applies to kernel resources too! • Suitable for formal verification – Requires small size, avoid complex constructs • Performance on par with best-performing L4 kernels – Prerequisite for real-world deployment! • Suitability for real-time use – Only partially achieved to date # • on-going work… ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 46 Attribution License COMP9242 S2/2014
  • 47. (Informal) Requirements for Formal Verification • Verification scales poorly ⇒ small size (LOC and API) • Conceptual complexity hurts ⇒ KISS • Global invariants are expensive ⇒ KISS • Concurrency difficult to reason about ⇒ single-threaded kernel Largely in line with traditional L4 approach! ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 47 Attribution License COMP9242 S2/2014
  • 48. seL4 Fundamental Abstractions • Capabilities as opaque names and access tokens – All kernel operations are cap invokations (except Yield()) ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 48 Attribution License • IPC: – Synchonous (blocking) message passing plus asynchous notification – Endpoint objects implemented as message queues • Send: get receiver TCB from endpoint or enqueue self • Receive: obtain sender’s TCB from endpoint or enqueue self • Other APIs: – Send()/Receive() to/from virtual kernel endpoint – Can interpose operations by substituting actual endpoint • Fully user-controlled memory management seL4’s main conceptual novelty! COMP9242 S2/2014
  • 49. Remember: seL4 User-Level Memory Management Global Resource Manager RAM Kernel ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 49 Attribution License Data GRM Data Resource Manager RM Data Resource Manager RM Data Addr Space AS Addr Space Addr Space RM RM Data Resources fully delegated, allows autonomous operation Strong isolation, No shared kernel resources Delegation can be revoked COMP9242 S2/2014
  • 50. Remaining Conceptual Issues in seL4 IPC & Tread Model: • Is the “mostly synchronous + a bit of async” model appropriate? – forces kernel scheduling of user activities – forces multi-threaded userland Time management: • Present scheduling model is ad-hoc and insufficient – fixed-prio round-robin forces policy – not sufficient for some classes of real-time systems (time triggered) – no real support for hierarchical real-time scheduling – lack of an elegant resource management model for time ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 50 Attribution License COMP9242 S2/2014 About to be solved!
  • 51. Lessons From 20 Years of L4 • Minimality is excellent driver of design decisions – L4 kernels have become simpler over time – Policy-mechanism separation (user-mode page-fault handlers) – Device drivers really belong to user level – Minimality is key enabler for formal verification! • IPC speed still matters – But not everywhere, premature optimisation is wasteful – Compilers have got so much better – Verification does not compromise performance – Verification invariants can help improve speed! [Shi, OOPSLA’13] • Capabilities are the way to go ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 51 Attribution License COMP9242 S2/2014 • Details changed, but principles remained • Microkernels rock! (If done right!)
  • 52. Where To Find More • UNSW Advanced Operating Systems Course http://www.cse.unsw.edu.au/~cs9242 • NICTA Trustworthy Systems research http://trustworthy.systems • seL4 open-source portal http://sel4.systems • L4 Microkernel Headquarters http://l4hq.org • Gernot’s blog: http://microkerneldude.wordpress.com/ • Gernot’s research home page: http://ssrg.nicta.com.au/people/?cn=Gernot+Heiser ©2012 Gernot Heiser, UNSW/NICTA. Distributed under Creative Commons 52 Attribution License COMP9242 S2/2014 W01