𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)
1. 1
Linux rumpkernelLinux rumpkernel
a librarified monolithic kernela librarified monolithic kernel
Hajime Tazaki
IIJ Research Laboratory
March, 2018, AsiaBSDCon 2018
slide source
https://github.com/thehajime/asiabsdcon-1803/
2. 2
IntroIntro
i'm going to talk about Linux is great (sorry)
but Linux or xxxBSD doesn't matter
re-composable, re-usable, flexible operating system kernel
should make everyone happy
3. 3
Who I am ?Who I am ?
Researcher at IIJ Research Laboratory
working for the Internet
4.
5. 4
5
The original InternetThe original Internet
packet switching network
a basis of end-to-end principle
a basis of the hugest platform
P. Baran, On Distributed Communications Networks, IEEE Transactions on Communications Systems, 1964
6. 6
Today's internetToday's internet
not yesterday's Internet
various stake holders / controlled system / security fast
refs:
https://justimagine.aurecongroup.com/solving-complex-problems-forget-what-you-currently-know/
https://kentforliberty.liberty.me/letting-government-control-you/
7. 7
Today's internet (cont'd)Today's internet (cont'd)
a packet is hard to deliver to the others without any modifications
ref: https://www.slideshare.net/obonaventure/innovation-is-back-in-the-transport-and-network-layers
8. 8
End of evolution/innovation ??End of evolution/innovation ??
internet is mature enough (that we don't have to modify)
we can create another universe
are we satisfied ?
people want to innovate but the system is not ready
9. 9
QuestionsQuestions
why do you want to extend your system ?
want to put new idea (I have a great protocol)
want to refresh design (socket API sucks)
want to optimize implementations (too slow for me)
want to secure codes (security fast)
14. 14
Ossification: middlebox (cont'd)Ossification: middlebox (cont'd)
TCP segments processed by a NAT router
ref:
https://www.slideshare.net/obonaventure/innovation-is-back-in-the-transport-and-network-layers
15. 15
Ossification: middlebox (cont'd)Ossification: middlebox (cont'd)
possible TCP segments processed by typical middlebox today
ref:
https://www.slideshare.net/obonaventure/innovation-is-back-in-the-transport-and-network-layers
16. 16
Ossification: host OSOssification: host OS
The deployment of protocol extensions takes long
Standardized
WS,TS: 1992 (RFC1323)
SACK: 1996 (RFC2018)
OS
WS, TS: Win 2000/Linux(1999)
SACK: defaulted 1999 (Linux), 2004 (Win)
Fukuda, Kensuke. "An Analysis of Longitudinal TCP Passive Measurements (Short Paper)." Traffic Monitoring and
Analysis 40: 29.
17. 17
Ossification: host OS (cont'd)Ossification: host OS (cont'd)
updating base kernel is not an easy task
Android still uses older kernel
container guests use the host kernel (for network stack)
Android OS distribution with the base Linux kernel version
(taken Nov. 2017)https://developer.android.com/about/dashboards/index.html
18. 18
Design patternDesign pattern
Multipath TCP (mptcp)
an extension to
(traditional) TCP
multipath communication
RFC6824 (experimental)
application compatibility
(unlike SCTP)
Good design ?
middlebox friendly => OK
unmodified application => OK
http://blog.multipath-tcp.org/blog/html/2015/12/25/commercial_usage_of_multipath_tcp.html
19. 19
Ossification: Google's answerOssification: Google's answer
QUIC (Quick UDP Internet Connection)
a transport protocol over UDP
7% of Internet traffic *1
why UDP ?
middlebox friendly
with encrypted payload middlebox can't intercept
why UDP (cont'd) ?
can be implemented in userspace
no need to upgrade host OS
*1 The QUIC Transport Protocol: Design and Internet-Scale Deployment, ACM SIGCOMM 2017
21. 21
If you face obstacles...If you face obstacles...
you would implement from scratch
as a name of specialization
lack of maturity of an OS history
more low-quality codes
more waste of time (reinventing a wheel)
22. 22
summary of problemssummary of problems
today's internet is not the original internet
no more end-to-end
to put a break-through
be a part of giant
or thinks differently ?
23. 23
AlternativesAlternatives
Userspace stack
lwip (2002~)
Arrakis [OSDI '14]
IX [OSDI '14]
MegaPipe [OSDI '12]
mTCP [NSDI '14]
SandStorm [SIGCOMM '14]
uTCP [CCR '14]
FastSocket [ASPLOS '16]
SolarFlare (2007~?)
libuinet (2013~)
SeaStar (2014~)
Snabb Switch (2012~)
lightweight VM
MirageOS [ASPLOS '13]
OSv [USENIX '14]
ClickOS [NSDI '14]
Most of them lack feature-richness, or one-shot porting w/o latest
feature updates
24. 24
Alternatives (cont'd)Alternatives (cont'd)
MegaPipe [OSDI '12]
outperforms baseline Linux .. 582% (for short connections).
New API for applications (no free existing applications benefit)
mTCP [NSDI '14]
improves the performance ... by a factor of 25 compared to the
latest Linux TCP stack
implement with very limited TCP extensions
SandStorm [SIGCOMM '14]
our approach with the FreeBSD and Linux stacks ...,
demonstrating 2-10x improvements
specialized (no free existing applications benefit)
Arrakis [OSDI '14]
improvements of 2-5x in latency and 9x in throughput .. to a
well-tuned Linux implementation.
utilize simplified TCP/IP stack (lwip) (loose feature-rich extensions)
25. 25
Does speed matter ?Does speed matter ?
nope, it's one of metric of a system
improving numbers often sacrifices features/functions
As the old joke goes, writing a TCP/IP stack from
scratch over the weekend is easy, but making it
work on the real-world Internet is more difficult
[1].
[1] Antti Kantee, Rump Kernels No OS? No Problem!, USENIX login; October, 2014
26. 26
Our goalOur goal
Respect the implementation (and experience) of past decades
Accelerate the innovation of network stack
discover new values through the past studies
28. 28
AnykernelAnykernel
Anykernel: originally in NetBSD rump kernel
using (unmodified) high-quality code base of monolithic kernel
on different environment in different shape
by gluing additional stuffs
We define an anykernel to be an organization of kernel code which allows the
kernel's unmodified drivers to be run in various configurations such as
application libraries and microkernel style servers, and also as part of a
monolithic kernel. -- Kantee 2012.
30. 30
Linux Kernel Library (LKL)Linux Kernel Library (LKL)
a library (liblkl.{so,a})
out-of-tree architecture
(h/w-independent)
run Linux code on various ways
with a reusable library
h/w dependent layer
on Linux/Windows
/FreeBSD/Android uspace,
unikernel, on UEFI
network simulator (ns-3)
code
2.4KLoC (h/w independent)
6.6KLoC (h/w dep)
32. 32
1. host backend1. host backend
environment dependent part
unify an interface across
different platforms
(rump-hypercall like)
device interface with Virtio
block device <=> disk image
networking <=> TAP,
raw socket, DPDK, VDE
33. 2. CPU independent architecture2. CPU independent architecture
architecture (arch/lkl)
transparent architecture bind
(as CPU arch)
require no modification to
the other
implementation
thread information (struct
thread_info)
irq, timer, syscall handler
access to underlying layer
by host_ops
34. 3334
3. Application interface3. Application interface
1. use exposed API (LKL syscall)
2. use host libc (LD_PRELOAD)
3. extend (alternative) libc
35. 35
API 1: use exposed API (LKL syscall)API 1: use exposed API (LKL syscall)
call entry points of LKL kernel
lkl_sys_open(), lkl_sys_socket()
almost same as ordinal syscalls
return value, errno notification are different
can use LKL syscall and host syscall
simultaneously
read ext4 file by lkl_sys_read() =>
write into host (Windows) by write()
36. 36
API 2: hijack host standard libraryAPI 2: hijack host standard library
dynamically replace symbols
of host syscalls (of libc)
LD_PRELOAD
socket() => lkl_sys_socket()
can use host binary (executable) as-is
limitation of replaceable symbols
needs syscall translation on non-linux host
37. 37
API 3: extend (alternative) libcAPI 3: extend (alternative) libc
only call LKL syscall with our own libc
also introduce as a virtual CPU architecture
a program can link this instead of host libc
can't access to (underlying) host resource
directly via this lkl syscall
as a patch for musl libc
39. 39
NUSE (Network Stack in UserspacE)NUSE (Network Stack in UserspacE)
What ?
install/use alternate network stack (i.e., TCP/IP)
but it's a full-fledged code (Linux)
host network stack isn't involved
Why ?
because kernel is hard to touch
Android (long delivery time)
container (e.g., docker: shared by others)
41. 41
unikernelsunikernels
What ?
OS instance w/ a single process
on bare-metal
on hypervisor
on userspace program
cross-compile with alt-libc
rumprun (by Antti Kantee)
frankenlibc (by Justin Cormack)
Why ?
small footprint
quick instantiation
- http://www.linux.com/news/enterprise/cloud-
computing/751156-are-cloud-operating-
systems-the-next-big-thing-
44. 44
Service Function Chain (SFC)Service Function Chain (SFC)
What ?
SFC by Unix pipe and LKL
NF in a shell command
ping.sh | nat.sh | pfilter.sh
Why ?
a chain w/ VMs is heavyweight
Unix pipe is useful enough (e.g., packet filter by grep)
46. How ping looks like ?How ping looks like ?
generate raw data to stdout
next program can receive from stdin
https://github.com/thehajime/blog/issues/3
53. 52
Your experiment (cont'd)Your experiment (cont'd)
huge resources to conduct a test
not likely to reproduce
tons of configuration scripts
running on different machines/OSes
controling is troublesome
distributed debugger...
55. 5354
Testing with Continuous IntegrationTesting with Continuous Integration
Detected bugs (Linux net-next ree)
[net-next,v2] ipv6: Do not iterate over all interfaces when finding
source address on specific interface. (v4.2-rc0, during VRF)
[v3] ipv6: Fix protocol resubmission (v4.1-rc7, expanded from v4
stack)
[net-next] ipv6: Check RTF_LOCAL on rt->rt6i_flags instead of rt-
>dst.flags (v4.1-rc1, during v6 improvement)
[net-next] xfrm6: Fix a offset value for network header in
_decode_session6 (v3.19-rc7?, regression only in mip6)
63. LoC:
arch/lkl (LKL) < arch/lib (LibOS)
diff: the amount of stub code
commons
no modification to the original Linux code
description of kernel context (by POSIX thread)
outsourced resources (clock, memory, scheduler)
CPU independent architecture
diffs
LibOS: implemented with higher API (timer, irq, kthread) by pthread
LKL: implement IRQ, kthread, timer with pthread in lower layer
LKL: current statusLKL: current status
64. 59
Sent RFC (Nov. 2015)
no update on LKML since then
have evolved a lot
fast syscall path
offload (csum, TSO/LRO)
CONFIG_SMP (WIP)
json config
qemu baremetal (unikernel)
on UEFI
https://github.com/lkl/linux