Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Introduction to eBPF and XDP

Taiwan Linux Kernel Hackers (2017-12-12)

  • Identifiez-vous pour voir les commentaires

Introduction to eBPF and XDP

  1. 1. Gary Lin Sofware Engineer, SUSE Labs glin@suse.com Introduction to eBPF and XDP Taiwan Linux Kernel Hackers
  2. 2. eBPF
  3. 3. BPF?
  4. 4. Berkeley Packet Filter
  5. 5. BPF No Red BPF Program
  6. 6. The BSD Packet Filter: A New Architecture for User-level Packet Capture December 19, 1992
  7. 7. SCO lawsuit, August 2003
  8. 8. BPF ASM ldh [12] jne #0x800, drop ldb [23] jneq #1, drop # get a random uint32 number ld rand mod #4 jneq #1, drop ret #-1 drop: ret #0
  9. 9. BPF Bytecode struct sock_filter code[] = { { 0x28, 0, 0, 0x0000000c }, { 0x15, 0, 8, 0x000086dd }, { 0x30, 0, 0, 0x00000014 }, { 0x15, 2, 0, 0x00000084 }, { 0x15, 1, 0, 0x00000006 }, { 0x15, 0, 17, 0x00000011 }, { 0x28, 0, 0, 0x00000036 }, { 0x15, 14, 0, 0x00000016 }, { 0x28, 0, 0, 0x00000038 }, { 0x15, 12, 13, 0x00000016 }, ... };
  10. 10. Virtual Machine kind of
  11. 11. BPF JIT
  12. 12. # find arch -name bpf_jit* arch/sparc/net/bpf_jit_asm_64.S ... arch/arm/net/bpf_jit_32.c arch/arm/net/bpf_jit_32.h arch/arm64/net/bpf_jit_comp.c arch/arm64/net/bpf_jit.h arch/powerpc/net/bpf_jit_asm64.S arch/powerpc/net/bpf_jit_asm.S ... arch/s390/net/bpf_jit_comp.c arch/s390/net/bpf_jit.S ... arch/mips/net/bpf_jit_asm.S ... arch/x86/net/bpf_jit_comp.c arch/x86/net/bpf_jit.S
  13. 13. Stable and Fast!
  14. 14. Linux 3.15
  15. 15. Extended BPF https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bd4cf0ed331a275e9bf5a49e6d0fd55dffc551b8
  16. 16. From BPF to eBPF ● 2 32-bit registers → 10 64-bit registers ● New instructions BPF_MOV, BPF_JNE, BPF_CALL, … ● Helper functions ● eBPF verifier: kernel/bpf/verifier.c Loading programs from user space ● eBPF map
  17. 17. BPF Calling Convention ● R0 Return value from in-kernel function, and exit value for eBPF program ● R1 – R5 Arguments from eBPF program to in-kernel function ● R6 – R9 Callee saved registers that in-kernel function will preserve ● R10 Read-only frame pointer to access stack
  18. 18. x86_64 Register Mapping R6 rbx→ R7 r13→ R8 r14→ R9 r15→ R10 rbp→ R0 rax→ R1 rdi→ R2 rsi→ R3 rdx→ R4 rcx→ R5 r8→
  19. 19. BPF Helper Functions ● /usr/include/bpf.h – bpf_probe_read – bpf_ktime_get_ns – bpf_trace_printk – bpf_get_smp_processor_id – bpf_perf_event_output – ...
  20. 20. eBPF Verifier ● Instructions limit: 4096 ● Two-Step Verification – Directed acyclic graph check – Execution Simulation
  21. 21. Direct Acyclic Graph Check ● Back-edge detection ● Unreachable instructions
  22. 22. Direct Acyclic Graph Check ● Back-edge detection ● Unreachable instructions PERMISSION DENIED
  23. 23. Execution Simulation ● Reading an uninitialized register ● Arithmetic of two valid pointer ● Load or store registers of invalid types ● Read stack before writing data into stack
  24. 24. Execution Simulation ● Reading an uninitialized register ● Arithmetic of two valid pointer ● Load or store registers of invalid types ● Read stack before writing data into stack PERMISSION DENIED
  25. 25. Stable, Fast, and Secure!
  26. 26. eBPF Maps
  27. 27. eBPF Map Types ● Hash ● Array ● Tail Call Array ● Per-CPU Hash/Array ● Stack Trace ● cgroup Array ● LRU (per-CPU) Hash ● Longest-Prefix Matching Trie ● Array/Hash of Maps ● Net device Map ● Socket Map https://github.com/iovisor/bcc/blob/master/docs/kernel-versions.md#tables-aka-maps
  28. 28. eBPF Map Syscalls ● BPF_MAP_CREATE ● BPF_MAP_LOOKUP_ELEM ● BPF_MAP_UPDATE_ELEM ● BPF_MAP_DELETE_ELEM ● BPF_MAP_GET_NEXT_KEY ● BPF_MAP_GET_NEXT_ID ● BPF_MAP_GET_FD_BY_ID
  29. 29. eBPF BPF bytecode Access Map BPF bytecode Map BPF_PROG_LOAD BPF_MAP_* userspace kernel user program
  30. 30. User Program eBPF Kernel Program As simple as possible Whatever you want userspace kernel eBPF MAP
  31. 31. BTW
  32. 32. clang >= 3.7 with bpf taget $ clang -target bpf source.c -o code.o
  33. 33. eBPF Projects ● Networking tc, socket, XDP, cilium, ... ● System Tracing and Monitoring kprobe/uprobe/tracepoint/perf event/usdt ● Security LandLock LSM, seccomp ● System Error Handler Testing eBPF directed error injection
  34. 34. XDP
  35. 35. RX Packet Processing userspace kernel Driver Network Stack NIC Network Program
  36. 36. DDoS
  37. 37. userspace kernel Driver Network Stack NIC Network Program
  38. 38. netfilter userspace kernel DriverNIC Network Program Network Stack netfilter DROP
  39. 39. Trafic Control userspace kernel Driver Network Stack NIC Network Program TC ingress DROP
  40. 40. eXpress Data Path userspace kernel Network Stack NIC Network Program Driver skb alloc DROP eBPF TX
  41. 41. XDP ● A high performance, programmable network data path Attaching eBPF programs through netlink (IFLA_XDP) ● No specialized hardware ● No kernel bypass ● Works with the existing network stack ● Direct packet write
  42. 42. userspace kernel Driver Network Stack NIC Network Program generic XDP tc ingress netfilter ingress Generic XDP
  43. 43. virtnet_poll [virtio_net]() { receive_buf [virtio_net]() { receive_mergeable [virtio_net]() { bpf_prog_run_xdp();----------------------Native XDP page_to_skb [virtio_net]() { __napi_alloc_skb() { __build_skb(); } skb_put(); } } skb_gro_reset_offset(); tcp4_gro_receive() { tcp_gro_receive(); } netif_receive_skb_internal() { netif_receive_generic_xdp();------------generic XDP __netif_receive_skb() { __netif_receive_skb_core() { sch_handle_ingress();----------------TC ingress nf_ingress();-----------------Netfilter Ingress ip_rcv() { nf_hook_slow() {----Netfilter RAW Pre-routing
  44. 44. ipv4_conntrack_defrag [nf_defrag_ipv4](); ipv4_conntrack_in [nf_conntrack_ipv4]() { nf_conntrack_in [nf_conntrack]() { ipv4_get_l4proto [nf_conntrack_ipv4](); __nf_ct_l4proto_find [nf_conntrack](); tcp_error [nf_conntrack]() { nf_ip_checksum(); } nf_ct_get_tuple [nf_conntrack]() { ipv4_pkt_to_tuple [nf_conntrack_ipv4](); tcp_pkt_to_tuple [nf_conntrack](); } hash_conntrack_raw [nf_conntrack](); __nf_conntrack_find_get [nf_conntrack](); tcp_get_timeouts [nf_conntrack](); tcp_packet [nf_conntrack]() { tcp_in_window [nf_conntrack]() { nf_ct_seq_offset [nf_conntrack](); tcp_options.isra.11 [nf_conntrack](); } __nf_ct_refresh_acct [nf_conntrack](); } }
  45. 45. } } ip_rcv_finish() { tcp_v4_early_demux(); ip_route_input_noref(); ip_local_deliver() {------routing decisions nf_hook_slow() {---Netfilter filter Input ipt_do_table [ip_tables](); ipv4_helper [nf_conntrack_ipv4](); ipv4_confirm [nf_conntrack_ipv4](); } ip_local_deliver_finish() { raw_local_deliver(); tcp_v4_rcv() {------L4 Protocol Handler tcp_filter() { security_sock_rcv_skb(); } tcp_prequeue(); tcp_v4_do_rcv() { tcp_rcv_state_process() { tcp_parse_options(); tcp_ack() { ...
  46. 46. #define KBUILD_MODNAME "foo" /*for some headers*/ #include <uapi/linux/bpf.h> ... SEC("xdp_prog") int xdp_prog(struct xdp_md *ctx) { void *data_end = (void *)(long)ctx->data_end; void *data = (void *)(long)ctx->data; struct ethhdr *eth = data; ... return XDP_DROP; /* the action */ }
  47. 47. XDP Actions ● XDP_ABORTED Indicate eBPF program error (treat as XDP_DROP) ● XDP_DROP Drop the packet ● XDP_PASS Pass the packet up to the stack ● XDP_TX Transmit the packet out through the same NIC ● XDP_REDIRECT (4.14) Redirect the packet to another NIC or CPU
  48. 48. XDP Restrictions ● Memory model change in driver – One packet per memory page (memory waste) – ixgbe and i40e using refcnt instead of one packet per page ● No per-RX-queue XDP instance yet ● XDP_REDIRECT only supported by limited drivers ● eBPF program limitations
  49. 49. Current Status ● XDP Core: 4.8 ● Supported Drivers – mlx4: 4.8 – mlx5: 4.9 – nfp, qed, virtio_net: 4.10 – ixgbe, generic_xdp, thunderx: 4.12 – i40e: 4.13 – veth, tap: 4.14
  50. 50. XDP Benchmarks (mlx4) ● Generated using pktgen ● Single core – ip routing drop: ~3.6 Mpps – tc clsact using bpf: ~4.2 Mpps – XDP drop: 20 Mpps (< 10% cpu util) https://www.slideshare.net/IOVisor/express-data-path-linux-meetup-santa-clara-july-2016
  51. 51. XDP Benchmarks (virtio-net) ● Generated using pktgen ● Host: i7-4790 CPU @ 3.6 GHz ● Single core qemu guest – iptables drop (raw preroute): ~3.0 Mpps – tc clsact using bpf: ~3.0 Mpps – generic XDP drop: ~3.5 Mpps – native XDP drop: ~4.0 Mpps
  52. 52. XDP Use Cases ● DDoS attack mitigation ● Load Balancing ● Tunnelling: packet header handling ● Network sampling and monitoring ● And more
  53. 53. Question?
  54. 54. Thank You!
  55. 55. References ● BPF and XDP Reference Guide http://cilium.readthedocs.io/en/stable/bpf/ ● Dive into BPF: a list of reading material https://qmonnet.github.io/whirl-ofload/2016/09/01/dive-into-bpf/ ● Linux Socket Filtering aka Berkeley Packet Filter (BPF) Documentation/networking/filter.txt
  56. 56. Join Us at www.opensuse.org
  57. 57. License This slide deck is licensed under the Creative Commons Attribution-ShareAlike 4.0 International license. It can be shared and adapted for any purpose (even commercially) as long as Attribution is given and any derivative work is distributed under the same license. Details can be found at https://creativecommons.org/licenses/by-sa/4.0/ General Disclaimer This document is not to be construed as a promise by any participating organisation to develop, deliver, or market a product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. openSUSE makes no representations or warranties with respect to the contents of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The development, release, and timing of features or functionality described for openSUSE products remains at the sole discretion of openSUSE. Further, openSUSE reserves the right to revise this document and to make changes to its content, at any time, without obligation to notify any person or entity of such revisions or changes. All openSUSE marks referenced in this presentation are trademarks or registered trademarks of SUSE LLC, in the United States and other countries. All third-party trademarks are the property of their respective owners. Credits Template Richard Brown rbrown@opensuse.org Design & Inspiration openSUSE Design Team http://opensuse.github.io/branding- guidelines/

×