SlideShare une entreprise Scribd logo
1  sur  137
Télécharger pour lire hors ligne
Linux Kernel - BPF / XDP
KossLab 유태희, 송태웅
BPF 란 ?
1. 1. Berkeley Packet Filter since 1992
2. 2. Kernel Infrastructure
BPF 란 ?
1. Berkeley Packet Filter since 1992
1. 2. Kernel Infrastructure
a. - Interpreter in-kernel virtual machine
- Hook points in-kernel callback point
- Map
- Helper
BPF 란 ?
“Safe dynamic programs and tools”
"런타임중 안전하게 커널코드를 삽입하는 기술"
BPF Infrastructure:
안전한 code injection 작전
1) Native 머신코드 대신 BPF instruction 을 활용하자
2) Verifier 를 통해 위험요소를 미리검사하자
3) (기존)커널함수가 필요할때 Helper 함수를 통해서만 호출하자
BPF Infrastructure:
안전한 code injection 작전
1) Native 머신코드 대신 BPF instruction 을 활용하자
BPF Infrastructure:
안전한 code injection 작전
2) Verifier 를 통해 위험요소를 미리검사하자
BPF Infrastructure:
안전한 code injection 작전
3) (기존)커널함수가 필요할때 Helper 함수를 통해서만 호출하자
BPF Infrastructure:
안전한 code injection 위한 기반기술
Kernel += BPF Interpreter in-kernel virtual machine
+ Verifier
+ BPF Helper 함수 추가 leveraging kernel func
+ BPF syscall prog/map: loading & attaching 등
1) 주니어 x86 Instruction set ’simplified x86’
(참고: PLUMgrind의 x86 bytecode verifier 실패)
2) BPF = classic BPF:10% + x86:70% + arm64:25% + risc:5%
3) Instruction encoding 사이즈 고정
(for high interpreter speed)
4) 간소화 -> 위험을 예측하고 예방하기 수월
(Verifier를 통한 loop, memory access 범위 점검 등)
5) Architecture-independent
BPF Instruction set:
BPF Instruction set:
immediate:32 offset:16 src:4 dst:4 opcode:8
$ cat include/uapi/linux/bpf.h
[...]
struct bpf_insn {
__u8 code; /* opcode */
__u8 dst_reg:4; /* dest register */
__u8 src_reg:4; /* source register */
__s16 off; /* signed offset */
__s32 imm; /* signed immediate constant */
};
[...]
BPF Instruction set:
immediate:32 offset:16 src:4 dst:4 opcode:8
class:4 + fields:4
+ fields:4
eBPF: include/uapi/linux/bpf.h
cBPF: include/uapi/linux/bpf_common.h
BPF Instruction set:
immediate:32 offset:16 src:4 dst:4 opcode:8
class:4 + LD/ST fields:4
+ ALU/JUM fields:4
eBPF: include/uapi/linux/bpf.h
cBPF: include/uapi/linux/bpf_common.h
LD/ST 계열:
0x00 ~ 0x03
ALU/JMP 계열:
0x04 ~ 0x07
BPF Instruction set:
immediate:32 offset:16 src:4 dst:4 opcode:8
class:4 + LD/ST fields:4
+ ALU/JUM fields:4
eBPF: include/uapi/linux/bpf.h
cBPF: include/uapi/linux/bpf_common.h
LD/ST 계열:
0x00 ~ 0x03
ALU/JMP 계열:
0x04 ~ 0x07
BPF Instruction set:
struct bpf_insn prog[] = {
BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
BPF_LD_ABS(BPF_B, ETH_HLEN + offsetof(struct iphdr, protocol) /* R0 = ip->proto */),
BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4), /* *(u32 *)(fp - 4) = r0 */
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4), /* r2 = fp - 4 */
BPF_LD_MAP_FD(BPF_REG_1, map_fd),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
BPF_MOV64_IMM(BPF_REG_1, 1), /* r1 = 1 */
BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_DW, BPF_REG_0, BPF_REG_1, 0, 0), /* xadd r0 += r1 */
BPF_MOV64_IMM(BPF_REG_0, 0), /* r0 = 0 */
BPF_EXIT_INSN(),
};
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/tree/samples/bpf/sock_example.c
BPF Helper 함수:
$ grep BPF_CALL
kernel/bpf/helpers.c:
BPF_CALL_2(bpf_map_lookup_elem, struct bpf_map *, map, void *, key)
BPF_CALL_4(bpf_map_update_elem, struct bpf_map *, map, void *, key,
[...]
kernel/trace/bpf_trace.c:
BPF_CALL_2(bpf_override_return, struct pt_regs *, regs, unsigned long, rc)
BPF_CALL_3(bpf_probe_read, void *, dst, u32, size, const void *, unsafe_ptr)
BPF_CALL_3(bpf_probe_write_user, void *, unsafe_ptr, const void *, src,
BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, u64, arg1,
[...]
net/core/filter.c:
BPF_CALL_1(bpf_skb_get_pay_offset, struct sk_buff *, skb)
BPF_CALL_3(bpf_skb_get_nlattr, struct sk_buff *, skb, u32, a, u32, x)
[...]
BPF as a kernel subproject
“Safe dynamic programs and tools”
$ cat MAINTAINERS | grep -A 3 BPF
BPF (Safe dynamic programs and tools)
M: Alexei Starovoitov <ast@kernel.org>
M: Daniel Borkmann <daniel@iogearbox.net>
L: netdev@vger.kernel.org
[...]
“Safe dynamic programs and tools”
$ cat MAINTAINERS | grep -A 27 BPF
BPF (Safe dynamic programs and tools)
[...]
F: arch/x86/net/bpf_jit*
[...]
F: kernel/bpf/
F: kernel/trace/bpf_trace.c
[...]
F: net/core/filter.c
F: net/sched/act_bpf.c
F: net/sched/cls_bpf.c
[...]
[...]
F: samples/bpf/
F: tools/bpf/
F: tools/lib/bpf/
F: tools/testing/selftests/bpf/
BPF as a kernel subproject
$ cat MAINTAINERS | grep -A 27 BPF
BPF (Safe dynamic programs and tools)
[...]
F: arch/x86/net/bpf_jit*
[...]
F: kernel/bpf/
F: kernel/trace/bpf_trace.c
[...]
F: net/core/filter.c
F: net/sched/act_bpf.c
F: net/sched/cls_bpf.c
[...]
[...]
F: samples/bpf/
F: tools/bpf/
F: tools/lib/bpf/
F: tools/testing/selftests/bpf/
JIT 지원 arch:
x86,
arm, arm64
sparc,
s390,
powerpc, mips
“Safe dynamic programs and tools”
BPF as a kernel subproject
“Safe dynamic programs and tools”
$ cat MAINTAINERS | grep -A 27 BPF
BPF (Safe dynamic programs and tools)
[...]
F: arch/x86/net/bpf_jit*
[...]
F: kernel/bpf/
F: kernel/trace/bpf_trace.c
[...]
F: net/core/filter.c
F: net/sched/act_bpf.c
F: net/sched/cls_bpf.c
[...]
[...]
F: samples/bpf/
F: tools/bpf/
F: tools/lib/bpf/
F: tools/testing/selftests/bpf/
BPF core:
Syscall,
Interpreter,
Verifier,
Generic Helpers,
Maps,
...
BPF as a kernel subproject
“Safe dynamic programs and tools”
$ cat MAINTAINERS | grep -A 27 BPF
BPF (Safe dynamic programs and tools)
[...]
F: arch/x86/net/bpf_jit*
[...]
F: kernel/bpf/
F: kernel/trace/bpf_trace.c
[...]
F: net/core/filter.c
F: net/sched/act_bpf.c
F: net/sched/cls_bpf.c
[...]
[...]
F: samples/bpf/
F: tools/bpf/
F: tools/lib/bpf/
F: tools/testing/selftests/bpf/
Hook points,
Specific Helpers
...
For cBPF, ...
BPF as a kernel subproject
“Safe dynamic programs and tools”
$ cat MAINTAINERS | grep -A 27 BPF
BPF (Safe dynamic programs and tools)
[...]
F: arch/x86/net/bpf_jit*
[...]
F: kernel/bpf/
F: kernel/trace/bpf_trace.c
[...]
F: net/core/filter.c
F: net/sched/act_bpf.c
F: net/sched/cls_bpf.c
[...]
[...]
F: samples/bpf/
F: tools/bpf/
F: tools/lib/bpf/
F: tools/testing/selftests/bpf/
bpf loading(lib),
bpf tool,
test codes,
samples,
...
BPF as a kernel subproject
BPF Infrastructure:
BPF프로그램 활용을 위한 지원
1) Hook points in-kernel callback point
2) Map user-to-kernel shared memory
3) helper를 통한 커널함수호출 leveraging
4) Object pinning /sys/fs/bpf/...
KERNEL SPACE
bpf() SYSCALL
BPF Controller 1
(User App)
ip tc
Map 1
(Shared memory)
Map 2
(Shared memory)
.
.
BPF
BPF
BPF
func(): Helper
func()
func()
func()
BPF library
in-iproute2
BPF Controller 2
(User App)
. . . . . .
BPF Architecture:
BPF library: libbpf
prog/map
load, attach, control
XDP
iptables는 충분히 빠른가요?
iptables는 왜 느릴까요?
iptables의 정책을 튜닝해본적 있으신가요?
XDP
(eXpress Data Path)
XDP == FAST PATH
NORMAL PATH
TX
APP
L7
RX
L3 input
TC Ingress
PREROUTING ROUTING
TCP/UDP
FORWARD ROUTING
INPUT OUTPUT
POSTROUTING
TC egress
L3 output
L4
L3
L3
DD
NORMAL PATH
TX
APP
L7
RX
L3 input
TC Ingress
PREROUTING ROUTING
TCP/UDP
FORWARD ROUTING
INPUT OUTPUT
POSTROUTING
TC egress
L3 output
L4
L3
L3
DD
XDP FAST PATH
REDIRECT
TX
APP
RX
L7
L4
L3
DD
BPF
Tutorial
준비물
1. 컴파일 컴퓨터 1대
2. 테스트 컴퓨터 1대(x86추천)
3. 커널 소스코드
4. clang + llvm(컴파일러)
5. bpftool(bpf 프로그램 로더)
6. bpf를 지원하는 iproute2 패키지
clang + llvm
컴파일러
git.kernel.org 의 bpf tree
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git
커널 소스코드
bpftool
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/tree/tools/bpf/bpftool
BPF 프로그램 로더
iproute2
https://git.kernel.org/pub/scm/linux/kernel/git/dborkman/iproute2.git
XDP 설정도구
kernel source code 및 bpf sample code
samples/bpf
예제
kernel소스 내 sample code 분석
samples/bpf
예제(xdp_rxq_info_kern.c)
BPF 프로그램 컴파일 실습
samples/bpf
컴파일
$ mount bpffs /sys/fs/bpf -t bpf
$ bpftool prog load ./xdp_rxq_info_kern.o /sys/fs/bpf/xdp
프로그램 로드
$ ls /sys/fs/bpf/
$ ./bpftool prog list
$ ./bpftool prog dump xlated id X
jited
프로그램 확인
$ ip link set dev lo xdp pin /sys/fs/bpf/xdp
XDP프로그램 설정
$ ip link show dev lo
XDP프로그램 설정 확인
$ ip link set dev lo xdp off
$ rm /sys/fs/bpf/xdp
XDP프로그램 설정 제거
iptables vs XDP
TEST NETWORK
PC2
192.168.4.2
PC1
192.168.4.1
ICMP
$ ping
iptables를 사용하여 패킷을 버리기
DROP
#PC2
$ ping 192.168.4.1
#PC1
$ iptables -A INPUT -s 192.168.4.2 -d 192.168.4.1 -p icmp 
-j DROP
NORMAL PATH
TX
APP
L7
RX
L3 input
TC Ingress
PREROUTING ROUTING
TCP/UDP
FORWARD ROUTING
INPUT OUTPUT
POSTROUTING
TC egress
L3 output
L4
L3
L3
DD
NORMAL PATH
TX
APP
L7
RX
L3 input
TC Ingress
PREROUTING ROUTING
TCP/UDP
FORWARD ROUTING
INPUT OUTPUT
POSTROUTING
TC egress
L3 output
L4
L3
L3
DD
DROP
XDP를 사용하여 패킷을 버리기
DROP
$ ./bpftool prog load ./xdp_icmp.o /sys/fs/bpf/xdp_icmp
$ ip link set dev lo xdp pin /sys/fs/bpf/xdp_icmp
XDP프로그램 설정 제거
XDP GENERIC PATH
TX
APP
L7
RX
BPF
TC Ingress
PREROUTING ROUTING
TCP/UDP
FORWARD ROUTING
INPUT OUTPUT
POSTROUTING
TC egress
L3 output
L4
L3
L3
DD
DROP
BPF Tracing
iptables path VS XDP path
netif_receive_skb_internal()
ipt_do_table()
DROP
BPF Tracing:
iptables - DROP case
netif_receive_skb_internal()
ipt_do_table()
Long time !! ~~
DROP
BPF Tracing:
iptables - DROP case
netif_receive_skb_internal()
do_xdp_generic()
DROP
BPF Tracing:
XDP - DROP case
netif_receive_skb_internal()
do_xdp_generic()
Short time !! ~~
DROP
BPF Tracing:
XDP - DROP case
netif_receive_skb_internal()
ipt_do_table()
do_xdp_generic()
Short time !! ~~
BPF Tracing:
iptables vs XDP - DROP case
DROP
DROP
Long time !! ~~
BPF Tracing:
iptables vs XDP - DROP case
net/core/dev.c:
static int netif_receive_skb_internal(struct sk_buff *skb)
net/core/dev.c:
int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
net/ipv4/netfilter/ip_tables.c:
unsigned int ipt_do_table(struct sk_buff *skb, ...)
DROP
DROP
net/core/dev.c:
static int netif_receive_skb_internal(struct sk_buff *skb)
net/core/dev.c:
int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
net/ipv4/netfilter/ip_tables.c:
unsigned int ipt_do_table(struct sk_buff *skb, ...)
BPF Tracing:
iptables vs XDP - DROP case
net/core/dev.c:
static int netif_receive_skb_internal(struct sk_buff *skb)
net/core/dev.c:
int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
net/ipv4/netfilter/ip_tables.c:
unsigned int ipt_do_table(struct sk_buff *skb, ...)
BPF Tracing:
iptables vs XDP - DROP case
BPF
BPF
Beginning point: BPF ATTACH !!
BPF
Return point: BPF ATTACH !!
Return point: BPF ATTACH !!
net/core/dev.c:
static int netif_receive_skb_internal(struct sk_buff *skb)
net/core/dev.c:
int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
net/ipv4/netfilter/ip_tables.c:
unsigned int ipt_do_table(struct sk_buff *skb, ...)
BPF Tracing:
iptables vs XDP - DROP case
BPF
BPF
BPFSEC("kprobe/netif_receive_skb_internal")
int bpf_trace_receive_skb(struct pt_regs *ctx)
{
long skb_ptr = PT_REGS_PARM1(ctx);
u64 start_time = bpf_ktime_get_ns();
bpf_map_update_elem(&tracing_map, &skb_ptr, &start_time,
BPF_ANY);
return 0;
}
BPF
BPF
net/core/dev.c:
static int netif_receive_skb_internal(struct sk_buff *skb)
net/core/dev.c:
int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
net/ipv4/netfilter/ip_tables.c:
unsigned int ipt_do_table(struct sk_buff *skb, ...)
BPF Tracing:
iptables vs XDP - DROP case
BPF
SEC("kprobe/netif_receive_skb_internal")
int bpf_trace_receive_skb(struct pt_regs *ctx)
{
long skb_ptr = PT_REGS_PARM1(ctx);
u64 start_time = bpf_ktime_get_ns();
bpf_map_update_elem(&tracing_map, &skb_ptr, &start_time,
BPF_ANY);
return 0;
}
BPF
BPF
net/core/dev.c:
static int netif_receive_skb_internal(struct sk_buff *skb)
net/core/dev.c:
int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
net/ipv4/netfilter/ip_tables.c:
unsigned int ipt_do_table(struct sk_buff *skb, ...)
BPF Tracing:
iptables vs XDP - DROP case
BPF
SEC("kretprobe/do_xdp_generic")
int bpf_trace_xdp_drop(struct pt_regs *ctx)
{
long skb_ptr = PT_REGS_PARM2(ctx);
int action = PT_REGS_RC(ctx);
if (action == XDP_DROP) {
u64 *time = bpf_map_lookup_elem(&tracing_map, &skb_ptr);
u64 cur_time = bpf_ktime_get_ns();
u64 delta = cur_time - tr->time;
*time = delta;
...
BPF
BPF
net/core/dev.c:
static int netif_receive_skb_internal(struct sk_buff *skb)
net/core/dev.c:
int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
net/ipv4/netfilter/ip_tables.c:
unsigned int ipt_do_table(struct sk_buff *skb, ...)
BPF Tracing:
iptables vs XDP - DROP case
BPF
SEC("kretprobe/do_xdp_generic")
int bpf_trace_xdp_drop(struct pt_regs *ctx)
{
long skb_ptr = PT_REGS_PARM2(ctx);
int action = PT_REGS_RC(ctx);
if (action == XDP_DROP) {
u64 *time = bpf_map_lookup_elem(&tracing_map, &skb_ptr);
u64 cur_time = bpf_ktime_get_ns();
u64 delta = cur_time - tr->time;
*time = delta;
...
BPF
BPF
net/core/dev.c:
static int netif_receive_skb_internal(struct sk_buff *skb)
net/core/dev.c:
int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
net/ipv4/netfilter/ip_tables.c:
unsigned int ipt_do_table(struct sk_buff *skb, ...)
BPF Tracing:
iptables vs XDP - DROP case
BPF
SEC("kretprobe/ipt_do_table")
int bpf_trace_iptables_drop(struct pt_regs *ctx)
{
long skb_ptr = PT_REGS_PARM1(ctx);
int action = PT_REGS_RC(ctx);
if (action == NF_DROP) {
u64 *time = bpf_map_lookup_elem(&tracing_map, &skb_ptr);
u64 cur_time = bpf_ktime_get_ns();
u64 delta = cur_time - tr->time;
*time = delta;
...
BPF
BPF
net/core/dev.c:
static int netif_receive_skb_internal(struct sk_buff *skb)
net/core/dev.c:
int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
net/ipv4/netfilter/ip_tables.c:
unsigned int ipt_do_table(struct sk_buff *skb, ...)
BPF Tracing:
iptables vs XDP - DROP case
BPF
SEC("kretprobe/ipt_do_table")
int bpf_trace_iptables_drop(struct pt_regs *ctx)
{
long skb_ptr = PT_REGS_PARM1(ctx);
int action = PT_REGS_RC(ctx);
if (action == NF_DROP) {
u64 *time = bpf_map_lookup_elem(&tracing_map, &skb_ptr);
u64 cur_time = bpf_ktime_get_ns();
u64 delta = cur_time - tr->time;
*time = delta;
...
Ftrace Tracing
iptables path VS XDP path
$ cat /sys/kernel/debug/tracing/trace
netif_receive_skb_internal() {
ktime_get_with_offset();
__netif_receive_skb() {
__netif_receive_skb_core() {
ip_rcv() {
pskb_trim_rcsum_slow();
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
}
ip_rcv_finish() {
udp_v4_early_demux();
ip_route_input_noref() {
ip_route_input_rcu() {
ip_route_input_slow() {
fib_table_lookup();
fib_validate_source() {
__fib_validate_source() {
fib_table_lookup();
}
}
}
}
}
ip_local_deliver() {
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
iptable_filter_hook() {
ipt_do_table() {
udp_mt();
__local_bh_enable_ip();
}
}
kfree_skb()
$ cat /sys/kernel/debug/tracing/trace
netif_receive_skb_internal() {
ktime_get_with_offset();
__netif_receive_skb() {
__netif_receive_skb_core() {
ip_rcv() {
pskb_trim_rcsum_slow();
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
}
ip_rcv_finish() {
udp_v4_early_demux();
ip_route_input_noref() {
ip_route_input_rcu() {
ip_route_input_slow() {
fib_table_lookup();
fib_validate_source() {
__fib_validate_source() {
fib_table_lookup();
}
}
}
}
}
ip_local_deliver() {
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
iptable_filter_hook() {
ipt_do_table() {
udp_mt();
__local_bh_enable_ip();
}
}
kfree_skb()
DROP
$ cat /sys/kernel/debug/tracing/trace
netif_receive_skb_internal() {
ktime_get_with_offset();
__netif_receive_skb() {
__netif_receive_skb_core() {
ip_rcv() {
pskb_trim_rcsum_slow();
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
}
ip_rcv_finish() {
udp_v4_early_demux();
ip_route_input_noref() {
ip_route_input_rcu() {
ip_route_input_slow() {
fib_table_lookup();
fib_validate_source() {
__fib_validate_source() {
fib_table_lookup();
}
}
}
}
}
ip_local_deliver() {
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
iptable_filter_hook() {
ipt_do_table() {
udp_mt();
__local_bh_enable_ip();
}
}
kfree_skb()
netif_receive_skb_internal() {
ktime_get_with_offset();
do_xdp_generic() {
pskb_expand_head() {
__kmalloc_reserve.isra.48() {
__kmalloc_node_track_caller() {
kmalloc_slab();
should_failslab();
}
}
ksize();
skb_free_head() {
page_frag_free();
}
skb_headers_offset_update();
}
__bpf_prog_run32() {
___bpf_prog_run();
}
kfree_skb()
DROP
DROP
$ cat /sys/kernel/debug/tracing/trace
netif_receive_skb_internal() {
ktime_get_with_offset();
__netif_receive_skb() {
__netif_receive_skb_core() {
ip_rcv() {
pskb_trim_rcsum_slow();
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
}
ip_rcv_finish() {
udp_v4_early_demux();
ip_route_input_noref() {
ip_route_input_rcu() {
ip_route_input_slow() {
fib_table_lookup();
fib_validate_source() {
__fib_validate_source() {
fib_table_lookup();
}
}
}
}
}
ip_local_deliver() {
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
iptable_filter_hook() {
ipt_do_table() {
udp_mt();
__local_bh_enable_ip();
}
}
kfree_skb()
netif_receive_skb_internal() {
ktime_get_with_offset();
do_xdp_generic() {
pskb_expand_head() {
__kmalloc_reserve.isra.48() {
__kmalloc_node_track_caller() {
kmalloc_slab();
should_failslab();
}
}
ksize();
skb_free_head() {
page_frag_free();
}
skb_headers_offset_update();
}
__bpf_prog_run32() {
___bpf_prog_run();
}
kfree_skb()
DROP
DROP
$ cat /sys/kernel/debug/tracing/trace
netif_receive_skb_internal() {
ktime_get_with_offset();
__netif_receive_skb() {
__netif_receive_skb_core() {
ip_rcv() {
pskb_trim_rcsum_slow();
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
}
ip_rcv_finish() {
udp_v4_early_demux();
ip_route_input_noref() {
ip_route_input_rcu() {
ip_route_input_slow() {
fib_table_lookup();
fib_validate_source() {
__fib_validate_source() {
fib_table_lookup();
}
}
}
}
}
ip_local_deliver() {
nf_hook_slow() {
iptable_mangle_hook() {
ipt_do_table() {
__local_bh_enable_ip();
}
}
iptable_filter_hook() {
ipt_do_table() {
udp_mt();
__local_bh_enable_ip();
}
}
kfree_skb()
netif_receive_skb_internal() {
ktime_get_with_offset();
do_xdp_generic() {
pskb_expand_head() {
__kmalloc_reserve.isra.48() {
__kmalloc_node_track_caller() {
kmalloc_slab();
should_failslab();
}
}
ksize();
skb_free_head() {
page_frag_free();
}
skb_headers_offset_update();
}
__bpf_prog_run32() {
___bpf_prog_run();
}
kfree_skb()
DROP
DROP
YOU WIN !!
“XDP is LOVE”
BPF internals
BPF Infrastructure:
1) Hook points in-kernel callback point
2) LOAD ATTACH CALLBACK
3) Verifier / Interpreter / JIT
4) Map user-to-kernel shared memory
5) helper를 통한 커널함수호출 leveraging
6) Object pinning /sys/fs/bpf/…
...
Hook points: callback points
KERNEL SPACE
XDP: L2 device driver 지점
tc: L3 DD 직전 / 직후 지점 kprobe: 함수 Entry / Return
. . .
. . .
Hook points: callback points
KERNEL SPACE
XDP: L2 device driver 지점
tc: L3 DD 직전 / 직후 지점 kprobe: 함수 Entry / Return
. . .
. . .if (has_bpf_prog)
BPF_PROG_RUN();
->bpf_func(ctx, insni);
특정 커널 함수 안에
Hook points: callback points
KERNEL SPACE
XDP: L2 device driver 지점
kprobe: 함수 Entry / Return
. . .
. . .
BPF
BPF
BPF
BPF prog injection !!
tc: L3 DD 직전 / 직후 지점if (has_bpf_prog)
BPF_PROG_RUN();
->bpf_func(ctx, insni);
특정 커널 함수 안에
Hook points: callback points
KERNEL SPACE
XDP: L2 device driver 지점
kprobe: 함수 Entry / Return
. . .
. . .
BPF
BPF
BPF
BPF prog injection !!
tc: L3 DD 직전 / 직후 지점if (has_bpf_prog)
BPF_PROG_RUN();
->bpf_func(ctx, insni);
BPF Interpreter
또는
JIT 된 머신코드
특정 커널 함수 안에
Hook points: callback points
KERNEL SPACE
XDP: L2 device driver 지점
kprobe: 함수 Entry / Return
. . .
. . .
BPF
BPF
BPF
BPF prog injection !!
HOW ?
tc: L3 DD 직전 / 직후 지점
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
Map 1
(Shared memory)
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
3. Loading
BPF
BPF
BPF prog injection !!
BPF_PROG_LOAD
Map 1
(Shared memory)
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
3. Loading
BPF
BPF
BPF prog injection !!
BPF Controller
(User App)
BPF library: libbpf
prog/map
load, attach, control
. . .
Map 1
(Shared memory)
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
3. Loading
BPF
BPF
BPF prog injection !!
BPF Controller
(User App)
BPF library: libbpf
prog/map
load, attach, control
. . .
HOW ? in bpf()
Map 1
(Shared memory)
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
3. Loading
BPF Controller
(User App)
BPF library: libbpf
prog/map
load, attach, control
. . .
BPF LOAD 과정:
1. BPF prog / map alloc
2. Verifier (loop, mem access 범위)
Map 1
(Shared memory)
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
3. Loading
BPF Controller
(User App)
BPF library: libbpf
prog/map
load, attach, control
. . .
BPF LOAD 과정:
1. BPF prog / map alloc
2. Verifier (loop, mem access 범위)
3. 2차 Relocation:
1) map fd → map ptr
2) helper ID → func addr
Map 1
(Shared memory)
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
3. Loading
BPF Controller
(User App)
BPF library: libbpf
prog/map
load, attach, control
. . .
BPF LOAD 과정:
1. BPF prog / map alloc
2. Verifier (loop, mem access 범위)
3. 2차 Relocation:
1) map fd → map ptr
2) helper ID → func addr
4. select runtime:
1) BPF interpreter func addr
2) JIT 후 BPF func addr
return fd;
Map 1
(Shared memory)
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
3. Loading
BPF Controller
(User App)
BPF library: libbpf
prog/map
load, attach, control
. . .
BPF LOAD 과정:
1. BPF prog / map alloc
2. Verifier (loop, mem access 범위)
3. 2차 Relocation:
1) map fd → map ptr
2) helper ID → func addr
4. select runtime:
1) BPF interpreter func addr
2) JIT 후 BPF func addr
if (has_bpf_prog)
BPF_PROG_RUN();
->bpf_func(ctx, insni);
return fd;
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
c소스 _kern.c
clang / llc 컴파일
BPF 프로그램
Or
BPF bytecode
BPF
elf
1. ELF parsing,
2. 1차 Relocation:
1) map fd
2) bpf to bpf call
3. Loading
BPF
BPF
BPF Controller
(User App)
BPF library: libbpf
prog/map
load, attach, control
. . .
Map 1
(Shared memory)
Map 2
(Shared memory)
.
.
.
다양한 BPF ATTACH 방식:
- sock(), send() AF_NETLINK
- bpf() syscall BPF_PROG_ATTACH
BPF_RAW_TRACEPOINT_OPEN
- kprobe event id, ioctl()
PERF_EVENT_IOC_SET_BPF
...
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
BPF
BPF
BPF Controller
(User App)
BPF library: libbpf
prog/map
load, attach, control
. . .
Map 1
(Shared memory)
Map 2
(Shared memory)
.
.
.
BPF
Callback !!
Callback !!
BPF CALLBACK !!
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
BPF
BPF
BPF Controller
(User App)
BPF library: libbpf
prog/map
load, attach, control
. . .
Map 1
(Shared memory)
Map 2
(Shared memory)
.
.
.
BPF
func(): Helper
func()
func()
func()
BPF Helper 함수를 통한 커널함수 호출 leveraging
!!
KERNEL SPACE
tc ip
BPF library
in-iproute2
bpf() SYSCALL
BPF
BPF
. . .
Map 1
(Shared memory)
Map 2
(Shared memory)
.
.
.
BPF
func(): Helper
func()
func()
func()
BPF Controller 1
(User App)
BPF library: libbpf
prog/map
load, attach, control
BPF Controller 2
(User App)
BPF map 을 통한 user to kernel memory shared
KERNEL SPACE
bpf() SYSCALL
BPF Controller 1
(User App)
ip tc
Map 1
(Shared memory)
Map 2
(Shared memory)
.
.
BPF
BPF
BPF
func(): Helper
func()
func()
func()
BPF library
in-iproute2
BPF Controller 2
(User App)
. . . . . .
BPF Architecture:
BPF library: libbpf
prog/map
load, attach, control
XDP internals
XDP_ABORT
XDP_DROP
XDP_PASS
XDP_TX
XDP_REDIRECT
XDP RETURN TYPE
XDP_REDIRECT
XDP_TX
XDP_PASS
BPF
APP
XDP_DROP
Network Device Driver
Generic XDP
vs
Driver XDP
XDP GENERIC PATH
TX
APP
L7
RX
BPF
TC Ingress
PREROUTING ROUTING
TCP/UDP
FORWARD ROUTING
INPUT OUTPUT
POSTROUTING
TC egress
L3 output
L4
L3
L3
DD
XDP GENERIC PATH
TX
APP
L7
RX
BPF
TC Ingress
PREROUTING ROUTING
TCP/UDP
FORWARD ROUTING
INPUT OUTPUT
POSTROUTING
TC egress
L3 output
L4
L3
L3
DD
DRIVER XDP PATH
REDIRECT
TX
APP
RX
L7
L4
L3
L2
PASS
BPF
DRIVER XDP PATH
REDIRECT
TX
APP
RX
L7
L4
L3
L2
PASS
BPF
Driver XDP vs Generic XDP
REDIRECT
TX
RX
PASS
BPF
REDIRECT
TX
RX
L3
BPF
PASS
XDP 자료구조와 SKB
xdp->data
HEADROOM
MAC
HEADER
IP
HEADER
TAIL/
TAILROOM
END
skb->data
xdp->data_hard_start
xdp->data_meta
xdp_frame
DATA ACCESS 허용범위
xdp->data
HEADROOM
MAC
HEADER
IP
HEADER
TAIL/
TAILROOM
END
xdp->data_meta
xdp->data_hard_start
xdp->data
HEADROOM
MAC
HEADER
IP
HEADER
TAIL/
TAILROOM
END
xdp->data_meta
xdp->data_hard_start
XDP_REDIRECT분석
XDP_REDIRECTBPF
APP
eth0 eth1 eth2 eth3
XDP_TX
REDIRECT MAP
XDP_REDIRECTBPF
APP
eth0 eth1 eth2 eth3
XDP_TX
REDIRECT MAP
XDP_REDIRECTBPF
APP
eth0 eth1 eth2 eth3
XDP_TX
REDIRECT MAP
bpf_redirect()통한 XDP_REDIRECT
bpf_redirect()에 대해
XDP_REDIRECTBPF
APP
eth0 eth1 eth2 eth3
XDP_TX
REDIRECT MAP
XDP_REDIRECTBPF
APP
eth0 eth1 eth2 eth3
XDP_TX
REDIRECT MAP
XDP_REDIRECT - bulkTX
bulkTX
REDIRECT
TX
RX
BPF
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
map
DEVMAP
DEVMAP
REDIRECT
TX
RX
BPF
xdp_frame
DEVMAP
redirect info
bpf_redirect_map
Key Value(Device)
0 X
1 X
2 X
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
CPUMAP
CPUMAP
REDIRECT
???
RX
BPF
xdp_frame
CPUMAP
redirect info
bpf_redirect_map
Key Value(CPU)
0 X
1 X
2 X
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
CPUMAP
REDIRECT
netif_receive_skb_core
RX
BPF
xdp_frame
CPUMAP
redirect info
bpf_redirect_map
Key Value(CPU)
0 X
1 X
2 X
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
xdp_frame
GENERIC_XDP의 REDIRECT
BPFILTER
● memory model switching
○ /net/core/xdp.c
● page pool
○ /net/core/page_pool
● offload
● AF_XDP && XSK(XDP SOCKET)
● helper functions
● Device Driver
Additional Topics:
● Verifier
○ CFG, DAG, register, memory check...
● Other types
○ TC, SOCKET FILTER, CGROUP
● BTF
○ ELFutils, clang -g, llc -mattr=dwarfris
● Tail call
○ bpf_prog_array 연관
Additional Topics:
● FACEBOOK’s Katran
○ L4 Load-balancing
○ https://github.com/facebookincubator/katran
● Suricata
○ IPD/IDS engine
○ https://suricata-ids.org/
● Cilium
○ https://cilium.io/
● IOvisor bcc
○ https://www.iovisor.org/
● IR Decoding
○ https://lwn.net/Articles/759188/
Additional Topics:

Contenu connexe

Tendances

BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machineAlexei Starovoitov
 
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPDockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPThomas Graf
 
Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!Ray Jenkins
 
Security Monitoring with eBPF
Security Monitoring with eBPFSecurity Monitoring with eBPF
Security Monitoring with eBPFAlex Maestretti
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and moreBrendan Gregg
 
Xdp and ebpf_maps
Xdp and ebpf_mapsXdp and ebpf_maps
Xdp and ebpf_mapslcplcp1
 
Kernel Recipes 2017 - EBPF and XDP - Eric Leblond
Kernel Recipes 2017 - EBPF and XDP - Eric LeblondKernel Recipes 2017 - EBPF and XDP - Eric Leblond
Kernel Recipes 2017 - EBPF and XDP - Eric LeblondAnne Nicolas
 
ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!Affan Syed
 
Linux kernel tracing
Linux kernel tracingLinux kernel tracing
Linux kernel tracingViller Hsiao
 
Staring into the eBPF Abyss
Staring into the eBPF AbyssStaring into the eBPF Abyss
Staring into the eBPF AbyssSasha Goldshtein
 
Building Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCBuilding Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCKernel TLV
 
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021Valeriy Kravchuk
 
Introduction to eBPF
Introduction to eBPFIntroduction to eBPF
Introduction to eBPFRogerColl2
 
Introduction to eBPF and XDP
Introduction to eBPF and XDPIntroduction to eBPF and XDP
Introduction to eBPF and XDPlcplcp1
 
Faster packet processing in Linux: XDP
Faster packet processing in Linux: XDPFaster packet processing in Linux: XDP
Faster packet processing in Linux: XDPDaniel T. Lee
 
Fun with Network Interfaces
Fun with Network InterfacesFun with Network Interfaces
Fun with Network InterfacesKernel TLV
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
 
Cilium - BPF & XDP for containers
 Cilium - BPF & XDP for containers Cilium - BPF & XDP for containers
Cilium - BPF & XDP for containersDocker, Inc.
 
Meet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracingMeet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracingViller Hsiao
 

Tendances (20)

BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machine
 
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPDockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
 
Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!
 
eBPF Basics
eBPF BasicseBPF Basics
eBPF Basics
 
Security Monitoring with eBPF
Security Monitoring with eBPFSecurity Monitoring with eBPF
Security Monitoring with eBPF
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
 
Xdp and ebpf_maps
Xdp and ebpf_mapsXdp and ebpf_maps
Xdp and ebpf_maps
 
Kernel Recipes 2017 - EBPF and XDP - Eric Leblond
Kernel Recipes 2017 - EBPF and XDP - Eric LeblondKernel Recipes 2017 - EBPF and XDP - Eric Leblond
Kernel Recipes 2017 - EBPF and XDP - Eric Leblond
 
ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!
 
Linux kernel tracing
Linux kernel tracingLinux kernel tracing
Linux kernel tracing
 
Staring into the eBPF Abyss
Staring into the eBPF AbyssStaring into the eBPF Abyss
Staring into the eBPF Abyss
 
Building Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCBuilding Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCC
 
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
 
Introduction to eBPF
Introduction to eBPFIntroduction to eBPF
Introduction to eBPF
 
Introduction to eBPF and XDP
Introduction to eBPF and XDPIntroduction to eBPF and XDP
Introduction to eBPF and XDP
 
Faster packet processing in Linux: XDP
Faster packet processing in Linux: XDPFaster packet processing in Linux: XDP
Faster packet processing in Linux: XDP
 
Fun with Network Interfaces
Fun with Network InterfacesFun with Network Interfaces
Fun with Network Interfaces
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
Cilium - BPF & XDP for containers
 Cilium - BPF & XDP for containers Cilium - BPF & XDP for containers
Cilium - BPF & XDP for containers
 
Meet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracingMeet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracing
 

Similaire à Linux Kernel - BPF / XDP Fast Path

Efficient System Monitoring in Cloud Native Environments
Efficient System Monitoring in Cloud Native EnvironmentsEfficient System Monitoring in Cloud Native Environments
Efficient System Monitoring in Cloud Native EnvironmentsGergely Szabó
 
eBPF Tooling and Debugging Infrastructure
eBPF Tooling and Debugging InfrastructureeBPF Tooling and Debugging Infrastructure
eBPF Tooling and Debugging InfrastructureNetronome
 
eBPF Debugging Infrastructure - Current Techniques
eBPF Debugging Infrastructure - Current TechniqueseBPF Debugging Infrastructure - Current Techniques
eBPF Debugging Infrastructure - Current TechniquesNetronome
 
DCSF 19 eBPF Superpowers
DCSF 19 eBPF SuperpowersDCSF 19 eBPF Superpowers
DCSF 19 eBPF SuperpowersDocker, Inc.
 
DISTRIBUTED PERFORMANCE ANALYSIS USING INFLUXDB AND THE LINUX EBPF VIRTUAL MA...
DISTRIBUTED PERFORMANCE ANALYSIS USING INFLUXDB AND THE LINUX EBPF VIRTUAL MA...DISTRIBUTED PERFORMANCE ANALYSIS USING INFLUXDB AND THE LINUX EBPF VIRTUAL MA...
DISTRIBUTED PERFORMANCE ANALYSIS USING INFLUXDB AND THE LINUX EBPF VIRTUAL MA...InfluxData
 
Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Brendan Gregg
 
Systems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedSystems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedBrendan Gregg
 
Berkeley Packet Filters
Berkeley Packet FiltersBerkeley Packet Filters
Berkeley Packet FiltersKernel TLV
 
BPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveBPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveNetronome
 
USENIX Vault'19: Performance analysis in Linux storage stack with BPF
USENIX Vault'19: Performance analysis in Linux storage stack with BPFUSENIX Vault'19: Performance analysis in Linux storage stack with BPF
USENIX Vault'19: Performance analysis in Linux storage stack with BPFTaeung Song
 
Debugging Python with gdb
Debugging Python with gdbDebugging Python with gdb
Debugging Python with gdbRoman Podoliaka
 
eBPF - Observability In Deep
eBPF - Observability In DeepeBPF - Observability In Deep
eBPF - Observability In DeepMydbops
 
Kubernetes Networking with Cilium - Deep Dive
Kubernetes Networking with Cilium - Deep DiveKubernetes Networking with Cilium - Deep Dive
Kubernetes Networking with Cilium - Deep DiveMichal Rostecki
 
Not breaking userspace: the evolving Linux ABI
Not breaking userspace: the evolving Linux ABINot breaking userspace: the evolving Linux ABI
Not breaking userspace: the evolving Linux ABIAlison Chaiken
 
Software Quality Assurance Tooling 2023
Software Quality Assurance Tooling 2023Software Quality Assurance Tooling 2023
Software Quality Assurance Tooling 2023Henry Schreiner
 
Introduction of eBPF - 時下最夯的Linux Technology
Introduction of eBPF - 時下最夯的Linux Technology Introduction of eBPF - 時下最夯的Linux Technology
Introduction of eBPF - 時下最夯的Linux Technology Jace Liang
 
An Overview of the IHK/McKernel Multi-kernel Operating System
An Overview of the IHK/McKernel Multi-kernel Operating SystemAn Overview of the IHK/McKernel Multi-kernel Operating System
An Overview of the IHK/McKernel Multi-kernel Operating SystemLinaro
 

Similaire à Linux Kernel - BPF / XDP Fast Path (20)

Efficient System Monitoring in Cloud Native Environments
Efficient System Monitoring in Cloud Native EnvironmentsEfficient System Monitoring in Cloud Native Environments
Efficient System Monitoring in Cloud Native Environments
 
eBPF Tooling and Debugging Infrastructure
eBPF Tooling and Debugging InfrastructureeBPF Tooling and Debugging Infrastructure
eBPF Tooling and Debugging Infrastructure
 
eBPF Debugging Infrastructure - Current Techniques
eBPF Debugging Infrastructure - Current TechniqueseBPF Debugging Infrastructure - Current Techniques
eBPF Debugging Infrastructure - Current Techniques
 
DCSF 19 eBPF Superpowers
DCSF 19 eBPF SuperpowersDCSF 19 eBPF Superpowers
DCSF 19 eBPF Superpowers
 
DISTRIBUTED PERFORMANCE ANALYSIS USING INFLUXDB AND THE LINUX EBPF VIRTUAL MA...
DISTRIBUTED PERFORMANCE ANALYSIS USING INFLUXDB AND THE LINUX EBPF VIRTUAL MA...DISTRIBUTED PERFORMANCE ANALYSIS USING INFLUXDB AND THE LINUX EBPF VIRTUAL MA...
DISTRIBUTED PERFORMANCE ANALYSIS USING INFLUXDB AND THE LINUX EBPF VIRTUAL MA...
 
Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)
 
Beagleboard xm-setup
Beagleboard xm-setupBeagleboard xm-setup
Beagleboard xm-setup
 
Systems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting StartedSystems@Scale 2021 BPF Performance Getting Started
Systems@Scale 2021 BPF Performance Getting Started
 
Berkeley Packet Filters
Berkeley Packet FiltersBerkeley Packet Filters
Berkeley Packet Filters
 
BPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveBPF Hardware Offload Deep Dive
BPF Hardware Offload Deep Dive
 
USENIX Vault'19: Performance analysis in Linux storage stack with BPF
USENIX Vault'19: Performance analysis in Linux storage stack with BPFUSENIX Vault'19: Performance analysis in Linux storage stack with BPF
USENIX Vault'19: Performance analysis in Linux storage stack with BPF
 
Meetup 2009
Meetup 2009Meetup 2009
Meetup 2009
 
PHP selber bauen
PHP selber bauenPHP selber bauen
PHP selber bauen
 
Debugging Python with gdb
Debugging Python with gdbDebugging Python with gdb
Debugging Python with gdb
 
eBPF - Observability In Deep
eBPF - Observability In DeepeBPF - Observability In Deep
eBPF - Observability In Deep
 
Kubernetes Networking with Cilium - Deep Dive
Kubernetes Networking with Cilium - Deep DiveKubernetes Networking with Cilium - Deep Dive
Kubernetes Networking with Cilium - Deep Dive
 
Not breaking userspace: the evolving Linux ABI
Not breaking userspace: the evolving Linux ABINot breaking userspace: the evolving Linux ABI
Not breaking userspace: the evolving Linux ABI
 
Software Quality Assurance Tooling 2023
Software Quality Assurance Tooling 2023Software Quality Assurance Tooling 2023
Software Quality Assurance Tooling 2023
 
Introduction of eBPF - 時下最夯的Linux Technology
Introduction of eBPF - 時下最夯的Linux Technology Introduction of eBPF - 時下最夯的Linux Technology
Introduction of eBPF - 時下最夯的Linux Technology
 
An Overview of the IHK/McKernel Multi-kernel Operating System
An Overview of the IHK/McKernel Multi-kernel Operating SystemAn Overview of the IHK/McKernel Multi-kernel Operating System
An Overview of the IHK/McKernel Multi-kernel Operating System
 

Dernier

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Dernier (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

Linux Kernel - BPF / XDP Fast Path

  • 1. Linux Kernel - BPF / XDP KossLab 유태희, 송태웅
  • 2. BPF 란 ? 1. 1. Berkeley Packet Filter since 1992 2. 2. Kernel Infrastructure
  • 3. BPF 란 ? 1. Berkeley Packet Filter since 1992 1. 2. Kernel Infrastructure a. - Interpreter in-kernel virtual machine - Hook points in-kernel callback point - Map - Helper
  • 4. BPF 란 ? “Safe dynamic programs and tools” "런타임중 안전하게 커널코드를 삽입하는 기술"
  • 5. BPF Infrastructure: 안전한 code injection 작전 1) Native 머신코드 대신 BPF instruction 을 활용하자 2) Verifier 를 통해 위험요소를 미리검사하자 3) (기존)커널함수가 필요할때 Helper 함수를 통해서만 호출하자
  • 6. BPF Infrastructure: 안전한 code injection 작전 1) Native 머신코드 대신 BPF instruction 을 활용하자
  • 7. BPF Infrastructure: 안전한 code injection 작전 2) Verifier 를 통해 위험요소를 미리검사하자
  • 8. BPF Infrastructure: 안전한 code injection 작전 3) (기존)커널함수가 필요할때 Helper 함수를 통해서만 호출하자
  • 9. BPF Infrastructure: 안전한 code injection 위한 기반기술 Kernel += BPF Interpreter in-kernel virtual machine + Verifier + BPF Helper 함수 추가 leveraging kernel func + BPF syscall prog/map: loading & attaching 등
  • 10. 1) 주니어 x86 Instruction set ’simplified x86’ (참고: PLUMgrind의 x86 bytecode verifier 실패) 2) BPF = classic BPF:10% + x86:70% + arm64:25% + risc:5% 3) Instruction encoding 사이즈 고정 (for high interpreter speed) 4) 간소화 -> 위험을 예측하고 예방하기 수월 (Verifier를 통한 loop, memory access 범위 점검 등) 5) Architecture-independent BPF Instruction set:
  • 11. BPF Instruction set: immediate:32 offset:16 src:4 dst:4 opcode:8 $ cat include/uapi/linux/bpf.h [...] struct bpf_insn { __u8 code; /* opcode */ __u8 dst_reg:4; /* dest register */ __u8 src_reg:4; /* source register */ __s16 off; /* signed offset */ __s32 imm; /* signed immediate constant */ }; [...]
  • 12. BPF Instruction set: immediate:32 offset:16 src:4 dst:4 opcode:8 class:4 + fields:4 + fields:4 eBPF: include/uapi/linux/bpf.h cBPF: include/uapi/linux/bpf_common.h
  • 13. BPF Instruction set: immediate:32 offset:16 src:4 dst:4 opcode:8 class:4 + LD/ST fields:4 + ALU/JUM fields:4 eBPF: include/uapi/linux/bpf.h cBPF: include/uapi/linux/bpf_common.h LD/ST 계열: 0x00 ~ 0x03 ALU/JMP 계열: 0x04 ~ 0x07
  • 14. BPF Instruction set: immediate:32 offset:16 src:4 dst:4 opcode:8 class:4 + LD/ST fields:4 + ALU/JUM fields:4 eBPF: include/uapi/linux/bpf.h cBPF: include/uapi/linux/bpf_common.h LD/ST 계열: 0x00 ~ 0x03 ALU/JMP 계열: 0x04 ~ 0x07
  • 15. BPF Instruction set: struct bpf_insn prog[] = { BPF_MOV64_REG(BPF_REG_6, BPF_REG_1), BPF_LD_ABS(BPF_B, ETH_HLEN + offsetof(struct iphdr, protocol) /* R0 = ip->proto */), BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -4), /* *(u32 *)(fp - 4) = r0 */ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4), /* r2 = fp - 4 */ BPF_LD_MAP_FD(BPF_REG_1, map_fd), BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2), BPF_MOV64_IMM(BPF_REG_1, 1), /* r1 = 1 */ BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_DW, BPF_REG_0, BPF_REG_1, 0, 0), /* xadd r0 += r1 */ BPF_MOV64_IMM(BPF_REG_0, 0), /* r0 = 0 */ BPF_EXIT_INSN(), }; https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/tree/samples/bpf/sock_example.c
  • 16. BPF Helper 함수: $ grep BPF_CALL kernel/bpf/helpers.c: BPF_CALL_2(bpf_map_lookup_elem, struct bpf_map *, map, void *, key) BPF_CALL_4(bpf_map_update_elem, struct bpf_map *, map, void *, key, [...] kernel/trace/bpf_trace.c: BPF_CALL_2(bpf_override_return, struct pt_regs *, regs, unsigned long, rc) BPF_CALL_3(bpf_probe_read, void *, dst, u32, size, const void *, unsafe_ptr) BPF_CALL_3(bpf_probe_write_user, void *, unsafe_ptr, const void *, src, BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, u64, arg1, [...] net/core/filter.c: BPF_CALL_1(bpf_skb_get_pay_offset, struct sk_buff *, skb) BPF_CALL_3(bpf_skb_get_nlattr, struct sk_buff *, skb, u32, a, u32, x) [...]
  • 17. BPF as a kernel subproject “Safe dynamic programs and tools” $ cat MAINTAINERS | grep -A 3 BPF BPF (Safe dynamic programs and tools) M: Alexei Starovoitov <ast@kernel.org> M: Daniel Borkmann <daniel@iogearbox.net> L: netdev@vger.kernel.org [...]
  • 18. “Safe dynamic programs and tools” $ cat MAINTAINERS | grep -A 27 BPF BPF (Safe dynamic programs and tools) [...] F: arch/x86/net/bpf_jit* [...] F: kernel/bpf/ F: kernel/trace/bpf_trace.c [...] F: net/core/filter.c F: net/sched/act_bpf.c F: net/sched/cls_bpf.c [...] [...] F: samples/bpf/ F: tools/bpf/ F: tools/lib/bpf/ F: tools/testing/selftests/bpf/ BPF as a kernel subproject
  • 19. $ cat MAINTAINERS | grep -A 27 BPF BPF (Safe dynamic programs and tools) [...] F: arch/x86/net/bpf_jit* [...] F: kernel/bpf/ F: kernel/trace/bpf_trace.c [...] F: net/core/filter.c F: net/sched/act_bpf.c F: net/sched/cls_bpf.c [...] [...] F: samples/bpf/ F: tools/bpf/ F: tools/lib/bpf/ F: tools/testing/selftests/bpf/ JIT 지원 arch: x86, arm, arm64 sparc, s390, powerpc, mips “Safe dynamic programs and tools” BPF as a kernel subproject
  • 20. “Safe dynamic programs and tools” $ cat MAINTAINERS | grep -A 27 BPF BPF (Safe dynamic programs and tools) [...] F: arch/x86/net/bpf_jit* [...] F: kernel/bpf/ F: kernel/trace/bpf_trace.c [...] F: net/core/filter.c F: net/sched/act_bpf.c F: net/sched/cls_bpf.c [...] [...] F: samples/bpf/ F: tools/bpf/ F: tools/lib/bpf/ F: tools/testing/selftests/bpf/ BPF core: Syscall, Interpreter, Verifier, Generic Helpers, Maps, ... BPF as a kernel subproject
  • 21. “Safe dynamic programs and tools” $ cat MAINTAINERS | grep -A 27 BPF BPF (Safe dynamic programs and tools) [...] F: arch/x86/net/bpf_jit* [...] F: kernel/bpf/ F: kernel/trace/bpf_trace.c [...] F: net/core/filter.c F: net/sched/act_bpf.c F: net/sched/cls_bpf.c [...] [...] F: samples/bpf/ F: tools/bpf/ F: tools/lib/bpf/ F: tools/testing/selftests/bpf/ Hook points, Specific Helpers ... For cBPF, ... BPF as a kernel subproject
  • 22. “Safe dynamic programs and tools” $ cat MAINTAINERS | grep -A 27 BPF BPF (Safe dynamic programs and tools) [...] F: arch/x86/net/bpf_jit* [...] F: kernel/bpf/ F: kernel/trace/bpf_trace.c [...] F: net/core/filter.c F: net/sched/act_bpf.c F: net/sched/cls_bpf.c [...] [...] F: samples/bpf/ F: tools/bpf/ F: tools/lib/bpf/ F: tools/testing/selftests/bpf/ bpf loading(lib), bpf tool, test codes, samples, ... BPF as a kernel subproject
  • 23. BPF Infrastructure: BPF프로그램 활용을 위한 지원 1) Hook points in-kernel callback point 2) Map user-to-kernel shared memory 3) helper를 통한 커널함수호출 leveraging 4) Object pinning /sys/fs/bpf/...
  • 24. KERNEL SPACE bpf() SYSCALL BPF Controller 1 (User App) ip tc Map 1 (Shared memory) Map 2 (Shared memory) . . BPF BPF BPF func(): Helper func() func() func() BPF library in-iproute2 BPF Controller 2 (User App) . . . . . . BPF Architecture: BPF library: libbpf prog/map load, attach, control
  • 25. XDP
  • 30. XDP == FAST PATH
  • 31. NORMAL PATH TX APP L7 RX L3 input TC Ingress PREROUTING ROUTING TCP/UDP FORWARD ROUTING INPUT OUTPUT POSTROUTING TC egress L3 output L4 L3 L3 DD
  • 32. NORMAL PATH TX APP L7 RX L3 input TC Ingress PREROUTING ROUTING TCP/UDP FORWARD ROUTING INPUT OUTPUT POSTROUTING TC egress L3 output L4 L3 L3 DD
  • 35. 준비물 1. 컴파일 컴퓨터 1대 2. 테스트 컴퓨터 1대(x86추천) 3. 커널 소스코드 4. clang + llvm(컴파일러) 5. bpftool(bpf 프로그램 로더) 6. bpf를 지원하는 iproute2 패키지
  • 37. git.kernel.org 의 bpf tree https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git 커널 소스코드
  • 40. kernel source code 및 bpf sample code samples/bpf 예제
  • 41. kernel소스 내 sample code 분석 samples/bpf 예제(xdp_rxq_info_kern.c)
  • 42. BPF 프로그램 컴파일 실습 samples/bpf 컴파일
  • 43. $ mount bpffs /sys/fs/bpf -t bpf $ bpftool prog load ./xdp_rxq_info_kern.o /sys/fs/bpf/xdp 프로그램 로드
  • 44. $ ls /sys/fs/bpf/ $ ./bpftool prog list $ ./bpftool prog dump xlated id X jited 프로그램 확인
  • 45. $ ip link set dev lo xdp pin /sys/fs/bpf/xdp XDP프로그램 설정
  • 46. $ ip link show dev lo XDP프로그램 설정 확인
  • 47. $ ip link set dev lo xdp off $ rm /sys/fs/bpf/xdp XDP프로그램 설정 제거
  • 51. #PC2 $ ping 192.168.4.1 #PC1 $ iptables -A INPUT -s 192.168.4.2 -d 192.168.4.1 -p icmp -j DROP
  • 52. NORMAL PATH TX APP L7 RX L3 input TC Ingress PREROUTING ROUTING TCP/UDP FORWARD ROUTING INPUT OUTPUT POSTROUTING TC egress L3 output L4 L3 L3 DD
  • 53. NORMAL PATH TX APP L7 RX L3 input TC Ingress PREROUTING ROUTING TCP/UDP FORWARD ROUTING INPUT OUTPUT POSTROUTING TC egress L3 output L4 L3 L3 DD DROP
  • 55. $ ./bpftool prog load ./xdp_icmp.o /sys/fs/bpf/xdp_icmp $ ip link set dev lo xdp pin /sys/fs/bpf/xdp_icmp XDP프로그램 설정 제거
  • 56. XDP GENERIC PATH TX APP L7 RX BPF TC Ingress PREROUTING ROUTING TCP/UDP FORWARD ROUTING INPUT OUTPUT POSTROUTING TC egress L3 output L4 L3 L3 DD DROP
  • 59. netif_receive_skb_internal() ipt_do_table() Long time !! ~~ DROP BPF Tracing: iptables - DROP case
  • 61. netif_receive_skb_internal() do_xdp_generic() Short time !! ~~ DROP BPF Tracing: XDP - DROP case
  • 62. netif_receive_skb_internal() ipt_do_table() do_xdp_generic() Short time !! ~~ BPF Tracing: iptables vs XDP - DROP case DROP DROP Long time !! ~~
  • 63. BPF Tracing: iptables vs XDP - DROP case net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) DROP DROP
  • 64. net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case
  • 65. net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case BPF BPF Beginning point: BPF ATTACH !! BPF Return point: BPF ATTACH !! Return point: BPF ATTACH !!
  • 66. net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case BPF BPF BPFSEC("kprobe/netif_receive_skb_internal") int bpf_trace_receive_skb(struct pt_regs *ctx) { long skb_ptr = PT_REGS_PARM1(ctx); u64 start_time = bpf_ktime_get_ns(); bpf_map_update_elem(&tracing_map, &skb_ptr, &start_time, BPF_ANY); return 0; }
  • 67. BPF BPF net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case BPF SEC("kprobe/netif_receive_skb_internal") int bpf_trace_receive_skb(struct pt_regs *ctx) { long skb_ptr = PT_REGS_PARM1(ctx); u64 start_time = bpf_ktime_get_ns(); bpf_map_update_elem(&tracing_map, &skb_ptr, &start_time, BPF_ANY); return 0; }
  • 68. BPF BPF net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case BPF SEC("kretprobe/do_xdp_generic") int bpf_trace_xdp_drop(struct pt_regs *ctx) { long skb_ptr = PT_REGS_PARM2(ctx); int action = PT_REGS_RC(ctx); if (action == XDP_DROP) { u64 *time = bpf_map_lookup_elem(&tracing_map, &skb_ptr); u64 cur_time = bpf_ktime_get_ns(); u64 delta = cur_time - tr->time; *time = delta; ...
  • 69. BPF BPF net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case BPF SEC("kretprobe/do_xdp_generic") int bpf_trace_xdp_drop(struct pt_regs *ctx) { long skb_ptr = PT_REGS_PARM2(ctx); int action = PT_REGS_RC(ctx); if (action == XDP_DROP) { u64 *time = bpf_map_lookup_elem(&tracing_map, &skb_ptr); u64 cur_time = bpf_ktime_get_ns(); u64 delta = cur_time - tr->time; *time = delta; ...
  • 70. BPF BPF net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case BPF SEC("kretprobe/ipt_do_table") int bpf_trace_iptables_drop(struct pt_regs *ctx) { long skb_ptr = PT_REGS_PARM1(ctx); int action = PT_REGS_RC(ctx); if (action == NF_DROP) { u64 *time = bpf_map_lookup_elem(&tracing_map, &skb_ptr); u64 cur_time = bpf_ktime_get_ns(); u64 delta = cur_time - tr->time; *time = delta; ...
  • 71. BPF BPF net/core/dev.c: static int netif_receive_skb_internal(struct sk_buff *skb) net/core/dev.c: int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb) net/ipv4/netfilter/ip_tables.c: unsigned int ipt_do_table(struct sk_buff *skb, ...) BPF Tracing: iptables vs XDP - DROP case BPF SEC("kretprobe/ipt_do_table") int bpf_trace_iptables_drop(struct pt_regs *ctx) { long skb_ptr = PT_REGS_PARM1(ctx); int action = PT_REGS_RC(ctx); if (action == NF_DROP) { u64 *time = bpf_map_lookup_elem(&tracing_map, &skb_ptr); u64 cur_time = bpf_ktime_get_ns(); u64 delta = cur_time - tr->time; *time = delta; ...
  • 73. $ cat /sys/kernel/debug/tracing/trace netif_receive_skb_internal() { ktime_get_with_offset(); __netif_receive_skb() { __netif_receive_skb_core() { ip_rcv() { pskb_trim_rcsum_slow(); nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } } ip_rcv_finish() { udp_v4_early_demux(); ip_route_input_noref() { ip_route_input_rcu() { ip_route_input_slow() { fib_table_lookup(); fib_validate_source() { __fib_validate_source() { fib_table_lookup(); } } } } } ip_local_deliver() { nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } iptable_filter_hook() { ipt_do_table() { udp_mt(); __local_bh_enable_ip(); } } kfree_skb()
  • 74. $ cat /sys/kernel/debug/tracing/trace netif_receive_skb_internal() { ktime_get_with_offset(); __netif_receive_skb() { __netif_receive_skb_core() { ip_rcv() { pskb_trim_rcsum_slow(); nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } } ip_rcv_finish() { udp_v4_early_demux(); ip_route_input_noref() { ip_route_input_rcu() { ip_route_input_slow() { fib_table_lookup(); fib_validate_source() { __fib_validate_source() { fib_table_lookup(); } } } } } ip_local_deliver() { nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } iptable_filter_hook() { ipt_do_table() { udp_mt(); __local_bh_enable_ip(); } } kfree_skb() DROP
  • 75. $ cat /sys/kernel/debug/tracing/trace netif_receive_skb_internal() { ktime_get_with_offset(); __netif_receive_skb() { __netif_receive_skb_core() { ip_rcv() { pskb_trim_rcsum_slow(); nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } } ip_rcv_finish() { udp_v4_early_demux(); ip_route_input_noref() { ip_route_input_rcu() { ip_route_input_slow() { fib_table_lookup(); fib_validate_source() { __fib_validate_source() { fib_table_lookup(); } } } } } ip_local_deliver() { nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } iptable_filter_hook() { ipt_do_table() { udp_mt(); __local_bh_enable_ip(); } } kfree_skb() netif_receive_skb_internal() { ktime_get_with_offset(); do_xdp_generic() { pskb_expand_head() { __kmalloc_reserve.isra.48() { __kmalloc_node_track_caller() { kmalloc_slab(); should_failslab(); } } ksize(); skb_free_head() { page_frag_free(); } skb_headers_offset_update(); } __bpf_prog_run32() { ___bpf_prog_run(); } kfree_skb() DROP DROP
  • 76. $ cat /sys/kernel/debug/tracing/trace netif_receive_skb_internal() { ktime_get_with_offset(); __netif_receive_skb() { __netif_receive_skb_core() { ip_rcv() { pskb_trim_rcsum_slow(); nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } } ip_rcv_finish() { udp_v4_early_demux(); ip_route_input_noref() { ip_route_input_rcu() { ip_route_input_slow() { fib_table_lookup(); fib_validate_source() { __fib_validate_source() { fib_table_lookup(); } } } } } ip_local_deliver() { nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } iptable_filter_hook() { ipt_do_table() { udp_mt(); __local_bh_enable_ip(); } } kfree_skb() netif_receive_skb_internal() { ktime_get_with_offset(); do_xdp_generic() { pskb_expand_head() { __kmalloc_reserve.isra.48() { __kmalloc_node_track_caller() { kmalloc_slab(); should_failslab(); } } ksize(); skb_free_head() { page_frag_free(); } skb_headers_offset_update(); } __bpf_prog_run32() { ___bpf_prog_run(); } kfree_skb() DROP DROP
  • 77. $ cat /sys/kernel/debug/tracing/trace netif_receive_skb_internal() { ktime_get_with_offset(); __netif_receive_skb() { __netif_receive_skb_core() { ip_rcv() { pskb_trim_rcsum_slow(); nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } } ip_rcv_finish() { udp_v4_early_demux(); ip_route_input_noref() { ip_route_input_rcu() { ip_route_input_slow() { fib_table_lookup(); fib_validate_source() { __fib_validate_source() { fib_table_lookup(); } } } } } ip_local_deliver() { nf_hook_slow() { iptable_mangle_hook() { ipt_do_table() { __local_bh_enable_ip(); } } iptable_filter_hook() { ipt_do_table() { udp_mt(); __local_bh_enable_ip(); } } kfree_skb() netif_receive_skb_internal() { ktime_get_with_offset(); do_xdp_generic() { pskb_expand_head() { __kmalloc_reserve.isra.48() { __kmalloc_node_track_caller() { kmalloc_slab(); should_failslab(); } } ksize(); skb_free_head() { page_frag_free(); } skb_headers_offset_update(); } __bpf_prog_run32() { ___bpf_prog_run(); } kfree_skb() DROP DROP YOU WIN !! “XDP is LOVE”
  • 79. BPF Infrastructure: 1) Hook points in-kernel callback point 2) LOAD ATTACH CALLBACK 3) Verifier / Interpreter / JIT 4) Map user-to-kernel shared memory 5) helper를 통한 커널함수호출 leveraging 6) Object pinning /sys/fs/bpf/… ...
  • 80. Hook points: callback points KERNEL SPACE XDP: L2 device driver 지점 tc: L3 DD 직전 / 직후 지점 kprobe: 함수 Entry / Return . . . . . .
  • 81. Hook points: callback points KERNEL SPACE XDP: L2 device driver 지점 tc: L3 DD 직전 / 직후 지점 kprobe: 함수 Entry / Return . . . . . .if (has_bpf_prog) BPF_PROG_RUN(); ->bpf_func(ctx, insni); 특정 커널 함수 안에
  • 82. Hook points: callback points KERNEL SPACE XDP: L2 device driver 지점 kprobe: 함수 Entry / Return . . . . . . BPF BPF BPF BPF prog injection !! tc: L3 DD 직전 / 직후 지점if (has_bpf_prog) BPF_PROG_RUN(); ->bpf_func(ctx, insni); 특정 커널 함수 안에
  • 83. Hook points: callback points KERNEL SPACE XDP: L2 device driver 지점 kprobe: 함수 Entry / Return . . . . . . BPF BPF BPF BPF prog injection !! tc: L3 DD 직전 / 직후 지점if (has_bpf_prog) BPF_PROG_RUN(); ->bpf_func(ctx, insni); BPF Interpreter 또는 JIT 된 머신코드 특정 커널 함수 안에
  • 84. Hook points: callback points KERNEL SPACE XDP: L2 device driver 지점 kprobe: 함수 Entry / Return . . . . . . BPF BPF BPF BPF prog injection !! HOW ? tc: L3 DD 직전 / 직후 지점
  • 85. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf
  • 86. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call
  • 87. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call Map 1 (Shared memory)
  • 88. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF BPF BPF prog injection !! BPF_PROG_LOAD Map 1 (Shared memory)
  • 89. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF BPF BPF prog injection !! BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . Map 1 (Shared memory)
  • 90. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF BPF BPF prog injection !! BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . HOW ? in bpf() Map 1 (Shared memory)
  • 91. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . BPF LOAD 과정: 1. BPF prog / map alloc 2. Verifier (loop, mem access 범위) Map 1 (Shared memory)
  • 92. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . BPF LOAD 과정: 1. BPF prog / map alloc 2. Verifier (loop, mem access 범위) 3. 2차 Relocation: 1) map fd → map ptr 2) helper ID → func addr Map 1 (Shared memory)
  • 93. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . BPF LOAD 과정: 1. BPF prog / map alloc 2. Verifier (loop, mem access 범위) 3. 2차 Relocation: 1) map fd → map ptr 2) helper ID → func addr 4. select runtime: 1) BPF interpreter func addr 2) JIT 후 BPF func addr return fd; Map 1 (Shared memory)
  • 94. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . BPF LOAD 과정: 1. BPF prog / map alloc 2. Verifier (loop, mem access 범위) 3. 2차 Relocation: 1) map fd → map ptr 2) helper ID → func addr 4. select runtime: 1) BPF interpreter func addr 2) JIT 후 BPF func addr if (has_bpf_prog) BPF_PROG_RUN(); ->bpf_func(ctx, insni); return fd;
  • 95. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL c소스 _kern.c clang / llc 컴파일 BPF 프로그램 Or BPF bytecode BPF elf 1. ELF parsing, 2. 1차 Relocation: 1) map fd 2) bpf to bpf call 3. Loading BPF BPF BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . Map 1 (Shared memory) Map 2 (Shared memory) . . . 다양한 BPF ATTACH 방식: - sock(), send() AF_NETLINK - bpf() syscall BPF_PROG_ATTACH BPF_RAW_TRACEPOINT_OPEN - kprobe event id, ioctl() PERF_EVENT_IOC_SET_BPF ...
  • 96. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL BPF BPF BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . Map 1 (Shared memory) Map 2 (Shared memory) . . . BPF Callback !! Callback !! BPF CALLBACK !!
  • 97. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL BPF BPF BPF Controller (User App) BPF library: libbpf prog/map load, attach, control . . . Map 1 (Shared memory) Map 2 (Shared memory) . . . BPF func(): Helper func() func() func() BPF Helper 함수를 통한 커널함수 호출 leveraging !!
  • 98. KERNEL SPACE tc ip BPF library in-iproute2 bpf() SYSCALL BPF BPF . . . Map 1 (Shared memory) Map 2 (Shared memory) . . . BPF func(): Helper func() func() func() BPF Controller 1 (User App) BPF library: libbpf prog/map load, attach, control BPF Controller 2 (User App) BPF map 을 통한 user to kernel memory shared
  • 99. KERNEL SPACE bpf() SYSCALL BPF Controller 1 (User App) ip tc Map 1 (Shared memory) Map 2 (Shared memory) . . BPF BPF BPF func(): Helper func() func() func() BPF library in-iproute2 BPF Controller 2 (User App) . . . . . . BPF Architecture: BPF library: libbpf prog/map load, attach, control
  • 104. XDP GENERIC PATH TX APP L7 RX BPF TC Ingress PREROUTING ROUTING TCP/UDP FORWARD ROUTING INPUT OUTPUT POSTROUTING TC egress L3 output L4 L3 L3 DD
  • 105. XDP GENERIC PATH TX APP L7 RX BPF TC Ingress PREROUTING ROUTING TCP/UDP FORWARD ROUTING INPUT OUTPUT POSTROUTING TC egress L3 output L4 L3 L3 DD
  • 108. Driver XDP vs Generic XDP REDIRECT TX RX PASS BPF REDIRECT TX RX L3 BPF PASS
  • 110.
  • 116. XDP_REDIRECTBPF APP eth0 eth1 eth2 eth3 XDP_TX REDIRECT MAP
  • 117. XDP_REDIRECTBPF APP eth0 eth1 eth2 eth3 XDP_TX REDIRECT MAP
  • 118. XDP_REDIRECTBPF APP eth0 eth1 eth2 eth3 XDP_TX REDIRECT MAP
  • 120.
  • 122.
  • 123.
  • 124. XDP_REDIRECTBPF APP eth0 eth1 eth2 eth3 XDP_TX REDIRECT MAP
  • 125. XDP_REDIRECTBPF APP eth0 eth1 eth2 eth3 XDP_TX REDIRECT MAP
  • 128. DEVMAP
  • 129. DEVMAP REDIRECT TX RX BPF xdp_frame DEVMAP redirect info bpf_redirect_map Key Value(Device) 0 X 1 X 2 X xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame
  • 130. CPUMAP
  • 131. CPUMAP REDIRECT ??? RX BPF xdp_frame CPUMAP redirect info bpf_redirect_map Key Value(CPU) 0 X 1 X 2 X xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame
  • 132. CPUMAP REDIRECT netif_receive_skb_core RX BPF xdp_frame CPUMAP redirect info bpf_redirect_map Key Value(CPU) 0 X 1 X 2 X xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame xdp_frame
  • 135. ● memory model switching ○ /net/core/xdp.c ● page pool ○ /net/core/page_pool ● offload ● AF_XDP && XSK(XDP SOCKET) ● helper functions ● Device Driver Additional Topics:
  • 136. ● Verifier ○ CFG, DAG, register, memory check... ● Other types ○ TC, SOCKET FILTER, CGROUP ● BTF ○ ELFutils, clang -g, llc -mattr=dwarfris ● Tail call ○ bpf_prog_array 연관 Additional Topics:
  • 137. ● FACEBOOK’s Katran ○ L4 Load-balancing ○ https://github.com/facebookincubator/katran ● Suricata ○ IPD/IDS engine ○ https://suricata-ids.org/ ● Cilium ○ https://cilium.io/ ● IOvisor bcc ○ https://www.iovisor.org/ ● IR Decoding ○ https://lwn.net/Articles/759188/ Additional Topics: