Presented by: Antonin Bas & Jianjun Shen, VMware
Presented at All Things Open 2020
Abstract: For the non-initiated, Kubernetes (K8s) networking can be a bit like dark magic. Many clusters have requirements beyond what the default network plugin, kubenet, can provide and require the use of a third-party Container Network Interface (CNI) plugin. But what exactly is the role of these plugins, how do they differ from each other and how does the choice of one affect your cluster?
In this talk, Antonin and Jianjun will describe how a group of developers was able to build a CNI plugin - an open source project called Antrea - from scratch and bring it to production in a matter of months. This velocity was achieved by leveraging existing open-source technologies extensively: Open vSwitch, a well-established programmable virtual switch for the data plane, and the K8s libraries for the control plane. Antonin and Jianjun will explain the responsibilities of a CNI plugin in the context of K8s and will walk the audience through the steps required to create one. They will show how Antrea integrates with the rest of the cloud-native ecosystem (e.g. dashboards such as Octant and Prometheus) to provide insight into the network and ensure that K8s networking is not just dark magic anymore.
How to build a Kubernetes networking solution from scratch with Open vSwitch and Project Antrea
1. How to Build a
Kubernetes
Networking Solution
from Scratch
Antonin Bas, Jianjun Shen
Project Antrea maintainers @VMware
ATO, October 2020
2. Agenda
2
Container and K8s networking
Building a K8s network plugin with Open vSwitch
Introducing Project Antrea
More visibility into K8s networks with Project Antrea
Q&A
3. 3
Basics of Container Networking
Network Namespace
• Isolated network
environment provided by
Linux kernel
Interconnect
• A simple way:
veth devices & Linux bridge
Communication across
hosts
• Network address translation
and port mapping
Docker bridge network on Linux
docker0 (Linux bridge)
10.10.0.1/24
container1 – netns ns1
eth0 lo
container2 – netns ns2
eth0 lo
ens0
veth1 veth2
10.10.0.11/24 10.10.0.12/24
root netns
Docker host
SNAT
172.1.1.11/16
4. 4
Kubernetes is an open-source
platform for automating
deployment, scaling, and
operations of application
containers across clusters of hosts,
providing container-centric
infrastructure.
What is Kubernetes?
5. 5
Kubernetes Components
K8s Cluster consists of
Master(s) and Nodes
K8s Master Components
• API Server
• Scheduler
• Controller Manager
• etcd
K8s Node Components
• kubelet
• kube-proxy
• Container Runtime
K8s master
K8s master
K8s
Master
Controller
Manager
K8s API
Server
Key-Value
Store
dashboard
Scheduler
K8s node
K8s node
K8s node
K8s node
K8s Nodes
kubelet c runtime
kube-proxy
> _
Kubectl
CLI
K8s Master(s)
6. 6
Kubernetes Pod
"Pods are the smallest
deployable units of computing
that you can create and
manage in Kubernetes"
A Pod comprises a group of
one or more containers that
shares an IP address and a
network namespace.
Pod
pause container
(‘owns’ the IP stack)
10.24.0.0/16
10.24.0.2
nginx
tcp/80
mgmt
tcp/22
logging
udp/514
IPC
External IP Traffic
7. 7
Kubernetes Namespace
“Namespaces are a way to
divide cluster resources
between multiple users”
“Namespaces provide a
scope for names”
Namespace level access
control is supported.
Namespace: foo
Base URI: /api/v1/namespaces/foo
'redis-master' Pod:
/api/v1/namespaces/foo/pods/redis-master
'redis' Service:
/api/v1/namespaces/foo/services/redis
Namespace: bar
Base URI: /api/v1/namespaces/bar
'redis-master' Pod:
/api/v1/namespaces/bar/pods/redis-master
'redis' Service:
/api/v1/namespaces/bar/services/redis
8. 8
Kubernetes Service
"An abstract way to expose an
application running on a set of
Pods as a network service"
Serves multiple functions:
• Service Discovery / DNS
• East/West load balancing in the
Cluster (Type: ClusterIP)
• External load balancing for L4
TCP/UDP (Type: LoadBalancer)
• External access to the Service
through the Nodes IPs (Type:
NodePort)
Redis Pods
Redis Service
10.24.0.5
ClusterIP
172.30.0.24
Web Front-End
Pods
10.24.2.7
▶ kubectl describe svc redis
Name: redis
Namespace: default
Selector: app=redis
Type: LoadBalancer
IP: 172.30.0.24
LoadBalancer Ingress: 134.247.200.20
Port: <unnamed> 6379/TCP
Endpoints: 10.24.0.5:6379,
10.24.2.7:6379
DNS:
redis.<ns>.cluster.local è 172.30.0.24
ExternalIP
134.247.200.20
DNS:
redis.external.com è 134.247.200.20
9. 9
Kubernetes NetworkPolicy
“A specification of how
groups of Pods are allowed
to communicate with each
other and other network
endpoints“
Selects Pods to apply the
NetworkPolicy with matching
labels
Redis Pods
Redis Service
10.24.0.5
ClusterIP
172.30.0.24
Web Front-End
Pods
10.24.2.7
▶ kubectl describe netpol web-front-redis
Name: web-front-redis
Namespace: default
Spec:
PodSelector: app=redis
Allowing ingress traffic:
To Port: 6379/TCP
From:
PodSelector: app=web-front-end
Policy Types: Ingress
10. 10
Kubernetes Cluster Networking
Three communication patterns must be enabled
Pod
-to-
Pod
Pod
-to-
Service
External
-to-
Service
POD
POD
POD
P P P P P P
11. 12
What is a
Kubernetes
CNI Network
Plugin
responsible for?
Pod Network Connectivity
Plumbing eth0 (network interface) into Pod network
IP Address Management (IPAM)
E-W Service Load Balancing (optional)
Make traffic available to upstream kube-proxy, or
Implement native service load balancing – VIP DNAT
NetworkPolicy Enforcement (optional)
Enforcing Kubernetes Network Policy
Traffic Shaping Support
(experimental)
12. 13
kubenet
Relies on cloud network to
route traffic between Nodes
• Typically works with a Cloud
Provider implementation that
adds routes to the cloud router.
• Supported on AWS, Azure, GCP.
No NetworkPolicy support
Out-of-box Kubernetes network plugin
cbr0 (Linux bridge)
10.10.1.1/24
Pod1A
eth0
Pod1B
eth0
ens0
veth1 veth2
10.10.1.11/24 10.10.1.12/24
Node 1
cbr0 (Linux bridge)
10.10.2.1/24
Pod2A
eth0
Pod2B
eth0
ens0
veth1 veth2
10.10.2.11/24 10.10.2.12/24
Node 2
Cloud Network Fabric
172.1.1.11 172.1.2.22
Destination Target
10.10.1.0/24 172.1.1.11
10.10.2.0/24 172.1.2.22
13. 14
kube-proxy
Implements distributed load-
balancing for Services of
ClusterIP and NodePort
types
Supports: IPTables, IPVS,
and user space proxy modes
E-W Service Load-Balancing
Picture from: https://kubernetes.io/docs/concepts/services-networking/service
14. 15
Container Network Interface (CNI)
Where does the CNI fit in the Pod’s lifecycle?
K8s control plane
kubelet
Container Runtime
(e.g. containerd)
Network Plugin
Pod
K8s Node Pod
Network
1. User creates Pod spec
2. Pod is scheduled on Node
3.CRI call
5.CNI call
4. Run Pod
6. Add to Pod network
15. 18
And why use it for K8s networking?
What is Open vSwitch (OVS)?
A high-performance programmable virtual switch
• Connects to VMs (tap) and containers (veth)
Linux foundation project, very active
Portable: Works out of the box on all Linux distributions and supports Windows
Programmability: Supports many protocols, build your own forwarding pipeline
High-performance
• DPDK, AF_XDP
• Hardware offload available across multiple vendors
Rich feature set:
• Multi-layers – L2 to L4
• Advanced CLI tools
• Statistics, QoS
• Packet tracing
16. 19
Configuring Pod networking with OVS step-by-step
CNI_COMMAND=ADD
CNI_CONTAINERID=79ba130ac32e1c621e0e10ea10e3e8b7c0b101932f309ead54ee93fdf1795768
CNI_NETNS=/proc/1125/ns/net
CNI_IFNAME=eth0
CNI_ARGS="K8S_POD_NAMESPACE=default;K8S_POD_NAME=nginx-66b6c48dd5-
skx7z;K8S_POD_INFRA_CONTAINER_ID=79ba130ac32e1c621e0e10ea10e3e8b7c0b101932f309ead54ee93fdf1795768"
CNI_PATH=/opt/cni/path
# from stdin
{
"cniVersion": "0.3.0",
"name": "antrea",
"type": "antrea",
“dns":{},
"ipam":{
"type": "host-local",
"subnet": "10.10.1.0/24",
"gateway": "10.10.1.1”
}
}
From environment variables
From stdin
17. 20
Connecting the Pod to the OVS bridge
OVS bridge (br-int)
Container nginx
lo
ens0root netnsK8s Node
K8s Pod nginx-66b6c48dd5-skx7z
/proc/1125/ns/net netns
ovs-vsctl add-br br-int
18. 21
Connecting the Pod to the OVS bridge
OVS bridge (br-int)
Container nginx
lo
ens0
eth0
veth1
root netnsK8s Node
K8s Pod nginx-66b6c48dd5-skx7z
/proc/1125/ns/net netns
nsenter -t 1125 -n bash
Ø ip link add eth0 type veth peer name veth1
19. 22
Connecting the Pod to the OVS bridge
OVS bridge (br-int)
Container nginx
lo
ens0
eth0
veth1
root netnsK8s Node
K8s Pod nginx-66b6c48dd5-skx7z
/proc/1125/ns/net netns
nsenter -t 1125 -n bash
Ø ip link add eth0 type veth peer name veth1
Ø ip link set veth1 netns 1
20. 23
Connecting the Pod to the OVS bridge
OVS bridge (br-int)
Container nginx
lo
ens0
eth0
veth1
root netnsK8s Node
K8s Pod nginx-66b6c48dd5-skx7z
/proc/1125/ns/net netns
nsenter -t 1125 -n bash
Ø ip link add eth0 type veth peer name veth1
Ø ip link set veth1 netns 1
Ø ip link set eth0 mtu <MTU>
Ø ip addr add 10.10.1.2/24 dev eth0
Ø ip route add default via 10.10.1.1 dev eth0
Ø ip link set dev eth0 up
Ø exit
10.10.1.2/24
21. 24
Connecting the Pod to the OVS bridge
OVS bridge (br-int)
Container nginx
lo
ens0
eth0
veth1
root netnsK8s Node
K8s Pod nginx-66b6c48dd5-skx7z
/proc/1125/ns/net netns
nsenter -t 1125 -n bash
Ø ip link add eth0 type veth peer name veth1
Ø ip link set veth1 netns 1
Ø ip link set eth0 mtu <MTU>
Ø ip addr add 10.10.1.2/24 dev eth0
Ø ip route add default via 10.10.1.1 dev eth0
Ø ip link set dev eth0 up
Ø exit
ovs-vsctl add-port br-int veth1
ovs-vsctl show
Bridge br-int
…
Port veth1
Interface veth1
…
ovs_version: "2.14.0"
10.10.1.2/24
22. 25
Intra-Node Pod-to-Pod traffic
By default OVS behaves like
a regular L2 Linux bridge
A network plugin using OVS
can provide additional
security by preventing IP /
ARP spoofing
OVS bridge (br-int)
PodA
eth0
PodB
eth0
ens0
veth1 veth2
10.10.1.2/24 10.10.1.3/24
root netns
K8s Node
ovs-ofctl add-flow br-int
table=0,priority=200,arp,in_port=nginx,arp_spa=10.10.1.2,a
rp_sha=<MAC>,actions=goto_table=10
ovs-ofctl add-flow br-int
table=0,priority=200,ip,in_port=nginx,nw_src=10.10.1.2,dl_
src=<MAC>,actions=goto_table=10
ovs-ofctl add-flow br-int table=0,priority=0,actions=drop
ovs-ofctl add-flow br-int
table=10,priority=0,actions=NORMAL
23. 26
Inter-Node Pod-to-Pod traffic
The default gateway for
Pod1A is 10.10.1.1, which is
assigned to the OVS bridge
(internal port)
All traffic that’s not destined
to a local Pod will be
forwarded to gw0. Then
what?
è Build an overlay network
OVS bridge (br-int)
Pod1A
eth0
Pod1B
eth0
ens0
veth1 veth2
10.10.1.11/24 10.10.1.12/24
Node 1
OVS bridge (br-int)
Pod2A
eth0
Pod2B
eth0
ens0
veth1 veth2
10.10.2.11/24 10.10.2.12/24
Node 2
Cloud / Physical Network Fabric
172.1.1.11 172.1.2.22
?
gw0
10.10.1.1/24
gw0
10.10.2.1/24
Destination Target
10.10.1.0/24 -
* 10.10.1.1
25. 28
Inter-Node Pod-to-Pod traffic
Each Node has its own Pod
subnet
Broadcast domain is limited to a
single Node
New flows for inter-Node traffic
Each Node’s Pod subnet is read
from K8s API
Building an overlay network with OVS
OVS bridge (br-int)
Pod1A
eth0
Pod1B
eth0
ens0
veth1 veth2
10.10.1.11/24 10.10.1.12/24
Node 1
OVS bridge (br-int)
Pod2A
eth0
Pod2B
eth0
ens0
veth1 veth2
10.10.2.11/24 10.10.2.12/24
Node 2
Cloud / Physical Network Fabric
172.1.1.11 172.1.2.22
gw0
10.10.1.1/24
gw0
10.10.2.1/24
tun0 tun0
# on Node 1
ovs-ofctl add-flow br-int table=10,priority=200,ip,
nw_dst=10.10.2.0/24,actions=dec_ttl,load:172.1.1.11-
>NXM_NX_TUN_IPV4_DST[],output:tun0
ovs-ofctl add-flow br-int table=10,priority=200,ip,
in_port=tun0,nw_dst=10.10.1.11,actions=mod_dl_dst:<MAC_PO
D1A>,mod_dl_src:<MAC_GW0>,output:veth1
ovs-ofctl add-flow br-int table=10,priority=200,ip,
in_port=tun0,nw_dst=10.10.1.12,actions=mod_dl_dst:<MAC_PO
D1B>,mod_dl_src:<MAC_GW0>,output:veth1
26. 30
K8s Networking with Open vSwitch
L2 switching for local Pod-to-
Pod traffic
Overlay network for Inter-
Node traffic
SNAT for Pod-to-external
traffic
OVS programmability
supports implementing the
entire K8s network model
Recap
Node 1 (VM) Node 2 (VM)
Pod A Pod B
OvS
bridge
eth0 eth0
NIC
Cloud Network Fabric
vethA
gw0 tun0
vethB
Pod C Pod D
OvS
bridge
eth0 eth0
NIC
vethA
gw0 tun0
vethB
SNAT
pod-to-external pod-to-pod (inter-node) pod-to-pod (intra-node)
27. 31
Kubernetes CNI Plugins
Dataplane
technologies
Open vSwitch BIRD (BGP), IPTables,
eBPF (since v3.16.0)
eBPF Linux bridge
Network modes Overlay (Geneve, VXLAN,
GRE, STT)
or no-encapsulation
Overlay (IPIP, VXLAN)
or BGP routing
Overlay (Geneve, VXLAN)
or no-encapsulation
Overlay (VXLAN)
or no-encapsulation
NetworkPolicy Open vSwitch
Centralized policy
computation
IPTables or eBPF eBPF N/A
Windows Support Open vSwitch Windows BGP, Virtual
Filtering Platform
N/A win-bridge or win-overlay
26 “third party” plugins listed at: https://github.com/containernetworking/cni, besides the “core plugins” maintained
by the CNI project.
CNI plugins for specific cloud / IaaS platform:
28. 32
Project Antrea is an open source CNI
network plugin for Kubernetes based
on Open vSwitch, providing:
• Pod network connectivity
• NetworkPolicy enforcement
• Service load balancing
= ++
https://antrea.io
@ProjectAntrea
https://github.com/vmware-tanzu/antrea
Kubernetes Slack – #antrea
29. 33
Antrea is a community driven project
focusing on
• simplifying usability & diagnostics,
• adapting any cloud and network topology,
• providing comprehensive security policies, and
• improving scaling & performance
for container networking in Kubernetes.
https://antrea.io
@ProjectAntrea
https://github.com/vmware-tanzu/antrea
Kubernetes Slack – #antrea
782
GitHub Stars
136
GitHub Forks
42
ContributorsPrivate
Cloud
Public
Cloud
Edge Linux Windows
runs on
30. 34
Open vSwitch provides a flexible and performant data plane.
Project Antrea Architecture
Worker Node Worker Node
Master Node
kubelet
antrea
agent
kube-
proxy
kubectlpod A pod B
kube-
api
control-plane
data-plane
CRDsNetwork
Policy
Gateway Gateway
Tunnel
CNI CNI
antrea
agent
IPtables
kube-
proxy
IPtables
veth
pair
veth
pair
Antrea Agent
• Manages Pod network interfaces and OVS
bridge.
• Implements overlay network, NetworkPolicies,
and Service load balancing with OVS.
Antrea Controller
• Computes NetworkPolicies and publishes the
results to Antrea Agents.
• High performance channel to Agents based on
the K8s apiserver lib.
Built with K8s technologies
• Leverages K8s and K8s solutions for API, control
plane, deployment, UI and CLI.
• Antrea Controller and Agent are based on K8s
controller and apiserver libs.
kubectl apply -f
https://github.com/vmware-
tanzu/antrea/releases/download/v0.10.1/antrea.yml
antrea
controller
35. 39
Antrea in the cloud-native ecosystem
Providing visibility into the network
Prometheus
metrics exported
from Agents &
Controller
Octant plugin to
monitor
components and
trace packets
ELK stack to
visualize flow
maps for the
cluster network
36. 40
Demo Video 3
K8s Network Visibility with Antrea
https://youtu.be/qzTeUaePJRo
37. 43
Network Plugins implement the CNI and provide L2/L3 connectivity in K8s clusters
Open vSwitch can implement the full K8s network model with a unified data plane
Project Antrea: a production-grade Network Plugin built in < 1 year
OVS as the data plane
K8s libraries for a highly-scalable control plane
Integrations with cloud-native ecosystem tools to provide visibility into the network
Suggest new integrations to us on Github!
Conclusion
38. 44
Come help us continually improve
Kubernetes Networking!
Kubernetes Slack
#antrea
Community Meeting, Mondays @ 9PM PT
Zoom Link
https://github.com/vmware-tanzu/antrea
• Good first issues
• Help us improve our documentation
• Propose new features
• File Bugs
projectantrea-announce
projectantrea
projectantrea-dev
(Google Groups)
@ProjectAntrea
@
https://antrea.io
• Documentation
• Blogs