4. Containers vs. Pods
• Containers in the same pod communicate via loopback
• Multiple containers in a Pod; single network namespace.
• Containers are NOT bridged inside a Pod.
Pod
C1 C2 C3
eth0
5. Containers vs. Pods Analogy
-
= +
H O
H H
O
H2O
C12H22O11
https://speakerdeck.com/thockin/kubernetes-understanding-
pods-vs-containers
- Process ~ Particle
- Container ~ Atom
- Pod ~ Molecule
- Application ~ Combination of molecules
6. K8s Networking Manifesto
• all containers can communicate with all
other containers without NAT
• all nodes can communicate with all
containers (and vice-versa) without NAT
• the IP that a container sees itself as is the
same IP that others see it as
7. Minimal network setup!
• IP per pod
• K8s is ready to deploy Pods after install
• No L2 Network, Network Port, Subnet,
FloatingIP, Security Group, Router, Firewall,
DHCP…
• In general, user doesn’t have to draw network
diagrams
9. Think about Services, not Pods
• Pods are grouped by label
• Pods are automatically managed
• Sets of pods provide a service
• Service IP/port is load-balanced to pods
• Service names auto-registered in DNS
10. The service architecture
PodReplicaSet Pod Pod
Endpoints
Service
Deployment
DNS
IP1 IP2 IP3
managesmanages
LB
reads clusterIP,
externalIP
reads
Client
Auto-scaler
11. ReplicationController, ReplicaSet
apiVersion: v1
kind: ReplicationController
metadata:
name: nginx
spec:
replicas: 3
selector:
app: nginx
template:
metadata:
name: nginx
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
• Pets vs. Cattle
• Don’t deploy just one pod
• Deploy something to
manage pods for you
• Label your Pods
12. Fundamentals: Pods are ephemeral
• A Pod can be killed at any time
• A new Pod may be created at any time
• No Pod migration!
– Port/VIF/IP address doesn’t need to move
– Even pods in a StatefulSet change address
13. Deployment
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: my-nginx
spec:
replicas: 2
template:
metadata:
labels:
run: my-nginx
spec:
containers:
- name: my-nginx
image: nginx
ports:
- containerPort: 80
• ReplicationController came first
• ReplicaSet has a more
expressive selector
• Deployment enables
declarative updates for
ReplicaSet
14. Service – L4 load balancing
apiVersion: v1
kind: Service
metadata:
name: my-nginx
labels:
run: my-nginx
spec:
ports:
- port: 80
targetPort: 80
protocol: TCP
selector:
run: my-nginx
• Adds pods to an Endpoints object
– …if a selector is defined, otherwise you manage
Endpoints object some other way
• supports TCP and UDP and liveness probes
• East-West (pod-to-pod) using a “clusterIP”
• North-South using NodePort, ExternalIP, or
LoadBalancer (as specified in template)
• Type LoadBalancer behavior depends on
implementation (and varies by hosting cloud)
15. NodePort and ExternalIP use ClusterIP
10.0.0.2
Port 30001
clusterIP:port
clusterIP:port
A:p1, B:p2
externalIP:port
clusterIP:port
Service1 -> clusterIP:port,
NodePort 30001, Endpoints1
Endpoints1-> A:p1, B:p2
10.0.0.3
Port 30001
clusterIP:port
clusterIP:port
A:p1, B:p2
externalIP:port
clusterIP:port
10.0.0.4
Port 30001
clusterIP:port
clusterIP:port
A:p1, B:p2
externalIP:port
clusterIP:port
A B
17. North-South load-balancing with NodePort
10.0.0.2 10.0.0.3
Port 30001
clusterIP:port
clusterIP:port
A:p1, B:p2
10.0.0.4
A B
First, SNAT+DNAT - clientIP to nodeIP, NodePort to clusterIP:port
Then DNAT - clusterIP:port to service pod:targetPort
18. DNS records for Services
• With clusterIP:
– creates DNS A and SRV records of Service.Namespace
clusterIP/port
• Without clusterIP (Headless):
– With selectors:
• manages Endpoints and creates DNS A records for each IP
– Without selectors:
• With ExternalName creates DNS CNAME record
• Without ExternalName expects someone else to manage
Endpoints and creates DNS A records for each IP
20. Canonical workflow
1. Service records are registered in DNS
2. Client pod queries DNS for service
3. Client sends service request
4. KubeProxy (or other SDN) L4 load-
balances it to one of the Endpoints.
21. Overview Diagram
PodReplicaSet Pod Pod
Endpoints
Service
Deployment
DNS
IP1 IP2 IP3
managesmanages
LB
reads clusterIP,
externalIP
reads
Client
Auto-scaler
1 2
3
4
5
6
7
22. Ingress – L7 routing
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: test
annotations:
ingress.kubernetes.io/rewrite-target: /
spec:
rules:
- host: foo.bar.com
http:
paths:
- path: /foo
backend:
serviceName: s1
servicePort: 80
- path: /bar
backend:
serviceName: s2
servicePort: 80
• Route different URL paths to
different backend services
• Different Ingress controllers
implement different feature
subsets
• DNS behavior depends on the
controller
23. Namespaces
• A scope for names and labels
• Mechanism to attach authorization and
policy
– Namespaces can map to organizations or
projects
• Scope for quotas
– Total quotas as well as per-resource limits are
defined per namespace
24. NetworkPolicy – Security
apiVersion:
networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: test-network-policy
namespace: default
spec:
podSelector:
matchLabels:
role: db
ingress:
- from:
- namespaceSelector:
matchLabels:
project: myproject
- podSelector:
matchLabels:
role: frontend
ports:
- protocol: TCP
port: 6379
• Anyone can query DNS for services in any
namespace
• By default pods receive traffic from
anywhere
• Pods selected by a NetworkPolicy allow in
only what’s explicitly allowed
– E.g. Pods with label “role:db” should allow TCP
to port 6379 from pods with label “role:frontend”
in namespaces with label “project:myproject”
• Only ingress rules in v1.7
– Egress, QoS and other rules in progress
NetworkPolicy
25. Comparison to OpenStack
• Less networking
• No DHCP
• No Metadata Service or Proxy
• Service is the central concept
• Similar to Heat, but more opinionated
• No IPSec VPN, port mirroring, QoS,
service chaining…
26. Other network-related topics
• Number of interfaces (or IPs) per pod
• Federation/Ubernetes and Ubernetes Lite
• Service Meshes (Istio)
28. Kubernetes Admin Challenges
• What’s the right combination of controllers?
• How to keep users informed of the features
you support.
• Underlay design
• VMs vs. Bare Metal
• How many clusters/deployments?
• Connectivity across environments and clouds
30. Kuryr motivation
• It’s hard to connect VMs, bare-metal and
containers
• Overlay2 for containers in VMs
• Smooth transition to cloud-native and
micro-services
32. Neutron-port-per-Pod, nested case
VM
Pod Pod Pod
eth0
eth0.10
eth0.20
eth0.30
VM
Pod
Pod
Pod
network A
network B
network C
network D
trunk
port
child
ports
Each Pod gets a
separate port-level
firewall, like any
VM in Neutron
33. Kuryr-Kubernetes Macvlan
Pod PodPod PodPod PodVM VM
VM VM
Pods get a MAC and IP directly on the VM’s network. VM and
nested Pod MACs/Ips all on the VM’s single Neutron port.
Simple, but no NetworkPolicy support.
34. Kuryr today supports
• Kubernetes native networking
• Pod gets a Neutron port
– Or macvlan per Pod
• Single tenant
• Full connectivity (default)
• K8s ClusterIP Services (Neutron LBaaS)
• Bare metal and Pod-in-VM
40. Dragonflow’s pluggable DB and Pub-Sub
• DB
– Etcd
– Redis
– Zookeeper
– RAMcloud
– Cassandra
• Pub-Sub
– Redis
– ZeroMQ (Neutron)
– Etcd
41. Dragonflow Pipeline
Installed in every OVS
Service
Traffic
Classification
Ingress Processing
(NAT, BUM)
ARP DHCP
L2
Lookup
L3
Lookup
DVR
Egress
Dispatching outgoing
traffic to external
nodes or local ports
Ingress
Port
Security
(ARP spoofing , SG, …)
Egress
Port
Security
Egress
Processing
(NAT)
Fully Proactive
Has Reactive Flows to Controller
Security Groups
…
Outgoing from local
port Classification
and tagging
Dispatching
Incoming traffic from
external nodes to
local ports
42. Dragonflow recent features
• Pike
– BGP dynamic routing
– Service Function Chaining
– IPv6
– Trunk ports
– Distributed SNAT
K8s networking defined itself in contrast to Docker’s “host-private” networking that forced mapping node ports to container ports. K8s NodePort Service type inherits from Docker’s thinking.
K8s networking defined itself in contrast to Docker’s “host-private” networking. K8s NodePort Service type inherits from Docker’s thinking.
From the deployer’s perspective Services matter more than Pods.
Let there be pods!
Now we get to the networking.
NodePort is a remnant from Docker’s early host-private network model, which relied heavily on NAT and mapping ports between nodes and containers.
The SNAT forces the reply back through the node that received the request.
DNS is an add-on. You don’t have to enable it, but it’s strongly recommended.
Now we get to the networking.
This model works both for K8s on OpenStack and for K8s on Neutron (shared network with OpenStack).
Needs a special CNI driver. Leverages OpenStack Neutron TrunkPort
<10k Lines-of-code
Mirantis tests from Dec. 16… And people don’t run clusters beyond a few hundred nodes. DF addresses greater scale, but is also great for small/medium clusters (fewer components that can break).
<10k Lines-of-code
Mirantis tests from Dec. 16… And people don’t run clusters beyond a few hundred nodes. DF addresses greater scale, but is also great for small/medium clusters (fewer components that can break).
<10k Lines-of-code
Mirantis tests from Dec. 16… And people don’t run clusters beyond a few hundred nodes. DF addresses greater scale, but is also great for small/medium clusters (fewer components that can break).
Talk about how easy drivers are to add – 100-200 lines of code