2. Needs for an Overlay Networks
Logical Network (aka “Overlay” Network)
§ NetworkVirtualization (SDN)
§ Abstracts the virtualized environment form
the physical topology
§ Constructs Layer 2 tunnels across the physical
infrastructure
§ Tunnels provide connectivity between physical
and virtual end-points
Physical Network (aka “Underlay” Network)
§ Transparent to the overlay technology
§ Allows the building of L3 infrastructure – No L2
§ Physical provide the bandwidth and scale for the
communication
§ Removes the scaling constraints of the physical
from the virtual
Physical Infrastructure
Overlay
Networks
3. Introducing VXLAN (RFC 7348)
Virtual eXtensible LAN (VXLAN RFC 7348)
§ IETF framework proposal, co-authored by Arista, Broadcom,
Cisco, Citrix Red Hat &VMware
Provides Layer 2 “Overlay Networks” on top of a Layer 3
network
§ “MAC in IP” Encapsulation
§ Layer 2 multi-point tunneling over IP UDP
Tunnel End-Points (VTEPs) perform encapsulation/decapsulation
§ In Software e.g. Hypervisor vSwitch
§ In Hardware e.g. Leaf Switches
Enables Layer 2 interconnection across Layer 3 boundaries
§ Transparent to the physical IP network
§ Provides Layer 2 scale across the Layer 3 IP fabric
§ Abstracts theVirtual connectivity from the physical IP
infrastructure
§ e.g. EnablesVMotion, L2 clusters etc. across standards based
IP fabrics
VM-1
10.10.10.1/24
VM-2
20.20.20.1/24
VM-3
10.10.10.2/24
VM-4
20.20.20.2/24
ESX host
ESX host
Subnet A
Layer 2 (e.g. forVM mobility,
storage access, clustering etc.)
Across Layer 3 subnets
NAS
20.20.20.324
Load Balancer
10.10.10.3/24
Subnet B
4. VXLAN Terminology
Virtual Tunnel End-point (VTEP)
§ Performs for VXLAN encapsulation & decapsulation of the
native frame
§ Adds the the appropriate VXLAN header.
§ Can be implemented on software virtual switch or a physical
switch.
Virtual Tunnel Identifier (VTI)
§ An IP interface used as the Source IP address for the
encapsulatedVXLAN traffic
§ The destination IP address forVXLAN encapsulated traffic
Virtual Network Identifier (VNI)
§ A 24-bit field added within theVXLAN header.
§ Identifies the Layer 2 segment of the encapsulated Ethernet
frame
VXLAN Header
§ The IP/UDP VXLAN header added by theVTEP
§ Uses a UDP source port based on a hash of the inner frame
to create entropy for ECMP
Software
VTEP
Hardware
VTEPs
VTEP
IP address:
x.x.x.x
VTI-A
VTI-B
VTI-C
VTEP
VTEP
IP address:
z.z.z.z
IP address:
y.y.y.y
VXLAN + IP/UDP header
SRC IPVTI-A; DST IPVTI-C
Logical Layer 2 Network
VNI n.n
5. VXLAN Encapsulated Frame Format
§ Ethernet header uses localVTEP MAC and default router MAC (14 bytes plus 4 optional 802.1Q
header)
§ TheVXLAN encapsulation source/destination IP addresses are those of local/remoteVTI (20 bytes)
§ UDP header, with SRC port hash of the inner Ethernets header, destination port IANA defined (8
bytes)
• Allows for ECMP load-balancing across the network core which isVXLAN unaware.
§ 24-bitVNI to scale up to 16 million for the Layer 2 domain or “Virtual Wires” (8 bytes)
Src.
MAC addr.
Dest.
MAC addr.
802.1Q.
Dest. IP
Src. IP
UDP
VNI
(24 bits)
Payload
FCS
Src.
MAC addr.
Dest.
MAC addr.
Optional
802.1Q.
Original Ethernet Payload
(including any IP headers etc.)
VXLAN (IP-MAC) Encapsulation
Ethernet Frame
6. VXLAN Overlay Networks
Fixed Configuration,
Active-Active Layer 3 design
for scale, using well known
management tools/protocols
Flexible VTEP Edge,
Mobile, agile, for flexible
provisioning via Cloud
Management Platforms (CMP)
VXLAN Overlay Architecture configuration/flexibility at the edge, and transparency and
fix configuration in the IP fabric
VXLANVNI 10
VTEP
VTEP
VXLANVNI 20
VTEP
7. VLXANVTEP within the Hypervisor vSwitch
§ VXLAN encapsulation de-capsulation performed by the vSwitch
• Encapsulation performed prior to packet hitting the “physical interface”
• Physical network is unaware of the encapsulated content
- Sees only IP headers
§ External routing via decapsulation
on the software switch
- Based onVNI toVLAN mapping
128.218.11.x
128.218.10.x
10.10.1.4
10.10.1.5
10.10.1.6
Locally Switched
Traffic is done without
encap/decap
vSwitch is responsible
for encapsulation
decapsulation ofVXLAN traffic
between hosts
Software Router
Responsible for
external routing
Physical
Infrastructure
Virtual Switch
(VTEP)
SWVTEP:
VNI toVLAN
translation
Virtual Switch
(VTEP)
8. Switch basedVXLAN Gateway Architecture
UDP 4729
VTI 1
10.10.1.1
VTEP
VNI 200
VNI 2000
VNI 20000
VLAN 100
VLAN 200
VLAN 300
VLAN 400
VLAN 500
Ethernet Ports
Port Channels
Ethernet Ports
Port Channels
Ethernet Ports
Port Channels
Ethernet Ports
Port Channels
Ethernet Ports
Port Channels
Local Devices
Local Devices
Local Devices
Local Devices
Local Devices
Ethernet Ports
Port Channels
Spine/Leaf Switch
11. VXLAN Control Plane Options
§ SDN Controller or Controller-less
§ TheVXLAN control plane is used for MAC learning and packet flooding
• Mechanism to discover hosts residing behind remoteVTEPs
• How to discoverVTEPs and theirVNI membership
• The mechanism used to forward Broadcast and multicast traffic within the Layer 2 segment (VNI)
IP Multicast Control
Plane
• VTEP join an associated IP
multicast group (s) for the
VNI(s)
• Unknown unicasts
forwarded to VTEPs in the
VNIs via IP multicast
• Support for Third-party
VTEP(s)
• Flood and learn and
requires IP multicast
support – limited
deployments
HeadEnd Replication
(HER)
• BUM traffic replicated to
each remote VTEPs in the
VNIs
• Replication carried out on
the ingress VTEP.
• Support for Third-party
VTEP(s)
• MAC learning still via flood
and learn but no
requirement for IP multicast
HER with Controller
• Local learnt MACs and VNI
binding published to
Controller
• Controller dynamically
distributes state to remote
VTEPs
• Support for Third-party
VTEP(s)
• Dynamic MAC distribution,
automated flood-list
provisioning
• HA Cluster support for
resiliency
eVPN Model
• BGP used to distribute local
MAC to IP bindings
between VTEPs
• Broadcast traffic handled
via IP multicast or HER
models
• Dynamic MAC distribution
and VNI learning,
configuration can be BGP
intensive
• Support for Third-party
VTEP(s)
12. VXLAN BUM Forwarding and Learning…
§ The RFC Model
• RemoteVM MAC ßàVTEP association learnt via IP multicast
• VTEP with a givenVNI joins associated (*,G) group
• Broadcast, Unknown Multicast traffic for aVNI sent to the IP multicast group
• LocalVTEP “learns” MAC to remoteVTEP IP bonding
• Once bonded traffic is unicast via standard Layer 3 protocol
VM4@VNI10@VTEP-B
VM5@VNI20@VTEP-B
VM6@VNI30@VTEP-B
VM7@VNI10@VTEP-C
VM8@VNI20@VTEP-C
VM9@VNI30@VTEP-C
Multicast (*,G) tree forVNI 10
Multicast (*,G) tree forVNI 20
Multicast (*,G) tree forVNI 20
VM1
VNI10
VM2
VNI20
VM3
VNI300
VM4
VNI10
VM5
VNI20
VM6
VNI30
VM7
VNI10
VM8
VNI20
VM9
VNI30
VTEP-
A
VTEP-
B
VTEP-
C
Requires an IP Multicast
Enabled Physical Network!
Note: Arista supports single (*,G) group + HER. All other platforms use HER
13. Unicast
toVTEP-4
VTEP
flood
list
on
VTEP-‐1
VNI
2000
à
VTEP-‐3
VNI
2000
à
VTEP-‐4
VTEP
flood
list
on
VTEP-‐3
VNI
2000
à
VTEP-‐1
VNI
2000
à
VTEP-‐4
VTEP
flood
list
on
VTEP-‐4
VNI
2000
à
VTEP-‐1
VNI
2000
à
VTEP-‐3
VTEP creates a unicast
frame for eachVTEP in
the flood-list
of the specificVNI
BUM traffic
VTEP flood list manually
configured on eachVTEP
for eachVNI
BUM traffic received
locally onVTEP
VTEP learns inner MAC and maps
to the outer SRC IP (remoteVTEP)
Separate unicast on the wire for eachVTEP in theVNI
1
2
3
4
VTEP
2
VTEP
3
VTEP
4
VNI 2000
VXLAN Head End Replication
VTEP
1
Unicast
toVTEP-3
Eliminates the need for an IP Multicast Enabled Physical
Network!
14. VLAN
200
Eth
2
-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐
VTEP
Config
Source-‐IP
1.1.1.2/32
VLAN
500
à
VNI
2000
Overlay Network
§ VLAN toVNI mapping of aVTI is only locally significant
• Local 802.1QVLAN Tag is stripped prior toVXLAN encapsulation
• Allows for a singleVLAN tag to be mapped to differentVNIs on different switches
• ProvidingVLAN translation across aVNI and scale beyond the traditional 4k+VNIs
VLAN
20
Eth
2,
Eth3
-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐
VTEP
config
VLAN
20
à
VNI
2000
VLAN 20
VLAN 500
VNI 2000
VLANs (1-3K)
POD significant)
VLANs (1-3K)
POD significant)
VLANs (1-3K)
POD significant)
VNIs Mapping
5k-8K
VNIs Mapping
9k-12K
VNIs Mapping
12k-15K
Scaling beyond 4KVLANs
Across POD DC wide VNIs VLAN 3k +
VLAN Translation betweenVTEPs
VTEP
VTEP
Eth 2
Eth 2
Eth 3
15. SDN Controllers forVXLAN
CVX + NSX
• Centralized database of
physical infrastructure
collected on CVX
• CVX state (MAC, VNIs, HW
VTEPs) shared with NSX
• Centralized provisioning and
controller via the NSX
controller
• Solution for scalable dynamic
DCs with HW to SW VTEP
automation
• Advantages within an ESXi
estate
CloudVision
eXchange
CVX + Nuage
• Centralized database of
physical infrastructure
collected on CVX
• CVX state (MAC, VNIs, HW
VTEPs) shared with the VSC
• Centralized provisioning and
controller via the VSC
controller
• Solution for scalable dynamic
DCs with HW to SW VTEP
automation
• Targeted for a Zen, KVM
estate
CVX + OpenStack
• Centralized database of
physical infrastructure
collected on CVX
• ML2 plugin for communication
between CVX and OpenStack
• Provisioning of the physical
infrastructure from OpenStack
• Solution for small to medium
DCs with VTEP automation
• Targeted for a Zen, KVM
estate
CloudVision
eXchange
CloudVision
eXchange
OVSDB
OVSDB
ML2 plugin
17. VXLAN Bridging
§ Provides layer 2 connectivity (P2P, P2M) over the layer 3 spine/leaf network
§ Allowing any-to-any Layer 2 connectivity between DCs, racks, servers, devices,VMs
§ Layer 2 connectivity provided byVXLAN encapsulation at the leaf nodes –VXLAN
VTEP(s)
Subnet/VLAN A
Subnet/VLAN B
Spine
Subnet/VLAN A
Subnet/VLAN B
VXLAN VNI – Layer 2
VXLAN VNI – Layer 2
Leaf
VTEP
VTEP
VTEP
VTEP
18. VXLAN Bridging Operation
§ Standard local switching via theVLAN configuration on theVTEP
§ Extend the Layer 2 domain by mapping theVLAN ID to theVXLAN VNI
§ VLAN toVNI mapping is only locally significant, VLAN tag is not carried in the
VXLAN frame
§ Host learnt on the remoteVTEP, VXLAN encapsulated by theVTEP and routed to
the remoteVTEP
VLAN
10
MAC-1
MAC 2
Leaf-1
Serv-1
MAC-1
VLAN 10 à VNI 1010
802.1Q
VLAN 10
L3 Backbone
VNI 1010
VNI 1010 - VLAN 20
Serv-2
MAC-2
Leaf-2
Inner Eth
Frame
VNI 1010
2.2.2.2
2.2.2.1
VLAN
20
MAC-1
MAC 2
802.1Q
VLAN 20
Layer 2 Domain (eg,
193.10.10.0/24)
VTEP
2.2.2.2
Eth-49
Eth-1
VTEP
2.2.2.1
Eth-1
Eth-49
19. Active/
Active
Dual-
homing
Rack-
1
VXLAN Bridging – Resiliency with dual-homing
§ For host resiliency single LogicalVTEP can be created across the Active/Active
Dual-homing domain
§ Providing active-activeVXLAN encap and decap across the two physical
switches
VTI
VTI
Eth-1
Eth-1
VTI
VTI
Eth-1
Eth-1
L3 Backbone
VNI 1010
Inner Eth
Frame
VNI 1010
2.2.2.2
2.2.2.1
Rack-
2
VLAN 10
MAC-1
MAC 2
VLAN 20
MAC-1
MAC 2
VLAN 10 à VNI 1010
VNI 1010 - VLAN 20
Serv-1
MAC-1
Serv-2
MAC-2
Layer 2 Domain
VTEP
2.2.2.2
VTEP
2.2.2.1
Eth-49
Eth-49
Eth-49
Eth-49
Leaf-11
Leaf-12
Leaf-21
Leaf-22
Active/
Active
Dual-
homing
20. S-VLAN to VNI mapping
• Mapping of outer S-Tag to single
VNI
• Inner C-Tag are transported within a
single VNI
• The inner VLAN ID are carried on
VXLAN encap frame
• Ability to transport all customer
VLANs across a single VXLAN point
to point link
Switchport mode dot1q-
tunnel
VXLAN Bridging -VLAN toVNI Service Mapping
VLAN toVNI mapping
• One to One mapping betweenVLAN ID
and theVNI
• Mapping is only locally significant,
• VLAN ID not carried onVXLAN encap
frame
• AllowsVLAN translation between
remoteVTEPs
Port + VLAN to VNI mapping
• Mapping traffic to a VNI based on a
combination of the ingress port and
it VLAN-ID
• The VLAN ID is not carried on
VXLAN encap frame
• Provides support for overlapping
VLANs within a single VTEP to be
mapped to different VNIs
Leaf-1
VNI 1020
VNI 1010
VLAN
10
VLAN
20
VLAN 10 - VNI
1010
VLAN 20 à VNI
1020
Leaf-1
VTEP
VNI 1030
C-tag 10,20
VLAN 10,20
S-VLAN 30 - VNI
1030
Leaf-1
VTEP
VLAN 10
Eth-1 VLAN 10 - VNI 1010
Eth-2 VLAN 10 à VNI 1020
Eth-1
VLAN 10
Eth-2
VNI 1020
VNI 1010
VTEP
Eth-1
VLAN 30
21. VXLAN Bridging – STP Behavior
§ STP BPDU’s are not transported across theVXLAN tunnel
§ Creating Separate STP domains within the local ports of eachVTEP
Leaf-1
Serv-1
802.1Q
VLAN 10
L3 Backbone
VNI 1010
Serv-2
Leaf-2
802.1Q
VLAN 10
Layer 2 Domain
Spanning Tree Domain 1
STP BPDU
Root Bridge
leaf 1
Cost 0
VLAN 10 à VNI
1010
VNI 1010 à VLAN
10
STP BPDU
Root Bridge
leaf 2
Cost 0
Spanning Tree Domain 2
VTEP
2.2.2.2
VTEP
2.2.2.1
Eth-1
Eth-49
Eth-49
Eth-1
22. VXLAN Bridging – Quality of Service
§ Standard ingress policy used to define DSCP of outer frame
§ Trusted or Untrusted configuration of ingress interface used to derive outer CoS/DSCP
value
§ Any re-write action applied to only the inner frame NOT the outer frame
§ Outer CoS value derived from the Traffic Class map
Leaf-1
Eth-1
Eth-49
DSCP Trusted Interface
CS1 (8)
DSCP to TC mapping : CS1 à TC 0
CS1 (8)
Outer
CS1 (8)
inner
Leaf-1
Eth-1
Eth-49
DSCP Untrusted Interface (with Re-write)
CS4 (32)
DSCP to TC mapping: CS3 à TC 3
TC to DSCP Rewrite : TC 3 à AF21 (18)
CS3 (24)
Outer
AF21 (18)
inner
Default interface CoS = CS3 (24)
VTEP
VTEP
23. VXLAN Bridging – Use Case 1
Interconnect Islands within the DC or across geographically disperse sites
• ProvidingVM workload mobility within DC and inter DCs
• Workload migration,VM bursting (eg hybrid cloud), business continuity across DCs
DCI to provide Layer 2 connectivity
between geographically disperse sites
Server migration POD interconnect for connectivity between DC’s PODs
Layer 2 Domain
Layer 2 Domain
VNI
VNI
802.1Q
VTEP
802.1Q
VTEP
24. VXLAN Bridging - Use case 2
VXLAN as a Layer 2 Service within a Leaf Spine
• Interconnect disperse subnets with Layer 3 to 7 services – NFV service chaining
• Providing a logical multi-tiered network regardless of physical location
Server Leaf
Server Leaf
Tenant L3 Node
NFV
Services Leaf
Firewall
Load-balancer
Firewall
VNI 1010
VNI 1020
VNI 1030
Tenantslogical
Connectivity
VNI
Layer 2
VNI
Layer 2
VNI
Spine
VTEP
VTEP
VTEP
VTEP
26. VXLAN Routing
VXLAN Bridging Model
§ Routing achieved via a centralized node
§ Requiring a dedicated routing node
within the leaf-spine fabric
§ Sub-optimal traffic forwarding to traffic
tromboning
VXLAN Routing model
§ Routing achieved at the leaf LayerVTEP nodes
§ No additional external routing nodes required
§ Optimized routing with the reducing of traffic
tromboning
§ Not supported by MPLS VLL/VPLS
Server Leaf
Dedicated L3 Node
VNI 1010
VNI 1030
Server Leaf
Server Leaf
Spine
Server Leaf
VNI 1010
VNI 1020
Route directly
at the leaf
Server Leaf
Server Leaf
Dedicated Router, sub-optimal forwarding
Routing at the leaf, providing optimal
forwarding
VTEP
VTEP
VTEP
VTEP
VTEP
Spine
VTEP
VTEP
VTEP
VTEP
27. What isVXLAN Routing?
§ SVI configured on theVLAN which isVXLAN enabled
§ SVI can be placed in a non-defaultVRF to support overlapping IPs and multi-
tenancy
§ NoteVXLAN routing support is required on the platform even when next-hop
host(s) are local
Serv-1
10.10.10.100
GW 10.10.10.1
SVI VLAN 10
10.10.10.1
802.1Q
VLAN 10
SVI VLAN 20
10.10.20.1
Serv-2
10.10.20.100
GW 10.10.20.1
VNI 1020
VXLAN
Bridging
Routing +
VXLAN Encap
802.1Q
VLAN 20
VTEP
2.2.2.2
VTEP
2.2.2.1
Leaf-1
Leaf-2
28. VXLAN Routing - Operation
10.10.10.100
10.10.20.100
VLAN 10
MAC-1
MAC -3
VNI 1020
10.10.10.100
10.10.20.100
VLAN 10
MAC-4
MAC-2
VNI 1020
2.2.2.2
2.2.2.1
10.10.10.100
10.10.20.100
VLAN 20
MAC-4
MAC -2
1. SVI 10 Gateway for Serv-1.
Routes packet into subnet
10.10.20.0, resulting in a Src MAC
of MAC-4 and Dest MAC of
MAC-2
10.10.10.100
10.10.20.100
VLAN 20
MAC-4
MAC-2
2. VTEP-1 learns Dest MAC
(MAC-2) via remote VTEP=2
(2.2.2.2). VXLAN encaps the
frame with a Dest-IP of 2.2.2.2
3. VTEP-2 maps VNI 1020 to
VLAN 20. MAC lookup of
MAC-2
points to Eth-6. VXLAN
header
removed and forwarded to
Serv-2
4. Packet forward to
Serv-2 tagged based on
the Local VLAN to VNI
mapping
Serv-1
10.10.10.100
GW 10.10.10.1
MAC-1
Serv-2
10.10.20.100
GW 10.10.20.1
MAC-2
802.1Q
VLAN 20
802.1Q
VLAN 10
SVI VLAN 10
10.10.10.1
MAC-3
SVI VLAN 20
10.10.20.1
MAC-4
VNI 1020
VNI 1020 à VLAN
20
VXLAN Bridging
VTEP-1
2.2.2.1
VTEP-2
2.2.2.2
29. VXLAN Routing - Forwarding models for Trident2 platform
§ Single re-circulation required.
• 1st pass of ASIC to route frame
• 2nd pass of ASIC forVXLAN encapsulation
VXLAN Routing – Route and VXLAN encapsulation
Local host to a remote host
VLAN 10 à VLAN 20 à VNI 1020
VLAN 10
VXLAN Routing – VXLAN de-encapsulate and route
Remote host routed to a local host and switch is the DFG for the remote host
VXLAN Routing – VXLAN de-encapsulate, route and VXLAN encapsulate
Switch is the DFG for two remote hosts on different subnets
§ Two re-circulations required.
• 1st pass of ASIC for VXLAN de-capsulation
• 2nd pass of ASIC to route of inner frame
• 3rd pass of ASIC for VXLAN encapsulation
§ Single re-circulation required.
• 1st pass of ASIC for VXLAN de-
capsulation
• 2nd pass of ASIC to route of inner frame
VLAN 10 ß VLAN 20ß VNI 1020
VLAN 10
VNI 1010 à VLAN 10 à VLAN 20 à VNI 1020
VNI 1010
VNI 1020
VNI 1020
VNI 1020
30. VXLAN Routing – Forwarding Models
Intel Fulcrum Alta platforms
• AllVXLAN routing functionality is achieved in a single pass
• No need for recirculation ports
Broadcom Trident2,Tomahawk
• AllVXLAN routing functionality is achieved in mixed single and double passes
• Need for recirculation ports
Broadcom Trident2+, ARAD, Jericho platforms
• AllVXLAN routing functionality is achieved in a single pass
• No need for recirculation ports
31. SummaryVXLAN
§ Open standard RFC 7348 – multivendor support on software or hardware
§ L2 extension over L3 network
• More reliable scalable than L2 only QinQ,TRILL and PBB
§ L2 over L3 services using switching TCO vs router MPLS TCO
§ VXLANVTEP at host,VM, spine/leaf switches, load balancer – flexibility for
users and service providers
§ Preference on hardware basedVXLAN - performance
§ Use cases
• L2 extension over L3 routing network. MPLS not needed.
• Data Center Interconnect (DCI) for active-active DC
• Multi tender services chaining in hosted DC