VXLAN is a point to point, UDP-based "tunneling" protocol, that enables L2 encapsulation over an L3 "undernet", while also allowing up to 16 million Virtual Networks. One challenge with deploying VXLAN is that by default VXLAN requires multicast support for Broadcast, Unknown and Multi-cast packets. Often this is not possible in customer networks. An alternative approach is to use the Service Node concept where dedicated node(s)/process(es) are responsible for flooding Broadcast, Unknown, and Multicast packets throughout a network.
This removes the need for multi-cast, and greatly simplifies network configuration. However, it does require a scalable, and highly available implementation.
3. Number of end hosts
Number of networks
Bandwidth requirements
4. This is a problem for
traditional data center
networks
5. • L2 Access with L3 Aggregation
• Wasted capacity: STP blocks ports to prevent loops
• VLAN Exhaustion: only 4K with 802.1Q label
• ToR Scalability: hw tables need to scale with endpoints
Traditional Data Centers
9. • MAC over UDP/IP overlay
• Re-uses existing IP core (L3 ECMP, No STP)
• Reduces pressure on ToR L2 tables
• Supports over 16M+ VLANs
• Maintains L2 bridging semantics
VXLAN
13. • Broadcast, Unknown, and Multicast packets (e.g. ARP,
DHCP, multi-cast, etc.) are flooded to all VTEPs for the
given VNI
• Two mechanisms used:
• Multicast
• Multi-cast address and VNI configured for each VXLAN segment
• VTEP sends IGMP join/leave as VMs spin up/down
• Broadcast domain implemented using multicast
• Service Node:
• Use a “central” service node to maintain mapping of VNIs to VTEP IPs
Broadcast, Unknown and Multicast Packets
28. • Multi-threaded python program (multiprocessing module)
• Runs on every hypervisor
• Shares state using Distributed Cache
• FB Mcrouter – memcached protocol router (5B requests /second @ peak!)
• Listens for new VTEP registrations
• Forwards new mappings to Distributed Cache
• Listens for Broadcast, Unknown, Multicast packets
• Floods to all VTEPs in the Virtual Network
VXLAN Distributed Service Node
32. ip link add vxlan1 type vxlan id 1 remote 169.254.1.1 dev
eth0
ip addr add 172.16.1.1 dev vxlan1
ip link set dev vxlan1 mtu 1450
ip link set dev vxlan1 up
Creating VXLAN interfaces
33. root@mhv2:~# ip addr show vxlan1
4: vxlan1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc
noqueue state UNKNOWN group default
link/ether f2:af:3f:62:cf:65 brd ff:ff:ff:ff:ff:ff
inet 172.16.1.5/24 scope global vxlan1
valid_lft forever preferred_lft forever
inet6 fe80::f0af:3fff:fe62:cf65/64 scope link
valid_lft forever preferred_lft forever
Configured VXLAN Interface
46. • C. Burgess, N. Leake, L3 + VXLAN Made Practical,
OpenStack Summit Spring 2014.
• M. Mahalingam, et. Al, Virtual eXtensible Local Area
Network (VXLAN): A Framework for Overlaying
Virtualized Layer 2 Networks over Layer 3 Networks,
https://tools.ietf.org/html/rfc7348
References
47. • Sanjay K. Hooda, Shyam Kapadia, Padmanabhan
Krishnan, Using TRILL, FabricPath, and VXLAN:
Designing Massively Scalable Data Centers (MSDC) with
Overlays, Cisco Press, 2014.
• Introducing McRouter, http://bit.ly/introducing-mcrouter
References
48. • McRouter on github,
https://github.com/facebook/mcrouter
• Pyroute2, https://pypi.python.org/pypi/pyroute2
• Maintaining a set in Memcached, http://bit.ly/memcache-
sets
• Ansible, http://docs.ansible.com
References