[Retired] routing on the host an introduction – cumulus networks® knowledge base
1. 12/06/2017 [RETIRED] Routing on the Host: An Introduction – Cumulus Networks® Knowledge Base
https://support.cumulusnetworks.com/hc/enus/articles/216805858RETIREDRoutingontheHostAnIntroduction 1/7
Documentation | Community | Downloads Search
Home My Requests Submit a Request
Cumulus Networks® Knowledge Base > Con蟘guration and Usage > Routing
[RETIRED] Routing on the Host: An Introduction
Important! Now that Routing on the Host has oTcially launched, you can 蟘nd updated content in the technical documentation.
In order to build more resilient data centers, many Cumulus Networks customers are leveraging the Linux ecosystem to run routing protocols directly
to their servers. This is often referred to as routing on the host. This means running layer 3 protocols like OSPF (Open Shortest Path First) or BGP
(Border Gateway Protocol) directly down to the host level, and is done in a variety of ways, by running Quagga:
Within Linux containers (such as Docker)
Within a VM as a virtual router on the hypervisor
Directly on the hypervisor
Directly on the host (such as an Ubuntu server)
Contents
Why Route on the Host?
Simplifying Troubleshooting
Three or More Top of Rack Switches
Clear Upgrade Strategy
Application Availability
Multi-vendor Support
Host, VM and Container Mobility
BGP Unnumbered Interfaces
Why Have Networks not Done this in the Past?
Lack of a Fully-featured Host Routing Application
Cost of Layer 3 Licensing
See Also
Why Route on the Host?
Why do customers do this? Why should you care?
Simplifying Troubleshooting
Troubleshooting layer 2 network problems in the data center has been a persistent challenge in modern networks, so expanding the layer 3 footprint
further into your data center by routing on the host alleviates many issues described below.
Consider a network where layer 2 MLAG is con蟘gured between all devices. Although this is a common data center design, and can be deployed on
Cumulus Linux, it suTers from a number of shortcomings.
Sign in
Sean Cavanaugh
July 08, 2016 02:00
Follow
May we use cookies to track your activities? We take your privacy very seriously. Please see our privacy policy for details and any questions.Yes No
2. 12/06/2017 [RETIRED] Routing on the Host: An Introduction – Cumulus Networks® Knowledge Base
https://support.cumulusnetworks.com/hc/enus/articles/216805858RETIREDRoutingontheHostAnIntroduction 2/7
Traceroute is not eTective, since it only shows layer 3 hops in the network; this design uses layer 2 devices only. All traceroute outputs,
regardless of the path taken, only show the layer 3 exit leafs. There is no way to determine which spine is forwarding traTc.
MAC address tables become the only way to trace down hosts. For the diagram above, to hunt down a particular host you would need to run
commands to show the MAC addresses on the exit leafs, the spine switches and the leaf switches. If a host or VM migrates while
troubleshooting, or a loop occurs from a miscon蟘guration, you may have to show the addresses multiple times.
Duplicate MAC addresses and MAC Taps become frustratingly hard to track down. Orphan ports and dealing with MLAG and non-MLAG pairs
increase network complexity. The fastest way to 蟘nd a speci蟘c MAC address is to check the MAC address table of every single network switch in
the data center.
Proving load balancing is working correctly can become cumbersome. With layer 2 solutions, LACP (Link Aggregation Control Protocol) is very
prevalent, so you need to have multiple bonds/Etherchannels between the switches. Performing a simple ping doesn't help because the hash
remains the same for layer 2 Etherchannels, which are most commonly hashed on SRC IP, DST IP, SRC port and DST port. In the end, you
need multiple streams that hash evenly across the LACP bond. This often means you must buy test tools from companies like Spirent and Ixia.
With a layer 3 design, you can run ip route show and see all of the equal cost routes. It's possible to use tools like mtr and scamper and see all
possible ECMP routes; that is, what switches are being load balanced.
Three or More Top of Rack Switches
With solutions like Cisco's vPC (virtual Port Channel), Juniper's MC-LAG (Multi-Chassis Link Aggregation) or Arista's MLAG (Multi-chassis Link
Aggregation), you gain high availability by having two active connections. Cumulus Networks has feature parity with these solutions with its
own MLAG implementation.
High availability means having two or more active connections. However, with high density servers, or hyper-converged infrastructure deployments, it
is common to see more than two NICs per host. By routing on the host, three or more ToR (top of rack) switches can be con蟘gured, giving much more
redundancy. If one ToR fails, you only lose 1/total ToR switches, whereas with a layer 2 MLAG solution, you lose 50% of your bandwidth.
Clear Upgrade Strategy
By routing on the host, you gain two huge bonuses:
Ability to gracefully remove a ToR switch from the fabric for maintenance
More redudnancy by having multiple ToRs (3+)
Let's expand on these two points. With layer 2 only (like MLAG), there is no way to inTuence routes without being disruptive (that is, some traTc loss
must occur). With OSPF and BGP, there are multiple load balanced routes via ECMP (Equal Cost Multipath) routing. Since there is routing, it is possible
to change these routes dynamically.
3. 12/06/2017 [RETIRED] Routing on the Host: An Introduction – Cumulus Networks® Knowledge Base
https://support.cumulusnetworks.com/hc/enus/articles/216805858RETIREDRoutingontheHostAnIntroduction 3/7
For OSPF, you can increase the cost of all the links making the network node less preferable.
With BGP, there are multiple ways to change the routes, but the most common is prepending your BGP AS to make the switch less preferable.
Both BGP and OSPF make the ToR switch less preferable, removing it as an ECMP choice for both protocols. However, the link doesn't get turned oT.
Unlike layer 2, where the link must be shut down and all traTc currently being transmitted is lost, a routing solution noti蟘es the rest of the network to
no longer send traTc to this switch. By watching interface counters you can determine when traTc is no longer being sent to the device under
maintenance, so you can safely remove it from the network with no impact on traTc.
Because routing on the host uses three or more ToRs, this reduces the impact of a ToR being removed from service, either due to expected
maintenance or unexpected network failure. So, instead of losing 50% of bandwidth in a two ToR MLAG deployment, the bandwidth loss can be
reduced to 33% with three ToRs or 25% with four.
The redundancy with layer 3 networks is tremendous. In the image above, the network on the left can still operate even if 3 out of 4 ToR switches are
down. That is 4N redundancy. The best case for the network on the right is 2N redundancy, no matter what vendor you choose. Layer 3 allows
applications to have much more uptime with no risk for outages.
Application Availability
Often when deploying a new application, server or service, there can be a delay between when the new device or service is available and when it is
integrated with the network. This is typically a result of the additional con蟘guration required to set up layer 2 high availability (HA) technologies on the
upstream switches, which is often a manual process.
Using layer 3 and routing on the host eliminates this delay entirely. Tight pre蟘x list control coupled with authentication can be leveraged on leaf and
spine switches to protect the rest of the network from the downstream servers and what they are allowed to advertise into the network. Server
admins can be in control of getting their service on the network within the bounds of a safe framework setup by the network team. This is similar to
how service providers treat their customers today.
Similarly, when an application or service moves from one part of the network to another, the application team has the ability to advertise the newly
moved application quickly to the rest of the network allowing for more agility in service location.
A service or application can be represented by a /32 IPv4 or /128 IPv6 host route. Since that application depends on that /32 or /128 being reachable,
the application is dependent on the network. Usually this means the ToR or spine is advertising reachability. If the application is migrated or moved
(for example, by VMware vMotion or KVM Migration), the network may need substantial recon蟘guration to advertise it correctly. Usually this requires
multiple steps:
4. 12/06/2017 [RETIRED] Routing on the Host: An Introduction – Cumulus Networks® Knowledge Base
https://support.cumulusnetworks.com/hc/enus/articles/216805858RETIREDRoutingontheHostAnIntroduction 4/7
1. Removing the host route from the previous ToR, spine or pair of ToRs or spines so it is no longer advertised to the wrong location.
2. Adding the host route to the new ToR, spine or pair of ToRs or spines so it is advertised into the routed fabric.
3. Checking connectivity from the host to make sure it has reachability.
These steps are often done by diTerent teams, which can also cause problems. When routing on the host this is done automatically by Quagga
advertising, the host routes no matter where the host is plugged in.
Multi-vendor Support
One problem with layer 2, especially around MLAG environments, is interoperability. This means if you have 1 Cisco device and 1 Juniper device, they
can't act as an MLAG pair. This causes a problem known as vendor lock-in where the customer is locked into a vendor because of propritary
requirements. One huge bene蟘t of doing layer 3 is that by using OSPF or BGP, the network is adhering to open standards that have been around a
long time. OSPF and BGP interoperability is highly tested, very scalable and has a track record of success. Most networks are multi-vendor networks
where they peer at layer 3. By designing the network down to the host level with layer 3, it is now possible to have multiple vendors everywhere in
your network. The following diagram is perfectly acceptable in a layer 3 environment:
Host, VM and Container Mobility
When routing on the host, all VMs, containers, subnets and so forth are advertised into the fabric automatically. This means the only the subnet on the
connection between the ToR and the router on the host needs to be con蟘gured on the ToR. This greatly increases host mobility by allowing minimal
con蟘guration on the ToR switch. All the ToR switch has to do is peer with the server.
If security is a concern, the host can be forced authenticate to allow BGP or OSPF adjacencies to occur. Consider the following diagram:
In the above diagram the Quagga con蟘guration does not need to change, no matter what ToR you plug it into. The only con蟘guration that needs to
change is the subnet on swp1 and eth0 (con蟘gured under /etc/network/interfaces, which is not shown here). This greatly reduces con蟘guration
complexity and allows for easy host mobility.
5. 12/06/2017 [RETIRED] Routing on the Host: An Introduction – Cumulus Networks® Knowledge Base
https://support.cumulusnetworks.com/hc/enus/articles/216805858RETIREDRoutingontheHostAnIntroduction 5/7
BGP Unnumbered Interfaces
Cumulus Networks enhanced Quagga with the ability to implement RFC 5549. This means that you can con蟘gure BGP unnumbered interfaces on the
host. In addition to the bene蟘ts of not having to con蟘gure every subnet described above, you do not have to con蟘gure anything speci蟘c on the ToR
switch at all, so you don't have to con蟘gure an IPv4 address in /etc/network/interfaces for peering.
BGP unnumbered interfaces enables IPv6 link-local addresses to be utilized for IPv4 BGP adjacencies. Link-local addresses are automatically
con蟘gured with SLAAC (StateLess Address AutoCon蟘guration). This address is derived from an interface's MAC address and is unique to each layer 3
adjaency. DAD (Duplicate Address Detection) keeps duplicate addresses from being con蟘gured. This means the con蟘guration remains the same no
matter where the host resides. There is no speci蟘c subnet used on the Ethernet connection between the host and the switch.
Along with implementation of RFC 5549, Quagga has a simpler con蟘guration, allowing novice users the ability to quickly con蟘gure, understand and
troubleshoot BGP con蟘gurations within the data center. The following illustration shows a single attached host using BGP unnumbered interfaces:
Why Have Networks not Done this in the Past?
If routing on the host has a lot of bene蟘ts, why has this not happened in the past?
Lack of a Fully-featured Host Routing Application
In the past, there were no enterprise grade open routing applications that could be installed easily on hosts. Cumulus Networks and many other
organizations have made these open source projects robust enough to run in production for hundreds of customers. Now that applications like
Quagga have reached a high level of maturity, it is only natural for them to run directly on the host as well.
Cost of Layer 3 Licensing
Many vendors have many license costs based on features. Unfortunately, vendors like Cisco, Arista and Juniper often want to charge more money for
layer 3 features. This means that designing a layer 3-capable network is not as simple as just turning it on; the customer is forced to pay additional
licenses to enable these features.
The licensing is often confusing (for example, "What is the upgrade path?" "Do I need additional licenses for BGP vs OSPF?" "Does scale aTect my
price?"), even when the cost is budgeted for. Routing is not something that should cost additional money for customers when buying a layer 3-capable
switch. At Cumulus Networks our licensing model is simple, concise and publicly available.