12. 12
root@loadbalancer-001:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet 37.72.160.20/32 scope global lo
On each loadbalancer:
- service ip
- health script
- BGP software
13. 13
Am I healthy? Tell the network I know
how to handle 37.72.160.20
Am I unhealthy? Tell the network I
withdraw my knowledge about
37.72.160.20
14. 14
Active nodes announce virtual IP +
priority to multiple BGP routers
The network knows which
loadbalancers are up and runninng
Hi, I am Frank. I am a co-founder and Operations-boss at Openminds.
Openminds is a managed hosting company based here in Gent, we sponsor DevOpsdays
We run a game, a “devops challenger” at our both. No marketing, no recruiting. Just a game we created for you to have fun, exercise your brain. Bragg about how many points you have on Twitter. And it’s BaaS compliant: you can win Beer as a Service.
So let’s talk redundancy and failover today.
We all fixed that problem. right? Just add more servers and load balance requests. Right?
Problem: how do we failover or scale the loadbalancers?
“Just add more”. Sure, but how to handle failover? How do we avoid the SPOF?
DNS round-robin? no. not really. This only works for “some” load spreading, not reliable enough for failover (Client dependent).
The classical answer is to use Keepalived, IPVS, Heartbeat or some other VRRP based system.
This works well, but has it’s limitations.
They all have the same issues:
only works in a layer 2 domain: so “close” networks, forget failovers to another datacenter
timing is essential. so high CPU loads leads to flaps
These solutions are usually based on Multicast and UDP, so no guarantee your “election” packets will actually arrive.
Very import problem is that the networking infrastructure is “blind” to the failover. Solutions relies on ARP flooding to tell the network the failover occurred.
I have a ton of great load-balancer failure stories. Come talk to me if you really want to hear them…
Now is there a better way?
Let’s talk about SDN. SDN is the “network-guys” part of DevOps (Dev NOps? DevOps-en?)
SDN in a greenfield implementation is great! Unicorns! Rainbows! World peace…
But most of us have to work in a “legacy” environment (even if legacy is just 6 months ago).
But we can still use SDN concepts on existing networks:
BGP is an excellent networking protocol to use in such solutions.
How does it work? Each web-node has a virtual service IP attached to it’s local loopback interface.
Each node does a health-check, and if all is fine: announce a route to it’s virtual ip to a neighbour network device (or better: to a few network devices).
How does it work? Each web-node has a virtual service IP attached to it’s local loopback interface.
Each node does a health-check, and if all is fine: announce a route to it’s virtual ip to a neighbour network device (or better: to a few network devices).
Active nodes not only announce the service IP or virtual IP they carry, but also a “priority”
Network devices sees all online ones, and chooses one (on priority etc).
If a webnodes goes offline, or declares itself unhealthy, the router already knows the “backup” one
There are a few advantages to this approach. The biggest one is that (from a Protocol standpoint), it’s much, much easier.
Failover is faster and (more important than speed) more reliable (as there are no arp floods needed)
BGP is proven technology
There is no multicast, no udp, no single-layer2 domain. No Dragons, no cry!
Multiple datacenters? No problem! (As long as you have one network in place)
Are there disadvantages? Sure there are.
BGP is something you haven’t then done. So you’ll need to learn.
Your equipment needs to be able to handle it (most 500$+ switches are, use Linux/bsd box)
Service or Virtual ips needs to come from separate subnet, only for virtual ips
Other cool stuff you can do with these ideas:
Build your own 8.8.8.8 style services (great for dns, time)
Announce “active” dns recursors in multiple places in your network, each with same vip
So clients will connect “closest” one, but failover to other one if goes down.