Traditional overlay networks using VXLAN are more complicated to setup and diagnose than is necessary for the majority of data centers. Calico offers an alternative Layer 3 solution - aside from simplicity, this also offers benefits in terms of improved scale and security.
These are the Calico slides from the SDN Switzerland meetup on 13/11/2015,
7. WHAT IF WE BUILT A DATA CENTER LIKE THE INTERNET?
IP
App
IP
App
IP
App
IP
App
IP
App
IP
App
IP
App
IP
App
Router
Router
Router
BGP BGP
8. WHAT IF WE BUILT A DATA CENTER LIKE THE INTERNET?
IP
App
IP
App
IP
App
IP
App
IP
App
IP
App
IP
App
IP
App
BGP BGP Compute NodeCompute Node
VMs / LXCs
Router
Router
Router
VMs / LXCs
… this is Project Calico!
9. An (Apache licensed) open source
project to enable networking of
workloads in a data center / cloud
environment
Objectives:
WHAT IS CALICO?
SimpleScale Open
Thousands of servers,
100k’s of workloads
Don’t demand users to
be networking experts
Open source and open
standards
10. TECHNICAL DETAILS
Architecture components
Orchestrator plug-in
etcd – distributed, highly
available datastore
Felix agent - forwarding table
update, security policy
BIRD – route distribution,
network integration
Linux kernel – layer 3
forwarding and ACL
enforcement
Build on and contribute to
many existing open source
projects
Any physical fabric (L2, L3, MPLS, …)
Cloud OS / Orchestration SystemCloud OS / Orchestration System
Compute NodeCompute NodeCompute Node
Linux kernel
Cloud OS / Orchestration System
BIRD Felix
Routes ACLs
Workload
VM / Container
Eth0 Eth1
Calico
Plugin
…
Introductions – myself, Metaswitch and Project Calico
Chosen docker focus because there are lot of exciting changes there at the moment – but much of this talk is relevant to other setups – OpenStack, Mesos, Kubernetes etc… where we also have Calico integrations.
Encapsulation adds overhead
Overlay networks are complicated to configure and diagnose
Scaling is a challenge
There are times when this is required (for specific L2 function that is needed) but in general it feels like something simpler is called for!
Mainline use case – unicast IP
What if we focused on this 80% use case?
What is the best example of a truly large scale model that we can think of? – The Internet!!
BGP as deployed today often complicated, but that’s a policy overlay – the protocol is simple and scalable and high performing
Uses standard linux routing, iptables etc.
i.e. features that are already in the linux kernel
Scale testing – latest results:
- Docker – 100k containers across 1k hosts, 50k containers in 120s (not timed for 100k)
- OpenStack – 500 hosts with 20VMs each (patch in testing that should remove a bottleneck and get us to 1k) , we have a customer deployed with 140 hosts and a churn of 2 VMs per second
Part 2 – Security!!
Web/app/data
Like a medieval fortress – outer walls/inner walls/castle on the hill where you keep your crown jewels
New architecture - application specific machines replaced with general purpose servers.
Commodity Hardware. Easily interchangeable. Centralized monitoring. Multiply redundant.
On top of this physical infrastructure I deploy virtualized application services.
I set up my virtual network on top of the physical network and… SURPRISE!, it’s the same architecture!
Imagine building an application for a big enterprise
pretty progressive, pretty “with-it” when it comes to technology. Fully virtualized compute and storage
Even so, ops manager hands you a form that more or less looks like this…
Fill in all our services.
Which ones need access from the Internet?, those go in the Web tier. Which ones access data?, those are in the Data tier. Anything else? App tier! Web can’t access Data directly.
Lets consider just services; not micro-services
fill-in all the ports and IPs that I needed to open on the firewalls.
I forgot one and the application couldn’t connect. Raise a ticket! 2 days later, firewalls updated and I’m back in business.
Then, a few weeks later, I wanted to add a new service: raise a ticket!
you want to stand up a second application stack in this environment? Maybe a third?
So this is where we are today before you even add micro-services
first part of the challenge with micro-services – fast rate of change
do put your foot down and “you have to go through security to modify the network, start early.”
If you do, congratulations, now you’re the bottleneck to innovation in your software company.
Or, do you open wide the gates?
The 2nd problem – resource fungibility
Each host is mutually interchangeable with any other.
any service instantiated in your should be able to be deployed on any host
and you should be able to scale any service across your whole data center.
e.g. Tectonic: CoreOS + Kubernetes, and Mesosphere - in common is the vision of a datacenter operating system
---an environment that takes care of the detail that your application is distributed across hundreds or thousands of servers.
and requires fungibility
---so that they can autoscale their applications and have service instances deployed by scheduler which pack things for maximum utilization of the expensive hardware it takes to run a datacenter.
[back one slide] This is not fungible. I’ve divided my datacenter into zones and I’m dependent on the firewalls to enforce security.
tear down the walls at the zoo and let the animals roam together?
But now what do we do about security?
So, that’s the problem:
- Things are moving fast and will move faster
- and people want the datacenter to present fungible resources,
But it’s not all bad news for micro-services from the perspective of network security.
Micro-service have a really useful property: they are compartmentalized.
Building applications this way naturally forces you to break down large, monolithic applications into constituent parts and isolate them. That means that if one gets compromised, your attacker gets only a small amount of information, and only a small amount of power to subvert the rest of your system.
They’re also, by definition, small! The goal is for each service to do one thing and do it well. This makes it much easier to analyze an application from a security perspective. Each service does something simple and probably only needs to talk to a few different things.
the dream:
break the application up into compartments that are easy to understand and easy to analyze, and then isolate them.
Containers give you isolation within the host OS, and what I’m talking to you about is network isolation.
Now, of course, we can’t have a functioning application if the containers are completely isolated: they need to communicate, but only over a few specific ports to specific destinations.
Firewall every instance of every micro-service, opening just the ports it needs to communicate over and to just the specific addresses it needs.
So now, attackers need to do more than just breach your castle walls---make them fight tooth and nail room by room, container by container. Every service instance exposes just the minimum surface.
The problem of resource fungibility, that containers need to run anywhere in the data center means that:
the firewall needs to be closely coupled to the service instance.
It needs to be where the container is and needs to live and die with the service.
To properly isolate each service, we need to instantiate a per-instance firewall right there.
In the container host Linux kernel, tied to the particular virtual interface for that instance.
The scale of containers and rate of change means that this needs to be automated.
You can’t provision these manually based on opening tickets.
How can Calico can do this?
In order to program the firewall, Felix needs to know the policy for each container
etcd cluster used as data store (security policy written to it by the orchestrator’s calico plugin)
Felix programs local routes and sets up iptables
Iptables uses ipchains to configure complex firewall rules in the kernel
Each workload has its own virtual interface for the firewall rules to be associated with
So the firewalls are in the host kernel and tied to individual workloads
Felix watches etcd for changes, as services are created, destroyed and rescaled.
So it will automatically create/update/remove the firewall rules as the workloads change