Monitoring Docker containers - Docker NYC Feb 2015

•Télécharger en tant que PPTX, PDF•

14 j'aime•3,187 vues

Alexis goals this presentation are three-fold: 1) Dive into key Docker metrics 2) Explain operational complexity. In other words I want to take what we have seen on the field and show you where the pain points will be. 3) Rethink monitoring of Docker containers. The old tricks won’t work.

Technologie

Monitoring and Running
Docker Containers at Scale
Docker NYC Meetup
February 25th, 2015

Datadog
• Monitoring service
• Made for the cloud
• Aggregates everything
• Support for Docker (since 1.0)

Goal of this talk
Rethink the monitoring of Docker containers

Agenda
1.A (very) brief history of containers
2.Operational complexity
3.Monitoring Docker effectively
4.Demo

Containers in a nutshell
• Been around for a long time
– jails, zones, cgroups
• No full-virtualization overhead
• Used for runtime isolation (e.g. jails)
• Docker is an Escape from Dependency Hell

Escape from dependency hell
a.out
shared libs
packages
omnibus
Docker ==
?

Mini-host or über-process?
Process Container Host
Spec Source Dockerfile Kickstart
On disk .TEXT /var/lib/docker /
In memory PID Container ID Hostname
In the network Socket veth* eth*
Runtime
context
server core host data center

Combinatorial multiplication
Hardware
OS
Off-the-shelf
Your Application
Hardware
Hypervisor
Off-the-
shelf
App
OS OS
Off-the-
shelf
App
Hardware
Hypervisor
OS OS
A A A A
Containers
O O O O

Operational complexity
• Average containers per host: N (N=5, 10/2014)
• N-times as many “hosts” to manage
• Affects
– provisioning: prep’ing & building containers
– configuration: passing config to containers
– orchestration: deciding where/when containers
run
– monitoring: making sure containers run
properly

Complexity increases with...
1. Number of things to measure
2. Velocity of change

Number of things to measure
• 1 Amazon EC2 instance
– 10 CloudWatch metrics
• 1 operating system (e.g. linux)
– 100 metrics
•N containers
– 100*N metrics
•110 + 100*N metrics per instance

Combinatorial multiplication
100 500instances containers
Assuming only 5 containers per instance

Combinatorial multiplication
160 610metrics
per host
metrics
per host
Assuming only 5 containers per
instance

Combinatorial multiplication
100 61,000instances metrics
Assuming only 5 containers per instance

Velocity
hours,
days,
months
minutes,
hours,
days
Host half-life Container half-life

Aggravating factors
• Registry-based provisioning
– new images as fast as you can git commit
• Autonomic orchestration
– from imperative to declarative
– automated
– individual containers don’t matter
– e.g. kubernetes, mesos

If your monitoring is still centered on individual hosts or
instances…

Host-centric monitoring
Monitor
Monitor
GA
P
Hypervisor
OS OS
A A A A
Containers
O O O O

Layers of monitoring
Monitor
Hypervisor
OS OS
A A A A
Containers
O O O O

Layers of monitoring
CloudWatch
Infrastructure
Monitoring
APM
Hypervisor
OS OS
A A A A
Containers
O O O O

Layers of monitoring
cpu/net/io
filesystem
docker mem
docker cpu
db queries
web requests
app throughput
CloudWatch
Infrastructure
Monitoring
APM
e.g
.
Hypervisor
OS OS
A A A A
Containers
O O O O

Layers of monitoring
• Access to metrics from all the layers
• Amazon CloudWatch, OS metrics, Docker metrics,
app metrics in 1 place
• Shared timeline

If monitoring
does not cover all
layers,
pain.

Tags (a.k.a. labels)
You (probably) already use them

Tags
• Monitoring is like Auto-Scaling Groups
• Monitoring is like Docker orchestration
• From imperative to declarative
• Query-based
• Queries operate on tags

Monitoring with tags and queries
“Monitor all Docker containers running image web”
“… in region us-west-2 across all availability zones”
“… and make sure resident set size < 1GB on c3.xl”

Monitoring with tags and queries
“Monitor all Docker containers running image web”
“… in region us-west-2 across all availability zones”
“… that use more than 1.5x the average on c3.xl”

Take-aways
1. Docker increases operational complexity by an
order of magnitude unless…
2. You have layered monitoring, from the instance to
the container and to the application, and…
3. You monitor using tags and queries

Recommandé

Running & Monitoring Docker at ScaleDatadog

Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015Datadog

The Data Mullet: From all SQL to No SQL back to Some SQLDatadog

Fact-Based Monitoring - PuppetConf 2014Puppet

Lifting the Blinds: Monitoring Windows Server 2012Datadog

Docker Usage Patterns - Meetup Docker Paris - November, 10th 2015Datadog

(APP309) Running and Monitoring Docker Containers at Scale | AWS re:Invent 2014Amazon Web Services

Moving Legacy Applications to Docker by Josh Ellithorpe, Apcera Docker, Inc.

Recommandé

Running & Monitoring Docker at ScaleDatadog

Monitoring Docker at Scale - Docker San Francisco Meetup - August 11, 2015Datadog

The Data Mullet: From all SQL to No SQL back to Some SQLDatadog

Fact-Based Monitoring - PuppetConf 2014Puppet

Lifting the Blinds: Monitoring Windows Server 2012Datadog

Docker Usage Patterns - Meetup Docker Paris - November, 10th 2015Datadog

(APP309) Running and Monitoring Docker Containers at Scale | AWS re:Invent 2014Amazon Web Services

Moving Legacy Applications to Docker by Josh Ellithorpe, Apcera Docker, Inc.

CoreOS: The Inside and Outside of Linux ContainersRamit Surana

Tupperware: Containerized Deployment at FBDocker, Inc.

Take an Analytics-driven Approach to Container Performance with Splunk for Co...Docker, Inc.

Fully automated kubernetes deployment and managementLinuxCon ContainerCon CloudOpen China

Docker for Ops: Operationalize your Docker Built Apps in Production by Evan H...Docker, Inc.

Stateful set in kubernetes implementation & usecases Krishna-Kumar

Docker for Ops: Docker Networking Deep Dive, Considerations and Troubleshooti...Docker, Inc.

Structured Container Delivery by Oscar Renalias, AccentureDocker, Inc.

Velocity NYC 2016 - Containers @ Netflixaspyker

How to Build Your First Web App in GoAll Things Open

Application Deployment and Management at Scale with 1&1 by Matt BaldwinDocker, Inc.

Fluentd and docker monitoringVinay Krishna

Re:invent 2016 Container Scheduling, Execution and AWS Integrationaspyker

K8S in prodMageshwaran Rajendran

Container Orchestration with Docker Swarm and KubernetesWill Hall

Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...Roberto Hashioka

Netflix Container Runtime - Titus - for Container Camp 2016aspyker

Container orchestration overviewWyn B. Van Devanter

Introducing Chef | An IT automation for speed and awesomenessRamit Surana

Kubernetes 101 for DevelopersRoss Kukulinski

Measuring Micro-services. Richard RodgerFuture Insights

Performance monitoring for Docker - Lucerne meetupStijn Polfliet

Contenu connexe

Tendances

CoreOS: The Inside and Outside of Linux ContainersRamit Surana

Tupperware: Containerized Deployment at FBDocker, Inc.

Take an Analytics-driven Approach to Container Performance with Splunk for Co...Docker, Inc.

Fully automated kubernetes deployment and managementLinuxCon ContainerCon CloudOpen China

Docker for Ops: Operationalize your Docker Built Apps in Production by Evan H...Docker, Inc.

Stateful set in kubernetes implementation & usecases Krishna-Kumar

Docker for Ops: Docker Networking Deep Dive, Considerations and Troubleshooti...Docker, Inc.

Structured Container Delivery by Oscar Renalias, AccentureDocker, Inc.

Velocity NYC 2016 - Containers @ Netflixaspyker

How to Build Your First Web App in GoAll Things Open

Application Deployment and Management at Scale with 1&1 by Matt BaldwinDocker, Inc.

Fluentd and docker monitoringVinay Krishna

Re:invent 2016 Container Scheduling, Execution and AWS Integrationaspyker

K8S in prodMageshwaran Rajendran

Container Orchestration with Docker Swarm and KubernetesWill Hall

Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...Roberto Hashioka

Netflix Container Runtime - Titus - for Container Camp 2016aspyker

Container orchestration overviewWyn B. Van Devanter

Introducing Chef | An IT automation for speed and awesomenessRamit Surana

Kubernetes 101 for DevelopersRoss Kukulinski

Tendances (20)

CoreOS: The Inside and Outside of Linux Containers

Tupperware: Containerized Deployment at FB

Take an Analytics-driven Approach to Container Performance with Splunk for Co...

Fully automated kubernetes deployment and management

Docker for Ops: Operationalize your Docker Built Apps in Production by Evan H...

Stateful set in kubernetes implementation & usecases

Docker for Ops: Docker Networking Deep Dive, Considerations and Troubleshooti...

Structured Container Delivery by Oscar Renalias, Accenture

Velocity NYC 2016 - Containers @ Netflix

How to Build Your First Web App in Go

Application Deployment and Management at Scale with 1&1 by Matt Baldwin

Fluentd and docker monitoring

Re:invent 2016 Container Scheduling, Execution and AWS Integration

K8S in prod

Container Orchestration with Docker Swarm and Kubernetes

Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...

Netflix Container Runtime - Titus - for Container Camp 2016

Container orchestration overview

Introducing Chef | An IT automation for speed and awesomeness

Kubernetes 101 for Developers

En vedette

Measuring Micro-services. Richard RodgerFuture Insights

Performance monitoring for Docker - Lucerne meetupStijn Polfliet

Nagios Conference 2014 - Spenser Reinhardt - Detecting Security Breaches With...Nagios

Monitoring docker container and dockerized applicationsAnanth Padmanabhan

Docker Indy Meetup Monitoring 30-Aug-2016Matt Bentley

Monitoring docker containers and dockerized applicationsSatya Sanjibani Routray

ContainerDays NYC 2016: "Observability and Manageability in a Container Envir...DynamicInfraDays

Voxxed Days Thessaloniki 2016 - Microservices in productionVoxxed Days Thessaloniki

2008 "An overview of Methods for analysis of Identifiability and Observabilit...Steinar Elgsæter

BFF Pattern in Action: SoundCloud’s MicroservicesBora Tunca

Microservice ArchitectureEngin Yoeyen

Tracing 2000+ polyglot microservices at Uber with Jaeger and OpenTracingYuri Shkuro

Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...Brian Brazil

Monitoring and observabilityTheo Schlossnagle

Monitoring Microservices at Scale on OpenShift (OpenShift Commons Briefing #52)Martin Etmajer

Performance Analysis: The USE MethodBrendan Gregg

SREcon 2016 Performance Checklists for SREsBrendan Gregg

AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...Amazon Web Services

AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, Featuring Performance ...Amazon Web Services

AWS re:Invent 2016: Monitoring, Hold the Infrastructure: Getting the Most fro...Amazon Web Services

En vedette (20)

Measuring Micro-services. Richard Rodger

Performance monitoring for Docker - Lucerne meetup

Nagios Conference 2014 - Spenser Reinhardt - Detecting Security Breaches With...

Monitoring docker container and dockerized applications

Docker Indy Meetup Monitoring 30-Aug-2016

Monitoring docker containers and dockerized applications

ContainerDays NYC 2016: "Observability and Manageability in a Container Envir...

Voxxed Days Thessaloniki 2016 - Microservices in production

2008 "An overview of Methods for analysis of Identifiability and Observabilit...

BFF Pattern in Action: SoundCloud’s Microservices

Microservice Architecture

Tracing 2000+ polyglot microservices at Uber with Jaeger and OpenTracing

Monitoring What Matters: The Prometheus Approach to Whitebox Monitoring (Berl...

Monitoring and observability

Monitoring Microservices at Scale on OpenShift (OpenShift Commons Briefing #52)

Performance Analysis: The USE Method

SREcon 2016 Performance Checklists for SREs

AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...

AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, Featuring Performance ...

AWS re:Invent 2016: Monitoring, Hold the Infrastructure: Getting the Most fro...

Similaire à Monitoring Docker containers - Docker NYC Feb 2015

Devoxx 2016 - Docker Nuts and BoltsPatrick Chanezon

Intro Docker october 2013dotCloud

Docker introductiondotCloud

Dock ir incident response in a containerized, immutable, continually deploy...Shakacon

Write Once and REALLY Run Anywhere | OpenStack Summit HK 2013dotCloud

OpenStack SummitDocker, Inc.

Docker Presentation at the OpenStack Austin Meetup | 2013-09-12dotCloud

Application Deployment on OpenstackDocker, Inc.

What's New in Docker - February 2017Patrick Chanezon

The challenge of application distribution - Introduction to Docker (2014 dec ...Sébastien Portebois

State of the Container EcosystemVinay Rao

Webinar Docker Tri SeriesNewt Global Consulting LLC

Detailed Introduction To Dockernklmish

Introduction to DockerAditya Konarde

Docker-IntroSujai Sivasamy

Using Docker in production: Get started today!Clarence Bakirtzidis

Containing the world with DockerGiuseppe Piccolo

Docker & Daily DevOpsSatria Ady Pradana

Docker and-daily-devopsSatria Ady Pradana

Docker-Hanoi @DKT , Presentation about Docker EcosystemVan Phuc

Similaire à Monitoring Docker containers - Docker NYC Feb 2015 (20)

Devoxx 2016 - Docker Nuts and Bolts

Intro Docker october 2013

Docker introduction

Dock ir incident response in a containerized, immutable, continually deploy...

Write Once and REALLY Run Anywhere | OpenStack Summit HK 2013

OpenStack Summit

Docker Presentation at the OpenStack Austin Meetup | 2013-09-12

Application Deployment on Openstack

What's New in Docker - February 2017

The challenge of application distribution - Introduction to Docker (2014 dec ...

State of the Container Ecosystem

Webinar Docker Tri Series

Detailed Introduction To Docker

Introduction to Docker

Docker-Intro

Using Docker in production: Get started today!

Containing the world with Docker

Docker & Daily DevOps

Docker and-daily-devops

Docker-Hanoi @DKT , Presentation about Docker Ecosystem

Plus de Datadog

What it Means to be a Next-Generation Managed Service ProviderDatadog

Monitoring kubernetes across data center and cloudDatadog

Datadog + VictorOps WebinarDatadog

Dataday Texas 2016 - DatadogDatadog

PyData NYC 2015 - Automatically Detecting Outliers with Datadog Datadog

Treating Infrastructure as GarbageDatadog

Events and metrics the Lifeblood of WebopsDatadog

Big (IT) dataDatadog

Deep dive into Nagios analyticsDatadog

Just enough web ops for web developersDatadog

Customer Ops: DevOps <3 customer supportDatadog

I <3 graphs in 20 slidesDatadog

Effective monitoring with StatsDDatadog

Alerting: more signal, less noise, less painDatadog

Fact based monitoringDatadog

Fact-Based MonitoringDatadog

Monitoring NGINX (plus): key metrics and how-toDatadog

What’s in this Cookbook? - Mike FiedlerDatadog

I Love Graphs - Alexis Lê-QuôcDatadog

Virtualization at Gilt - Rangarajan RadhakrishnanDatadog

Plus de Datadog (20)

What it Means to be a Next-Generation Managed Service Provider

Monitoring kubernetes across data center and cloud

Datadog + VictorOps Webinar

Dataday Texas 2016 - Datadog

PyData NYC 2015 - Automatically Detecting Outliers with Datadog

Treating Infrastructure as Garbage

Events and metrics the Lifeblood of Webops

Big (IT) data

Deep dive into Nagios analytics

Just enough web ops for web developers

Customer Ops: DevOps <3 customer support

I <3 graphs in 20 slides

Effective monitoring with StatsD

Alerting: more signal, less noise, less pain

Fact based monitoring

Fact-Based Monitoring

Monitoring NGINX (plus): key metrics and how-to

What’s in this Cookbook? - Mike Fiedler

I Love Graphs - Alexis Lê-Quôc

Virtualization at Gilt - Rangarajan Radhakrishnan

Dernier

AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Histor y of HAM Radio presentation slidevu2urc

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

A Year of the Servo Reboot: Where Are We Now?Igalia

Apidays New York 2024 - The value of a flexible API Management solution for O...apidays

GenAI Risks & Security Meetup 01052024.pdflior mazor

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

presentation ICT roal in 21st century educationjfdjdjcjdnsjd

Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Dernier (20)

AWS Community Day CPH - Three problems of Terraform

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

Boost Fertility New Invention Ups Success Rates.pdf

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

How to Troubleshoot Apps for the Modern Connected Worker

Histor y of HAM Radio presentation slide

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

Automating Google Workspace (GWS) & more with Apps Script

A Year of the Servo Reboot: Where Are We Now?

Apidays New York 2024 - The value of a flexible API Management solution for O...

GenAI Risks & Security Meetup 01052024.pdf

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

Axa Assurance Maroc - Insurer Innovation Award 2024

Handwritten Text Recognition for manuscripts and early printed texts

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

Boost PC performance: How more available memory can improve productivity

presentation ICT roal in 21st century education

Tata AIG General Insurance Company - Insurer Innovation Award 2024

GenCyber Cyber Security Day Presentation

Monitoring Docker containers - Docker NYC Feb 2015

1. Monitoring and Running Docker Containers at Scale Docker NYC Meetup February 25th, 2015

2. @alq — CTO at Datadog

3. Datadog • Monitoring service • Made for the cloud • Aggregates everything • Support for Docker (since 1.0)

4. Goal of this talk Rethink the monitoring of Docker containers

5. Agenda 1.A (very) brief history of containers 2.Operational complexity 3.Monitoring Docker effectively 4.Demo

6. A brief history of containers

7. Containers in a nutshell • Been around for a long time – jails, zones, cgroups • No full-virtualization overhead • Used for runtime isolation (e.g. jails) • Docker is an Escape from Dependency Hell

8. Escape from dependency hell a.out shared libs packages omnibus Docker == ?

9. Mini-host or über-process? Process Container Host Spec Source Dockerfile Kickstart On disk .TEXT /var/lib/docker / In memory PID Container ID Hostname In the network Socket veth* eth* Runtime context server core host data center

10. Mini-host or über-process?

11. Operational complexity

12. Combinatorial multiplication Hardware OS Off-the-shelf Your Application Hardware Hypervisor Off-the- shelf App OS OS Off-the- shelf App Hardware Hypervisor OS OS A A A A Containers O O O O

13. Operational complexity • Average containers per host: N (N=5, 10/2014) • N-times as many “hosts” to manage • Affects – provisioning: prep’ing & building containers – configuration: passing config to containers – orchestration: deciding where/when containers run – monitoring: making sure containers run properly

14. Complexity increases with... 1. Number of things to measure 2. Velocity of change

15. Number of things to measure • 1 Amazon EC2 instance – 10 CloudWatch metrics • 1 operating system (e.g. linux) – 100 metrics •N containers – 100*N metrics •110 + 100*N metrics per instance

16. Combinatorial multiplication 100 500instances containers Assuming only 5 containers per instance

17. Combinatorial multiplication 160 610metrics per host metrics per host Assuming only 5 containers per instance

18. Combinatorial multiplication 100 61,000instances metrics Assuming only 5 containers per instance

19. Velocity hours, days, months minutes, hours, days Host half-life Container half-life

20. Aggravating factors • Registry-based provisioning – new images as fast as you can git commit • Autonomic orchestration – from imperative to declarative – automated – individual containers don’t matter – e.g. kubernetes, mesos

21. A lot more, A lot faster.

22. If your monitoring is still centered on individual hosts or instances…

23. Host-centric monitoring Monitor Monitor GA P Hypervisor OS OS A A A A Containers O O O O

24. A lot more pain, A lot faster.

25. Monitoring containers effectively

26. A new approach to container monitoring

27. Layers + Tags

28. Layers of monitoring Monitor Hypervisor OS OS A A A A Containers O O O O

29. Layers of monitoring CloudWatch Infrastructure Monitoring APM Hypervisor OS OS A A A A Containers O O O O

30. Layers of monitoring cpu/net/io filesystem docker mem docker cpu db queries web requests app throughput CloudWatch Infrastructure Monitoring APM e.g . Hypervisor OS OS A A A A Containers O O O O

31. Layers of monitoring • Access to metrics from all the layers • Amazon CloudWatch, OS metrics, Docker metrics, app metrics in 1 place • Shared timeline

32. If monitoring does not cover all layers, pain.

33. Tags (a.k.a. labels) You (probably) already use them

34. Tags • Monitoring is like Auto-Scaling Groups • Monitoring is like Docker orchestration • From imperative to declarative • Query-based • Queries operate on tags

35. Monitoring with tags and queries “Monitor all Docker containers running image web” “… in region us-west-2 across all availability zones” “… and make sure resident set size < 1GB on c3.xl”

36. Monitoring with tags and queries “Monitor all Docker containers running image web” “… in region us-west-2 across all availability zones” “… and make sure resident set size < 1GB on c3.xl”

37. Monitoring with tags and queries “Monitor all Docker containers running image web” “… in region us-west-2 across all availability zones” “… that use more than 1.5x the average on c3.xl”

38. Demo: layers & tags

39. Take-aways 1. Docker increases operational complexity by an order of magnitude unless… 2. You have layered monitoring, from the instance to the container and to the application, and… 3. You monitor using tags and queries

Notes de l'éditeur

My name is Alexis. I’m the CTO of Datadog. We monitor cloud-based infrastructures. We have been monitoring containers for a few years now (lxc then docker)
Datadog is a monitoring service made for cloud environments, such as AWS, Azure, Google Cloud, etc. By that I mean that Datadog understands that your infrastructure can change at any time and deals with it naturally. To be able to monitor effectively, Datadog acts as an aggregator: it aggregates everything, it speaks native Cloudwatch and over 100 different other sources, like databases, web servers, etc.
My goals for this talk are three-fold. Dive into key Docker metrics Explain operational complexity. In other words I want to take what we have seen on the field and show you where the pain points will be. Rethink monitoring of Docker containers. The old tricks won’t work.
Here’s what I would like to talk about today. I will start with very brief history of containers and docker. This is a popular topic so I will only focus on operational matters, including key metrics that containers expose. I will focus on the inherent complexity that comes with running fleets of containers. I will illustrate this with what we see out there, in the real world. We have a particular vantage point that gives us good insight into this.
Containers, as lightweight virtual runtimes have been around for a while without going back all the way to the mainframe. Depending on the operating system, they go by the name of jails, zones, cgroups and are like traditional VMs, without the flexibility but also without the overhead. They were initially designed for security reasons (e.g. jails) but most recently have been used to escape dependency hell.
Dependency hell is this state where you end up having tens or hundreds of dependencies on shared code. Before shared libraries we had compile-time dependencies to build static executables. Shared libraries were a good idea when the size of a library was commensurate to the amount of RAM available in a machine. Now, obviously, there is a lot less memory pressure. Still, that has remained the default way to build software. Then, packages came: apt, yum, rvm, virtualenv, etc. as a partial solution to have a group of binaries that reliably work together. That proved too slow, having to wait for upstream updates so people started to bundle their code and dependencies into /opt. Then a way to make self-contained packages. And now we are back full-circle to static binaries, when we realized how much baggage we carried in shared code.
When you look at it a container is a hybrid between a process and a full-blown host. It has a Dockerfile, which is a manifest or a recipe to build the container, much like source code builds a binary and kickstart, chef or puppet build a full-blown host. Then you have the actual binary representation of the container on disk, in /var/lib/docker. For a binary, it’s the .text section. For a host it’s its filesystem. Finally when it runs a container has a unique ID, much like a process has a PID and a host has a hostname. So a container is this intermediary between a single binary and a full-blown host. It’s lik a static binary with a fully-functioning IP stack. To put it simply if you look at it from a dev point of view, a container looks like a binary. If you are think about it from an operations point of view, a container is closer to a host.
Let’s recap for a minute. We know that a container is a lightweight VM We know roughly what current deployments look like in number of containers per instance. We know how to measure the performance of a single container. How do we monitor the whole thing. Here I want to make the case that Docker introduces operational complexity
This is how the stack has evolved over the past 15 years. On the left, without virtualization. Off-the-shelf could be your J2EE runtime, or your database. Then when virtualization and services like EC2 were introduced, in the middle. It’s allowed better utilization and quasi-instant provisioning but for an engineer, few things have changed. And now running Docker containers inside EC2 instances on top of real hardware. There is a clear trend here toward a lot more moving parts than before. It also puts engineering much closer to operations.
Specifically by an order of magnitude or so given the 5 containers per instance on average.. This affects a lot of different things at run-time. provisioning: docker configuration: etcd, confd, consul, etc. orchestration: kubernetes, mesos monitoring: where I can contribute the most
Let’s look at monitoring an EC2 instance. I counted 10 CloudWatch metrics, about 100 metrics coming from the OS, 50 metrics coming from a container, 10-15 of which are critical to monitor, and let’s say 50 metrics for an off-the-shelf component, for instance a database. This is a conservative estimate as we see our customers use many more metrics per instance.
Now let’s plug in some numbers. Assuming you have 100 instances, and 5 containers per instance, you have 500 containers to manage and monitor. And remember, from a management standpoint, containers behave like hosts. Single-purpose hosts, but hosts none the less.
So for a given instance, you have moved from 160 metrics per instance, to about 410. Again assuming, 5 containers per host and being conservative on the number of metrics you need to keep an eye on.
If I recap, 100 instances, 41,000 metrics generated. That’s already 3x what you had before.
And it gets worse. Much worse Let’s talk about velocity. If you compare the “half-life” of an EC2 instance, and by half-life I mean the median uptime of your instances. You’re likely having a mix of hourly instances and long-lived instances that will go on for months. Compare this to containers. A container’s half-life can be in minutes, days at the most.
On top of that, you’ll have to layer in much faster provisioning, where new versions of containers are created on a daily basis, so you rotate your container fleet on a daily basis between versions. Much faster and much more often than doing an OS upgrade. And you add autonomic orchestration that go from imperative to declarative. So you can say, I need 1 container of this kind per instance per zone, at all times. And the scheduler makes sure it’s always the case. If you use mesos or kubernetes, this is your new reality
In summary, from a management and monitoring standpoint, it means a lot more and a lot faster. More moving parts that change pretty much all the time with limited predictability.
If your monitoring is still centered around hosts, this is what your world view looks like: complicated. When we talk to customers, they feel that the move to EC2 was a key factor to rethink their monitoring. Because instances come and go, different groups within their organization would spin up new stacks with little advance notice. Imagine if you throw containers in the mix. The old, host-centric monitoring practice simply stops working altogether. The host-centric monitoring practice that has you track individual hosts. It’s a bit like ptolemaic astronomy. Put the earth at the center of the universe and account for the movement of the planets. It gets pretty complicated.
In other words host-centric monitoring does not really understand containers, so either you treat them as hosts, and you have a lot of hosts that come and go every few minutes, which makes your life miserable because the host-centric monitoring system thinks half of your infrastructure is on fire. Or you don’t track containers, and you essentially have a gap. You see the OS, you see the app, and what happens in the middle, well…
So in short, if you think about monitoring containers like you’ve monitored hosts before, you’re in for a painful ride very very quickly.
So how do we do it properly?
We need a new approach, that does not treat everything like a host. The picture here, as you’ve guessed, comes from Copernicus. He suggested a radical approach to simplifying the universe. Don’t put the earth at the center of it… Compared to putting the earth at the center of the universe, this one is striking in clarity and simplicity.
So what’s the secret sauce? It’s simple: forget about hosts, think in layers and tags. What do you I mean by that…
Using a layered monitoring approach is pretty simple. This is where you want to be: have coverage from the bottom of the stack all the way to the top.
Which means using monitoring tools that don’t leave any gap. At the bottom, CloudWatch to know about the VMs. In the middle, an infrastructure monitoring system that understands containers. Ad at the top, an application performance monitoring tool.
So in terms of what you can see through these tools: At the bottom, raw resources like cpu, network, io of the VM. In the middle, anything from the OS to docker metrics. At the top, application throughput.
The key here is to have 1 shared timeline for everything. You want to get CloudWatch metrics, OS metrics, Docker metrics and app metrics, ideally in 1 place, all on the same timeline so that you can see when things break, how changes ripples through the different layers.
That’s the first part of the equation. Layers.
Tags is the second half of the equation. The good news is that you use them already. How are they relevant to monitoring in general and monitoring containers in particular?
Think of monitoring like ASG. Think of monitoring like container orchestration. Don’t think “imperative”, think “declarative”. Don’t monitor host X, Y and Z. Instead, monitor everything that share a common property, for instance being located in the same AZ. Think in terms of queries and you will see that tags work beautifully because queries operate on tags.
Here’s an example: Monitor… to make sure a container does not blow up in memory.
You can see the tags: Name of container image: web AWS Region: us-west-2 Instance type: c3.xlarge Do you see how powerful this is?
Once you have queries in place, you can express even more interesting things such as: Monitor …
Ok, demo time.