Performance monitoring for Docker - Lucerne meetup

Performance Monitoring
for Docker environments
Monitoring Docker
Anomaly detection
Live demo

About me
@coscale
www.coscale.com
@spolfliet
stijn.polfliet@coscale.com

• Scale & dynamic behavior:
Number of containers >> number of servers
Containers come and go at a much faster pace
Container monitoring challenges
• Diversity
Different application technologies
Overload of metrics to monitor and alert on

Monolithic application monitoring
(Virtualized) OS
Application
End user
System / Infrastructure monitoring
Application performance monitoring (APM)
Real user monitoring (RUM)

Microservices monitoring
(Virtualized) OS
End user
System / Infrastructure monitoring
Container monitoring +
In-container application monitoring
Real user monitoring (RUM)
Container
Application
component
Container
Application
component
Container
Application
component

Hosts (CPU, memory, disk)
Orchestrator (services, volumes, replication controllers, …)
Containers (cpu, memory, disk, network, ...)
Container internals (application, database, caching, etc.)
Impact on user and application performance
What to monitor?
Lightweight monitoring for lightweight microservices environment

Docker stats API
$ docker stats
CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O
1285939c1fd3 0.07% 796 KiB / 64 MiB 1.21% 788 B / 648 B 3.568 MB / 512 KB
9c76f7834ae2 0.07% 2.746 MiB / 64 MiB 4.29% 1.266 KB / 648 B 12.4 MB / 0 B
d1ea048f04e4 0.03% 4.583 MiB / 64 MiB 6.30% 2.854 KB / 648 B 27.7 MB / 0 B
Docker API

docker run
--volume=/:/rootfs:ro
--volume=/var/run:/var/run:rw
--volume=/sys:/sys:ro
--volume=/var/lib/docker/:/var/lib/docker:ro
--publish=8080:8080
--detach=true
--name=cadvisor
google/cadvisor:latest
open http://<your-hostname>:8080/
CAdvisor

agent runs in 1 container or on host
container resource usage
basic application monitoring
15$ / month / server
Datadog
datadoghq.com

kernel module captures system calls
container resource usage
basic application monitoring
Sysdig
sysdig.com

Heavyweight, deep application monitoring
Designed for monolithic application in specific programming language
Too many dynamic metrics to handle with static alerts
Putting an agent inside a container is an anti-pattern
100+$ / month / server
APM vendors

● Extra work in setting up, maintaining, and supporting
● Generic tools, no specific container or cluster visualizations
● No Real User Monitoring
● No out-of-the-box anomaly detection and predictive analytics
Prometheus
Open source

Performance Monitoring
for Docker environments
Anomaly detection

Static alerts
TODO : more realistic business examples
!
!
!

?
seasonality
correlations
changing or dynamic
environment
Static alert limitations

Challenges
statistical significance relevance
⇏

Simple technique: 3- rule
Exponential smoothing: α=0.03, z=3

Does not work with seasonal data

Holt-Winters
● seasonal exponential smoothing
● works quite well on ‘laboratory data’
● calculation of prediction intervals
relies on normal distribution after
removal of seasonality
● => on our real world seasonal data
generates too many false positives

Sliding window approach
model
evaluation
of new
data

Local outlier factor
Existing instance based machine learning technique (lazy,
~kNN)
Based on concept of local density
local outlier factor(A) =
density at point A
average density of kNN of point A
LOF >> 1 ⇒ outlier
en.wikipedia.org/wiki/Local_outlier_factor

Load balance detector
Compare multiple signals (mean + variance) in load-balanced environment

Anomaly detection @ service level

Lightweight agent
• Server metrics from OS
• Container and cluster metrics from Kubernetes and Docker APIs
• Application metrics from log files and management interfaces
• Business & custom metrics from various sources
Contextual events
• Container lifecycle
• Deployments & software releases
• Infrastructure changes
• Custom events
CoScale approach

Scalable Architecture
APP
APP
APP
APP
APP
API
APP
APP
RUM
Postgresql
Metadata
Cassandra
Metric data
Elasticsearch
Event data
HaProxy
Loadbalancer
HTTPS handling
Analysis workers
Alerting workers
Data workers
RUM
Boomerang.js
Agent
Log & api parsing

Questions?
or contact me at
stijn.polfliet@coscale.com
@spolfliet

Local outlier factor, no strong model assumption
heavy
process
Local outlier factor, no strong model assumption

Local outlier factor, no free lunch
Scaling: comparing apples and oranges
scale ⇒ distance ⇒ density ⇒ LOF-score
Autoscaling? (Mahalanobis distance) => enlarges
dimensions with low variance
“Curse of dimensionality”
dimensionality reduction preprocessing (e.g. PCA), but don’t throw
away the anomalies with the bathwater
Choosing cross-sections of data to analyze together, e.g.
different metric on same container
same metric on different containers

Performance monitoring for Docker - Lucerne meetup

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Performance monitoring for Docker - Lucerne meetup

Similar to Performance monitoring for Docker - Lucerne meetup (20)

Recently uploaded

Recently uploaded (20)

Performance monitoring for Docker - Lucerne meetup