Performance monitoring for Docker
Challenges - Anomaly detection - CoScale demo
For more info about how to use CoScale Docker monitoring, some reading material here: http://www.coscale.com/blog/how-to-monitor-docker-containers-with-coscale and http://www.coscale.com/blog/how-to-monitor-your-kubernetes-cluster
A summary of CoScale Docker performance monitoring can be found here: http://www.coscale.com/docker-monitoring
5. • Scale & dynamic behavior:
Number of containers >> number of servers
Containers come and go at a much faster pace
Container monitoring challenges
• Diversity
Different application technologies
Overload of metrics to monitor and alert on
7. Microservices monitoring
(Virtualized) OS
End user
System / Infrastructure monitoring
Container monitoring +
In-container application monitoring
Real user monitoring (RUM)
Container
Application
component
Container
Application
component
Container
Application
component
8. Hosts (CPU, memory, disk)
Orchestrator (services, volumes, replication controllers, …)
Containers (cpu, memory, disk, network, ...)
Container internals (application, database, caching, etc.)
Impact on user and application performance
What to monitor?
Lightweight monitoring for lightweight microservices environment
9. Docker stats API
$ docker stats
CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O
1285939c1fd3 0.07% 796 KiB / 64 MiB 1.21% 788 B / 648 B 3.568 MB / 512 KB
9c76f7834ae2 0.07% 2.746 MiB / 64 MiB 4.29% 1.266 KB / 648 B 12.4 MB / 0 B
d1ea048f04e4 0.03% 4.583 MiB / 64 MiB 6.30% 2.854 KB / 648 B 27.7 MB / 0 B
Docker API
10. docker run
--volume=/:/rootfs:ro
--volume=/var/run:/var/run:rw
--volume=/sys:/sys:ro
--volume=/var/lib/docker/:/var/lib/docker:ro
--publish=8080:8080
--detach=true
--name=cadvisor
google/cadvisor:latest
open http://<your-hostname>:8080/
CAdvisor
11. agent runs in 1 container or on host
container resource usage
basic application monitoring
15$ / month / server
Datadog
datadoghq.com
13. Heavyweight, deep application monitoring
Designed for monolithic application in specific programming language
Too many dynamic metrics to handle with static alerts
Putting an agent inside a container is an anti-pattern
100+$ / month / server
APM vendors
14. ● Extra work in setting up, maintaining, and supporting
● Generic tools, no specific container or cluster visualizations
● No Real User Monitoring
● No out-of-the-box anomaly detection and predictive analytics
Prometheus
Open source
24. Holt-Winters
● seasonal exponential smoothing
● works quite well on ‘laboratory data’
● calculation of prediction intervals
relies on normal distribution after
removal of seasonality
● => on our real world seasonal data
generates too many false positives
31. Local outlier factor
Existing instance based machine learning technique (lazy,
~kNN)
Based on concept of local density
local outlier factor(A) =
density at point A
average density of kNN of point A
LOF >> 1 ⇒ outlier
en.wikipedia.org/wiki/Local_outlier_factor
36. Lightweight agent
• Server metrics from OS
• Container and cluster metrics from Kubernetes and Docker APIs
• Application metrics from log files and management interfaces
• Business & custom metrics from various sources
Contextual events
• Container lifecycle
• Deployments & software releases
• Infrastructure changes
• Custom events
CoScale approach
48. Local outlier factor, no strong model assumption
heavy
process
Local outlier factor, no strong model assumption
49. Local outlier factor, no free lunch
Scaling: comparing apples and oranges
scale ⇒ distance ⇒ density ⇒ LOF-score
Autoscaling? (Mahalanobis distance) => enlarges
dimensions with low variance
“Curse of dimensionality”
dimensionality reduction preprocessing (e.g. PCA), but don’t throw
away the anomalies with the bathwater
Choosing cross-sections of data to analyze together, e.g.
different metric on same container
same metric on different containers