How to Monitor Application Performance in a Container-Based World

Fabio Giannetti and Ken Owens
CloudNative Con - November 8, 2016
How to Monitor Application
Performance in a
Container-Based World

• Problem Statement and Motivations
• Proposed Solution
• Takeways and Next Steps
Agenda

Problem Statement and
Motivations

As a Application Owner
In order to run my application in the best possible
way
I want to express an application intent and monitor
how it stacks to it in real time
Problem statement (requirement)

Motivations
Ability to influence application performance through scale and
micro-service (re-)distribution (a.k.a. Application Intent)
Visualize aggregated monitoring data for microservice to clearly
point at application bottlenecks.
Provide a real time feedback that unifies several sources and
allows the system to take multiple simultaneous corrective
actions.

Micro-service
Deployment Environments
Number of Running Containers
Monitoring Data
2

cAdvisor
µService Metrics
Measurements Bus
Host Metrics
Load Balancer
3
AppIntent
Engine
Scheduling Actions
Security Actions
Compound Metrics Actions
Alarm/Notificattion Actions
Kubernetes /
Marathon
IDM

• Unified Measurements Format
• Using a unified format with a handful of pre-set user labels we can correlate data
coming from several sources
• Measurements Bus
• Having a bus allows us to have an ecosystem of producers and consumers that can
act independently, at their own pace and needs, e.g. alarms vs. persisters
• Compound Measurements
• Compound Measurements allows us to combine several metrics and generate new
ones that are re-inserted in the bus, we can also use alarms to generate new
measurements.
Solution Pillars

Unified Measurement Model (based on Monasca)
{ "metric":
{ "name": "container.cpu.usage.total",
"dimensions": {
”project_id": ”this is the application id",
”service_id": ”unique id of the microservice being part of the app”,
"env_id": ”this would indicate production, staging, dev etc…",
“container_id“: “123…”,
"hostName": ”host1" ,
“any_other_label”: “vvv”},
"timestamp": 1458749286,
"value": 6455776114 },
"meta": {
"tenantId": ”…..",
"region": ”…." },
"creation_time": 1458749286
}

Measurement Bus and µServices
Worker Node
cAdvisor
Marathon+Mesos or Kubernetes Cluster
schedule
container
Worker Node Worker Node
µService Metrics
Measurements Bus
Monasca
Persister
Tenant
Container
Dimensions
become tags,
so queries
are fast, and
are efficiently
stored.

Visualization
using Grafana
µService Metrics
Measurements Bus
Monasca
Persister

Compound
Measurements
µService Metrics
Measurements BusAppIntent
Engine
IF diskWrite GTE 1.4
netSent GTE 2.0
MEASUREMENT(Label, Value)

Adding Load Balancer (Traefik)
{ "metric": {
"name":"elapsedMs-traefik",
"dimensions": {
”project_id": ”this is the application id",
”service_id": ”unique id of the microservice being part of the app”,
"env_id": ”this would indicate production, staging, dev etc…”
},
"timestamp":1461198662000,
"value":161 },
"meta":{ "tenantId":"19c60964-0621-11e6-bd9b-0242ac110003", "region":"" },
"creation_time":1461198662000 }
NOTE: Traefik has no access to the ContainerID and HostName

Load Balancer
Metrics
µService Metrics
Measurements Bus
Monasca
Persister
Load
Balancer

Adding Security Scans (Cisco Norad)
{ "metric": {
"name":”norad_security_scan",
"dimensions": {
"hostName": ”host1" },
"timestamp":1461198662030,
"value":0 },
"meta":{ "tenantId":"19c60964-0621-11e6-bd9b-0242ac110003", "region":"" },
"creation_time":1461198662000 }
NOTE: A Host security scan may not have knowledge of the containers

Engine
cAdvisor
µService Metrics
Measurements Bus
Host Metrics IaaS Metrics
Load
Balancer
Actions Engine
Threadshold
Engine
Proxy Agent
AppIntent
Complete Solution
Container
Vulnerability Scan
Host Vulnerability
Scan

• Using a Measurement Bus is possible to create real time alarms
based on threadsholds, it is also possible to enable 3rd parties to
inject metrics and have them all correlated together: root cause
analysis? Not really, but we went a long way with that little …
• Take corrective actions using the scheduler to adjust µServices:
• Scaling (Up and Down) in order to address increase/decrease in load
• Move (affinity/anti-affinity) to avoid issues with noisy neighbors, infected hosts or
potential faulty hardware
• Suspend Execution (security) to eliminate the risk of compromised containers but
leaving them available for forensics.
• Let the Application Owner decide what to watch for and how to
respond to it
Takeaways

• These concepts are to be considered proof of concepts and are an
illustration of where we would like to see the industry move in term of
supporting application intent.
• Some of these features are still not fully implemented (and/or tested),
but some are available at ciscoshipped.io
• We would like to explore interest in the Prometheus community in
supporting/extending some of these concepts.
Next Steps

How to Monitor Application Performance in a Container-Based World

How to Monitor Application Performance in a Container-Based World

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (20)

Similaire à How to Monitor Application Performance in a Container-Based World

Similaire à How to Monitor Application Performance in a Container-Based World (20)

Plus de Ken Owens

Plus de Ken Owens (8)

Dernier

Dernier (20)

How to Monitor Application Performance in a Container-Based World