Monitoring applications that consists of multiple containers is not easy or available as part of any container solution or orchestration platform. This talk looks at how to address application performance leveraging business service level objectives and the architecture for implementing the solution. The solution has been prototyped at ciscoshipped.io and we would love your thoughts.
4. As a Application Owner
In order to run my application in the best possible
way
I want to express an application intent and monitor
how it stacks to it in real time
Problem statement (requirement)
5. Motivations
Ability to influence application performance through scale and
micro-service (re-)distribution (a.k.a. Application Intent)
Visualize aggregated monitoring data for microservice to clearly
point at application bottlenecks.
Provide a real time feedback that unifies several sources and
allows the system to take multiple simultaneous corrective
actions.
10. • Unified Measurements Format
• Using a unified format with a handful of pre-set user labels we can correlate data
coming from several sources
• Measurements Bus
• Having a bus allows us to have an ecosystem of producers and consumers that can
act independently, at their own pace and needs, e.g. alarms vs. persisters
• Compound Measurements
• Compound Measurements allows us to combine several metrics and generate new
ones that are re-inserted in the bus, we can also use alarms to generate new
measurements.
Solution Pillars
11. Unified Measurement Model (based on Monasca)
{ "metric":
{ "name": "container.cpu.usage.total",
"dimensions": {
”project_id": ”this is the application id",
”service_id": ”unique id of the microservice being part of the app”,
"env_id": ”this would indicate production, staging, dev etc…",
“container_id“: “123…”,
"hostName": ”host1" ,
“any_other_label”: “vvv”},
"timestamp": 1458749286,
"value": 6455776114 },
"meta": {
"tenantId": ”…..",
"region": ”…." },
"creation_time": 1458749286
}
12. Measurement Bus and µServices
Worker Node
cAdvisor
Marathon+Mesos or Kubernetes Cluster
schedule
container
Worker Node Worker Node
µService Metrics
Measurements Bus
Monasca
Persister
Tenant
Container
Dimensions
become tags,
so queries
are fast, and
are efficiently
stored.
15. Adding Load Balancer (Traefik)
{ "metric": {
"name":"elapsedMs-traefik",
"dimensions": {
”project_id": ”this is the application id",
”service_id": ”unique id of the microservice being part of the app”,
"env_id": ”this would indicate production, staging, dev etc…”
},
"timestamp":1461198662000,
"value":161 },
"meta":{ "tenantId":"19c60964-0621-11e6-bd9b-0242ac110003", "region":"" },
"creation_time":1461198662000 }
NOTE: Traefik has no access to the ContainerID and HostName
20. • Using a Measurement Bus is possible to create real time alarms
based on threadsholds, it is also possible to enable 3rd parties to
inject metrics and have them all correlated together: root cause
analysis? Not really, but we went a long way with that little …
• Take corrective actions using the scheduler to adjust µServices:
• Scaling (Up and Down) in order to address increase/decrease in load
• Move (affinity/anti-affinity) to avoid issues with noisy neighbors, infected hosts or
potential faulty hardware
• Suspend Execution (security) to eliminate the risk of compromised containers but
leaving them available for forensics.
• Let the Application Owner decide what to watch for and how to
respond to it
Takeaways
21. • These concepts are to be considered proof of concepts and are an
illustration of where we would like to see the industry move in term of
supporting application intent.
• Some of these features are still not fully implemented (and/or tested),
but some are available at ciscoshipped.io
• We would like to explore interest in the Prometheus community in
supporting/extending some of these concepts.
Next Steps