4. Logging
• Why do we need centralized logging?
• Logs in Cloud Foundry
• How to store
• How to parse
• How to see
• The Logsearch project
• Tips and tricks
5. How to see logs without centralized entrypoint
• bosh ssh + less/grep/etc for
platform logs
• cf logs for apps logs
Can you call this convenient from operator’s
point of view? I can’t.
6. Why do we need centralized logging
• Too many servers, too few displays :-)
• Convenient search
• Data manipulation
• Long-term storing
• Opportunity to create dashboards, reports,
alerts, and etc.
8. Logs in Cloud Foundry: Apps
• All application logs ➡ Metron agent ➡ Firehose nozzle
• Specific application ➡ User-provided Service Instance
with syslog URL ➡ syslog receiver
• Specific application ➡ Service Instance with
syslog_drain_url ➡ syslog receiver
https://docs.cloudfoundry.org/devguide/services/log-management.html
https://docs.cloudfoundry.org/services/app-log-streaming.html
https://github.com/openservicebrokerapi/servicebroker/blob/v2.13/spec.md#log-drain
13. Logs in Cloud Foundry: Platform
• Diego
• UAA
• CC API
• Consul
• etcd
• ...
14. How to store
You need some kind of database suitable for
logs:
– dynamic fields
– indexing
– fast/convenient search
15. How to store: Example
Elasticsearch cluster
Indexes
Nodes
Shards
16. How to parse
Parser should be able to parse logs in
different formats:
– syslog (RFC 5424) for platform logs
– plain text for apps
– custom format for apps (e.g. JSON)
17. How to parse: Example
https://www.elastic.co/guide/en/logstash/
current/input-plugins.html
https://www.elastic.co/guide/en/logstash/
current/output-plugins.html
https://www.elastic.co/guide/en/logstash/
current/filter-plugins.html
18. How to see
Personally I would like to see to see the
following features in the UI:
– convenient search and filtering
– graphs and dashboards
21. PCF: Altoros Log Search for PCF
https://network.pivotal.io/products/altoros-log-search
22. Tips and tricks
• Decrease the log level in CF Deployment
(e.g. debug) to avoid information overload
• To ease application log parsing, you might
want to consider using the JSON format
for logs
23. Metrics
• Main concepts of monitoring
• Levels of Cloud Foundry monitoring
• Monitoring approaches for each CF level
• Architecture of a simple monitoring solution
24. Why monitoring is important
• We want to know what is going on
• We want to know it before our clients do
• We want to be able to troubleshoot problems
• We want to measure (e.g. capacity planning)
25. Why we need metrics
We already have logs and maybe some checks
and alerts, why do we need metrics?
26. Why we need metrics
With the help of metrics we can:
• do measurement
• prove assumptions
• do troubleshooting
• make predictions
• set up alerts based on historical data
Also graphs are human friendly :-)
28. Metrics workflow: collecting
• Push model (metrics collectors or agents send
metrics to TSDB)
• Pull model (internal capability of the system to
expose metrics)
32. Levels of CF monitoring
• IaaS
• BOSH
• CF
• Applications
• Backing services
33.
34. IaaS monitoring
• Collect metrics for VMs
– Metrics collectors
• collectd
• diamond
• telegraf
• prometheus exporters
• Collect internal IaaS Metrics
– Internal API (so you can use a metrics collector)
– Vendor-specific monitoring systems
35. BOSH monitoring
• BOSH Health Monitor
• BOSH HM Forwarder
• PCF JMX Bridge (PCF only)
Note: these metrics are quite limited.
https://bosh.io/docs/hm-config.html
https://github.com/cloudfoundry/bosh-hm-forwarder
https://network.pivotal.io/products/ops-metrics
36. CF monitoring
• Firehose nozzles for CF own components:
– for your on-premises TSDB
– for SaaS monitoring
• Monitoring agents for 3rd party CF components:
– consul
– MySQL/PostgreSQL
– HAProxy
• Direct API calls (deprecated, don’t use it)
38. Event types
• ValueMetric indicates the value of a metric at an instant in time.
• CounterEvent represents the increment of a counter. It contains
only the change in the value; it is the responsibility of downstream
consumers to maintain the value of the counter.
• LogMessage contains a "log line" and associated metadata.
• Error event represents an error in the originating process.
• ContainerMetric records resource usage of an app in a container.
• HttpStartStop event represents the whole lifecycle of an HTTP
request.
42. Application monitoring
• A Firehose nozzle (standard metrics)
• Application Performance Monitoring (cool, but
expensive)
• Define metrics in your apps and send them to
your own monitoring system (e.g. statsd)
• Create custom buildpacks to collect some
predefined metrics (e.g. JMX)
43. Backing services monitoring
• Via metrics collectors (they have plugins for this)
• Via internal capability of the system (like in
Cassandra and Jenkins)
• Via a firehose (some bosh-releases use it)
– e.g. via Pivotal Cloud Foundry Service Metrics SDK
50. Next time: Basic but useful metrics
• BOSH
• Diego
• Gorouter
• CC
• etcd
51. Next time: Advanced metrics
• Capacity planning
• Security
• Derived metrics (e.g. from the HttpStartStop
event)
52. Next time: Seamless integration into CF
• Deploy your monitoring solution with BOSH
• Deploy your monitoring agents by adding them
to your manifests or deploy them as BOSH
addons
• Create a service broker
• Create a custom buildpack
54. Q & A
Anton Soroko
anton.soroko@altoros.com
Thank you!
https://www.altoros.com/heartbeat/
Notes de l'éditeur
API - Users make API calls to request changes in app state
STG - The Diego cell or the Droplet Execution Agent emits STG logs when staging or restaging an app.
RTR - The Router emits RTR logs when it routes HTTP requests to the app.
Zipkin Trace Logging - If Zipkin trace logging is enabled in Cloud Foundry, then Gorouter access log messages contain Zipkin HTTP headers.
LGR - Loggregator emits LGR to indicate problems with the logging process.
APP - Every app emits logs according to choices by the developer.
SSH - The Diego cell emits SSH logs when a user accesses an application container through SSH by using the cf ssh command.
CELL - The Diego cell emits CELL logs when it starts or stops the app. The Diego cell also emits messages when an app crashes.