> 55 million messages a day
• Now ~30Gb of indexed data per day
• All our applications
• All of syslog
• Used by developers and product managers
• 2 x DL360s with 8x600Gb discs, also
graphite install
About 4 months old
• Almost all apps onboard to various levels
• All of syslog was easy
• Still haven’t done apache logs
• Haven’t comprehensively done router/
switches
• Lots of apps still emit directly to graphite
On host log collector
• Application to logcollector is ZMQ
• Small amount of buffering (1000
messages)
On host log collector
• Application to logcollector is ZMQ
• Small amount of buffering (1000
messages)
• logcollector to logstash is ZMQ
• Large amount of buffering (disc offload,
100s of thousands of messages)
What, no AMQP?
• Could go logcollector => AMQP =>
logstash for extra durability
What, no AMQP?
• Could go logcollector => AMQP =>
logstash for extra durability
• ZMQ buffering ‘good enough’
What, no AMQP?
• Could go logcollector => AMQP =>
logstash for extra durability
• ZMQ buffering ‘good enough’
• logstash uses a pure ruby AMQP decoder
What, no AMQP?
• Could go logcollector => AMQP =>
logstash for extra durability
• ZMQ buffering ‘good enough’
• logstash uses a pure ruby AMQP decoder
• Slooooowwwwww
Reliability
• Multiple Elasticsearch servers (obvious)!
• Due to ZMQ buffering, you can:
• restart logstash, messages just buffer on
hosts whilst it’s unavailable
• restart logcollector, messages from apps
buffer (lose some syslog)
Redundancy
• Add a UUID to each message at emission
point.
• Index in elasticsearch by UUID
Redundancy
• Add a UUID to each message at emission
point.
• Index in elasticsearch by UUID
• Emit to two backend logstash instances
(TODO)
Redundancy
• Add a UUID to each message at emission
point.
• Index in elasticsearch by UUID
• Emit to two backend logstash instances
(TODO)
• Index everything twice! (TODO)
Elasticsearch
optimisation
• You need a template
• compress source
• disable _all
• discard unwanted fields from source /
indexing
• tweak shards and replicas
• compact your yesterday’s index at end of
day!
Elasticsearch size
• 87 daily indexes
• 800Gb of data (per instance)
• Just bumped ES heap to 22G
• Just writing data - 2Gb
• Query over all indexes - 17Gb!
Elasticsearch size
• 87 daily indexes
• 800Gb of data (per instance)
• Just bumped ES heap to 22G
• Just writing data - 2Gb
• Query over all indexes - 17Gb!
• Hang on - 800/87 does not = 33Gb/day!
TimedWebRequest
• Most obvious example of a standard event
• App name
• Environment
• HTTP status
• Page generation time
• Request / Response size
TimedWebRequest
• Most obvious example of a standard event
• App name
• Environment
• HTTP status
• Page generation time
• Request / Response size
• Can derive loads of metrics from this!
statsd
• Rolls up counters and timers into metrics
• One bucket per stat, emits values every 10
seconds
statsd
• Rolls up counters and timers into metrics
• One bucket per stat, emits values every 10
seconds
• Counters: Request rate, HTTP status rate
statsd
• Rolls up counters and timers into metrics
• One bucket per stat, emits values every 10
seconds
• Counters: Request rate, HTTP status rate
• Timers: Total page time, mean page time,
min/max page times
Alerting use cases:
• Replaced nsca client with standardised log
pipeline
• Developers log an event and get (one!)
email warning of client side exceptions
• Passive health monitoring - ‘did we log
something recently’
Riemann
• Ambitious plans to do more
• Web pool health (>= n nodes)
• Replace statsd
• Transit collectd data via logstash and
use to emit to graphite
Riemann
• Ambitious plans to do more
• Web pool health (>= n nodes)
• Replace statsd
• Transit collectd data via logstash and
use to emit to graphite
• disc usage trending / prediction
Metadata
• It’s all about the metadata
• Structured events are describable
• Common patterns to give standard
metrics / alerting for free
Metadata
• It’s all about the metadata
• Structured events are describable
• Common patterns to give standard
metrics / alerting for free
• Dashboards!
The last point here is most important - ZMQ networking works entirely in a background thread perl knows nothing about, which means that you can asynchronously ship messages with no changes to your existing codebase.\n
The last point here is most important - ZMQ networking works entirely in a background thread perl knows nothing about, which means that you can asynchronously ship messages with no changes to your existing codebase.\n
The last point here is most important - ZMQ networking works entirely in a background thread perl knows nothing about, which means that you can asynchronously ship messages with no changes to your existing codebase.\n
The last point here is most important - ZMQ networking works entirely in a background thread perl knows nothing about, which means that you can asynchronously ship messages with no changes to your existing codebase.\n
The last point here is most important - ZMQ networking works entirely in a background thread perl knows nothing about, which means that you can asynchronously ship messages with no changes to your existing codebase.\n
The last point here is most important - ZMQ networking works entirely in a background thread perl knows nothing about, which means that you can asynchronously ship messages with no changes to your existing codebase.\n