Agenda for this Presentation
• The challenges of Log Management at scale
• Overview of Loggly’s processing pipeline
• Alternative technologies considered
• Why we love Apache Kafka
• How Kafka has added flexibility to our pipeline

The Challenges of Log Management at Scale
• Big data
– >750 billion events logged to date
– Sustained bursts of 100,000+ events per second
– Data space measured in petabytes
• Need for high fault tolerance
• Near real-time indexing requirements
• Time-series index management
Scaling API-first – The story of a global engineering organization
Why @Loggly Loves Apache Kafka, and How We Use Its Unbreakable Messaging for Better Log Management
1. Why Loggly Loves Apache
Kafka, and How We Use Its
Unbreakable Messaging for
Better Apache Log Storm
Management
Infrastructure Engineering Team
June 2014
| Log management as a service Simplify Log Management
2. What Loggly Does
World’s most popular cloud-based
log management service
§ More than 5,000 customers
§ Near real-time indexing of events
Distributed architecture, built on AWS
Initial production services in 2011
§ Loggly Generation 2 released in Sept 2013
| Log management as a service Simplify Log Management
3. Loggly: Addressing the first big data
problem every company faces
§ Centralized logging
and archival
§ Real-time processing,
analysis and
visualization
§ Monitoring, alerting
and troubleshooting
| Log management as a service Simplify Log Management
4. Agenda for this Presentation
§ The challenges of Log
Management at scale
§ Overview of Loggly’s
processing pipeline
§ Alternative technologies
considered
§ Why we love Apache Kafka
§ How Kafka has added
flexibility to our pipeline
| Log management as a service Simplify Log Management
5. The Challenges of Log Management at Scale
§ Big data
– >750 billion events logged to
date
– Sustained bursts of 100,000+
events per second
– Data space measured in
petabytes
§ Need for high fault tolerance
§ Near real-time indexing
requirements
§ Time-series index
management
| Log management as a service Simplify Log Management
6. Log Management Processing Pipeline:
Overview
Load Balancing
Kafka
Stage
2
Loggly
Custom
Module
| Log management as a service Simplify Log Management
7. Collectors Can Easily Outpace
Downstream Processes
Load Balancing
Kafka
Stage
2
Loggly
Custom
Module
§ Written in C++
§ Designed to ingest
massive data volumes
§ Need to collect
regardless of what’s
happening
downstream
| Log management as a service Simplify Log Management
8. Solution:
Queue That’s External to Collector
Load Balancing
Kafka
Stage
2
Loggly
Custom
Module
§ Based on Apache
Kafka
§ Highly performant
and reliable
| Log management as a service Simplify Log Management
9. Alternate/ Supplementary
Approaches Considered
§ Internal buffering in collectors
– Added complexity
§ Cassandra
– Not as good a queue as Kafka
§ Apache Storm
– In initial Gen2 architecture, removed after launch
| Log management as a service Simplify Log Management
10. The Secret to Log Management at Scale:
Keep It Simple, Stupid
Results:
§ Can process sustained rates of
100,000+ events per second per cluster
§ Average message 300 bytes
| Log management as a service Simplify Log Management
11. Why We Love
Kafka
| Log management as a service Simplify Log Management
12. What Attracted Us in the First Place
No single point
of failure
• Terabytes of data move through our Kafka cluster
every day without losing a single event
• We use age-based retention to purge old data on disks
Low latency • 99.99999% of the time our data is coming from disk
cache and RAM; only very rarely do we hit disk
Performance • Crazy good!
• We currently have a bunch of Kafka brokers running
on m2.xlarge instances backed by provisioned IOPS.
• One of consumer group (eight threads) which maps a
log to a customer can process about 200,000 events
per second draining from 192 partitions spread across
three brokers
Scalability • Ability to increase partition count per topic and
downstream consumer threads provides flexibility to
increase throughput when desired
| Log management as a service Simplify Log Management
13. How Our Kafka Crush Has Deepened
Distributed log
collection
• Local pods and collectors spread all over the Internet with
local Kafka deployments to collect data from customers
located all over world
• Can collect logs even when we lose connectivity
• When network comes back, Kafka sends the logs
downstream to the rest of the pipeline
More efficient,
effective
DevOps
• Deploying Kafka throughout pipeline makes it easy to
disable certain parts of system (for troubleshooting or
upgrades)
• No worrying that we will lose customer data
• Example: Add support for new log type into our
automatic parsing capabilities by turning off existing
parser, deploying new one, and processing logs that
Kafka has queued up
Controlling
resource
utilization
• Keep collectors as simple as possible for resilience and
reliability reasons
• Add intelligence into our pipelines using Kafka
| Log management as a service Simplify Log Management
15. “Noisy Neighbors” are
Inherent to SaaS
§ Sending many times their “normal” level of
logging volume, inadvertently or because their
application is in big trouble
§ Routing logs to separate queue minimizes
impact on other customers
| Log management as a service Simplify Log Management
16. Kafka Queues Add Flexibility to Loggly
Pipeline
§ Because Kafka topics are very cheap from a
performance and overhead standpoint, we
can create as many queues as we want
§ Scaled to the performance we want
§ Optimizing resource utilization across the system
§ Because they can be created dynamically, we
can make business rules very flexible
§ Makes us confident that pipeline will scale as
customer data volumes do
| Log management as a service Simplify Log Management
17. Conclusion:
Kafka Frees Our Development Team
to Build Differentiating Features
§ Kafka deployment working without us thinking
about it
§ Plenty of other things to do to keep our
position as the world’s most popular cloud-based
log management service!
| Log management as a service Simplify Log Management
18. Does Log Management
Sound Hard? It Should!
Let us do the heavy lifting for you!
Try Loggly FREE for 30 days
About Us:
Loggly is the world’s most popular cloud-based log management solution, used by
more than 5,000 happy customers to effortlessly spot problems in real-time, easily
pinpoint root causes and resolve issues faster to ensure application success.
Visit us at loggly.com or follow @loggly on Twitter.
| Log management as a service Simplify Log Management
19. Did you like this presentation?
Head over to our blog for
more great content!
Take me to the Loggly Blog
| Log management as a service Simplify Log Management