Machine Learning Software Engineering Patterns and Their Engineering
Kafka Summit NYC 2017 - Every Message Counts: Kafka as a Foundation for Highly Reliable Logging at Airbnb
1. Every Message Counts
Youssef Francis / Jun He / Kafka Summit 2017
Kafka as a Foundation for
Highly Reliable Logging at
Airbnb
2. Overview
Data we collect at Airbnb
• Product Events: Click streams, metrics, experiment
reporting.
• Database Exports: Streaming and batch replicas of
production databases.
• Service Events: Services uses these to communicate with
each other. Events act as a log of mutations.
• Derived Data: Computed from a combination of streaming
and batch data.
4. Schemas are a contract Events are delivered
reliably
Events are available
in real time
Producers only emit
well-formed events.
Errors detected in
development.
Predictable routing and
storage. Automatic
recovery from transient
failures.
Monitoring enables
immediately actionable
alerts. Events can be
inspected as they flow
through the pipeline.
Easy to find schemas
in use.
Ownership and
metadata provide
useful context.
Schemas and data
are discoverable
Tenets of Reliable Logging
6. What is Jitney?
Standardized message bus at
Airbnb
Jitney is a unified solution for
propagating business context
with standardized events.
7. Jitney is an
ecosystem
• Central schema repository
• Client SDK
• Producer proxy
• Consumer agent
• Self-service portal
8. Jitney Schemas
Schemas
• Contract owned by the producer and shared by consumers.
• Rich metadata provided by the Jitney API.
Central Schema Repository
• Single source of truth.
• Schemas are easily discoverable.
• All schemas undergo code review and automated validation.
Why Thrift?
• Rich ecosystem; write once, use everywhere.
• Good performance in Java and Ruby.
• Efficient binary encoding.
9. Jitney client SDK
Jitney everywhere
• Standardized routing logic.
• Automatic topic creation.
• Cluster discovery.
• Seek and Replay API: fast bootstrap and recovery.
• Built-in metric reporting.
• Available on all platforms in use at Airbnb.
10. Jitney producer
proxy and
consumer agents
Powered by Kafka
Jitney producer agent
• Local and distributed proxy modes
• Open to the public internet
• Local rate limiting and event filtering
• Single and batch publishing modes
• Horizontally scalable
Jitney consumer agent
• Local Java agent to support non-Java consumers
• Delivery at least once guarantee
19. Some numbers
• 150 brokers across 5 clusters. 1 million messages per
second at peak.
• 10 billion logging events collected per day. Highly seasonal.
• 95% of logging events are available for querying in the
warehouse within 3 minutes.
• SLA guarantees no more than 0.01% loss. Less than
0.000001% end-to-end loss in practice.