Keeping it Real Time: Globally distributed, high volume data
processing optimized for scale, reliability & cost
Krux is an infrastructure provider for many of the websites and
services you use online today, like NYTimes.com, WSJ.com, Roku and
Ticketmaster. For every request on those properties, Krux will get
one or more as well. We grew from zero traffic to several billion
requests per day from all over the world in the span of 2 years,
and we did so exclusively in AWS.
As anyone building a big data infrastructure will be able to tell
you, is that turn around time requirements for processing data are
getting shorter and shorter and (near) real time is becoming the norm.
However, processing data efficiently, reliably, resiliently, at scale,
at minimal cost and without burdening your developers and operators
is a tremendous challenge, especially in a cloud environment.
This is the story of all the challenges & pitfalls we encountered,
and how, through architecture, convention and sensible trade offs, we
managed to build a real time infrastructure that is at the core of our
data processing and incredibly economical to build, scale & operate.
I will share with you the details of how we set up and operate our
globally distributed real-time pipeline, based on Kafka and related
tools, to capture and process over 100k messages every second from
virtually every part of the system, including external requests, user
apps and the operating system itself, for under $4500/month using off
the shelf Open Source software and some code we've released as Open
Source ourselves.
This content will be applicable for anyone processing or desiring to
process vast amounts of messages in real-time in a cloud or datacenter
setting.
14. MAKING GOOD CHOICES
It All Dependstm
http://myyearwithoutclothesshopping.com/shopping-clothes-emotions/defining-moments-and-powerful-choices/
15. Kinesis SQS RabbitMQ Kafka
Features
Scale up via API
call
Scale up by
AWS
Open Source
(erlang)
Apache Open
Source (scala)
Very consistent
latencies
Dead letter
queue
Dead letter
queue
Performance
Disk IO bound
Works with
AWS tools
Works with
AWS tools
Message &
queue TTL
TTL on age or
queue size
Concerns
Limited amount
of supported
clients libs
Cross
datacenter
replication
Poor partition
tolerance
(documented)
Metrics &
insights need
custom code
All complex
logic in client
Maximum
payload 64kb
Performance/
availability
memory bound
Built in
management
via CLI only
Maximum life
cycle 24 hours
Maximum life
cycle 14 days
Drop acked
messages
Reliance on
Zookeeper
5 QPS/Shard Cost at scale
Verdict ? X X :)
https://aphyr.com/tags/jepsen
http://www.slideshare.net/adamw1pl/eval-repl-mq
16. BASIC QUEUE SETUP
The ‘Hello world!’ of Kafka
https://tammydotts.wordpress.com/2012/08/15/ikea-hangover/
17. KAFKA
At least once: Default
At most once: Client logic
http://www.thessdreview.com/daily-news/latest-buzz/current-ssd-projects-on-the-go/
http://kafka.apache.org/documentation.html#quickstart
http://blog.empathybox.com/post/62279088548/a-few-notes-on-kafka-and-jepsen
18. ZOOKEEPER
Stores state for Kafka Producers & Consumers
Use Netflix’ Exhibitor to manage it
http://www.amazon.com/If-Ran-Zoo-Classic-Seuss-ebook/dp/B00ESF29CA
https://zookeeper.apache.org
https://github.com/Netflix/exhibitor/wiki
24. RELAY DATA IN REALTIME
netcat & udp are your friends
CustomLog "|
/bin/nc -uj
localhost 9999
" response_time
https://code.google.com/p/openbsd-netcat/source/browse/trunk/netcat.c#73
25. TRANSPORT OPTIONS
We use PipeD (Krux OSS) on localhost
Also consider Heka, FluentD or Flume
http://www.launchgram.com/videogames/super-mario-bros-u/ https://github.com/krux/piped
42. ???
Web
Request in
DC2
User
Kafka in
DC2 & Krux
Mirror Maker
Web Server
User
Request
Logs
Kafka in
DC1
Transport &
Krux Stream
Listener
Message
Processing
Krux Kafka
Stdlib or
Stream
Processors
Web
Request in
DC2
User
Web Server
User
Request
Logs
Kafka in
DC1
Transport &
Krux Stream
Listener
Message
Processing
Krux Kafka
Stdlib or
Stream
Processors
REPLICATION
Optimize for central processing
48. Kafka in
DC2 & Krux
Mirror Maker
Web
Request in
DC2
User
Web Server
User
Request
Logs
Kafka in
DC1
Transport &
Krux Stream
Listener
Message
Processing
Krux Kafka
Stdlib or
Stream
Processors
Stats &
Insights
???
A PICTURE IS WORTH A
THOUSAND CLI CALLS
54. Kafka in
DC2 & Krux
Mirror Maker
Web
Request in
DC2
User
Web Server
User
Request
Logs
Kafka in
DC1
Transport &
Krux Stream
Listener
Message
Processing
Krux Kafka
Stdlib or
Stream
Processors
Stats &
Insights
Krux Kafka
Graphite
Kafka
Manager
THE FINAL PICTURE