Twitter's Real Time Stack - Processing Billions of Events Using Distributed Log and Heron

Twitter Real Time Stack
Processing Billions of Events Using
Distributed Log and Heron
Karthik
Ramasamy

Twi/er
@karthikz

3
Value of Data
It’s contextual
Value&of&Data&to&Decision/Making&
Time&
Preven8ve/&
Predic8ve&
Ac8onable&
Reac8ve&
Historical&
Real%&
Time&
Seconds& Minutes& Hours& Days&
Tradi8onal&“Batch”&&&&&&&&&&&&&&&
Business&&Intelligence&
Informa9on&Half%Life&
In&Decision%Making&
Months&
Time/cri8cal&
Decisions&
[1]
Courtesy
Michael
Franklin,
BIRTE,
2015.

4
What is Real-Time?
BATCH
high throughput
> 1 hour
monthly active users
relevance for ads
adhoc
queries
REAL TIME
low latency
< 1 ms
Financial
Trading
ad impressions count
hash tag trends
approximate
10 ms - 1 sec
Near Real
Time
latency sensitive
< 500 ms
fanout Tweets
search for Tweets
deterministic
workflows
OLTP
It’s contextual

5
Why Real Time?
G
Emerging break out
trends in Twitter (in the
form #hashtags)
Ü
Real time sports
conversations related
with a topic (recent goal
or touchdown)
!
Real time product
recommendations based
on your behavior &
profile
real time searchreal time trends real time conversations real time recommendations
Real time search of
tweets
s
ANALYZING BILLIONS OF EVENTS IN REAL TIME IS A CHALLENGE!

6
Real Time: Analytics
STREAMING
Analyze
data
as
it
is
being

produced
INTERACTIVE
Store
data
and
provide
results

instantly
when
a
query
is

posed
H
C

7
Real Time Use Cases
Online Services
10s of ms
Near Real Time
100s of ms
Data for Batch Analytics
secs to mins
TransacKon
log,
Queues,

RPCs
Change
propagaKon,

Streaming
analyKcs
Log
aggregaKon,
Client

events
I

8
Real Time Stack
Components: Many moving parts
TWITTER REAL
TIME
!
scribe
s
heron
J
Event
Bus
a
dlog
b

9
Scribe
Open source log aggregation
Originally
from
Facebook.
TwiRer

made
signiﬁcant
enhancements
for

real
Kme
event
aggregaKon
High throughput and scale
Delivers
125M
messages/min.

Provides
Kght
SLAs
on
data
reliability
Runs on every machine
Simple,
very
reliable
and
eﬃciently

uses
memory
and
CPU
!
{
"

Event
Bus
&
Distributed
Log
Next Generation Messaging
"

11
Twitter Messaging
Kestrel
Core
Business
Logic

(tweets,
fanouts
…)
Kestrel
HDFS
Kestrel
Book

Keeper
My
SQL Ka]a
Scribe
Deferred

RPC
Gizzard Database Search

12
Kestrel Limitations
Adding subscribers is expensive
Scales poorly as #queues increase
Durability is hard to achieve
Read-behind degrades performance
Too many random I/Os
Cross DC replication
!
#"
7!

13
Kafka Limitations
Relies on file system page cache
Performance degradation when subscribers
fall behind - too much random I/O
!
"

14
Rethinking Messaging
Durable writes, intra cluster and
geo-replication
Scale resources independently
Cost eﬀiciency
Unified Stack - tradeoﬀs for
various workloads
Multi tenancy
Ease of Manageability
!
#"
7!

15
Event Bus
Durable writes, intra cluster and
geo-replication
Scale resources independently
Cost eﬀiciency
Unified Stack - tradeoﬀs for
various workloads
Multi tenancy
Ease of Manageability
!
#"
7!

16
Event Bus - Pub-Sub
Write

Proxy
Read

Proxy
Publisher Subscriber
Metadata
Distributed

Log
Distributed
Log

17
Distributed Log
Write

Proxy
Read

Proxy
Publisher Subscriber
Metadata
Distributed

Log

18
Distributed Log @Twitter
01 02 03 04
Manhattan
Key Value Store
Durable Deferred
RPC
Real Time
Search Indexing
Pub Sub
System
/
.
-
,
05
/
Globally
Replicated Log

19
Distributed Log @Twitter
400
TB/Day

IN
10
PB/Day

OUT
2
Trillion
Events/Day

PROCESSED
100
MS

latency

ALGORITHMS
Mining
Streaming Data

Twi/er
Heron
Next Generation Streaming Engine
"

22
Better Storm
Twitter Heron
Container
Based
Architecture
Separate
Monitoring
and
Scheduling
-
Simpliﬁed
ExecuTon
Model
2
Much
Be/er
Performance$

23
Twitter Heron
Batching of tuples
AmorKzing
the
cost
of
transferring
tuples !
Task isolation
Ease
of
debug-‐ability/isolaKon/proﬁling
#Fully API compatible with Storm
Directed
acyclic
graph

Topologies,
Spouts
and
Bolts
"
Support for back pressure
Topologies
should
self
adjusKng
gUse of main stream languages
C++,
Java
and
Python !
Eﬀiciency
Reduce resource consumption
G
Design: Goals

24
Twitter Heron
Guaranteed
Message
Passing
Horizontal
Scalability
Robust
Fault
Tolerance
Concise
Code-Focus
on Logic
b Ñ /

25
Heron Terminology
Topology
Directed
acyclic
graph

verKces
=
computaKon,
and

edges
=
streams
of
data
tuples
Spouts
Sources
of
data
tuples
for
the
topology

Examples
-‐
Ka]a/Kestrel/MySQL/Postgres
Bolts
Process
incoming
tuples,
and
emit
outgoing
tuples

Examples
-‐
ﬁltering/aggregaKon/join/any
funcKon
,
%

26
Heron Topology
%
%
%
%
%
Spout 1
Spout 2
Bolt 1
Bolt 2
Bolt 3
Bolt 4
Bolt 5

27
Stream Groupings
01 02 03 04
Shuﬀle Grouping
Random distribution of tuples
Fields Grouping
Group tuples by a field or
multiple fields
All Grouping
Replicates tuples to all tasks
Global Grouping
Send the entire stream to one
task
/
.
-
,

28
Heron
Topology 1
Topology
Submission
Scheduler
Topology 2
Topology N
Architecture: High Level

29
Heron
Topology
Master
ZK
Cluster
Stream
Manager
I1 I2 I3 I4
Stream
Manager
I1 I2 I3 I4
Logical Plan,
Physical Plan and
Execution State
Sync Physical Plan
CONTAINER CONTAINER
Metrics
Manager
Metrics
Manager
Architecture: Topology

30
Heron
% %
S1 B2 B3
%
B4
Stream Manager: BackPressure

31
Stream Manager
S1 B2
B3
Stream
Manager
Stream
Manager
Stream
Manager
Stream
Manager
S1 B2
B3 B4
S1 B2
B3
S1 B2
B3 B4
B4
Stream Manager: BackPressure

S1 S1
S1S1S1 S1
S1S1
32
Heron
B2
B3
Stream
Manager
Stream
Manager
Stream
Manager
Stream
Manager
B2
B3 B4
B2
B3
B2
B3 B4
B4
Stream Manager: Spout BackPressure

33
Heron Use Cases
REALTIME
ETL
REAL TIME
BI
SPAM
DETECTION
REAL TIME
TRENDS
REALTIME
ML
REAL TIME
OPS

35
Heron @Twitter
1 stage 10 stages
3x reduction in cores and memory
Heron has been in production for 2 years

36
Heron
COMPONENTS EXPT #1 EXPT #2 EXPT #3
Spout 25 100 200
Bolt 25 100 200
# Heron containers 25 100 200
# Storm workers 25 100 200
Performance: Settings

37
Heron
Throughput CPU usage
milliontuples/min
0
2750
5500
8250
11000
Spout Parallelism
25 100 200
10,200
5,820
1,545
1,920
965249
Heron (paper) Heron (master)
#coresused
0
112.5
225
337.5
450
Spout Parallelism
25 100 200
397.5
217.5
54
261
137
32
Performance: Atmost Once
5 - 6x 1.4 -1.6x

38
Heronmilliontuples/min
0
10
20
30
40
Spout Parallelism
25 100 200
4-5x
Performance: CPU Usage

39
Heron @Twitter
>
400
Real

Time
Jobs
500
Billions
Events/Day

PROCESSED
25-‐200

MS

latency

41
Combining batch and real time
Lambda Architecture
New
Data
Client

42
Lambda Architecture - The Good
Event
BusScribe
CollecKon
Pipeline Heron
AnalyKcs
Pipeline Results

43
Lambda Architecture - The Bad
Have to fix everything (may be twice)!
How much Duct Tape required?
Have to write everything twice!
Subtle diﬀerences in semantics
What about Graphs, ML, SQL, etc?
!
#"
7!

44
Summingbird to the Rescue
Summingbird
Program
Scalding/Map
Reduce
HDFS
Message
broker
Heron
Topology
Online
key
value
result

store
Batch
key
value
result

store
Client

45
Curious to Learn More?
Twitter Heron: Stream Processing at Scale
Sanjeev Kulkarni, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg,
Sailesh Mittal, Jignesh M. Patel*,1
, Karthik Ramasamy, Siddarth Taneja
@sanjeevrk, @challenger_nik, @Louis_Fumaosong, @vikkyrk, @cckellogg,
@saileshmittal, @pateljm, @karthikz, @staneja
Twitter, Inc., *University of Wisconsin – Madison
ABSTRACT
Storm has long served as the main platform for real-time analytics
at Twitter. However, as the scale of data being processed in real-
time at Twitter has increased, along with an increase in the
diversity and the number of use cases, many limitations of Storm
have become apparent. We need a system that scales better, has
better debug-ability, has better performance, and is easier to
manage – all while working in a shared cluster infrastructure. We
considered various alternatives to meet these needs, and in the end
concluded that we needed to build a new real-time stream data
processing system. This paper presents the design and
implementation of this new system, called Heron. Heron is now
system process, which makes debugging very challenging. Thus, we
needed a cleaner mapping from the logical units of computation to
each physical process. The importance of such clean mapping for
debug-ability is really crucial when responding to pager alerts for a
failing topology, especially if it is a topology that is critical to the
underlying business model.
In addition, Storm needs dedicated cluster resources, which requires
special hardware allocation to run Storm topologies. This approach
leads to inefficiencies in using precious cluster resources, and also
limits the ability to scale on demand. We needed the ability to work
in a more flexible way with popular cluster scheduling software that
Storm @Twitter
Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M. Patel*, Sanjeev Kulkarni,
Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, Nikunj Bhagat, Sailesh Mittal, Dmitriy Ryaboy
@ankitoshniwal, @staneja, @amits, @karthikz, @pateljm, @sanjeevrk,
@jason_j, @krishnagade, @Louis_Fumaosong, @jakedonham, @challenger_nik, @saileshmittal, @squarecog
Twitter, Inc., *University of Wisconsin – Madison

46
Interested in Heron?
CONTRIBUTIONS ARE WELCOME!
https://github.com/twitter/heron
http://heronstreaming.io
HERON IS OPEN SOURCED
FOLLOW US @HERONSTREAMING

47
Interested in Distributed Log?
CONTRIBUTIONS ARE WELCOME!
https://github.com/twitter/heron
http://distributedlog.io
DISTRIBUTED LOG IS OPEN SOURCED
FOLLOW US @DISTRIBUTEDLOG

48
WHAT WHY WHERE WHEN WHO HOW
Any Question ???

THANKS
FOR
ATTENDING
!!!

Twitter's Real Time Stack - Processing Billions of Events Using Distributed Log and Heron

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Twitter's Real Time Stack - Processing Billions of Events Using Distributed Log and Heron

Similaire à Twitter's Real Time Stack - Processing Billions of Events Using Distributed Log and Heron (20)

Plus de Karthik Ramasamy

Plus de Karthik Ramasamy (12)

Dernier

Dernier (20)

Twitter's Real Time Stack - Processing Billions of Events Using Distributed Log and Heron