SlideShare une entreprise Scribd logo
1  sur  50
Télécharger pour lire hors ligne
Twitter Real Time Stack
Processing Billions of Events Using
Distributed Log and Heron
Karthik	
  Ramasamy	
  
Twi/er
@karthikz
2
3
Value of Data
It’s contextual
Value&of&Data&to&Decision/Making&
Time&
Preven8ve/&
Predic8ve&
Ac8onable&
Reac8ve&
Historical&
Real%&
Time&
Seconds& Minutes& Hours& Days&
Tradi8onal&“Batch”&&&&&&&&&&&&&&&
Business&&Intelligence&
Informa9on&Half%Life&
In&Decision%Making&
Months&
Time/cri8cal&
Decisions&
[1]	
  Courtesy	
  Michael	
  Franklin,	
  BIRTE,	
  2015.	
  
4
What is Real-Time?
BATCH
high throughput
> 1 hour
monthly active users
relevance for ads
adhoc
queries
REAL TIME
low latency
< 1 ms
Financial
Trading
ad impressions count
hash tag trends
approximate
10 ms - 1 sec
Near Real
Time
latency sensitive
< 500 ms
fanout Tweets
search for Tweets
deterministic
workflows
OLTP
It’s contextual
5
Why Real Time?
G
Emerging break out
trends in Twitter (in the
form #hashtags)
Ü
Real time sports
conversations related
with a topic (recent goal
or touchdown)
!
Real time product
recommendations based
on your behavior &
profile
real time searchreal time trends real time conversations real time recommendations
Real time search of
tweets
s
ANALYZING BILLIONS OF EVENTS IN REAL TIME IS A CHALLENGE!
6
Real Time: Analytics
STREAMING
Analyze	
  data	
  as	
  it	
  is	
  being	
  
produced
INTERACTIVE
Store	
  data	
  and	
  provide	
  results	
  
instantly	
   when	
   a	
   query	
   is	
  
posed
H
C
7
Real Time Use Cases
Online Services
10s of ms
Near Real Time
100s of ms
Data for Batch Analytics
secs to mins
TransacKon	
  log,	
  Queues,	
  
RPCs
Change	
  propagaKon,	
  
Streaming	
  analyKcs
Log	
  aggregaKon,	
  Client	
  
events
I
8
Real Time Stack
Components: Many moving parts
TWITTER REAL
TIME
!
scribe
s
heron
J
Event
Bus
a
dlog
b
9
Scribe
Open source log aggregation
Originally	
  from	
  Facebook.	
  TwiRer	
  
made	
  significant	
  enhancements	
  for	
  
real	
  Kme	
  event	
  aggregaKon
High throughput and scale
Delivers	
  125M	
  messages/min.	
  	
  
Provides	
  Kght	
  SLAs	
  on	
  data	
  reliability
Runs on every machine
Simple,	
  very	
  reliable	
  and	
  efficiently	
  
uses	
  memory	
  and	
  CPU
!
{
"
Event	
  Bus	
  &	
  Distributed	
  Log
Next Generation Messaging
"
11
Twitter Messaging
Kestrel
Core	
  Business	
  Logic	
  
(tweets,	
  fanouts	
  …)
Kestrel
HDFS
Kestrel
Book	
  
Keeper
My	
  SQL Ka]a
Scribe
Deferred	
  
RPC
Gizzard Database Search
12
Kestrel Limitations
Adding subscribers is expensive
Scales poorly as #queues increase
Durability is hard to achieve
Read-behind degrades performance
Too many random I/Os
Cross DC replication
!
#"
7!
13
Kafka Limitations
Relies on file system page cache
Performance degradation when subscribers
fall behind - too much random I/O
!
"
14
Rethinking Messaging
Durable writes, intra cluster and
geo-replication
Scale resources independently
Cost efficiency
Unified Stack - tradeoffs for
various workloads
Multi tenancy
Ease of Manageability
!
#"
7!
15
Event Bus
Durable writes, intra cluster and
geo-replication
Scale resources independently
Cost efficiency
Unified Stack - tradeoffs for
various workloads
Multi tenancy
Ease of Manageability
!
#"
7!
16
Event Bus - Pub-Sub
Write	
  
Proxy
Read	
  
Proxy
Publisher Subscriber
Metadata
Distributed	
  	
  
Log
Distributed	
  Log
17
Distributed Log
Write	
  
Proxy
Read	
  
Proxy
Publisher Subscriber
Metadata
Distributed	
  	
  
Log
18
Distributed Log @Twitter
01 02 03 04
Manhattan
Key Value Store
Durable Deferred
RPC
Real Time
Search Indexing
Pub Sub
System
/
.
-
,
05
/
Globally
Replicated Log
19
Distributed Log @Twitter
400	
  TB/Day	
  
IN
10	
  PB/Day	
  	
  
OUT
2	
  Trillion	
  Events/Day	
  
PROCESSED
100	
  MS	
  
latency
ALGORITHMS
Mining
Streaming Data
Twi/er	
  Heron
Next Generation Streaming Engine
"
22
Better Storm
Twitter Heron
Container	
  Based	
  Architecture
Separate	
  Monitoring	
  and	
  Scheduling
-
Simplified	
  ExecuTon	
  Model
2
Much	
  Be/er	
  Performance$
23
Twitter Heron
Batching of tuples
AmorKzing	
  the	
  cost	
  of	
  transferring	
  tuples !
Task isolation
Ease	
  of	
  debug-­‐ability/isolaKon/profiling
#Fully API compatible with Storm
Directed	
  acyclic	
  graph	
  	
  
	
  Topologies,	
  Spouts	
  and	
  Bolts
"
Support for back pressure
Topologies	
  should	
  self	
  adjusKng
gUse of main stream languages
C++,	
  Java	
  and	
  Python !
Efficiency
Reduce resource consumption
G
Design: Goals
24
Twitter Heron
Guaranteed
Message
Passing
Horizontal
Scalability
Robust
Fault
Tolerance
Concise
Code-Focus
on Logic
b  Ñ /
25
Heron Terminology
Topology
Directed	
  acyclic	
  graph	
  	
  
verKces	
  =	
  computaKon,	
  and	
  	
  
edges	
  =	
  streams	
  of	
  data	
  tuples
Spouts
Sources	
  of	
  data	
  tuples	
  for	
  the	
  topology	
  
Examples	
  -­‐	
  Ka]a/Kestrel/MySQL/Postgres
Bolts
Process	
  incoming	
  tuples,	
  and	
  emit	
  outgoing	
  tuples	
  
Examples	
  -­‐	
  filtering/aggregaKon/join/any	
  funcKon
,
%
26
Heron Topology
%
%
%
%
%
Spout 1
Spout 2
Bolt 1
Bolt 2
Bolt 3
Bolt 4
Bolt 5
27
Stream Groupings
01 02 03 04
Shuffle Grouping
Random distribution of tuples
Fields Grouping
Group tuples by a field or
multiple fields
All Grouping
Replicates tuples to all tasks
Global Grouping
Send the entire stream to one
task
/
.
-
,
28
Heron
Topology 1
Topology
Submission
Scheduler
Topology 2
Topology N
Architecture: High Level
29
Heron
Topology
Master
ZK
Cluster
Stream
Manager
I1 I2 I3 I4
Stream
Manager
I1 I2 I3 I4
Logical Plan,
Physical Plan and
Execution State
Sync Physical Plan
CONTAINER CONTAINER
Metrics
Manager
Metrics
Manager
Architecture: Topology
30
Heron
% %
S1 B2 B3
%
B4
Stream Manager: BackPressure
31
Stream Manager
S1 B2
B3
Stream
Manager
Stream
Manager
Stream
Manager
Stream
Manager
S1 B2
B3 B4
S1 B2
B3
S1 B2
B3 B4
B4
Stream Manager: BackPressure
S1 S1
S1S1S1 S1
S1S1
32
Heron
B2
B3
Stream
Manager
Stream
Manager
Stream
Manager
Stream
Manager
B2
B3 B4
B2
B3
B2
B3 B4
B4
Stream Manager: Spout BackPressure
33
Heron Use Cases
REALTIME
ETL
REAL TIME
BI
SPAM
DETECTION
REAL TIME
TRENDS
REALTIME
ML
REAL TIME
OPS
34
Heron
Sample Topologies
35
Heron @Twitter
1 stage 10 stages
3x reduction in cores and memory
Heron has been in production for 2 years
36
Heron
COMPONENTS EXPT #1 EXPT #2 EXPT #3
Spout 25 100 200
Bolt 25 100 200
# Heron containers 25 100 200
# Storm workers 25 100 200
Performance: Settings
37
Heron
Throughput CPU usage
milliontuples/min
0
2750
5500
8250
11000
Spout Parallelism
25 100 200
10,200
5,820
1,545
1,920
965249
Heron (paper) Heron (master)
#coresused
0
112.5
225
337.5
450
Spout Parallelism
25 100 200
397.5
217.5
54
261
137
32
Heron (paper) Heron (master)
Performance: Atmost Once
5 - 6x 1.4 -1.6x
38
Heronmilliontuples/min
0
10
20
30
40
Spout Parallelism
25 100 200
Heron (paper) Heron (master)
4-5x
Performance: CPU Usage
39
Heron @Twitter
>	
  400	
  Real	
  
Time	
  Jobs
500	
  Billions	
  Events/Day	
  
PROCESSED
25-­‐200	
  
MS	
  
latency
Tying	
  Together
"
41
Combining batch and real time
Lambda Architecture
New	
  Data
Client
42
Lambda Architecture - The Good
Event	
  BusScribe	
  CollecKon	
  Pipeline Heron	
  AnalyKcs	
  Pipeline Results
43
Lambda Architecture - The Bad
Have to fix everything (may be twice)!
How much Duct Tape required?
Have to write everything twice!
Subtle differences in semantics
What about Graphs, ML, SQL, etc?
!
#"
7!
44
Summingbird to the Rescue
Summingbird	
  Program
Scalding/Map	
  Reduce
HDFS
Message	
  broker
Heron	
  Topology
Online	
  key	
  value	
  result	
  
store
Batch	
  key	
  value	
  result	
  
store
Client
45
Curious to Learn More?
Twitter Heron: Stream Processing at Scale
Sanjeev Kulkarni, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg,
Sailesh Mittal, Jignesh M. Patel*,1
, Karthik Ramasamy, Siddarth Taneja
@sanjeevrk, @challenger_nik, @Louis_Fumaosong, @vikkyrk, @cckellogg,
@saileshmittal, @pateljm, @karthikz, @staneja
Twitter, Inc., *University of Wisconsin – Madison
ABSTRACT
Storm has long served as the main platform for real-time analytics
at Twitter. However, as the scale of data being processed in real-
time at Twitter has increased, along with an increase in the
diversity and the number of use cases, many limitations of Storm
have become apparent. We need a system that scales better, has
better debug-ability, has better performance, and is easier to
manage – all while working in a shared cluster infrastructure. We
considered various alternatives to meet these needs, and in the end
concluded that we needed to build a new real-time stream data
processing system. This paper presents the design and
implementation of this new system, called Heron. Heron is now
system process, which makes debugging very challenging. Thus, we
needed a cleaner mapping from the logical units of computation to
each physical process. The importance of such clean mapping for
debug-ability is really crucial when responding to pager alerts for a
failing topology, especially if it is a topology that is critical to the
underlying business model.
In addition, Storm needs dedicated cluster resources, which requires
special hardware allocation to run Storm topologies. This approach
leads to inefficiencies in using precious cluster resources, and also
limits the ability to scale on demand. We needed the ability to work
in a more flexible way with popular cluster scheduling software that
Storm @Twitter
Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M. Patel*, Sanjeev Kulkarni,
Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, Nikunj Bhagat, Sailesh Mittal, Dmitriy Ryaboy
@ankitoshniwal, @staneja, @amits, @karthikz, @pateljm, @sanjeevrk,
@jason_j, @krishnagade, @Louis_Fumaosong, @jakedonham, @challenger_nik, @saileshmittal, @squarecog
Twitter, Inc., *University of Wisconsin – Madison
46
Interested in Heron?
CONTRIBUTIONS ARE WELCOME!
https://github.com/twitter/heron
http://heronstreaming.io
HERON IS OPEN SOURCED
FOLLOW US @HERONSTREAMING
47
Interested in Distributed Log?
CONTRIBUTIONS ARE WELCOME!
https://github.com/twitter/heron
http://distributedlog.io
DISTRIBUTED LOG IS OPEN SOURCED
FOLLOW US @DISTRIBUTEDLOG
48
WHAT WHY WHERE WHEN WHO HOW
Any Question ???
49
@karthikz
Get in Touch
THANKS	
  FOR	
  ATTENDING	
  !!!

Contenu connexe

Tendances

Virtual Flink Forward 2020: Lessons learned on Apache Flink application avail...
Virtual Flink Forward 2020: Lessons learned on Apache Flink application avail...Virtual Flink Forward 2020: Lessons learned on Apache Flink application avail...
Virtual Flink Forward 2020: Lessons learned on Apache Flink application avail...Flink Forward
 
Analyzing Historical Data of Applications on YARN for Fun and Profit
Analyzing Historical Data of Applications on YARN for Fun and ProfitAnalyzing Historical Data of Applications on YARN for Fun and Profit
Analyzing Historical Data of Applications on YARN for Fun and ProfitDataWorks Summit
 
Deploy Secure and Scalable Services Across Kubernetes Clusters with NATS
Deploy Secure and Scalable Services Across Kubernetes Clusters with NATSDeploy Secure and Scalable Services Across Kubernetes Clusters with NATS
Deploy Secure and Scalable Services Across Kubernetes Clusters with NATSNATS
 
Apache Pulsar: The Next Generation Messaging and Queuing System
Apache Pulsar: The Next Generation Messaging and Queuing SystemApache Pulsar: The Next Generation Messaging and Queuing System
Apache Pulsar: The Next Generation Messaging and Queuing SystemDatabricks
 
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...Imply
 
Data Loss and Duplication in Kafka
Data Loss and Duplication in KafkaData Loss and Duplication in Kafka
Data Loss and Duplication in KafkaJayesh Thakrar
 
IIJ Technical DAY 2019 ~ IIJのサーバインフラはここまでやっています
IIJ Technical DAY 2019 ~ IIJのサーバインフラはここまでやっていますIIJ Technical DAY 2019 ~ IIJのサーバインフラはここまでやっています
IIJ Technical DAY 2019 ~ IIJのサーバインフラはここまでやっていますIIJ
 
Meet up roadmap cloudera 2020 - janeiro
Meet up   roadmap cloudera 2020 - janeiroMeet up   roadmap cloudera 2020 - janeiro
Meet up roadmap cloudera 2020 - janeiroThiago Santiago
 
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...Databricks
 
Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkDataWorks Summit
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin PodvalMartin Podval
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaAngelo Cesaro
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®confluent
 
Amazon Aurora Storage Demystified: How It All Works (DAT363) - AWS re:Invent ...
Amazon Aurora Storage Demystified: How It All Works (DAT363) - AWS re:Invent ...Amazon Aurora Storage Demystified: How It All Works (DAT363) - AWS re:Invent ...
Amazon Aurora Storage Demystified: How It All Works (DAT363) - AWS re:Invent ...Amazon Web Services
 
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)Shivji Kumar Jha
 
Extending Druid Index File
Extending Druid Index FileExtending Druid Index File
Extending Druid Index FileNavis Ryu
 
Building an Event Streaming Architecture with Apache Pulsar
Building an Event Streaming Architecture with Apache PulsarBuilding an Event Streaming Architecture with Apache Pulsar
Building an Event Streaming Architecture with Apache PulsarScyllaDB
 

Tendances (20)

Virtual Flink Forward 2020: Lessons learned on Apache Flink application avail...
Virtual Flink Forward 2020: Lessons learned on Apache Flink application avail...Virtual Flink Forward 2020: Lessons learned on Apache Flink application avail...
Virtual Flink Forward 2020: Lessons learned on Apache Flink application avail...
 
Analyzing Historical Data of Applications on YARN for Fun and Profit
Analyzing Historical Data of Applications on YARN for Fun and ProfitAnalyzing Historical Data of Applications on YARN for Fun and Profit
Analyzing Historical Data of Applications on YARN for Fun and Profit
 
Deploy Secure and Scalable Services Across Kubernetes Clusters with NATS
Deploy Secure and Scalable Services Across Kubernetes Clusters with NATSDeploy Secure and Scalable Services Across Kubernetes Clusters with NATS
Deploy Secure and Scalable Services Across Kubernetes Clusters with NATS
 
Apache Pulsar: The Next Generation Messaging and Queuing System
Apache Pulsar: The Next Generation Messaging and Queuing SystemApache Pulsar: The Next Generation Messaging and Queuing System
Apache Pulsar: The Next Generation Messaging and Queuing System
 
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
 
FASTER Key-Value Store and Log
FASTER Key-Value Store and LogFASTER Key-Value Store and Log
FASTER Key-Value Store and Log
 
Data Loss and Duplication in Kafka
Data Loss and Duplication in KafkaData Loss and Duplication in Kafka
Data Loss and Duplication in Kafka
 
IIJ Technical DAY 2019 ~ IIJのサーバインフラはここまでやっています
IIJ Technical DAY 2019 ~ IIJのサーバインフラはここまでやっていますIIJ Technical DAY 2019 ~ IIJのサーバインフラはここまでやっています
IIJ Technical DAY 2019 ~ IIJのサーバインフラはここまでやっています
 
Meet up roadmap cloudera 2020 - janeiro
Meet up   roadmap cloudera 2020 - janeiroMeet up   roadmap cloudera 2020 - janeiro
Meet up roadmap cloudera 2020 - janeiro
 
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
 
Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache Flink
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache Kafka
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®
 
Fluentd 101
Fluentd 101Fluentd 101
Fluentd 101
 
Amazon Aurora Storage Demystified: How It All Works (DAT363) - AWS re:Invent ...
Amazon Aurora Storage Demystified: How It All Works (DAT363) - AWS re:Invent ...Amazon Aurora Storage Demystified: How It All Works (DAT363) - AWS re:Invent ...
Amazon Aurora Storage Demystified: How It All Works (DAT363) - AWS re:Invent ...
 
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
How pulsar stores data at Pulsar-na-summit-2021.pptx (1)
 
Extending Druid Index File
Extending Druid Index FileExtending Druid Index File
Extending Druid Index File
 
Building an Event Streaming Architecture with Apache Pulsar
Building an Event Streaming Architecture with Apache PulsarBuilding an Event Streaming Architecture with Apache Pulsar
Building an Event Streaming Architecture with Apache Pulsar
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 

En vedette

Storm@Twitter, SIGMOD 2014 paper
Storm@Twitter, SIGMOD 2014 paperStorm@Twitter, SIGMOD 2014 paper
Storm@Twitter, SIGMOD 2014 paperKarthik Ramasamy
 
Rihards Olups - Zabbix log management
Rihards Olups - Zabbix log managementRihards Olups - Zabbix log management
Rihards Olups - Zabbix log managementZabbix
 
Back to the future with C++ and Seastar
Back to the future with C++ and SeastarBack to the future with C++ and Seastar
Back to the future with C++ and SeastarTzach Livyatan
 
Cloud PARTE: Elastic Complex Event Processing based on Mobile Actors
Cloud PARTE: Elastic Complex Event Processing based on Mobile ActorsCloud PARTE: Elastic Complex Event Processing based on Mobile Actors
Cloud PARTE: Elastic Complex Event Processing based on Mobile ActorsStefan Marr
 
Performance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterPerformance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterScyllaDB
 
#TwitterRealTime - Real time processing @twitter
#TwitterRealTime - Real time processing @twitter#TwitterRealTime - Real time processing @twitter
#TwitterRealTime - Real time processing @twitterTwitter Developers
 
Graph Stream Processing : spinning fast, large scale, complex analytics
Graph Stream Processing : spinning fast, large scale, complex analyticsGraph Stream Processing : spinning fast, large scale, complex analytics
Graph Stream Processing : spinning fast, large scale, complex analyticsParis Carbone
 
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache FlinkGelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache FlinkVasia Kalavri
 
Graph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPopGraph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPopJason Plurad
 
Gelly in Apache Flink Bay Area Meetup
Gelly in Apache Flink Bay Area MeetupGelly in Apache Flink Bay Area Meetup
Gelly in Apache Flink Bay Area MeetupVasia Kalavri
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming AnalyticsGuido Schmutz
 
Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived Vinoth Chandar
 
Kinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-diveKinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-diveYifeng Jiang
 
Converting Relational to Graph Databases
Converting Relational to Graph DatabasesConverting Relational to Graph Databases
Converting Relational to Graph DatabasesAntonio Maccioni
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineMonal Daxini
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per SecondAmazon Web Services
 

En vedette (20)

Storm@Twitter, SIGMOD 2014 paper
Storm@Twitter, SIGMOD 2014 paperStorm@Twitter, SIGMOD 2014 paper
Storm@Twitter, SIGMOD 2014 paper
 
Rihards Olups - Zabbix log management
Rihards Olups - Zabbix log managementRihards Olups - Zabbix log management
Rihards Olups - Zabbix log management
 
Back to the future with C++ and Seastar
Back to the future with C++ and SeastarBack to the future with C++ and Seastar
Back to the future with C++ and Seastar
 
Flink. Pure Streaming
Flink. Pure StreamingFlink. Pure Streaming
Flink. Pure Streaming
 
Cloud PARTE: Elastic Complex Event Processing based on Mobile Actors
Cloud PARTE: Elastic Complex Event Processing based on Mobile ActorsCloud PARTE: Elastic Complex Event Processing based on Mobile Actors
Cloud PARTE: Elastic Complex Event Processing based on Mobile Actors
 
Performance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterPerformance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla Cluster
 
#TwitterRealTime - Real time processing @twitter
#TwitterRealTime - Real time processing @twitter#TwitterRealTime - Real time processing @twitter
#TwitterRealTime - Real time processing @twitter
 
Graph Stream Processing : spinning fast, large scale, complex analytics
Graph Stream Processing : spinning fast, large scale, complex analyticsGraph Stream Processing : spinning fast, large scale, complex analytics
Graph Stream Processing : spinning fast, large scale, complex analytics
 
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache FlinkGelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
 
Graph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPopGraph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPop
 
Gelly in Apache Flink Bay Area Meetup
Gelly in Apache Flink Bay Area MeetupGelly in Apache Flink Bay Area Meetup
Gelly in Apache Flink Bay Area Meetup
 
ETL into Neo4j
ETL into Neo4jETL into Neo4j
ETL into Neo4j
 
20170126 big data processing
20170126 big data processing20170126 big data processing
20170126 big data processing
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming Analytics
 
Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived
 
Kinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-diveKinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-dive
 
Converting Relational to Graph Databases
Converting Relational to Graph DatabasesConverting Relational to Graph Databases
Converting Relational to Graph Databases
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipeline
 
Real-Time Event Processing
Real-Time Event ProcessingReal-Time Event Processing
Real-Time Event Processing
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
 

Similaire à Twitter's Real Time Stack - Processing Billions of Events Using Distributed Log and Heron

Real Time Processing Using Twitter Heron by Karthik Ramasamy
Real Time Processing Using Twitter Heron by Karthik RamasamyReal Time Processing Using Twitter Heron by Karthik Ramasamy
Real Time Processing Using Twitter Heron by Karthik RamasamyData Con LA
 
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...Data Con LA
 
[Kubecon 2017 Austin, TX] How We Built a Framework at Twitter to Solve Servic...
[Kubecon 2017 Austin, TX] How We Built a Framework at Twitter to Solve Servic...[Kubecon 2017 Austin, TX] How We Built a Framework at Twitter to Solve Servic...
[Kubecon 2017 Austin, TX] How We Built a Framework at Twitter to Solve Servic...Vinu Charanya
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...confluent
 
times ten in-memory database for extreme performance
times ten in-memory database for extreme performancetimes ten in-memory database for extreme performance
times ten in-memory database for extreme performanceOracle Korea
 
Drinking from the Firehose - Real-time Metrics
Drinking from the Firehose - Real-time MetricsDrinking from the Firehose - Real-time Metrics
Drinking from the Firehose - Real-time MetricsSamantha Quiñones
 
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...HostedbyConfluent
 
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...Soroosh Khodami
 
The hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaThe hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaAlluxio, Inc.
 
Spark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike FreedmanSpark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike FreedmanSpark Summit
 
Realtime Analytics on AWS
Realtime Analytics on AWSRealtime Analytics on AWS
Realtime Analytics on AWSSungmin Kim
 
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...Sumeet Singh
 
Times ten 18.1_overview_meetup
Times ten 18.1_overview_meetupTimes ten 18.1_overview_meetup
Times ten 18.1_overview_meetupByung Ho Lee
 
Large Scale Feature Aggregation Using Apache Spark with Pulkit Bhanot and Ami...
Large Scale Feature Aggregation Using Apache Spark with Pulkit Bhanot and Ami...Large Scale Feature Aggregation Using Apache Spark with Pulkit Bhanot and Ami...
Large Scale Feature Aggregation Using Apache Spark with Pulkit Bhanot and Ami...Databricks
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022StreamNative
 
Become a Performance Diagnostics Hero
Become a Performance Diagnostics HeroBecome a Performance Diagnostics Hero
Become a Performance Diagnostics HeroTechWell
 
Top Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your PipelineTop Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your PipelineAndreas Grabner
 
Complex Event Processing: What?, Why?, How?
Complex Event Processing: What?, Why?, How?Complex Event Processing: What?, Why?, How?
Complex Event Processing: What?, Why?, How?Alexandre Vasseur
 

Similaire à Twitter's Real Time Stack - Processing Billions of Events Using Distributed Log and Heron (20)

Handson with Twitter Heron
Handson with Twitter HeronHandson with Twitter Heron
Handson with Twitter Heron
 
Real Time Processing Using Twitter Heron by Karthik Ramasamy
Real Time Processing Using Twitter Heron by Karthik RamasamyReal Time Processing Using Twitter Heron by Karthik Ramasamy
Real Time Processing Using Twitter Heron by Karthik Ramasamy
 
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...
 
[Kubecon 2017 Austin, TX] How We Built a Framework at Twitter to Solve Servic...
[Kubecon 2017 Austin, TX] How We Built a Framework at Twitter to Solve Servic...[Kubecon 2017 Austin, TX] How We Built a Framework at Twitter to Solve Servic...
[Kubecon 2017 Austin, TX] How We Built a Framework at Twitter to Solve Servic...
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
times ten in-memory database for extreme performance
times ten in-memory database for extreme performancetimes ten in-memory database for extreme performance
times ten in-memory database for extreme performance
 
Drinking from the Firehose - Real-time Metrics
Drinking from the Firehose - Real-time MetricsDrinking from the Firehose - Real-time Metrics
Drinking from the Firehose - Real-time Metrics
 
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
 
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
 
The hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaThe hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at Helixa
 
Spark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike FreedmanSpark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike Freedman
 
Realtime Analytics on AWS
Realtime Analytics on AWSRealtime Analytics on AWS
Realtime Analytics on AWS
 
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
Strata Conference + Hadoop World NY 2016: Lessons learned building a scalable...
 
Times ten 18.1_overview_meetup
Times ten 18.1_overview_meetupTimes ten 18.1_overview_meetup
Times ten 18.1_overview_meetup
 
Large Scale Feature Aggregation Using Apache Spark with Pulkit Bhanot and Ami...
Large Scale Feature Aggregation Using Apache Spark with Pulkit Bhanot and Ami...Large Scale Feature Aggregation Using Apache Spark with Pulkit Bhanot and Ami...
Large Scale Feature Aggregation Using Apache Spark with Pulkit Bhanot and Ami...
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
 
Malstone KDD 2010
Malstone KDD 2010Malstone KDD 2010
Malstone KDD 2010
 
Become a Performance Diagnostics Hero
Become a Performance Diagnostics HeroBecome a Performance Diagnostics Hero
Become a Performance Diagnostics Hero
 
Top Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your PipelineTop Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your Pipeline
 
Complex Event Processing: What?, Why?, How?
Complex Event Processing: What?, Why?, How?Complex Event Processing: What?, Why?, How?
Complex Event Processing: What?, Why?, How?
 

Plus de Karthik Ramasamy

Scaling Apache Pulsar to 10 PB/day
Scaling Apache Pulsar to 10 PB/dayScaling Apache Pulsar to 10 PB/day
Scaling Apache Pulsar to 10 PB/dayKarthik Ramasamy
 
Pulsar summit-keynote-final
Pulsar summit-keynote-finalPulsar summit-keynote-final
Pulsar summit-keynote-finalKarthik Ramasamy
 
Apache Pulsar Seattle - Meetup
Apache Pulsar Seattle - MeetupApache Pulsar Seattle - Meetup
Apache Pulsar Seattle - MeetupKarthik Ramasamy
 
Unifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
Unifying Messaging, Queueing & Light Weight Compute Using Apache PulsarUnifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
Unifying Messaging, Queueing & Light Weight Compute Using Apache PulsarKarthik Ramasamy
 
Creating Data Fabric for #IOT with Apache Pulsar
Creating Data Fabric for #IOT with Apache PulsarCreating Data Fabric for #IOT with Apache Pulsar
Creating Data Fabric for #IOT with Apache PulsarKarthik Ramasamy
 
Linked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache PulsarLinked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache PulsarKarthik Ramasamy
 
Exactly once in Apache Heron
Exactly once in Apache HeronExactly once in Apache Heron
Exactly once in Apache HeronKarthik Ramasamy
 
Tutorial - Modern Real Time Streaming Architectures
Tutorial - Modern Real Time Streaming ArchitecturesTutorial - Modern Real Time Streaming Architectures
Tutorial - Modern Real Time Streaming ArchitecturesKarthik Ramasamy
 
Streaming Pipelines in Kubernetes Using Apache Pulsar, Heron and BookKeeper
Streaming Pipelines in Kubernetes Using Apache Pulsar, Heron and BookKeeperStreaming Pipelines in Kubernetes Using Apache Pulsar, Heron and BookKeeper
Streaming Pipelines in Kubernetes Using Apache Pulsar, Heron and BookKeeperKarthik Ramasamy
 
Storm@Twitter, SIGMOD 2014
Storm@Twitter, SIGMOD 2014Storm@Twitter, SIGMOD 2014
Storm@Twitter, SIGMOD 2014Karthik Ramasamy
 

Plus de Karthik Ramasamy (12)

Scaling Apache Pulsar to 10 PB/day
Scaling Apache Pulsar to 10 PB/dayScaling Apache Pulsar to 10 PB/day
Scaling Apache Pulsar to 10 PB/day
 
Apache Pulsar @Splunk
Apache Pulsar @SplunkApache Pulsar @Splunk
Apache Pulsar @Splunk
 
Pulsar summit-keynote-final
Pulsar summit-keynote-finalPulsar summit-keynote-final
Pulsar summit-keynote-final
 
Apache Pulsar Seattle - Meetup
Apache Pulsar Seattle - MeetupApache Pulsar Seattle - Meetup
Apache Pulsar Seattle - Meetup
 
Unifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
Unifying Messaging, Queueing & Light Weight Compute Using Apache PulsarUnifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
Unifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
 
Creating Data Fabric for #IOT with Apache Pulsar
Creating Data Fabric for #IOT with Apache PulsarCreating Data Fabric for #IOT with Apache Pulsar
Creating Data Fabric for #IOT with Apache Pulsar
 
Linked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache PulsarLinked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache Pulsar
 
Exactly once in Apache Heron
Exactly once in Apache HeronExactly once in Apache Heron
Exactly once in Apache Heron
 
Tutorial - Modern Real Time Streaming Architectures
Tutorial - Modern Real Time Streaming ArchitecturesTutorial - Modern Real Time Streaming Architectures
Tutorial - Modern Real Time Streaming Architectures
 
Streaming Pipelines in Kubernetes Using Apache Pulsar, Heron and BookKeeper
Streaming Pipelines in Kubernetes Using Apache Pulsar, Heron and BookKeeperStreaming Pipelines in Kubernetes Using Apache Pulsar, Heron and BookKeeper
Streaming Pipelines in Kubernetes Using Apache Pulsar, Heron and BookKeeper
 
Modern Data Pipelines
Modern Data PipelinesModern Data Pipelines
Modern Data Pipelines
 
Storm@Twitter, SIGMOD 2014
Storm@Twitter, SIGMOD 2014Storm@Twitter, SIGMOD 2014
Storm@Twitter, SIGMOD 2014
 

Dernier

WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benonimasabamasaba
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
WSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AIWSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AIWSO2
 
tonesoftg
tonesoftgtonesoftg
tonesoftglanshi9
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxAnnaArtyushina1
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba
 

Dernier (20)

WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
WSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AIWSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AI
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 

Twitter's Real Time Stack - Processing Billions of Events Using Distributed Log and Heron

  • 1. Twitter Real Time Stack Processing Billions of Events Using Distributed Log and Heron Karthik  Ramasamy   Twi/er @karthikz
  • 2. 2
  • 3. 3 Value of Data It’s contextual Value&of&Data&to&Decision/Making& Time& Preven8ve/& Predic8ve& Ac8onable& Reac8ve& Historical& Real%& Time& Seconds& Minutes& Hours& Days& Tradi8onal&“Batch”&&&&&&&&&&&&&&& Business&&Intelligence& Informa9on&Half%Life& In&Decision%Making& Months& Time/cri8cal& Decisions& [1]  Courtesy  Michael  Franklin,  BIRTE,  2015.  
  • 4. 4 What is Real-Time? BATCH high throughput > 1 hour monthly active users relevance for ads adhoc queries REAL TIME low latency < 1 ms Financial Trading ad impressions count hash tag trends approximate 10 ms - 1 sec Near Real Time latency sensitive < 500 ms fanout Tweets search for Tweets deterministic workflows OLTP It’s contextual
  • 5. 5 Why Real Time? G Emerging break out trends in Twitter (in the form #hashtags) Ü Real time sports conversations related with a topic (recent goal or touchdown) ! Real time product recommendations based on your behavior & profile real time searchreal time trends real time conversations real time recommendations Real time search of tweets s ANALYZING BILLIONS OF EVENTS IN REAL TIME IS A CHALLENGE!
  • 6. 6 Real Time: Analytics STREAMING Analyze  data  as  it  is  being   produced INTERACTIVE Store  data  and  provide  results   instantly   when   a   query   is   posed H C
  • 7. 7 Real Time Use Cases Online Services 10s of ms Near Real Time 100s of ms Data for Batch Analytics secs to mins TransacKon  log,  Queues,   RPCs Change  propagaKon,   Streaming  analyKcs Log  aggregaKon,  Client   events I
  • 8. 8 Real Time Stack Components: Many moving parts TWITTER REAL TIME ! scribe s heron J Event Bus a dlog b
  • 9. 9 Scribe Open source log aggregation Originally  from  Facebook.  TwiRer   made  significant  enhancements  for   real  Kme  event  aggregaKon High throughput and scale Delivers  125M  messages/min.     Provides  Kght  SLAs  on  data  reliability Runs on every machine Simple,  very  reliable  and  efficiently   uses  memory  and  CPU ! { "
  • 10. Event  Bus  &  Distributed  Log Next Generation Messaging "
  • 11. 11 Twitter Messaging Kestrel Core  Business  Logic   (tweets,  fanouts  …) Kestrel HDFS Kestrel Book   Keeper My  SQL Ka]a Scribe Deferred   RPC Gizzard Database Search
  • 12. 12 Kestrel Limitations Adding subscribers is expensive Scales poorly as #queues increase Durability is hard to achieve Read-behind degrades performance Too many random I/Os Cross DC replication ! #" 7!
  • 13. 13 Kafka Limitations Relies on file system page cache Performance degradation when subscribers fall behind - too much random I/O ! "
  • 14. 14 Rethinking Messaging Durable writes, intra cluster and geo-replication Scale resources independently Cost efficiency Unified Stack - tradeoffs for various workloads Multi tenancy Ease of Manageability ! #" 7!
  • 15. 15 Event Bus Durable writes, intra cluster and geo-replication Scale resources independently Cost efficiency Unified Stack - tradeoffs for various workloads Multi tenancy Ease of Manageability ! #" 7!
  • 16. 16 Event Bus - Pub-Sub Write   Proxy Read   Proxy Publisher Subscriber Metadata Distributed     Log Distributed  Log
  • 17. 17 Distributed Log Write   Proxy Read   Proxy Publisher Subscriber Metadata Distributed     Log
  • 18. 18 Distributed Log @Twitter 01 02 03 04 Manhattan Key Value Store Durable Deferred RPC Real Time Search Indexing Pub Sub System / . - , 05 / Globally Replicated Log
  • 19. 19 Distributed Log @Twitter 400  TB/Day   IN 10  PB/Day     OUT 2  Trillion  Events/Day   PROCESSED 100  MS   latency
  • 21. Twi/er  Heron Next Generation Streaming Engine "
  • 22. 22 Better Storm Twitter Heron Container  Based  Architecture Separate  Monitoring  and  Scheduling - Simplified  ExecuTon  Model 2 Much  Be/er  Performance$
  • 23. 23 Twitter Heron Batching of tuples AmorKzing  the  cost  of  transferring  tuples ! Task isolation Ease  of  debug-­‐ability/isolaKon/profiling #Fully API compatible with Storm Directed  acyclic  graph      Topologies,  Spouts  and  Bolts " Support for back pressure Topologies  should  self  adjusKng gUse of main stream languages C++,  Java  and  Python ! Efficiency Reduce resource consumption G Design: Goals
  • 25. 25 Heron Terminology Topology Directed  acyclic  graph     verKces  =  computaKon,  and     edges  =  streams  of  data  tuples Spouts Sources  of  data  tuples  for  the  topology   Examples  -­‐  Ka]a/Kestrel/MySQL/Postgres Bolts Process  incoming  tuples,  and  emit  outgoing  tuples   Examples  -­‐  filtering/aggregaKon/join/any  funcKon , %
  • 26. 26 Heron Topology % % % % % Spout 1 Spout 2 Bolt 1 Bolt 2 Bolt 3 Bolt 4 Bolt 5
  • 27. 27 Stream Groupings 01 02 03 04 Shuffle Grouping Random distribution of tuples Fields Grouping Group tuples by a field or multiple fields All Grouping Replicates tuples to all tasks Global Grouping Send the entire stream to one task / . - ,
  • 29. 29 Heron Topology Master ZK Cluster Stream Manager I1 I2 I3 I4 Stream Manager I1 I2 I3 I4 Logical Plan, Physical Plan and Execution State Sync Physical Plan CONTAINER CONTAINER Metrics Manager Metrics Manager Architecture: Topology
  • 30. 30 Heron % % S1 B2 B3 % B4 Stream Manager: BackPressure
  • 31. 31 Stream Manager S1 B2 B3 Stream Manager Stream Manager Stream Manager Stream Manager S1 B2 B3 B4 S1 B2 B3 S1 B2 B3 B4 B4 Stream Manager: BackPressure
  • 33. 33 Heron Use Cases REALTIME ETL REAL TIME BI SPAM DETECTION REAL TIME TRENDS REALTIME ML REAL TIME OPS
  • 35. 35 Heron @Twitter 1 stage 10 stages 3x reduction in cores and memory Heron has been in production for 2 years
  • 36. 36 Heron COMPONENTS EXPT #1 EXPT #2 EXPT #3 Spout 25 100 200 Bolt 25 100 200 # Heron containers 25 100 200 # Storm workers 25 100 200 Performance: Settings
  • 37. 37 Heron Throughput CPU usage milliontuples/min 0 2750 5500 8250 11000 Spout Parallelism 25 100 200 10,200 5,820 1,545 1,920 965249 Heron (paper) Heron (master) #coresused 0 112.5 225 337.5 450 Spout Parallelism 25 100 200 397.5 217.5 54 261 137 32 Heron (paper) Heron (master) Performance: Atmost Once 5 - 6x 1.4 -1.6x
  • 38. 38 Heronmilliontuples/min 0 10 20 30 40 Spout Parallelism 25 100 200 Heron (paper) Heron (master) 4-5x Performance: CPU Usage
  • 39. 39 Heron @Twitter >  400  Real   Time  Jobs 500  Billions  Events/Day   PROCESSED 25-­‐200   MS   latency
  • 41. 41 Combining batch and real time Lambda Architecture New  Data Client
  • 42. 42 Lambda Architecture - The Good Event  BusScribe  CollecKon  Pipeline Heron  AnalyKcs  Pipeline Results
  • 43. 43 Lambda Architecture - The Bad Have to fix everything (may be twice)! How much Duct Tape required? Have to write everything twice! Subtle differences in semantics What about Graphs, ML, SQL, etc? ! #" 7!
  • 44. 44 Summingbird to the Rescue Summingbird  Program Scalding/Map  Reduce HDFS Message  broker Heron  Topology Online  key  value  result   store Batch  key  value  result   store Client
  • 45. 45 Curious to Learn More? Twitter Heron: Stream Processing at Scale Sanjeev Kulkarni, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg, Sailesh Mittal, Jignesh M. Patel*,1 , Karthik Ramasamy, Siddarth Taneja @sanjeevrk, @challenger_nik, @Louis_Fumaosong, @vikkyrk, @cckellogg, @saileshmittal, @pateljm, @karthikz, @staneja Twitter, Inc., *University of Wisconsin – Madison ABSTRACT Storm has long served as the main platform for real-time analytics at Twitter. However, as the scale of data being processed in real- time at Twitter has increased, along with an increase in the diversity and the number of use cases, many limitations of Storm have become apparent. We need a system that scales better, has better debug-ability, has better performance, and is easier to manage – all while working in a shared cluster infrastructure. We considered various alternatives to meet these needs, and in the end concluded that we needed to build a new real-time stream data processing system. This paper presents the design and implementation of this new system, called Heron. Heron is now system process, which makes debugging very challenging. Thus, we needed a cleaner mapping from the logical units of computation to each physical process. The importance of such clean mapping for debug-ability is really crucial when responding to pager alerts for a failing topology, especially if it is a topology that is critical to the underlying business model. In addition, Storm needs dedicated cluster resources, which requires special hardware allocation to run Storm topologies. This approach leads to inefficiencies in using precious cluster resources, and also limits the ability to scale on demand. We needed the ability to work in a more flexible way with popular cluster scheduling software that Storm @Twitter Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M. Patel*, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, Nikunj Bhagat, Sailesh Mittal, Dmitriy Ryaboy @ankitoshniwal, @staneja, @amits, @karthikz, @pateljm, @sanjeevrk, @jason_j, @krishnagade, @Louis_Fumaosong, @jakedonham, @challenger_nik, @saileshmittal, @squarecog Twitter, Inc., *University of Wisconsin – Madison
  • 46. 46 Interested in Heron? CONTRIBUTIONS ARE WELCOME! https://github.com/twitter/heron http://heronstreaming.io HERON IS OPEN SOURCED FOLLOW US @HERONSTREAMING
  • 47. 47 Interested in Distributed Log? CONTRIBUTIONS ARE WELCOME! https://github.com/twitter/heron http://distributedlog.io DISTRIBUTED LOG IS OPEN SOURCED FOLLOW US @DISTRIBUTEDLOG
  • 48. 48 WHAT WHY WHERE WHEN WHO HOW Any Question ???