Stream Groupings
Fields Grouping!
!
Ensures all Tuples with with the same field value(s)
are always routed to the same task.
!
(this is a simple hash of the field values,
modulo the number of tasks)
Fault Tolerance
Workers heartbeat back to Supervisors and Nimbus via ZooKeeper, !
as well as locally.
ZooKeeperNimbus
Supervisor Supervisor Supervisor Supervisor
Topology
Submitter
Worker Worker Worker Worker
Fault Tolerance
If a worker dies (fails to heartbeat), the Supervisor will restart it
ZooKeeperNimbus
Supervisor Supervisor Supervisor Supervisor
Topology
Submitter
Worker Worker Worker Worker
X
Fault Tolerance
If a worker dies repeatedly, Nimbus will reassign the work to other!
nodes in the cluster.
ZooKeeperNimbus
Supervisor Supervisor Supervisor Supervisor
Topology
Submitter
Worker Worker Worker Worker
X
Fault Tolerance
If a supervisor node dies, Nimbus will reassign the work to other nodes.
ZooKeeperNimbus
Supervisor Supervisor Supervisor Supervisor
Topology
Submitter
Worker Worker Worker Worker
X
X
Fault Tolerance
If Nimbus dies, topologies will continue to function normally,!
but won’t be able to perform reassignments.
ZooKeeperNimbus
Supervisor Supervisor Supervisor Supervisor
Topology
Submitter
Worker Worker Worker Worker
X
Storm on YARN
HDFS2
(redundant,
reliable
storage)
YARN
(cluster
resource
management)
MapReduce
(batch)
Apache
STORM
(streaming)
HADOOP 2.0
Tez
(interactive)
Multi Use Data Platform
Batch, Interactive, Online, Streaming, …
Storm on YARN
HDFS2
(redundant,
reliable
storage)
YARN
(cluster
resource
management)
MapReduce
(batch)
Apache
STORM
(streaming)
HADOOP 2.0
Tez
(interactive)
Multi Use Data Platform
Batch, Interactive, Online, Streaming, …
Batch and real-time on the same cluster
Storm on YARN
HDFS2
(redundant,
reliable
storage)
YARN
(cluster
resource
management)
MapReduce
(batch)
Apache
STORM
(streaming)
HADOOP 2.0
Tez
(interactive)
Multi Use Data Platform
Batch, Interactive, Online, Streaming, …
Security and Multi-tenancy
Storm on YARN
HDFS2
(redundant,
reliable
storage)
YARN
(cluster
resource
management)
MapReduce
(batch)
Apache
STORM
(streaming)
HADOOP 2.0
Tez
(interactive)
Multi Use Data Platform
Batch, Interactive, Online, Streaming, …
Elasticity
Storm on YARN
Nimbus
Resource Management, Scheduling
Supervisor
Node and Process management
Workers
Runs topology tasks
YARN RM
Resource Management
Storm AM
Manage Topology
Containers
Runs topology tasks
YARN NM
Process Management
Storm’s resource management system
maps very naturally to the YARN model.
Storm on YARN
Nimbus
Resource Management, Scheduling
Supervisor
Node and Process management
Workers
Runs topology tasks
YARN RM
Resource Management
Storm AM
Manage Topology
Containers
Runs topology tasks
YARN NM
Process Management
High Availability
Storm on YARN
Nimbus
Resource Management, Scheduling
Supervisor
Node and Process management
Workers
Runs topology tasks
YARN RM
Resource Management
Storm AM
Manage Topology
Containers
Runs topology tasks
YARN NM
Process Management
Detect and scale around bottlenecks
Storm on YARN
Nimbus
Resource Management, Scheduling
Supervisor
Node and Process management
Workers
Runs topology tasks
YARN RM
Resource Management
Storm AM
Manage Topology
Containers
Runs topology tasks
YARN NM
Process Management
Optimize for available resources