SlideShare une entreprise Scribd logo
1  sur  108
Télécharger pour lire hors ligne
Processing
“BIG-DATA”
In Real Time
Yanai Franchi , Tikal

1
2
Vacation to Barcelona

3
After a Long Travel Day

4
Going to a Salsa Club

5
Best Salsa Club
NOW
●

Good Music

●

Crowded –
Now!
6
Same Problem in “gogobot”

7
8
Lets' Develop
“Gogobot Checkins Heat-Map”

gogobot checkin
Heat Map Service

9
Key Notes
●

Collector Service - Collects checkins as text addresses
–

We need to use GeoLocation Service

●

Upon elapsed interval, the last locations list will be
displayed as Heat-Map in GUI.

●

Web Scale service – 10Ks checkins/seconds all over the
world (imaginary, but lets do it for the exercise).

●

Accuracy – Sample data, NOT critical data.
–
–

Proportionately representative
Data volume is large enough to compensate for data loss.
10
Heat-Map Context
Heat-Map
Gogobot System
Text-Address
Checkins Heat-Map
Service

Gogobot
Micro Service
Gogobot
Micro Service

Get-GeoCode(Address)

Gogobot
Micro Service

Geo Location
Service

Last Interval Locations
11
Plan A
Simulate Checkins with a File
Check-in #1
Check-in #2
Check-in #3
Check-in #4
Check-in #5
Check-in #6
Check-in #7
Check-in #8
Check-in #9
...

Geo Location
Service
GET Geo
Location

Read
Text Address
Processing
Checkins

Persist Checkin
Intervals

Database
12
Tons of Addresses
Arriving Every Second

13
Architect - First Reaction...

14
Second Reaction...

15
Developer
First
Reaction

16
Second
Reaction

17
Problems ?
●

Tedious: Spend time conf guring where to send
i
messages, deploying workers, and deploying
intermediate queues.

●

Brittle: There's little fault-tolerance.

●

Painful to scale: Partition of running worker/s is
complicated.

18
What We Want ?
● Horizontal scalability
● Fault-tolerance
● No intermediate message brokers!
● Higher level abstraction than message
passing
● “Just works”
● Guaranteed data processing (not in this
case)
19
Apache Storm

✔Horizontal scalability
✔Fault-tolerance
✔No intermediate message brokers!
✔Higher level abstraction than message
passing
✔“Just works”
✔Guaranteed data processing

20
Anatomy of Storm

21
What is Storm ?
●

CEP - Open source and distributed realtime
computation system.
–
–

●

Makes it easy to reliably process unbounded streams of
tuples
Doing for realtime processing what Hadoop did for batch
processing.

Fast - 1M Tuples/sec per node.
–

It is scalable,fault-tolerant, guarantees your data will be
processed, and is easy to set up and operate.
22
Streams

Tuple

Tuple

Tuple

Tuple

Tuple

Tuple

Unbounded sequence of tuples

23
Spouts
Tuple

Tuple

Tuple

Tuple

Sources of Streams
24
Bolts
Tuple

Tuple

Tuple

Tuple

Tuple

TupleTuple

uple
ple T
Tu
Tuple

Processes input streams and produces
new streams

25
Storm Topology
Tuple
Tuple

TupleTuple

TupleTuple
Tuple

Tuple
Tuple
Tuple

le
e Tup
Tupl
Tuple

Tuple

Tuple

TupleTupleTuple

Network of spouts and bolts

26
Guarantee for Processing
●

●
●

Storm guarantees the full processing of a tuple by
tracking its state
In case of failure, Storm can re-process it.
Source tuples with full “acked” trees are removed
from the system

27
Tasks (Bolt/Spout Instance)

Spouts and bolts execute as
many tasks across the cluster

28
Stream Grouping

When a tuple is emitted, which task
(instance) does it go to?

29
Stream Grouping
●
●

●
●

Shuff e grouping: pick a random task
l
Fields grouping: consistent hashing on a subset of
tuple f elds
i
All grouping: send to all tasks
Global grouping: pick task with lowest id

30
Tasks , Executors , Workers

Worker Process

Executor

Task

JVM

Executor

Task

Task

Thread

=

Thread

Sput /
Bolt

Sput / Sput /
Bolt Bolt

31
Node
Worker Process
Executor

Spout A

Supervisor

Executor

Bolt C Bolt C

Executor
Bolt B Bolt B

Node
Worker Process

Supervisor

Executor

Executor

Spout A

Bolt C Bolt C

Executor
Bolt B Bolt B

32
Storm Architecture
NOT critical
for running topology
Upload/Rebalance
Heat-Map Topology
Supervisor
Supervisor

Supervisor

Supervisor

Nimbus

Supervisor

Supervisor

Zoo Keeper
Nodes

Master Node
(similar to Hadoop JobTracker)

33
Storm Architecture
A few
nodes
Upload/Rebalance
Heat-Map Topology
Supervisor
Supervisor

Supervisor

Supervisor

Nimbus

Supervisor

Supervisor

Zoo Keeper

Used For Cluster Coordination
34
Storm Architecture

Upload/Rebalance
Heat-Map Topology
Supervisor
Supervisor

Supervisor

Supervisor

Nimbus

Supervisor

Supervisor

Zoo Keeper

Run Worker Processes
35
Assembling Heatmap Topology

36
HeatMap Input/Output Tuples
●

Input Tuples: Timestamp and Text Address :
–

●

(9:00:07 PM , “287 Hudson St New York NY 10013”)

Output Tuple: Time interval, and a list of points for
it:
–

(9:00:00 PM to 9:00:15 PM,
List((40.719,-73.987),(40.726,-74.001),(40.719,-73.987))

37
Checkins
Spout

Heat Map
Storm
Topology

(9:01 PM @ 287 Hudson st)
Geocode
Lookup
Bolt
(9:01 PM , (40.736, -74,354)))
Heatmap
Builder
Bolt

Upon
Elapsed Interval

(9:00 PM – 9:15 PM , List((40.73, -74,34),
(51.36, -83,33),(69.73, -34,24))
Persistor
Bolt
38
Checkins Spout

public class CheckinsSpout extends BaseRichSpout {

private List<String> sampleLocations;
private int nextEmitIndex;
private SpoutOutputCollector outputCollector;

We hold state
No need for thread safety

@Override
public void open(Map map, TopologyContext topologyContext,
SpoutOutputCollector spoutOutputCollector) {
this.outputCollector = spoutOutputCollector;
this.nextEmitIndex = 0;

sampleLocations = IOUtils.readLines(

}

ClassLoader.getSystemResourceAsStream("sanple-locations.txt"));

@Override
Been called
public void nextTuple() {
iteratively by Storm
String address = checkins.get(nextEmitIndex);
String checkin = new Date().getTime()+"@ADDRESS:"+address;

outputCollector.emit(new Values(checkin));

}

nextEmitIndex = (nextEmitIndex + 1) % sampleLocations.size();

@Override

declareOutputFields(OutputFieldsDeclarer
declarer.declare(new Fields("str"));

public void
}

Declare
output fields

declarer) {

39
Geocode Lookup Bolt
public class GeocodeLookupBolt extends BaseBasicBolt {
private LocatorService locatorService;
@Override
public void prepare(Map stormConf, TopologyContext context) {
locatorService = new GoogleLocatorService();
}
@Override
public void execute(Tuple tuple, BasicOutputCollector outputCollector) {

String str = tuple.getStringByField("str");
String[] parts = str.split("@");
Long time = Long.valueOf(parts[0]);
String address = parts[1];

Get Geocode,
Create DTO

LocationDTO locationDTO = locatorService.getLocation(address);
if(checkinDTO!=null)
}

outputCollector.emit(new Values(time,locationDTO) );

@Override
public void declareOutputFields(OutputFieldsDeclarer fieldsDeclarer) {
}

fieldsDeclarer.declare(new Fields("time", "location"));

40
}
Tick Tuple – Repeating Mantra

41
Two Streams to Heat-Map Builder
Checkin 1

Checkin 4 Checkin 5 Checkin 6

HeatMapBuilder Bolt

On tick tuple, we f ush our Heat-Map
l
42
Tick Tuple in Action
public class HeatMapBuilderBolt extends BaseBasicBolt {
private Map<String, List<LocationDTO>> heatmaps;

Hold latest intervals

@Override
public Map<String, Object> getComponentConfiguration() {
Config conf = new Config();
conf.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, 60 );
return conf;
}

Tick interval

@Override
public void execute(Tuple tuple, BasicOutputCollector outputCollector) {
if (isTickTuple(tuple)) {
// Emit accumulated intervals
} else {
// Add check-in info to the current interval in the Map
}
}
private boolean isTickTuple(Tuple tuple) {
return tuple.getSourceComponent().equals(Constants.SYSTEM_COMPONENT_ID)
&& tuple.getSourceStreamId().equals(Constants.SYSTEM_TICK_STREAM_ID);
}
43
Persister Bolt
public class PersistorBolt extends BaseBasicBolt {
private Jedis jedis;
@Override
public void execute(Tuple tuple, BasicOutputCollector outputCollector) {
Long timeInterval = tuple.getLongByField("time-interval");
String city = tuple.getStringByField("city");
String locationsList = objectMapper.writeValueAsString
( tuple.getValueByField("locationsList"));
String dbKey = "checkins-" + timeInterval+"@"+city;

Persist in Redis
for 24h

jedis.setex(dbKey, 3600*24 ,locationsList);
jedis.publish("location-key", dbKey);
}

}

Publish in
Redis channel
for debugging
44
Transforming the Tuples
Checkins
Spout

Sample Checkins File
Check-in #1
Check-in #2
Check-in #3
Check-in #4
Check-in #5
Check-in #6
Check-in #7
Check-in #8
Check-in #9
...

Read
Text Addresses

Shuffle Grouping
Geocode
Lookup
Bolt

Get Geo
Location
Geo Location
Service

Group by city

Field Grouping(city)
Heatmap
Builder
Bolt
Shuffle Grouping

Database

Persistor
Bolt

45
Heat Map Topology
public class LocalTopologyRunner {
public static void main(String[] args) {
TopologyBuilder builder = buildTopolgy();
StormSubmitter.submitTopology(
"local-heatmap", new Config(), builder.createTopology());
}
private static TopologyBuilder buildTopolgy() {
topologyBuilder builder = new TopologyBuilder();
builder.setSpout("checkins", new CheckinsSpout());
builder.setBolt("geocode-lookup", new GeocodeLookupBolt() )
.shuffleGrouping("checkins");
builder.setBolt("heatmap-builder", new HeatMapBuilderBolt() )

.fieldsGrouping("geocode-lookup",

new Fields("city"));

builder.setBolt("persistor", new PersistorBolt() )
.shuffleGrouping("heatmap-builder");

}

}

return builder;
46
Its NOT Scaled

47
48
Scaling the Topology
public class LocalTopologyRunner {
conf.setNumWorkers(20);
public static void main(String[] args) {
TopologyBuilder builder = buildTopolgy();
Config conf = new Config();

Set no. of workers

conf.setNumWorkers(2);

}

StormSubmitter.submitTopology(
"local-heatmap", conf, builder.createTopology());

Parallelism hint
private static TopologyBuilder buildTopolgy() {
topologyBuilder builder = new TopologyBuilder();
builder.setSpout("checkins", new CheckinsSpout(),

4

);

Increase Tasks
For Future

8 )
.shuffleGrouping("checkins").setNumTasks(64);

builder.setBolt("geocode-lookup", new GeocodeLookupBolt() ,

builder.setBolt("heatmap-builder", new HeatMapBuilderBolt() , 4)
.fieldsGrouping("geocode-lookup", new Fields("city"));
builder.setBolt("persistor", new PersistorBolt() ,

2

)

.shuffleGrouping("heatmap-builder").setNumTasks(4);49
return builder;
Demo

50
Recap – Plan A
Sample Checkins File
Check-in #1
Check-in #2
Check-in #3
Check-in #4
Check-in #5
Check-in #6
Check-in #7
Check-in #8
Check-in #9
...

Geo Location
Service
GET Geo
Location

Read
Text Address
Storm Heat-Map
Topology

Persist Checkin
Intervals

Database
51
We have
something working

52
Add Kafka Messaging

53
Plan B Kafka Spout&Bolt to HeatMap
Publish
Checkins

Geo Location
Service

Checkin
Kafka Topic

Read
Kafka
Text Addresses Checkins
Spout

Geocode
Lookup
Bolt
Heatmap
Builder
Bolt

Database

Persistor
Bolt

Kafka
Locations
Bolt
Locations
Topic

54
55
They all are Good
But not for all use-cases

56
Kafka
A little introduction

57
58
Pub-Sub Messaging System

59
60
61
62
63
Doesn't Fear the File System

64
65
66
67
Topics
●
●

Logical collections of partitions (the physical f les).
i
A broker contains some of the partitions for a topic

68
A partition is Consumed by
Exactly One Group's Consumer

69
Distributed &
Fault-Tolerant
70
Producer 2

Producer 1

Broker 1

Broker 2

Broker 3

Zoo Keeper

Consumer 1

Consumer 2

71
Producer 2

Producer 1

Broker 1

Broker 2

Broker 3

Broker 4

Zoo Keeper

Consumer 1

Consumer 2

72
Producer 2

Producer 1

Broker 1

Broker 2

Broker 3

Broker 4

Zoo Keeper

Consumer 1

Consumer 2

73
Producer 2

Producer 1

Broker 1

Broker 2

Broker 3

Broker 4

Zoo Keeper

Consumer 1

Consumer 2

74
Producer 2

Producer 1

Broker 1

Broker 2

Broker 3

Broker 4

Zoo Keeper

Consumer 1

Consumer 2

75
Producer 2

Producer 1

Broker 1

Broker 2

Broker 3

Broker 4

Zoo Keeper

Consumer 1

Consumer 2

76
Producer 2

Producer 1

Broker 1

Broker 2

Broker 3

Broker 4

Zoo Keeper

Consumer 1

Consumer 2

77
Producer 2

Producer 1

Broker 1

Broker 2

Broker 3

Zoo Keeper

Consumer 1

Consumer 2

78
Producer 2

Producer 1

Broker 1

Broker 2

Broker 3

Zoo Keeper

Consumer 1

Consumer 2

79
Producer 2

Producer 1

Broker 1

Broker 2

Broker 3

Zoo Keeper

Consumer 1

Consumer 2

80
Producer 2

Producer 1

Broker 1

Broker 2

Broker 3

Zoo Keeper

Consumer 1

81
Producer 2

Producer 1

Broker 1

Broker 2

Broker 3

Zoo Keeper

Consumer 1

82
Producer 2

Producer 1

Broker 1

Broker 2

Broker 3

Zoo Keeper

Consumer 1

83
Performance Benchmark
1 Broker
1 Producer
1 Consumer
84
85
86
Add Kafka to our Topology
public class LocalTopologyRunner {
...
private static TopologyBuilder buildTopolgy() {

Kafka Spout

...
builder.setSpout("checkins", new KafkaSpout(kafkaConfig));
...
builder.setBolt("kafkaProducer", new KafkaOutputBolt
( "localhost:9092",
"kafka.serializer.StringEncoder",
"locations-topic"))
.shuffleGrouping("persistor");

}

}

Kafka Bolt

return builder;

87
Plan C – Add Reactor
Text-Address

Publish
Checkins

Checkin HTTP
Reactor

Checkin
Kafka Topic

Consume Checkins

Storm Heat-Map
Topology

Persist Checkin
Intervals

Database

GET Geo
Location

Publish
Interval Key
Locations
Kafka Topic

Geo Location
Service

Index Interval
Locations

Search
Server
Index

88
Why Reactor ?

89
C10K
Problem
90
2008:
Thread Per Request/Response

91
Reactor Pattern Paradigm

Application
registers
handlers

...events
trigger
handlers

92
Reactor Pattern – Key Points
●
●
●
●

Single thread / single event loop
EVERYTHING runs on it
You MUST NOT block the event loop
Many Implementations (partial list):
–

Node.js (JavaScrip), EventMachine (Ruby), Twisted
(Python)... and Vert.X

93
Reactor Pattern Problems
●

Some work is naturally blocking:
–
–

●

Intensive data crunching
3rd-party blocking API’s (e.g. JDBC)

Pure reactor (e.g. Node.js) is not a good f t for this
i
kind of work!

94
95
Vert.X Architecture
Vert.X Architecture

Event Bus

96
Vert.X Goodies
●

Growing Module

●

TCP/SSL servers/clients

●

Repository

●

●

web server

HTTP/HTTPS servers/
clients

●

WebSockets support

●

SockJS support

●

Persistors (Mongo,
JDBC, ...)

●

Work queue

●

Timers

●

Authentication

●

Buffers

●

Manager

●

Streams and Pumps

●

Session manager

●

Routing

●

Socket.IO

●

Asynchronous File I/O
97
Node.JS vs Vert.X

98
Node.js vs Vert.X
●

Node.js

●

Vert.X

–

JavaScript Only

–

Polyglot (JavaScript,
Java, Ruby, Python...)

–

Inherently Single
Threaded

–

Leverages JVM multithreading

–

No help much with IPC

–

Nervous Event Bus

–

All code MUST be in
Event loop

–

Blocking work can be
done off the event loop
99
Node.js vs Vert.X Benchmark

http://vertxproject.wordpress.com/2012/05/09/vert-x-vs-node-js-simple-http-benchmarks/

AMD Phenom II X6 (6 core), 8GB
RAM, Ubuntu 11.04

100
HeatMap Reactor Architecture
Vert.X Instance

Vert.X Instance
Automatically sends
EventBus Msg → KafkaTopic

HTTP
Server
Verticle

Kafka
module
Kafka Topic

Event Bus

Storm
Topology
101
Heat-Map Server – Only 6 LOC !
var
var
var
var

vertx = require('vertx');
container = require('vertx/container');
console = require('vertx/console');
config = container.config;

Send checkin
to Vert.X EventBus

vertx.createHttpServer().requestHandler(function(request) {
request.dataHandler(function(buffer) {
vertx.eventBus.send(config.address, {payload:buffer.toString()});
});
request.response.end();
}).listen(config.httpServerPort, config.httpServerHost);
console.log("HTTP CheckinsReactor started on port "+config.httpServerPort);

102
Publish
Checkins

Checkin HTTP
Reactor

Checkin
Kafka Topic

Consume Checkins

Checkin HTTP
Firehose

GET Geo
Location

Storm Heat-Map
Topology

Persist Checkin
Intervals

Publish
Interval Key

Index Interval
Locations

Hotzones
Kafka Topic

Database

Consume Intervals Keys
Search

Get Interval Locations

Geo Location
Service

Search
Server

Web App
Push via WebSocket

Index

103
Demo

104
Summary

105
When You go out to Salsa Club

●

Good Music

●

Crowded

106
More Conclusions..
●

Storm – Great for real-time BigData processing.
Complementary for Hadoop batch jobs.

●

Kafka – Great messaging for logs/events data, been
served as a good “source” for Storm spout

●

Vert.X – Worth trial and check as an alternative for
reactor.

107
Thanks

108

Contenu connexe

Tendances

Super COMPUTING Journal
Super COMPUTING JournalSuper COMPUTING Journal
Super COMPUTING Journal
Pandey_G
 
Computer Performance Microscopy with SHIM
Computer Performance Microscopy with SHIMComputer Performance Microscopy with SHIM
Computer Performance Microscopy with SHIM
hiyangxi
 

Tendances (19)

Fun421 stephens
Fun421 stephensFun421 stephens
Fun421 stephens
 
Taking Your Database Beyond the Border of a Single Kubernetes Cluster
Taking Your Database Beyond the Border of a Single Kubernetes ClusterTaking Your Database Beyond the Border of a Single Kubernetes Cluster
Taking Your Database Beyond the Border of a Single Kubernetes Cluster
 
Super COMPUTING Journal
Super COMPUTING JournalSuper COMPUTING Journal
Super COMPUTING Journal
 
On heap cache vs off-heap cache
On heap cache vs off-heap cacheOn heap cache vs off-heap cache
On heap cache vs off-heap cache
 
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
 
Alternative cryptocurrencies
Alternative cryptocurrencies Alternative cryptocurrencies
Alternative cryptocurrencies
 
Smallworld Data Check-Out to Microstation
Smallworld Data Check-Out to MicrostationSmallworld Data Check-Out to Microstation
Smallworld Data Check-Out to Microstation
 
SkyhookDM - Towards an Arrow-Native Storage System
SkyhookDM - Towards an Arrow-Native Storage SystemSkyhookDM - Towards an Arrow-Native Storage System
SkyhookDM - Towards an Arrow-Native Storage System
 
Computer Performance Microscopy with SHIM
Computer Performance Microscopy with SHIMComputer Performance Microscopy with SHIM
Computer Performance Microscopy with SHIM
 
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
 
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
 NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic... NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
 
Burst data retrieval after 50k GPU Cloud run
Burst data retrieval after 50k GPU Cloud runBurst data retrieval after 50k GPU Cloud run
Burst data retrieval after 50k GPU Cloud run
 
Smallworld to GreGG - FME Server Automation
Smallworld to GreGG - FME Server AutomationSmallworld to GreGG - FME Server Automation
Smallworld to GreGG - FME Server Automation
 
Pig TPC-H Benchmark and Performance Tuning
Pig TPC-H Benchmark and Performance TuningPig TPC-H Benchmark and Performance Tuning
Pig TPC-H Benchmark and Performance Tuning
 
Progress_190315
Progress_190315Progress_190315
Progress_190315
 
Pharo Status Fosdem 2015
Pharo Status Fosdem 2015Pharo Status Fosdem 2015
Pharo Status Fosdem 2015
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud Burst
 
하이퍼커넥트 데이터 팀이 데이터 증가에 대처해온 기록
하이퍼커넥트 데이터 팀이 데이터 증가에 대처해온 기록하이퍼커넥트 데이터 팀이 데이터 증가에 대처해온 기록
하이퍼커넥트 데이터 팀이 데이터 증가에 대처해온 기록
 
"Building and running the cloud GPU vacuum cleaner"
"Building and running the cloud GPU vacuum cleaner""Building and running the cloud GPU vacuum cleaner"
"Building and running the cloud GPU vacuum cleaner"
 

Similaire à Processing Big Data in Realtime

S4495-plasma-turbulence-sims-gyrokinetic-tokamak-solver
S4495-plasma-turbulence-sims-gyrokinetic-tokamak-solverS4495-plasma-turbulence-sims-gyrokinetic-tokamak-solver
S4495-plasma-turbulence-sims-gyrokinetic-tokamak-solver
Praveen Narayanan
 
Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.
Dan Lynn
 

Similaire à Processing Big Data in Realtime (20)

Heatmap
HeatmapHeatmap
Heatmap
 
Processing Big Data in Real-Time - Yanai Franchi, Tikal
Processing Big Data in Real-Time - Yanai Franchi, TikalProcessing Big Data in Real-Time - Yanai Franchi, Tikal
Processing Big Data in Real-Time - Yanai Franchi, Tikal
 
Developing Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache StormDeveloping Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache Storm
 
Apache Storm Tutorial
Apache Storm TutorialApache Storm Tutorial
Apache Storm Tutorial
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmap
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
 
Where in the world is Franz Kafka? | Will LaForest, Confluent
Where in the world is Franz Kafka? | Will LaForest, ConfluentWhere in the world is Franz Kafka? | Will LaForest, Confluent
Where in the world is Franz Kafka? | Will LaForest, Confluent
 
Real time and reliable processing with Apache Storm
Real time and reliable processing with Apache StormReal time and reliable processing with Apache Storm
Real time and reliable processing with Apache Storm
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*
 
RxJava on Android
RxJava on AndroidRxJava on Android
RxJava on Android
 
Storm is coming
Storm is comingStorm is coming
Storm is coming
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache Storm
 
S4495-plasma-turbulence-sims-gyrokinetic-tokamak-solver
S4495-plasma-turbulence-sims-gyrokinetic-tokamak-solverS4495-plasma-turbulence-sims-gyrokinetic-tokamak-solver
S4495-plasma-turbulence-sims-gyrokinetic-tokamak-solver
 
Deuce STM - CMP'09
Deuce STM - CMP'09Deuce STM - CMP'09
Deuce STM - CMP'09
 
Yevhen Tatarynov "From POC to High-Performance .NET applications"
Yevhen Tatarynov "From POC to High-Performance .NET applications"Yevhen Tatarynov "From POC to High-Performance .NET applications"
Yevhen Tatarynov "From POC to High-Performance .NET applications"
 
Storm
StormStorm
Storm
 
Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.
 
Concurrency in Go by Denys Goldiner.pdf
Concurrency in Go by Denys Goldiner.pdfConcurrency in Go by Denys Goldiner.pdf
Concurrency in Go by Denys Goldiner.pdf
 
Introduction to Apache Storm
Introduction to Apache StormIntroduction to Apache Storm
Introduction to Apache Storm
 
Stevens 3rd Annual Conference Hfc2011
Stevens 3rd Annual Conference Hfc2011Stevens 3rd Annual Conference Hfc2011
Stevens 3rd Annual Conference Hfc2011
 

Plus de Tikal Knowledge

Docking your services_with_docker
Docking your services_with_dockerDocking your services_with_docker
Docking your services_with_docker
Tikal Knowledge
 
Tikal Fuse Day Access Layer Implementation (C#) Based On Mongo Db
Tikal Fuse Day   Access Layer Implementation (C#) Based On Mongo DbTikal Fuse Day   Access Layer Implementation (C#) Based On Mongo Db
Tikal Fuse Day Access Layer Implementation (C#) Based On Mongo Db
Tikal Knowledge
 
Introduction To Cloud Computing
Introduction To Cloud ComputingIntroduction To Cloud Computing
Introduction To Cloud Computing
Tikal Knowledge
 

Plus de Tikal Knowledge (20)

Clojure - LISP on the JVM
Clojure - LISP on the JVM Clojure - LISP on the JVM
Clojure - LISP on the JVM
 
Clojure presentation
Clojure presentationClojure presentation
Clojure presentation
 
Tabtale story: Building a publishing and monitoring mobile games architecture...
Tabtale story: Building a publishing and monitoring mobile games architecture...Tabtale story: Building a publishing and monitoring mobile games architecture...
Tabtale story: Building a publishing and monitoring mobile games architecture...
 
Docking your services_with_docker
Docking your services_with_dockerDocking your services_with_docker
Docking your services_with_docker
 
Who moved my_box
Who moved my_boxWho moved my_box
Who moved my_box
 
Writing a Fullstack Application with Javascript - Remote media player
Writing a Fullstack Application with Javascript - Remote media playerWriting a Fullstack Application with Javascript - Remote media player
Writing a Fullstack Application with Javascript - Remote media player
 
.Net OSS Ci & CD with Jenkins - JUC ISRAEL 2013
.Net OSS Ci & CD with Jenkins - JUC ISRAEL 2013 .Net OSS Ci & CD with Jenkins - JUC ISRAEL 2013
.Net OSS Ci & CD with Jenkins - JUC ISRAEL 2013
 
Tikal's Backbone_js introduction workshop
Tikal's Backbone_js introduction workshopTikal's Backbone_js introduction workshop
Tikal's Backbone_js introduction workshop
 
TCE Automation
TCE AutomationTCE Automation
TCE Automation
 
Tce automation-d4
Tce automation-d4Tce automation-d4
Tce automation-d4
 
Cloud computing - an insight into "how does it really work ?"
Cloud computing - an insight into "how does it really work ?" Cloud computing - an insight into "how does it really work ?"
Cloud computing - an insight into "how does it really work ?"
 
Introduction to Cloud Computing
Introduction to Cloud ComputingIntroduction to Cloud Computing
Introduction to Cloud Computing
 
Tikal Fuse Day Access Layer Implementation (C#) Based On Mongo Db
Tikal Fuse Day   Access Layer Implementation (C#) Based On Mongo DbTikal Fuse Day   Access Layer Implementation (C#) Based On Mongo Db
Tikal Fuse Day Access Layer Implementation (C#) Based On Mongo Db
 
Ship early ship often with Django
Ship early ship often with DjangoShip early ship often with Django
Ship early ship often with Django
 
Google App Engine
Google App EngineGoogle App Engine
Google App Engine
 
AWS Case Study
AWS Case StudyAWS Case Study
AWS Case Study
 
Introduction To Cloud Computing
Introduction To Cloud ComputingIntroduction To Cloud Computing
Introduction To Cloud Computing
 
Building Components In Flex3
Building Components In Flex3Building Components In Flex3
Building Components In Flex3
 
Osgi Democamp
Osgi DemocampOsgi Democamp
Osgi Democamp
 
JBUG 11 - Django-The Web Framework For Perfectionists With Deadlines
JBUG 11 - Django-The Web Framework For Perfectionists With DeadlinesJBUG 11 - Django-The Web Framework For Perfectionists With Deadlines
JBUG 11 - Django-The Web Framework For Perfectionists With Deadlines
 

Dernier

Dernier (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

Processing Big Data in Realtime

  • 2. 2
  • 4. After a Long Travel Day 4
  • 5. Going to a Salsa Club 5
  • 6. Best Salsa Club NOW ● Good Music ● Crowded – Now! 6
  • 7. Same Problem in “gogobot” 7
  • 8. 8
  • 9. Lets' Develop “Gogobot Checkins Heat-Map” gogobot checkin Heat Map Service 9
  • 10. Key Notes ● Collector Service - Collects checkins as text addresses – We need to use GeoLocation Service ● Upon elapsed interval, the last locations list will be displayed as Heat-Map in GUI. ● Web Scale service – 10Ks checkins/seconds all over the world (imaginary, but lets do it for the exercise). ● Accuracy – Sample data, NOT critical data. – – Proportionately representative Data volume is large enough to compensate for data loss. 10
  • 11. Heat-Map Context Heat-Map Gogobot System Text-Address Checkins Heat-Map Service Gogobot Micro Service Gogobot Micro Service Get-GeoCode(Address) Gogobot Micro Service Geo Location Service Last Interval Locations 11
  • 12. Plan A Simulate Checkins with a File Check-in #1 Check-in #2 Check-in #3 Check-in #4 Check-in #5 Check-in #6 Check-in #7 Check-in #8 Check-in #9 ... Geo Location Service GET Geo Location Read Text Address Processing Checkins Persist Checkin Intervals Database 12
  • 13. Tons of Addresses Arriving Every Second 13
  • 14. Architect - First Reaction... 14
  • 18. Problems ? ● Tedious: Spend time conf guring where to send i messages, deploying workers, and deploying intermediate queues. ● Brittle: There's little fault-tolerance. ● Painful to scale: Partition of running worker/s is complicated. 18
  • 19. What We Want ? ● Horizontal scalability ● Fault-tolerance ● No intermediate message brokers! ● Higher level abstraction than message passing ● “Just works” ● Guaranteed data processing (not in this case) 19
  • 20. Apache Storm ✔Horizontal scalability ✔Fault-tolerance ✔No intermediate message brokers! ✔Higher level abstraction than message passing ✔“Just works” ✔Guaranteed data processing 20
  • 22. What is Storm ? ● CEP - Open source and distributed realtime computation system. – – ● Makes it easy to reliably process unbounded streams of tuples Doing for realtime processing what Hadoop did for batch processing. Fast - 1M Tuples/sec per node. – It is scalable,fault-tolerant, guarantees your data will be processed, and is easy to set up and operate. 22
  • 27. Guarantee for Processing ● ● ● Storm guarantees the full processing of a tuple by tracking its state In case of failure, Storm can re-process it. Source tuples with full “acked” trees are removed from the system 27
  • 28. Tasks (Bolt/Spout Instance) Spouts and bolts execute as many tasks across the cluster 28
  • 29. Stream Grouping When a tuple is emitted, which task (instance) does it go to? 29
  • 30. Stream Grouping ● ● ● ● Shuff e grouping: pick a random task l Fields grouping: consistent hashing on a subset of tuple f elds i All grouping: send to all tasks Global grouping: pick task with lowest id 30
  • 31. Tasks , Executors , Workers Worker Process Executor Task JVM Executor Task Task Thread = Thread Sput / Bolt Sput / Sput / Bolt Bolt 31
  • 32. Node Worker Process Executor Spout A Supervisor Executor Bolt C Bolt C Executor Bolt B Bolt B Node Worker Process Supervisor Executor Executor Spout A Bolt C Bolt C Executor Bolt B Bolt B 32
  • 33. Storm Architecture NOT critical for running topology Upload/Rebalance Heat-Map Topology Supervisor Supervisor Supervisor Supervisor Nimbus Supervisor Supervisor Zoo Keeper Nodes Master Node (similar to Hadoop JobTracker) 33
  • 34. Storm Architecture A few nodes Upload/Rebalance Heat-Map Topology Supervisor Supervisor Supervisor Supervisor Nimbus Supervisor Supervisor Zoo Keeper Used For Cluster Coordination 34
  • 37. HeatMap Input/Output Tuples ● Input Tuples: Timestamp and Text Address : – ● (9:00:07 PM , “287 Hudson St New York NY 10013”) Output Tuple: Time interval, and a list of points for it: – (9:00:00 PM to 9:00:15 PM, List((40.719,-73.987),(40.726,-74.001),(40.719,-73.987)) 37
  • 38. Checkins Spout Heat Map Storm Topology (9:01 PM @ 287 Hudson st) Geocode Lookup Bolt (9:01 PM , (40.736, -74,354))) Heatmap Builder Bolt Upon Elapsed Interval (9:00 PM – 9:15 PM , List((40.73, -74,34), (51.36, -83,33),(69.73, -34,24)) Persistor Bolt 38
  • 39. Checkins Spout public class CheckinsSpout extends BaseRichSpout { private List<String> sampleLocations; private int nextEmitIndex; private SpoutOutputCollector outputCollector; We hold state No need for thread safety @Override public void open(Map map, TopologyContext topologyContext, SpoutOutputCollector spoutOutputCollector) { this.outputCollector = spoutOutputCollector; this.nextEmitIndex = 0; sampleLocations = IOUtils.readLines( } ClassLoader.getSystemResourceAsStream("sanple-locations.txt")); @Override Been called public void nextTuple() { iteratively by Storm String address = checkins.get(nextEmitIndex); String checkin = new Date().getTime()+"@ADDRESS:"+address; outputCollector.emit(new Values(checkin)); } nextEmitIndex = (nextEmitIndex + 1) % sampleLocations.size(); @Override declareOutputFields(OutputFieldsDeclarer declarer.declare(new Fields("str")); public void } Declare output fields declarer) { 39
  • 40. Geocode Lookup Bolt public class GeocodeLookupBolt extends BaseBasicBolt { private LocatorService locatorService; @Override public void prepare(Map stormConf, TopologyContext context) { locatorService = new GoogleLocatorService(); } @Override public void execute(Tuple tuple, BasicOutputCollector outputCollector) { String str = tuple.getStringByField("str"); String[] parts = str.split("@"); Long time = Long.valueOf(parts[0]); String address = parts[1]; Get Geocode, Create DTO LocationDTO locationDTO = locatorService.getLocation(address); if(checkinDTO!=null) } outputCollector.emit(new Values(time,locationDTO) ); @Override public void declareOutputFields(OutputFieldsDeclarer fieldsDeclarer) { } fieldsDeclarer.declare(new Fields("time", "location")); 40 }
  • 41. Tick Tuple – Repeating Mantra 41
  • 42. Two Streams to Heat-Map Builder Checkin 1 Checkin 4 Checkin 5 Checkin 6 HeatMapBuilder Bolt On tick tuple, we f ush our Heat-Map l 42
  • 43. Tick Tuple in Action public class HeatMapBuilderBolt extends BaseBasicBolt { private Map<String, List<LocationDTO>> heatmaps; Hold latest intervals @Override public Map<String, Object> getComponentConfiguration() { Config conf = new Config(); conf.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, 60 ); return conf; } Tick interval @Override public void execute(Tuple tuple, BasicOutputCollector outputCollector) { if (isTickTuple(tuple)) { // Emit accumulated intervals } else { // Add check-in info to the current interval in the Map } } private boolean isTickTuple(Tuple tuple) { return tuple.getSourceComponent().equals(Constants.SYSTEM_COMPONENT_ID) && tuple.getSourceStreamId().equals(Constants.SYSTEM_TICK_STREAM_ID); } 43
  • 44. Persister Bolt public class PersistorBolt extends BaseBasicBolt { private Jedis jedis; @Override public void execute(Tuple tuple, BasicOutputCollector outputCollector) { Long timeInterval = tuple.getLongByField("time-interval"); String city = tuple.getStringByField("city"); String locationsList = objectMapper.writeValueAsString ( tuple.getValueByField("locationsList")); String dbKey = "checkins-" + timeInterval+"@"+city; Persist in Redis for 24h jedis.setex(dbKey, 3600*24 ,locationsList); jedis.publish("location-key", dbKey); } } Publish in Redis channel for debugging 44
  • 45. Transforming the Tuples Checkins Spout Sample Checkins File Check-in #1 Check-in #2 Check-in #3 Check-in #4 Check-in #5 Check-in #6 Check-in #7 Check-in #8 Check-in #9 ... Read Text Addresses Shuffle Grouping Geocode Lookup Bolt Get Geo Location Geo Location Service Group by city Field Grouping(city) Heatmap Builder Bolt Shuffle Grouping Database Persistor Bolt 45
  • 46. Heat Map Topology public class LocalTopologyRunner { public static void main(String[] args) { TopologyBuilder builder = buildTopolgy(); StormSubmitter.submitTopology( "local-heatmap", new Config(), builder.createTopology()); } private static TopologyBuilder buildTopolgy() { topologyBuilder builder = new TopologyBuilder(); builder.setSpout("checkins", new CheckinsSpout()); builder.setBolt("geocode-lookup", new GeocodeLookupBolt() ) .shuffleGrouping("checkins"); builder.setBolt("heatmap-builder", new HeatMapBuilderBolt() ) .fieldsGrouping("geocode-lookup", new Fields("city")); builder.setBolt("persistor", new PersistorBolt() ) .shuffleGrouping("heatmap-builder"); } } return builder; 46
  • 48. 48
  • 49. Scaling the Topology public class LocalTopologyRunner { conf.setNumWorkers(20); public static void main(String[] args) { TopologyBuilder builder = buildTopolgy(); Config conf = new Config(); Set no. of workers conf.setNumWorkers(2); } StormSubmitter.submitTopology( "local-heatmap", conf, builder.createTopology()); Parallelism hint private static TopologyBuilder buildTopolgy() { topologyBuilder builder = new TopologyBuilder(); builder.setSpout("checkins", new CheckinsSpout(), 4 ); Increase Tasks For Future 8 ) .shuffleGrouping("checkins").setNumTasks(64); builder.setBolt("geocode-lookup", new GeocodeLookupBolt() , builder.setBolt("heatmap-builder", new HeatMapBuilderBolt() , 4) .fieldsGrouping("geocode-lookup", new Fields("city")); builder.setBolt("persistor", new PersistorBolt() , 2 ) .shuffleGrouping("heatmap-builder").setNumTasks(4);49 return builder;
  • 51. Recap – Plan A Sample Checkins File Check-in #1 Check-in #2 Check-in #3 Check-in #4 Check-in #5 Check-in #6 Check-in #7 Check-in #8 Check-in #9 ... Geo Location Service GET Geo Location Read Text Address Storm Heat-Map Topology Persist Checkin Intervals Database 51
  • 54. Plan B Kafka Spout&Bolt to HeatMap Publish Checkins Geo Location Service Checkin Kafka Topic Read Kafka Text Addresses Checkins Spout Geocode Lookup Bolt Heatmap Builder Bolt Database Persistor Bolt Kafka Locations Bolt Locations Topic 54
  • 55. 55
  • 56. They all are Good But not for all use-cases 56
  • 58. 58
  • 60. 60
  • 61. 61
  • 62. 62
  • 63. 63
  • 64. Doesn't Fear the File System 64
  • 65. 65
  • 66. 66
  • 67. 67
  • 68. Topics ● ● Logical collections of partitions (the physical f les). i A broker contains some of the partitions for a topic 68
  • 69. A partition is Consumed by Exactly One Group's Consumer 69
  • 71. Producer 2 Producer 1 Broker 1 Broker 2 Broker 3 Zoo Keeper Consumer 1 Consumer 2 71
  • 72. Producer 2 Producer 1 Broker 1 Broker 2 Broker 3 Broker 4 Zoo Keeper Consumer 1 Consumer 2 72
  • 73. Producer 2 Producer 1 Broker 1 Broker 2 Broker 3 Broker 4 Zoo Keeper Consumer 1 Consumer 2 73
  • 74. Producer 2 Producer 1 Broker 1 Broker 2 Broker 3 Broker 4 Zoo Keeper Consumer 1 Consumer 2 74
  • 75. Producer 2 Producer 1 Broker 1 Broker 2 Broker 3 Broker 4 Zoo Keeper Consumer 1 Consumer 2 75
  • 76. Producer 2 Producer 1 Broker 1 Broker 2 Broker 3 Broker 4 Zoo Keeper Consumer 1 Consumer 2 76
  • 77. Producer 2 Producer 1 Broker 1 Broker 2 Broker 3 Broker 4 Zoo Keeper Consumer 1 Consumer 2 77
  • 78. Producer 2 Producer 1 Broker 1 Broker 2 Broker 3 Zoo Keeper Consumer 1 Consumer 2 78
  • 79. Producer 2 Producer 1 Broker 1 Broker 2 Broker 3 Zoo Keeper Consumer 1 Consumer 2 79
  • 80. Producer 2 Producer 1 Broker 1 Broker 2 Broker 3 Zoo Keeper Consumer 1 Consumer 2 80
  • 81. Producer 2 Producer 1 Broker 1 Broker 2 Broker 3 Zoo Keeper Consumer 1 81
  • 82. Producer 2 Producer 1 Broker 1 Broker 2 Broker 3 Zoo Keeper Consumer 1 82
  • 83. Producer 2 Producer 1 Broker 1 Broker 2 Broker 3 Zoo Keeper Consumer 1 83
  • 84. Performance Benchmark 1 Broker 1 Producer 1 Consumer 84
  • 85. 85
  • 86. 86
  • 87. Add Kafka to our Topology public class LocalTopologyRunner { ... private static TopologyBuilder buildTopolgy() { Kafka Spout ... builder.setSpout("checkins", new KafkaSpout(kafkaConfig)); ... builder.setBolt("kafkaProducer", new KafkaOutputBolt ( "localhost:9092", "kafka.serializer.StringEncoder", "locations-topic")) .shuffleGrouping("persistor"); } } Kafka Bolt return builder; 87
  • 88. Plan C – Add Reactor Text-Address Publish Checkins Checkin HTTP Reactor Checkin Kafka Topic Consume Checkins Storm Heat-Map Topology Persist Checkin Intervals Database GET Geo Location Publish Interval Key Locations Kafka Topic Geo Location Service Index Interval Locations Search Server Index 88
  • 93. Reactor Pattern – Key Points ● ● ● ● Single thread / single event loop EVERYTHING runs on it You MUST NOT block the event loop Many Implementations (partial list): – Node.js (JavaScrip), EventMachine (Ruby), Twisted (Python)... and Vert.X 93
  • 94. Reactor Pattern Problems ● Some work is naturally blocking: – – ● Intensive data crunching 3rd-party blocking API’s (e.g. JDBC) Pure reactor (e.g. Node.js) is not a good f t for this i kind of work! 94
  • 95. 95
  • 97. Vert.X Goodies ● Growing Module ● TCP/SSL servers/clients ● Repository ● ● web server HTTP/HTTPS servers/ clients ● WebSockets support ● SockJS support ● Persistors (Mongo, JDBC, ...) ● Work queue ● Timers ● Authentication ● Buffers ● Manager ● Streams and Pumps ● Session manager ● Routing ● Socket.IO ● Asynchronous File I/O 97
  • 99. Node.js vs Vert.X ● Node.js ● Vert.X – JavaScript Only – Polyglot (JavaScript, Java, Ruby, Python...) – Inherently Single Threaded – Leverages JVM multithreading – No help much with IPC – Nervous Event Bus – All code MUST be in Event loop – Blocking work can be done off the event loop 99
  • 100. Node.js vs Vert.X Benchmark http://vertxproject.wordpress.com/2012/05/09/vert-x-vs-node-js-simple-http-benchmarks/ AMD Phenom II X6 (6 core), 8GB RAM, Ubuntu 11.04 100
  • 101. HeatMap Reactor Architecture Vert.X Instance Vert.X Instance Automatically sends EventBus Msg → KafkaTopic HTTP Server Verticle Kafka module Kafka Topic Event Bus Storm Topology 101
  • 102. Heat-Map Server – Only 6 LOC ! var var var var vertx = require('vertx'); container = require('vertx/container'); console = require('vertx/console'); config = container.config; Send checkin to Vert.X EventBus vertx.createHttpServer().requestHandler(function(request) { request.dataHandler(function(buffer) { vertx.eventBus.send(config.address, {payload:buffer.toString()}); }); request.response.end(); }).listen(config.httpServerPort, config.httpServerHost); console.log("HTTP CheckinsReactor started on port "+config.httpServerPort); 102
  • 103. Publish Checkins Checkin HTTP Reactor Checkin Kafka Topic Consume Checkins Checkin HTTP Firehose GET Geo Location Storm Heat-Map Topology Persist Checkin Intervals Publish Interval Key Index Interval Locations Hotzones Kafka Topic Database Consume Intervals Keys Search Get Interval Locations Geo Location Service Search Server Web App Push via WebSocket Index 103
  • 106. When You go out to Salsa Club ● Good Music ● Crowded 106
  • 107. More Conclusions.. ● Storm – Great for real-time BigData processing. Complementary for Hadoop batch jobs. ● Kafka – Great messaging for logs/events data, been served as a good “source” for Storm spout ● Vert.X – Worth trial and check as an alternative for reactor. 107