SlideShare une entreprise Scribd logo
1  sur  49
Télécharger pour lire hors ligne
Page1
Developing Java Streaming Applications
with Apache Storm
Lester Martin www.ajug.org - Nov 2017
Page2
Connection before Content
Lester Martin – Hadoop/Spark/Storm Trainer & Consultant
lester.martin@gmail.com
http://lester.website (links to blog, twitter,
github, LI, FB, etc)
Page3
Agenda – Needs Updating!!!!
• What is Storm?
• Conceptual Model
• Compile Time
• DEMO: Develop Word Count Topology
• Runtime
• DEMO: Submit Word Count Topology
• Additional Features
• DEMO: Kafka > Storm > HBase Topology in Local Cluster
Page4
What is Storm?
Page5
Storm is …
à Streaming
– Key enabler of the Lambda Architecture
à Fast
– Clocked at 1M+ messages per second per node
à Scalable
– Thousands of workers per cluster
à Fault Tolerant
– Failure is expected, and embraced
à Reliable
– Guaranteed message delivery
– Exactly-once semantics
Page6
Storm in the Lambda Architecture
persists data
Hadoop
batch processing
batch feeds
Update event models
Pattern templates, key-
performance indicators, and
alerts
Dashboards and Applications
Stormreal-time data
feeds
Page7
Conceptual Model
Page8
TUPLE
{…}
Page9
Tuple
à Unit of work to be processes
à Immutable ordered set of serializable values
à Fields must have assigned name
{…}
Page10
Stream
à Core abstraction of Storm
à Unbounded sequence of Tuples
{…} {…} {…} {…} {…} {…} {…}
Page11
SPOUT
Page12
Spout
à Source of Streams
à Wrap an event source and emit Tuples
Page13
Message Queues
Message queues are often the source of the data processed by Storm
Storm Spouts integrate with many types of message queues
real-time data
source
operating
systems,
services and
applications,
sensors
Kestrel,
RabbitMQ,
AMQP, Kafka,
JMS, others…
message
queue
log entries,
events, errors,
status
messages, etc.
Storm
data from queue
is read by Storm
Page14
BOLT
Page15
Bolt
à Core unit of computation
à Receive Tuples and do stuff
à Optionally, emit additional Tuples
Page16
Bolt
à Write to a data store
Page17
Bolt
à Read from a data store
Page18
Bolt
à Perform arbitrary computation
Page19
Bolt
à (Optionally) Emit additional Stream(s)
Page20
TOPOLOGY
Page21
Topology
à DAG of Spouts and Bolts
à Data Flow Representation
à Streaming Computation
Page22
Topology
à Storm executes Spouts and Bolts as Tasks that run in parallel on
multiple machines
Page23
Parallel Execution of Topology Components
a logical
topology
spout A
bolt A bolt B
bolt C
a physical
implementation
machine A
machine B
machine E
machine C
machine D
machine F
machine G
spout A
two tasks
bolt A
two tasks
bolt B two
tasks
bolt C
one task
Page24
Stream Groupings
Stream Groupings determine how Storm routes Tuples between Tasks
Grouping Type Routing Behavior
Shuffle Randomized round-robin (evenly distribute
load to downstream Bolts)
Fields Ensures all Tuples with the same Field
value(s) are always routed to the same Task
All Replicates Stream across all the Bolt’s
Tasks (use with care)
Other options Including custom RYO grouping logic
Page25
Compile Time
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields(”sentence"));
}
Page26
Example Spout Code (1 of 2)
public class RandomSentenceSpout extends BaseRichSpout {
SpoutOutputCollector _collector;
Random _rand;
@Override
public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
_collector = collector;
_rand = new Random();
}
@Override
public void nextTuple() {
Utils.sleep(100);
String[] sentences = new String[]{ "the cow jumped over the moon", "an apple a day keeps
the doctor away", "four score and seven years ago", "snow white and the seven dwarfs",
"i am at two with nature" };
String sentence = sentences[_rand.nextInt(sentences.length)];
_collector.emit(new Values(sentence));
}
Continued next page…
Storm uses open to open the spout and provide it with its configuration,
a context object providing information about components in the
topology, and an output collector used to emit tuples.
Storm uses nextTuple to request
the spout emit the next tuple.
The spout uses emit to send a
tuple to one or more bolts.
Name of the spout class. Storm spout class used as a “template”.
Page27
Example Spout Code (2 of 2)
@Override
public void ack(Object id) {
}
@Override
public void fail(Object id) {
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields(”sentence"));
}
}
Storm calls the spout’s ack method to signal that
a tuple has been fully processed.
Storm calls the spout’s fail method to signal
that a tuple has not been fully processed.
The declareOutputFields
method names the fields in a tuple.
Continued…
Page28
Example Bolt Code
public static class ExclamationBolt extends BaseRichBolt {
OutputCollector _collector;
public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
_collector = collector;
}
public void execute(Tuple tuple) {
_collector.emit(tuple, new Values(tuple.getString(0) + "!!!"));
_collector.ack(tuple);
}
public void cleanup(); {
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}
}
The prepare method
provides the bolt with
its configuration and
an
OutputCollector
used to emit tuples.
The execute method
receives a tuple from a
stream and emits a
new tuple. It also
provides an ack
method that can be
used after successful
delivery.
The cleanup method
releases system
resources when bolt is
shut down.
Names the fields in the output
tuples. More detail later.
Name of the bolt class. Bolt class used as a “template.”
Page29
Example Topology Code
public static main(String[] args) throws exception {
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout(“words”, new TestWordSpout());
builder.setBolt(“exclaim1”, new NewExclamationBolt()).shuffleGrouping(“words”);
builder.setBolt(“exclaim2”, new NewExclamationBolt()).shuffleGrouping(“exclaim1”);
Config conf = new Config();
StormSubmitter.submitTopology(”add-exclamation", conf, builder.createTopology());
}
This code…
words exclaim1 exclaim2
shuffleGrouping shuffleGrouping
…builds this
Topology.
runs code in
TestWordSpout()
runs code in
NewExclamationBolt()
runs code in
NewExclamationBolt()
Page30
DEMO
Develop Word Count Topology
Page31
Runtime
Nimbus
Supervisor
Supervisor
Supervisor
Supervisor
Page32
Physical View
Page33
Topology Submitter uploads topology:
• topology.jar
• topology.ser
• conf.ser
Topology Deployment
Page34
Topology Deployment
Nimbus calculates assignments and sends to Zookeeper
Page35
Topology Deployment
Supervisor nodes receive assignment information
via Zookeeper watches
Page36
Topology Deployment
Supervisor nodes download topology from Nimbus:
• topology.jar
• topology.ser
• conf.ser
Page37
Topology Deployment
Supervisors spawn workers (JVM processes)
Page38
DEMO
Submit Topology to Storm Topology
Page39
Additional Features
FAIL
Page40
Local Versus Distributed Storm Clusters
The topology program code submitted to Storm using storm jar is
different when submitting to local mode versus a distributed cluster.
The submitTopology method is used in both cases.
• The difference is the class that contains the submitTopology method.
Config conf = new Config();
LocalCluster cluster = new LocalCluster();
LocalCluster.submitTopology("mytopology", conf, topology);
Config conf = new Config();
StormSubmitter.submitTopology("mytopology", conf, topology);
Instantiate a local
cluster object.
Submit a topology
to a local cluster.
Submit a topology to a
distributed cluster.
Same method
name, different
classes
Same method
name, different
classes.
Page41
Reliable Processing
Bolts may emit Tuples Anchored to one received.
Tuple “B” is a descendant of Tuple “A”
Page42
Reliable Processing
Multiple Anchorings form a Tuple tree
(bolts not shown)
Page43
Reliable Processing
Bolts can Acknowledge that a tuple
has been processed successfully.
ACK
Page44
Reliable Processing
Bolts can also Fail a tuple to trigger a spout to
replay the original.
FAIL
Page45
Reliable Processing
Any failure in the Tuple tree will trigger a
replay of the original tuple
Page46
More Stuff
à Topology description/deployment options
– Flux
– Storm SQL
à Polyglot development
à Micro-batching with Trident
à Fault tolerance & deployment isolation
à Integrations
– Messaging; Kafka, Redis, Kestrel, Kinesis, MQTT, JMS
– Databases; HBase, Hive, Druid, Cassandra, MongoDB, JDBC
– Search Engines; Solr, Elasticsearch
– HDFS
– And more!
Page47
DEMO
Kafka > Storm > HBase Topology in a Local Cluster
Page48
Kafka > Storm > HBase Example
Requirements:
• Land simulated server logs into Kafka
• Configure a Kafka Bolt to consume the server log messages
• Ignore all messages that are not either WARN or ERROR
• Persist WARN and ERROR messages into HBase
– Keep 10 most recent messages for each server
– Maintain a running total of these concerning messages
• Publish these messages back to Kafka
Kafka
Kafka
HBase
HBaseParse FilterKafka
Kafka
Page49
Questions?
Lester Martin – Hadoop/Spark/Storm Trainer & Consultant
lester.martin@gmail.com
http://lester.website (links to blog, twitter, github, LI, FB, etc)
THANKS FOR YOUR TIME!!

Contenu connexe

Tendances

Tendances (20)

Airflow Intro-1.pdf
Airflow Intro-1.pdfAirflow Intro-1.pdf
Airflow Intro-1.pdf
 
NATS Streaming - an alternative to Apache Kafka?
NATS Streaming - an alternative to Apache Kafka?NATS Streaming - an alternative to Apache Kafka?
NATS Streaming - an alternative to Apache Kafka?
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache FlinkUnifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
 
Pipeline Devops - Intégration continue : ansible, jenkins, docker, jmeter...
Pipeline Devops - Intégration continue : ansible, jenkins, docker, jmeter...Pipeline Devops - Intégration continue : ansible, jenkins, docker, jmeter...
Pipeline Devops - Intégration continue : ansible, jenkins, docker, jmeter...
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Storm
StormStorm
Storm
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Introduction to Prometheus
Introduction to PrometheusIntroduction to Prometheus
Introduction to Prometheus
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
 
Introduction to Apache Airflow
Introduction to Apache AirflowIntroduction to Apache Airflow
Introduction to Apache Airflow
 
PostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized WorldPostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized World
 
Getting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaGetting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and Grafana
 
Kafka: All an engineer needs to know
Kafka: All an engineer needs to knowKafka: All an engineer needs to know
Kafka: All an engineer needs to know
 
Monitoring your Python with Prometheus (Python Ireland April 2015)
Monitoring your Python with Prometheus (Python Ireland April 2015)Monitoring your Python with Prometheus (Python Ireland April 2015)
Monitoring your Python with Prometheus (Python Ireland April 2015)
 
PromQL Deep Dive - The Prometheus Query Language
PromQL Deep Dive - The Prometheus Query Language PromQL Deep Dive - The Prometheus Query Language
PromQL Deep Dive - The Prometheus Query Language
 
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With Prometheus
 

Similaire à Developing Java Streaming Applications with Apache Storm

Os Reindersfinal
Os ReindersfinalOs Reindersfinal
Os Reindersfinal
oscon2007
 
Os Reindersfinal
Os ReindersfinalOs Reindersfinal
Os Reindersfinal
oscon2007
 

Similaire à Developing Java Streaming Applications with Apache Storm (20)

Apache Storm Tutorial
Apache Storm TutorialApache Storm Tutorial
Apache Storm Tutorial
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache Storm
 
Distributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache StormDistributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache Storm
 
Storm is coming
Storm is comingStorm is coming
Storm is coming
 
Real-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesReal-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpaces
 
Storm
StormStorm
Storm
 
Introduction to Apache Storm
Introduction to Apache StormIntroduction to Apache Storm
Introduction to Apache Storm
 
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with StormC*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
C*ollege Credit: CEP Distribtued Processing on Cassandra with Storm
 
Storm 0.8.2
Storm 0.8.2Storm 0.8.2
Storm 0.8.2
 
Java design patterns
Java design patternsJava design patterns
Java design patterns
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
NET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxNET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptx
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
Real time and reliable processing with Apache Storm
Real time and reliable processing with Apache StormReal time and reliable processing with Apache Storm
Real time and reliable processing with Apache Storm
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
Continuations in scala (incomplete version)
Continuations in scala (incomplete version)Continuations in scala (incomplete version)
Continuations in scala (incomplete version)
 
The GO Language : From Beginners to Gophers
The GO Language : From Beginners to GophersThe GO Language : From Beginners to Gophers
The GO Language : From Beginners to Gophers
 
Storm
StormStorm
Storm
 
Os Reindersfinal
Os ReindersfinalOs Reindersfinal
Os Reindersfinal
 
Os Reindersfinal
Os ReindersfinalOs Reindersfinal
Os Reindersfinal
 

Dernier

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 

Dernier (20)

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 

Developing Java Streaming Applications with Apache Storm

  • 1. Page1 Developing Java Streaming Applications with Apache Storm Lester Martin www.ajug.org - Nov 2017
  • 2. Page2 Connection before Content Lester Martin – Hadoop/Spark/Storm Trainer & Consultant lester.martin@gmail.com http://lester.website (links to blog, twitter, github, LI, FB, etc)
  • 3. Page3 Agenda – Needs Updating!!!! • What is Storm? • Conceptual Model • Compile Time • DEMO: Develop Word Count Topology • Runtime • DEMO: Submit Word Count Topology • Additional Features • DEMO: Kafka > Storm > HBase Topology in Local Cluster
  • 5. Page5 Storm is … à Streaming – Key enabler of the Lambda Architecture à Fast – Clocked at 1M+ messages per second per node à Scalable – Thousands of workers per cluster à Fault Tolerant – Failure is expected, and embraced à Reliable – Guaranteed message delivery – Exactly-once semantics
  • 6. Page6 Storm in the Lambda Architecture persists data Hadoop batch processing batch feeds Update event models Pattern templates, key- performance indicators, and alerts Dashboards and Applications Stormreal-time data feeds
  • 9. Page9 Tuple à Unit of work to be processes à Immutable ordered set of serializable values à Fields must have assigned name {…}
  • 10. Page10 Stream à Core abstraction of Storm à Unbounded sequence of Tuples {…} {…} {…} {…} {…} {…} {…}
  • 12. Page12 Spout à Source of Streams à Wrap an event source and emit Tuples
  • 13. Page13 Message Queues Message queues are often the source of the data processed by Storm Storm Spouts integrate with many types of message queues real-time data source operating systems, services and applications, sensors Kestrel, RabbitMQ, AMQP, Kafka, JMS, others… message queue log entries, events, errors, status messages, etc. Storm data from queue is read by Storm
  • 15. Page15 Bolt à Core unit of computation à Receive Tuples and do stuff à Optionally, emit additional Tuples
  • 16. Page16 Bolt à Write to a data store
  • 17. Page17 Bolt à Read from a data store
  • 19. Page19 Bolt à (Optionally) Emit additional Stream(s)
  • 21. Page21 Topology à DAG of Spouts and Bolts à Data Flow Representation à Streaming Computation
  • 22. Page22 Topology à Storm executes Spouts and Bolts as Tasks that run in parallel on multiple machines
  • 23. Page23 Parallel Execution of Topology Components a logical topology spout A bolt A bolt B bolt C a physical implementation machine A machine B machine E machine C machine D machine F machine G spout A two tasks bolt A two tasks bolt B two tasks bolt C one task
  • 24. Page24 Stream Groupings Stream Groupings determine how Storm routes Tuples between Tasks Grouping Type Routing Behavior Shuffle Randomized round-robin (evenly distribute load to downstream Bolts) Fields Ensures all Tuples with the same Field value(s) are always routed to the same Task All Replicates Stream across all the Bolt’s Tasks (use with care) Other options Including custom RYO grouping logic
  • 25. Page25 Compile Time @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields(”sentence")); }
  • 26. Page26 Example Spout Code (1 of 2) public class RandomSentenceSpout extends BaseRichSpout { SpoutOutputCollector _collector; Random _rand; @Override public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) { _collector = collector; _rand = new Random(); } @Override public void nextTuple() { Utils.sleep(100); String[] sentences = new String[]{ "the cow jumped over the moon", "an apple a day keeps the doctor away", "four score and seven years ago", "snow white and the seven dwarfs", "i am at two with nature" }; String sentence = sentences[_rand.nextInt(sentences.length)]; _collector.emit(new Values(sentence)); } Continued next page… Storm uses open to open the spout and provide it with its configuration, a context object providing information about components in the topology, and an output collector used to emit tuples. Storm uses nextTuple to request the spout emit the next tuple. The spout uses emit to send a tuple to one or more bolts. Name of the spout class. Storm spout class used as a “template”.
  • 27. Page27 Example Spout Code (2 of 2) @Override public void ack(Object id) { } @Override public void fail(Object id) { } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields(”sentence")); } } Storm calls the spout’s ack method to signal that a tuple has been fully processed. Storm calls the spout’s fail method to signal that a tuple has not been fully processed. The declareOutputFields method names the fields in a tuple. Continued…
  • 28. Page28 Example Bolt Code public static class ExclamationBolt extends BaseRichBolt { OutputCollector _collector; public void prepare(Map conf, TopologyContext context, OutputCollector collector) { _collector = collector; } public void execute(Tuple tuple) { _collector.emit(tuple, new Values(tuple.getString(0) + "!!!")); _collector.ack(tuple); } public void cleanup(); { } public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); } } The prepare method provides the bolt with its configuration and an OutputCollector used to emit tuples. The execute method receives a tuple from a stream and emits a new tuple. It also provides an ack method that can be used after successful delivery. The cleanup method releases system resources when bolt is shut down. Names the fields in the output tuples. More detail later. Name of the bolt class. Bolt class used as a “template.”
  • 29. Page29 Example Topology Code public static main(String[] args) throws exception { TopologyBuilder builder = new TopologyBuilder(); builder.setSpout(“words”, new TestWordSpout()); builder.setBolt(“exclaim1”, new NewExclamationBolt()).shuffleGrouping(“words”); builder.setBolt(“exclaim2”, new NewExclamationBolt()).shuffleGrouping(“exclaim1”); Config conf = new Config(); StormSubmitter.submitTopology(”add-exclamation", conf, builder.createTopology()); } This code… words exclaim1 exclaim2 shuffleGrouping shuffleGrouping …builds this Topology. runs code in TestWordSpout() runs code in NewExclamationBolt() runs code in NewExclamationBolt()
  • 33. Page33 Topology Submitter uploads topology: • topology.jar • topology.ser • conf.ser Topology Deployment
  • 34. Page34 Topology Deployment Nimbus calculates assignments and sends to Zookeeper
  • 35. Page35 Topology Deployment Supervisor nodes receive assignment information via Zookeeper watches
  • 36. Page36 Topology Deployment Supervisor nodes download topology from Nimbus: • topology.jar • topology.ser • conf.ser
  • 40. Page40 Local Versus Distributed Storm Clusters The topology program code submitted to Storm using storm jar is different when submitting to local mode versus a distributed cluster. The submitTopology method is used in both cases. • The difference is the class that contains the submitTopology method. Config conf = new Config(); LocalCluster cluster = new LocalCluster(); LocalCluster.submitTopology("mytopology", conf, topology); Config conf = new Config(); StormSubmitter.submitTopology("mytopology", conf, topology); Instantiate a local cluster object. Submit a topology to a local cluster. Submit a topology to a distributed cluster. Same method name, different classes Same method name, different classes.
  • 41. Page41 Reliable Processing Bolts may emit Tuples Anchored to one received. Tuple “B” is a descendant of Tuple “A”
  • 42. Page42 Reliable Processing Multiple Anchorings form a Tuple tree (bolts not shown)
  • 43. Page43 Reliable Processing Bolts can Acknowledge that a tuple has been processed successfully. ACK
  • 44. Page44 Reliable Processing Bolts can also Fail a tuple to trigger a spout to replay the original. FAIL
  • 45. Page45 Reliable Processing Any failure in the Tuple tree will trigger a replay of the original tuple
  • 46. Page46 More Stuff à Topology description/deployment options – Flux – Storm SQL à Polyglot development à Micro-batching with Trident à Fault tolerance & deployment isolation à Integrations – Messaging; Kafka, Redis, Kestrel, Kinesis, MQTT, JMS – Databases; HBase, Hive, Druid, Cassandra, MongoDB, JDBC – Search Engines; Solr, Elasticsearch – HDFS – And more!
  • 47. Page47 DEMO Kafka > Storm > HBase Topology in a Local Cluster
  • 48. Page48 Kafka > Storm > HBase Example Requirements: • Land simulated server logs into Kafka • Configure a Kafka Bolt to consume the server log messages • Ignore all messages that are not either WARN or ERROR • Persist WARN and ERROR messages into HBase – Keep 10 most recent messages for each server – Maintain a running total of these concerning messages • Publish these messages back to Kafka Kafka Kafka HBase HBaseParse FilterKafka Kafka
  • 49. Page49 Questions? Lester Martin – Hadoop/Spark/Storm Trainer & Consultant lester.martin@gmail.com http://lester.website (links to blog, twitter, github, LI, FB, etc) THANKS FOR YOUR TIME!!