SlideShare une entreprise Scribd logo
1  sur  16
Introduction to Storm
Md. Shamsur Rahim
Student of MScCS
American International University
Bangladesh
Types of Processing of Big Data
• Batch Processing
▫ Takes large amount Data at a time, analyzes it and
produces a large output.
• Real-Time Processing
▫ Collects, analyzes and produces output in Real
time.
Let’s get Familiar with Storm
• Storm is an open source software.
• Currently being incubated at the Apache
Software Foundation.
• Industry leaders choose it because it’s
distributed, real-time, data processing platform.
Possible Areas of Using Storm
• Stream processing:
▫ Storm is used to process a stream of data and update a
variety of Databases in real time.
▫ This processing occurs in real time and the processing
speed needs to match the input data speed.
• Continuous computation:
▫ Storm can do continuous computation on data streams
and stream the results into clients in real time.
• Distributed RPC ()
▫ Storm can parallelize an intense query so that you can
compute it in real time.
• Real-time analytics:
▫ Storm can analyze and respond to data that comes
▫ from different data sources as they happen in real
time.
Features of Storm
• Fast
• Horizontally scalable:
▫ As it is distributed platform, we can add nodes (1
node = 1 single machine) to Strom cluster.
▫ We can double the processing capacity by
Doubling the nodes.
• Fault tolerant
• Guaranteed data processing
• Easy to operate
• Programming language agnostic
Storm components
A Storm cluster
follows a master-
slave model where
the master and
slave processes are
coordinated
through
ZooKeeper.
Storm components: Nimbus
• Nimbus:
▫ is the master in a Storm cluster.
▫ It is responsible for :
 distributing application code across various worker
nodes
 assigning tasks to different machines
 monitoring tasks for any failures
 restarting them as and when required
▫ It is stateless.
▫ Stores all of its data in ZooKeeper.
▫ Only one Nimbus node in a Storm Cluster.
▫ It can be restarted without having any effects on the
already running tasks on the worker nodes.
Storm components: Supervisor nodes
• Supervisor nodes
▫ are the worker nodes in a Storm cluster.
▫ Each supervisor node runs a supervisor daemon that is
responsible for-
 creating,
 Starting and
 Stopping
worker processes to execute the tasks assigned to that
node.
▫ Like Nimbus, a supervisor daemon is also fail-fast and
stores all of its state in ZooKeeper so that it can be
restarted without any state loss.
▫ A single supervisor daemon normally handles multiple
worker processes .
Storm components: The ZooKeeper cluster
• ZooKeeper is an application that:
▫ Coordinates,
▫ Shares some configuration information
Among various processes.
▫ Being a distributed application, Storm also uses a
ZooKeeper cluster to coordinate various processes.
▫ All of the states associated with the cluster and the various
tasks submitted to the Storm are stored in ZooKeeper.
▫ Nimbus and Supervisor Nodes communicate only through
ZooKeeper.
▫ As all data are stored in ZooKeeper, So any time we can kill
Nimbus or supervisor daemons without affecting cluster.
The Storm data model
• Basic unit of data that can be processed by a Storm
application is called a tuple.
• Each tuple consists of a predefined list of fields.
• The value of each field can be a byte, char, integer,
long, float, double, Boolean, or byte array.
• Storm also provides an API to define your own data
types, which can be serialized as fields in a tuple.
• A tuple is dynamically typed, that is, you just need to
define the names of the fields in a tuple and not
their data type.
• Fields in a tuple can be accessed by its name
getValueByField(String) or its positional
index getValue(int).
Definition of a Storm topology
• A topology is an abstraction that
defines the graph of the
computation.
• We create a Storm topology and
deploy it on a Storm cluster to
process the data.
• A topology can be represented by a
direct acyclic graph.
The Components of a Storm topology:
Stream
• Stream:
▫ The key abstraction in Storm is that of a stream.
▫ It is an unbounded sequence of tuples that can be
processed in parallel by Storm.
▫ Each stream can be processed by a single or
multiple types of bolts.
▫ Each stream in a Storm application is given an ID.
▫ The bolts can produce and consume tuples from
these streams on the basis of their ID.
▫ Each stream also has an associated schema for the
tuples that will flow through it.(????)
The Components of a Storm topology:
Spout
• Spout:
▫ A spout is the source of tuples in a Storm topology.
▫ It is responsible for reading or listening to data
from an external source and publishing them—
emitting (into stream).
▫ A spout can emit multiple streams.
▫ Some important methods of spout are:
nextTuple() ack(Object msgId)
open() fail(Object msgId)
The Components of a Storm topology:
Bolt• Bolt:
▫ A bolt is the processing powerhouse of a Storm topology.
▫ It is responsible for transforming a stream.
▫ each bolt in the topology should be doing a simple
transformation of the tuples.
▫ many such bolts can coordinate with each other to exhibit a
complex transformation.
▫ Some important methods of Bolt are:
 execute(Tuple input):
 This method is executed for each tuple that comes through
the subscribed input streams.
 prepare(Map stormConf, TopologyContext
context, OutputCollector collector):
 In this method, you should make sure the bolt is properly
configured to execute tuples now.
Operation modes
Operation modes indicate how the topology is
deployed in Storm.
• The local mode:
▫ Storm topologies run on the local machine in a
single JVM.
• The remote mode:
▫ we will use the Storm client to submit the topology
to the master along with all the necessary code
required to execute the topology.
Slide #1:Introduction to Apache Storm

Contenu connexe

Tendances

Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Rohit Agrawal
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremRahul Jain
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
Data Streaming For Big Data
Data Streaming For Big DataData Streaming For Big Data
Data Streaming For Big DataSeval Çapraz
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBaseCloudera, Inc.
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to sparkHome
 
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Simplilearn
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Distributed Systems Real Life Applications
Distributed Systems Real Life ApplicationsDistributed Systems Real Life Applications
Distributed Systems Real Life ApplicationsAman Srivastava
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Simplilearn
 

Tendances (20)

Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
Hadoop Oozie
Hadoop OozieHadoop Oozie
Hadoop Oozie
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP Theorem
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
 
Data Streaming For Big Data
Data Streaming For Big DataData Streaming For Big Data
Data Streaming For Big Data
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBase
 
SQOOP PPT
SQOOP PPTSQOOP PPT
SQOOP PPT
 
Secure Hash Algorithm
Secure Hash AlgorithmSecure Hash Algorithm
Secure Hash Algorithm
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to spark
 
Introduction to HBase
Introduction to HBaseIntroduction to HBase
Introduction to HBase
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
 
Nosql data models
Nosql data modelsNosql data models
Nosql data models
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Distributed Systems Real Life Applications
Distributed Systems Real Life ApplicationsDistributed Systems Real Life Applications
Distributed Systems Real Life Applications
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
 
Apache spark
Apache sparkApache spark
Apache spark
 

En vedette

Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationnathanmarz
 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignMichael Noll
 
Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Dan Lynn
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopDataWorks Summit
 
Apache Storm Tutorial
Apache Storm TutorialApache Storm Tutorial
Apache Storm TutorialDavide Mazza
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014P. Taylor Goetz
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark StreamingP. Taylor Goetz
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureP. Taylor Goetz
 
Realtime processing with storm presentation
Realtime processing with storm presentationRealtime processing with storm presentation
Realtime processing with storm presentationGabriel Eisbruch
 
Panorama des offres NoSQL disponibles dans Azure
Panorama des offres NoSQL disponibles dans AzurePanorama des offres NoSQL disponibles dans Azure
Panorama des offres NoSQL disponibles dans AzureMicrosoft Technet France
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
Building a Real-time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-time Data Pipeline: Apache Kafka at LinkedInBuilding a Real-time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-time Data Pipeline: Apache Kafka at LinkedInDataWorks Summit
 
An Introduction to MapReduce
An Introduction to MapReduceAn Introduction to MapReduce
An Introduction to MapReduceFrane Bandov
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingHortonworks
 
Automatic Scaling Iterative Computations
Automatic Scaling Iterative ComputationsAutomatic Scaling Iterative Computations
Automatic Scaling Iterative ComputationsGuozhang Wang
 
Behavioral Simulations in MapReduce
Behavioral Simulations in MapReduceBehavioral Simulations in MapReduce
Behavioral Simulations in MapReduceGuozhang Wang
 
Phoenix - A High Performance Open Source SQL Layer over HBase
Phoenix - A High Performance Open Source SQL Layer over HBasePhoenix - A High Performance Open Source SQL Layer over HBase
Phoenix - A High Performance Open Source SQL Layer over HBaseSalesforce Developers
 

En vedette (20)

Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computation
 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - Verisign
 
Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.
 
Resource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache StormResource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache Storm
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and Hadoop
 
Apache Storm Tutorial
Apache Storm TutorialApache Storm Tutorial
Apache Storm Tutorial
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark Streaming
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
 
Realtime processing with storm presentation
Realtime processing with storm presentationRealtime processing with storm presentation
Realtime processing with storm presentation
 
Panorama des offres NoSQL disponibles dans Azure
Panorama des offres NoSQL disponibles dans AzurePanorama des offres NoSQL disponibles dans Azure
Panorama des offres NoSQL disponibles dans Azure
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Présentation Club STORM
Présentation Club STORMPrésentation Club STORM
Présentation Club STORM
 
Building a Real-time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-time Data Pipeline: Apache Kafka at LinkedInBuilding a Real-time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-time Data Pipeline: Apache Kafka at LinkedIn
 
An Introduction to MapReduce
An Introduction to MapReduceAn Introduction to MapReduce
An Introduction to MapReduce
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Automatic Scaling Iterative Computations
Automatic Scaling Iterative ComputationsAutomatic Scaling Iterative Computations
Automatic Scaling Iterative Computations
 
Behavioral Simulations in MapReduce
Behavioral Simulations in MapReduceBehavioral Simulations in MapReduce
Behavioral Simulations in MapReduce
 
Phoenix - A High Performance Open Source SQL Layer over HBase
Phoenix - A High Performance Open Source SQL Layer over HBasePhoenix - A High Performance Open Source SQL Layer over HBase
Phoenix - A High Performance Open Source SQL Layer over HBase
 

Similaire à Slide #1:Introduction to Apache Storm

Similaire à Slide #1:Introduction to Apache Storm (20)

Slide #2: Setup Apache Storm
Slide #2: Setup Apache StormSlide #2: Setup Apache Storm
Slide #2: Setup Apache Storm
 
Slide #2: How to Setup Apache STROM
Slide #2: How to Setup Apache STROMSlide #2: How to Setup Apache STROM
Slide #2: How to Setup Apache STROM
 
Storm
StormStorm
Storm
 
Apache Storm
Apache StormApache Storm
Apache Storm
 
Apache Storm
Apache StormApache Storm
Apache Storm
 
Cleveland HUG - Storm
Cleveland HUG - StormCleveland HUG - Storm
Cleveland HUG - Storm
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache Storm
 
Apache Storm Basics
Apache Storm BasicsApache Storm Basics
Apache Storm Basics
 
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation systemBWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
 
Apache Storm Tutorial
Apache Storm TutorialApache Storm Tutorial
Apache Storm Tutorial
 
IOT.pptx
IOT.pptxIOT.pptx
IOT.pptx
 
storm for RTA.pptx
storm for RTA.pptxstorm for RTA.pptx
storm for RTA.pptx
 
Introduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & ExampleIntroduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & Example
 
Mhug apache storm
Mhug apache stormMhug apache storm
Mhug apache storm
 
Distributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache StormDistributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache Storm
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache Storm
 
Storm - SpaaS
Storm - SpaaSStorm - SpaaS
Storm - SpaaS
 
Apache Storm Internals
Apache Storm InternalsApache Storm Internals
Apache Storm Internals
 
Workshop slides
Workshop slidesWorkshop slides
Workshop slides
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 

Dernier

Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 

Dernier (20)

Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 

Slide #1:Introduction to Apache Storm

  • 1. Introduction to Storm Md. Shamsur Rahim Student of MScCS American International University Bangladesh
  • 2. Types of Processing of Big Data • Batch Processing ▫ Takes large amount Data at a time, analyzes it and produces a large output. • Real-Time Processing ▫ Collects, analyzes and produces output in Real time.
  • 3. Let’s get Familiar with Storm • Storm is an open source software. • Currently being incubated at the Apache Software Foundation. • Industry leaders choose it because it’s distributed, real-time, data processing platform.
  • 4. Possible Areas of Using Storm • Stream processing: ▫ Storm is used to process a stream of data and update a variety of Databases in real time. ▫ This processing occurs in real time and the processing speed needs to match the input data speed. • Continuous computation: ▫ Storm can do continuous computation on data streams and stream the results into clients in real time. • Distributed RPC () ▫ Storm can parallelize an intense query so that you can compute it in real time. • Real-time analytics: ▫ Storm can analyze and respond to data that comes ▫ from different data sources as they happen in real time.
  • 5. Features of Storm • Fast • Horizontally scalable: ▫ As it is distributed platform, we can add nodes (1 node = 1 single machine) to Strom cluster. ▫ We can double the processing capacity by Doubling the nodes. • Fault tolerant • Guaranteed data processing • Easy to operate • Programming language agnostic
  • 6. Storm components A Storm cluster follows a master- slave model where the master and slave processes are coordinated through ZooKeeper.
  • 7. Storm components: Nimbus • Nimbus: ▫ is the master in a Storm cluster. ▫ It is responsible for :  distributing application code across various worker nodes  assigning tasks to different machines  monitoring tasks for any failures  restarting them as and when required ▫ It is stateless. ▫ Stores all of its data in ZooKeeper. ▫ Only one Nimbus node in a Storm Cluster. ▫ It can be restarted without having any effects on the already running tasks on the worker nodes.
  • 8. Storm components: Supervisor nodes • Supervisor nodes ▫ are the worker nodes in a Storm cluster. ▫ Each supervisor node runs a supervisor daemon that is responsible for-  creating,  Starting and  Stopping worker processes to execute the tasks assigned to that node. ▫ Like Nimbus, a supervisor daemon is also fail-fast and stores all of its state in ZooKeeper so that it can be restarted without any state loss. ▫ A single supervisor daemon normally handles multiple worker processes .
  • 9. Storm components: The ZooKeeper cluster • ZooKeeper is an application that: ▫ Coordinates, ▫ Shares some configuration information Among various processes. ▫ Being a distributed application, Storm also uses a ZooKeeper cluster to coordinate various processes. ▫ All of the states associated with the cluster and the various tasks submitted to the Storm are stored in ZooKeeper. ▫ Nimbus and Supervisor Nodes communicate only through ZooKeeper. ▫ As all data are stored in ZooKeeper, So any time we can kill Nimbus or supervisor daemons without affecting cluster.
  • 10. The Storm data model • Basic unit of data that can be processed by a Storm application is called a tuple. • Each tuple consists of a predefined list of fields. • The value of each field can be a byte, char, integer, long, float, double, Boolean, or byte array. • Storm also provides an API to define your own data types, which can be serialized as fields in a tuple. • A tuple is dynamically typed, that is, you just need to define the names of the fields in a tuple and not their data type. • Fields in a tuple can be accessed by its name getValueByField(String) or its positional index getValue(int).
  • 11. Definition of a Storm topology • A topology is an abstraction that defines the graph of the computation. • We create a Storm topology and deploy it on a Storm cluster to process the data. • A topology can be represented by a direct acyclic graph.
  • 12. The Components of a Storm topology: Stream • Stream: ▫ The key abstraction in Storm is that of a stream. ▫ It is an unbounded sequence of tuples that can be processed in parallel by Storm. ▫ Each stream can be processed by a single or multiple types of bolts. ▫ Each stream in a Storm application is given an ID. ▫ The bolts can produce and consume tuples from these streams on the basis of their ID. ▫ Each stream also has an associated schema for the tuples that will flow through it.(????)
  • 13. The Components of a Storm topology: Spout • Spout: ▫ A spout is the source of tuples in a Storm topology. ▫ It is responsible for reading or listening to data from an external source and publishing them— emitting (into stream). ▫ A spout can emit multiple streams. ▫ Some important methods of spout are: nextTuple() ack(Object msgId) open() fail(Object msgId)
  • 14. The Components of a Storm topology: Bolt• Bolt: ▫ A bolt is the processing powerhouse of a Storm topology. ▫ It is responsible for transforming a stream. ▫ each bolt in the topology should be doing a simple transformation of the tuples. ▫ many such bolts can coordinate with each other to exhibit a complex transformation. ▫ Some important methods of Bolt are:  execute(Tuple input):  This method is executed for each tuple that comes through the subscribed input streams.  prepare(Map stormConf, TopologyContext context, OutputCollector collector):  In this method, you should make sure the bolt is properly configured to execute tuples now.
  • 15. Operation modes Operation modes indicate how the topology is deployed in Storm. • The local mode: ▫ Storm topologies run on the local machine in a single JVM. • The remote mode: ▫ we will use the Storm client to submit the topology to the master along with all the necessary code required to execute the topology.