SlideShare une entreprise Scribd logo
1  sur  74
Télécharger pour lire hors ligne
Storm
The Real-Time Layer Your Big
    Data’s Been Missing


        Dan Lynn
      dan@fullcontact.com
          @danklynn
Keeps Contact Information Current and Complete


  Based in Denver, Colorado




                              CTO & Co-Founder
                               dan@fullcontact.com
                                   @danklynn
Turn Partial Contacts
 Into Full Contacts
Your data is old
Old data is confusing
First, there was the spreadsheet
Next, there was SQL
Then, there wasn’t SQL
Batch Processing
        =
   Stale Data
Streaming
Computation
Queues / Workers
Queues / Workers




Messages
Messages
 Messages
 Messages
  Messages
  Messages          Queue
   Messages




                                 Workers
Queues / Workers
Me ssage rou ting
can be com ple x
 Messages
 Messages
  Messages
  Messages
   Messages
   Messages          Queue
    Messages




                                  Workers
Queues / Workers
Brittle
Hard to scale
Queues / Workers
Queues / Workers
Queues / Workers
Queues / Workers
Rou ting mus t be
reco nfig ured whe n
sca ling out
Storm
Storm
Distributed and fault-tolerant real-time computation
Storm
Distributed and fault-tolerant real-time computation
Storm
Distributed and fault-tolerant real-time computation
Storm
Distributed and fault-tolerant real-time computation
Key Concepts
Tuples
Ordered list of elements
Tuples
           Ordered list of elements


("search-01384", "e:dan@fullcontact.com")
Streams
Unbounded sequence of tuples
Streams
        Unbounded sequence of tuples

Tuple    Tuple   Tuple   Tuple   Tuple   Tuple
Spouts
 Source of streams
Spouts
 Source of streams
Spouts  Source of streams

Tuple   Tuple   Tuple   Tuple   Tuple   Tuple
Spouts can talk with




               some images from http://commons.wikimedia.org
Spouts can talk with


•Queues




                     some images from http://commons.wikimedia.org
Spouts can talk with


•Queues

•Web logs




                      some images from http://commons.wikimedia.org
Spouts can talk with


•Queues

•Web logs

•API calls




                       some images from http://commons.wikimedia.org
Spouts can talk with


•Queues

•Web logs

•API calls

•Event data


                       some images from http://commons.wikimedia.org
Bolts
Process tuples and create new streams
Bolts



                                                                                             Tuple
                                                                                    Tuple
                                                                           Tuple
                                                                  Tuple
                                                         Tuple
                                                Tuple

Tuple   Tuple   Tuple   Tuple   Tuple   Tuple
                                                 Tuple
                                                          Tuple
                                                                   Tuple
                                                                            Tuple
                                                                                     Tuple
                                                                                             Tuple




                                                                               some images from http://commons.wikimedia.org
Bolts




        some images from http://commons.wikimedia.org
Bolts


•Apply functions / transforms




                     some images from http://commons.wikimedia.org
Bolts


•Apply functions / transforms
•Filter




                     some images from http://commons.wikimedia.org
Bolts


•Apply functions / transforms
•Filter
•Aggregation




                       some images from http://commons.wikimedia.org
Bolts


•Apply functions / transforms
•Filter
•Aggregation
•Streaming joins




                       some images from http://commons.wikimedia.org
Bolts


•Apply functions / transforms
•Filter
•Aggregation
•Streaming joins
•Access DBs, APIs, etc...


                       some images from http://commons.wikimedia.org
Topologies
A directed graph of Spouts and Bolts
This is a Topology




               some images from http://commons.wikimedia.org
This is also a topology




                 some images from http://commons.wikimedia.org
Tasks
Processes which execute Streams or Bolts
Running a Topology




$ storm jar my-code.jar com.example.MyTopology arg1 arg2
Storm Cluster




                Nathan Marz
Storm Cluster
If thi s we re
Hadoo p...




                                 Nathan Marz
Storm Cluster
If thi s we re
Hadoo p...




Job Tracke r
                                 Nathan Marz
Storm Cluster
If thi s we re
Hadoo p...




         Tas k Tracke rs         Nathan Marz
Storm Cluster

But it’s not Hado op




Coo rdi nates eve ry thi ng
                                       Nathan Marz
Example:
Streaming Word Count
Streaming Word Count


TopologyBuilder builder = new TopologyBuilder();

builder.setSpout("sentences", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8)
        .shuffleGrouping("sentences");
builder.setBolt("count", new WordCount(), 12)
        .fieldsGrouping("split", new Fields("word"));
Streaming Word Count


TopologyBuilder builder = new TopologyBuilder();

builder.setSpout("sentences", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8)
        .shuffleGrouping("sentences");
builder.setBolt("count", new WordCount(), 12)
        .fieldsGrouping("split", new Fields("word"));
Streaming Word Count
public static class SplitSentence extends ShellBolt implements IRichBolt {
        
    public SplitSentence() {
        super("python", "splitsentence.py");
    }

    @Override
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("word"));
    }

    @Override
    public Map<String, Object> getComponentConfiguration() {
        return null;
    }
}


                                                                   SplitSentence.java
Streaming Word Count
public static class SplitSentence extends ShellBolt implements IRichBolt {
        
    public SplitSentence() {
        super("python", "splitsentence.py");
    }

    @Override
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("word"));
    }

    @Override
    public Map<String, Object> getComponentConfiguration() {
        return null;
    }
}


                                                                        SplitSentence.java




                                                     splitsentence.py
Streaming Word Count
public static class SplitSentence extends ShellBolt implements IRichBolt {
        
    public SplitSentence() {
        super("python", "splitsentence.py");
    }

    @Override
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("word"));
    }

    @Override
    public Map<String, Object> getComponentConfiguration() {
        return null;
    }
}


                                                                   SplitSentence.java
Streaming Word Count


TopologyBuilder builder = new TopologyBuilder();

builder.setSpout("sentences", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8)
        .shuffleGrouping("sentences");
builder.setBolt("count", new WordCount(), 12)
        .fieldsGrouping("split", new Fields("word"));



                                                               java
Streaming Word Count
public static class WordCount extends BaseBasicBolt {
    Map<String, Integer> counts = new HashMap<String, Integer>();

    @Override
    public void execute(Tuple tuple, BasicOutputCollector collector) {
        String word = tuple.getString(0);
        Integer count = counts.get(word);
        if(count==null) count = 0;
        count++;
        counts.put(word, count);
        collector.emit(new Values(word, count));
    }

    @Override
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("word", "count"));
    }
}
                                                                    WordCount.java
Streaming Word Count


 TopologyBuilder builder = new TopologyBuilder();

 builder.setSpout("sentences", new RandomSentenceSpout(), 5);
 builder.setBolt("split", new SplitSentence(), 8)
         .shuffleGrouping("sentences");
 builder.setBolt("count", new WordCount(), 12)
         .fieldsGrouping("split", new Fields("word"));



                                                                java



Gro upings con tro l how tup les are rou ted
Shuffle grouping
Tuples are randomly distributed across all of the
             tasks running the bolt
Fields grouping
Groups tuples by specific named fields and routes
             them to the same task
o p’s
                    ado r
               t o H av i o
          o us b e h
An a lo g i ng
   rt i t io n
pa
              Fields grouping
     Groups tuples by specific named fields and routes
                  them to the same task
Distributed RPC
Before Distributed RPC,
time-sensitive queries relied on a
      pre-computed index
What if you didn’t need an index?
Distributed RPC
Try it out!

Huge thanks to Nathan Marz - @nathanmarz

   http://github.com/nathanmarz/storm

https://github.com/nathanmarz/storm-starter
             @stormprocessor
Questions?
 dan@fullcontact.com

Contenu connexe

Tendances

Multi-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop GridMulti-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop Grid
DataWorks Summit
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
Chandler Huang
 
Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.
Dan Lynn
 

Tendances (20)

Multi-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop GridMulti-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop Grid
 
Realtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQRealtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQ
 
Introduction to Apache Storm
Introduction to Apache StormIntroduction to Apache Storm
Introduction to Apache Storm
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache Storm
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)
 
Stream Processing Frameworks
Stream Processing FrameworksStream Processing Frameworks
Stream Processing Frameworks
 
Improved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as exampleImproved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as example
 
Apache Storm Internals
Apache Storm InternalsApache Storm Internals
Apache Storm Internals
 
Real time and reliable processing with Apache Storm
Real time and reliable processing with Apache StormReal time and reliable processing with Apache Storm
Real time and reliable processing with Apache Storm
 
Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache Storm
 
Developing Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache StormDeveloping Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache Storm
 
Resource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache StormResource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache Storm
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
 
Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.
 
Storm: The Real-Time Layer - GlueCon 2012
Storm: The Real-Time Layer  - GlueCon 2012Storm: The Real-Time Layer  - GlueCon 2012
Storm: The Real-Time Layer - GlueCon 2012
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
 
Spark vs storm
Spark vs stormSpark vs storm
Spark vs storm
 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - Verisign
 

Similaire à Storm - The Real-Time Layer Your Big Data's Been Missing

Streams processing with Storm
Streams processing with StormStreams processing with Storm
Streams processing with Storm
Mariusz Gil
 
Presentation on nesting of loops
Presentation on nesting of loopsPresentation on nesting of loops
Presentation on nesting of loops
bsdeol28
 
Apache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integrationApache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integration
Uday Vakalapudi
 

Similaire à Storm - The Real-Time Layer Your Big Data's Been Missing (20)

Streams processing with Storm
Streams processing with StormStreams processing with Storm
Streams processing with Storm
 
Twitter Stream Processing
Twitter Stream ProcessingTwitter Stream Processing
Twitter Stream Processing
 
Presentation on nesting of loops
Presentation on nesting of loopsPresentation on nesting of loops
Presentation on nesting of loops
 
Apache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integrationApache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integration
 
Realtime processing with storm presentation
Realtime processing with storm presentationRealtime processing with storm presentation
Realtime processing with storm presentation
 
C++ Standard Template Library
C++ Standard Template LibraryC++ Standard Template Library
C++ Standard Template Library
 
Domain-Specific Languages
Domain-Specific LanguagesDomain-Specific Languages
Domain-Specific Languages
 
Twitter Big Data
Twitter Big DataTwitter Big Data
Twitter Big Data
 
STORM
STORMSTORM
STORM
 
Storm 0.8.2
Storm 0.8.2Storm 0.8.2
Storm 0.8.2
 
Apache PIG - User Defined Functions
Apache PIG - User Defined FunctionsApache PIG - User Defined Functions
Apache PIG - User Defined Functions
 
Mastering Python lesson3b_for_loops
Mastering Python lesson3b_for_loopsMastering Python lesson3b_for_loops
Mastering Python lesson3b_for_loops
 
Dapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterDapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL Neater
 
storm-170531123446.pptx
storm-170531123446.pptxstorm-170531123446.pptx
storm-170531123446.pptx
 
Intro to Apache Storm
Intro to Apache StormIntro to Apache Storm
Intro to Apache Storm
 
Storm introduction
Storm introductionStorm introduction
Storm introduction
 
understand Storm in pictures
understand Storm in picturesunderstand Storm in pictures
understand Storm in pictures
 
Loops_in_Rv1.2b
Loops_in_Rv1.2bLoops_in_Rv1.2b
Loops_in_Rv1.2b
 
My lecture stack_queue_operation
My lecture stack_queue_operationMy lecture stack_queue_operation
My lecture stack_queue_operation
 
06 Loops
06 Loops06 Loops
06 Loops
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Storm - The Real-Time Layer Your Big Data's Been Missing