SlideShare une entreprise Scribd logo
1  sur  62
Télécharger pour lire hors ligne
1
Stream Processing
in the Cloud
Rafał Leszko (@RafalLeszko)
Cloud Software Engineer at Hazelcast
Hands Up
Hands Up
Raise your hand if…
● ...you know what Stream Processing is?
Hands Up
Raise your hand if…
● ...you know what Stream Processing is?
● ...you have ever used Stream Processing?
Hands Up
Raise your hand if…
● ...you know what Stream Processing is?
● ...you have ever used Stream Processing?
● ...you have ever used Hazelcast Jet?
Agenda
● Part 1: Stream Processing Basics
○ What is Stream Processing and Hazelcast Jet?
○ Example: Word Count
● Part 2: Jet Under the Hood
○ How does it work?
○ Infinite Streams
○ Example: Twitter Cryptocurrency Analysis
● Part 3: Jet in the Cloud
○ Cloud (Kubernetes) integration
○ Example: Stock Trade Aggregator
● Part 4: Jet Features & Use Cases
○ Why would I need it?
○ Example: Web Crawler
Part 1: Stream Processing Basics
What is Hazelcast?
What is Hazelcast?
Products:
What is Hazelcast?
Products:
What is Hazelcast Jet?
What is Hazelcast Jet?
DAG - Direct Acyclic Graph
What is Hazelcast Jet?
What is Hazelcast Jet?
Example 1: Word Count
Problem:
Count the number of occurrences of each word in the given text.
Sample Input:
Lorem ipsum dolor, dolor.
Sample Output:
lorem=1
ipsum=1
dolor=2
Example 1: Word Count
Pure Java
Pattern delimiter = Pattern.compile("W+");
return lines.entrySet().stream()
.map(e -> e.getValue().toLowerCase())
.flatMap(t -> Arrays.stream(delimiter.split(t)))
.filter(word -> !word.isEmpty())
.collect(
groupingBy(
identity(),
counting()));
Example 1: Word Count
Example 1: Word Count
Example 1: Word Count
Example 1: Word Count
Example 1: Word Count
Example 1: Word Count
Hazelcast Jet
Pattern delimiter = Pattern.compile("W+");
Pipeline pipeline = Pipeline.create();
pipeline.drawFrom(Sources.<Long, String>map(LINES))
.map(e -> e.getValue().toLowerCase())
.flatMap(t -> traverseArray(delimiter.split(t)))
.filter(word -> !word.isEmpty())
.groupingKey(wholeItem())
.aggregate(counting())
.drainTo(Sinks.map(COUNTS));
return pipeline;
Example 1: Word Count
Pure Java
Pattern delimiter = Pattern.compile("W+");
return lines.entrySet().stream()
.map(e -> e.getValue().toLowerCase())
.flatMap(t -> Arrays.stream(delimiter.split(t)))
.filter(word -> !word.isEmpty())
.collect(
groupingBy(
identity(),
counting()));
Example 1: Word Count
Hazelcast Jet
Pattern delimiter = Pattern.compile("W+");
Pipeline pipeline = Pipeline.create();
pipeline.drawFrom(Sources.<Long, String>map(LINES))
.map(e -> e.getValue().toLowerCase())
.flatMap(t -> traverseArray(delimiter.split(t)))
.filter(word -> !word.isEmpty())
.groupingKey(wholeItem())
.aggregate(counting())
.drainTo(Sinks.map(COUNTS));
return pipeline;
Example 1: Word Count
Example 1: Word Count
Example 1: Word Count
Hazelcast Jet
Pattern delimiter = Pattern.compile("W+");
Pipeline pipeline = Pipeline.create();
pipeline.drawFrom(Sources.<Long, String>map(LINES))
.map(e -> e.getValue().toLowerCase())
.flatMap(t -> traverseArray(delimiter.split(t)))
.filter(word -> !word.isEmpty())
.groupingKey(wholeItem())
.aggregate(counting())
.drainTo(Sinks.map(COUNTS));
return pipeline;
Example 1: Word Count
Demo:
https://github.com/hazelcast/hazelcast-jet-code-samples
Part 2: Jet Under the Hood
How does it work?
How does it work?
How does it work?
How does it work?
Under the Hood:
● Generate DAG representation from Pipeline
● Serialize DAG
● Send DAG to every Node
● Deserialize DAG
● Executes DAG on each Node
Infinite Streams
Infinite Streams
Examples:
● Currency Exchange Rates
● Tweets from Twitter
● Events in some Event-Based system
● ...
Windowing
pipeline.drawFrom(...)
.withNativeTimestamps(0)
.window(sliding(30_000, 10_000))
Example 2: Twitter Cryptocurrency Analysis
Problem:
Present in real-time the sentiments about cryptocurrencies
Input:
Tweets are streamed from Twitter and categorized by coin type
(BTC, ETC, XRP, etc)
Output:
Tweets sentiments (last 30 sec, last minute, last 5 minutes)
Example 2: Twitter Cryptocurrency Analysis
Demo:
https://jet.hazelcast.org/demos/
Part 3: Jet in the Cloud
Jet in the Cloud: discovery plugins
Jet in the Cloud: discovery plugins
Jet in the Cloud: discovery plugins
Jet in the Cloud: discovery plugins
Jet in the Cloud: discovery plugins
Jet in the Cloud: discovery plugins
Jet in the Cloud: deploying on k8
Jet in the Cloud: deploying on k8
$ helm install stable/hazelcast-jet
Jet in the Cloud: deploying on k8
$ kubectl scale <name> --replicas=6
Example 3: Stock Trade Aggregator
Problem:
Present in real-time the aggregated trade price of stocks
Input:
Stock trades with name and price
Output:
Sum of prices per stock name
Example 3: Stock Trade Aggregator
Demo:
https://github.com/hazelcast/hazelcast-jet-code-samples/t
ree/master/integration/kubernetes
Part 4: Jet Features & Use Cases
Jet Features
Categories of Features
● Easy to Use
● Performance
Jet Features: Performance
Jet Features: Performance
Jet Features: other features
Why would I need it?
● Big Data Projects
Why would I need it?
● Big Data Projects
● Speed up Everything
Why would I need it?
● Big Data Projects
● Speed up Everything
Example 4: Web Crawler
Problem:
Parse all blog posts from the webpage
Input:
URL of Blog Trips
Output:
All the content from the Blog
Example 4: Web Crawler
Demo:
https://github.com/leszko/geodump
Thank You!

Contenu connexe

Tendances

20100712-OTcl Command -- Getting Started
20100712-OTcl Command -- Getting Started20100712-OTcl Command -- Getting Started
20100712-OTcl Command -- Getting Started
Teerawat Issariyakul
 
NS2: Binding C++ and OTcl variables
NS2: Binding C++ and OTcl variablesNS2: Binding C++ and OTcl variables
NS2: Binding C++ and OTcl variables
Teerawat Issariyakul
 
jimmy hacking (at) Microsoft
jimmy hacking (at) Microsoftjimmy hacking (at) Microsoft
jimmy hacking (at) Microsoft
Jimmy Schementi
 
Internship - Final Presentation (26-08-2015)
Internship - Final Presentation (26-08-2015)Internship - Final Presentation (26-08-2015)
Internship - Final Presentation (26-08-2015)
Sean Krail
 
Gaucheで本を作る
Gaucheで本を作るGaucheで本を作る
Gaucheで本を作る
guest7a66b8
 

Tendances (20)

20100712-OTcl Command -- Getting Started
20100712-OTcl Command -- Getting Started20100712-OTcl Command -- Getting Started
20100712-OTcl Command -- Getting Started
 
Parallel computing with GPars
Parallel computing with GParsParallel computing with GPars
Parallel computing with GPars
 
LCDS - State Presentation
LCDS - State PresentationLCDS - State Presentation
LCDS - State Presentation
 
How the Go runtime implement maps efficiently
How the Go runtime implement maps efficientlyHow the Go runtime implement maps efficiently
How the Go runtime implement maps efficiently
 
TypeScript
TypeScriptTypeScript
TypeScript
 
Іван Лаврів "Transducers for ruby developers"
Іван Лаврів "Transducers for ruby developers"Іван Лаврів "Transducers for ruby developers"
Іван Лаврів "Transducers for ruby developers"
 
NS2: Binding C++ and OTcl variables
NS2: Binding C++ and OTcl variablesNS2: Binding C++ and OTcl variables
NS2: Binding C++ and OTcl variables
 
jimmy hacking (at) Microsoft
jimmy hacking (at) Microsoftjimmy hacking (at) Microsoft
jimmy hacking (at) Microsoft
 
Kotlin workshop 2018-06-11
Kotlin workshop 2018-06-11Kotlin workshop 2018-06-11
Kotlin workshop 2018-06-11
 
Cilk - An Efficient Multithreaded Runtime System
Cilk - An Efficient Multithreaded Runtime SystemCilk - An Efficient Multithreaded Runtime System
Cilk - An Efficient Multithreaded Runtime System
 
1548 PROJECT DEMO
1548 PROJECT DEMO1548 PROJECT DEMO
1548 PROJECT DEMO
 
Incremental and parallel computation of structural graph summaries for evolvi...
Incremental and parallel computation of structural graph summaries for evolvi...Incremental and parallel computation of structural graph summaries for evolvi...
Incremental and parallel computation of structural graph summaries for evolvi...
 
DConf 2016: Bitpacking Like a Madman by Amaury Sechet
DConf 2016: Bitpacking Like a Madman by Amaury SechetDConf 2016: Bitpacking Like a Madman by Amaury Sechet
DConf 2016: Bitpacking Like a Madman by Amaury Sechet
 
Object Detection with Tensorflow
Object Detection with TensorflowObject Detection with Tensorflow
Object Detection with Tensorflow
 
Open GL Programming Training Session I
Open GL Programming Training Session IOpen GL Programming Training Session I
Open GL Programming Training Session I
 
Internship - Final Presentation (26-08-2015)
Internship - Final Presentation (26-08-2015)Internship - Final Presentation (26-08-2015)
Internship - Final Presentation (26-08-2015)
 
FOSDEM 2020: Querying over millions and billions of metrics with M3DB's index
FOSDEM 2020: Querying over millions and billions of metrics with M3DB's indexFOSDEM 2020: Querying over millions and billions of metrics with M3DB's index
FOSDEM 2020: Querying over millions and billions of metrics with M3DB's index
 
packet destruction in NS2
packet destruction in NS2packet destruction in NS2
packet destruction in NS2
 
NS2 Shadow Object Construction
NS2 Shadow Object ConstructionNS2 Shadow Object Construction
NS2 Shadow Object Construction
 
Gaucheで本を作る
Gaucheで本を作るGaucheで本を作る
Gaucheで本を作る
 

Similaire à Stream Processing in the Cloud - Athens Kubernetes Meetup 16.07.2019

Threaded Programming
Threaded ProgrammingThreaded Programming
Threaded Programming
Sri Prasanna
 
Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...
Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...
Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...
Raffi Khatchadourian
 

Similaire à Stream Processing in the Cloud - Athens Kubernetes Meetup 16.07.2019 (20)

Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmap
 
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational Databases
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational Databases
 
Flink internals web
Flink internals web Flink internals web
Flink internals web
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing:  Why and How 2.0Distributed Real-Time Stream Processing:  Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0
 
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017
 
Distributed real time stream processing- why and how
Distributed real time stream processing- why and howDistributed real time stream processing- why and how
Distributed real time stream processing- why and how
 
Apache Flink Stream Processing
Apache Flink Stream ProcessingApache Flink Stream Processing
Apache Flink Stream Processing
 
Qt for beginners
Qt for beginnersQt for beginners
Qt for beginners
 
Threaded Programming
Threaded ProgrammingThreaded Programming
Threaded Programming
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Hadoop trainingin bangalore
Hadoop trainingin bangaloreHadoop trainingin bangalore
Hadoop trainingin bangalore
 
L Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformaticsL Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformatics
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 
Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...
Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...
Towards Safe Automated Refactoring of Imperative Deep Learning Programs to Gr...
 
Intro To Graph Databases - Oxana Goriuc
Intro To Graph Databases - Oxana GoriucIntro To Graph Databases - Oxana Goriuc
Intro To Graph Databases - Oxana Goriuc
 

Plus de Rafał Leszko

Where is my cache? Architectural patterns for caching microservices by example
Where is my cache? Architectural patterns for caching microservices by exampleWhere is my cache? Architectural patterns for caching microservices by example
Where is my cache? Architectural patterns for caching microservices by example
Rafał Leszko
 

Plus de Rafał Leszko (20)

Build Your Kubernetes Operator with the Right Tool!
Build Your Kubernetes Operator with the Right Tool!Build Your Kubernetes Operator with the Right Tool!
Build Your Kubernetes Operator with the Right Tool!
 
Mutation Testing with PIT
Mutation Testing with PITMutation Testing with PIT
Mutation Testing with PIT
 
Distributed Locking in Kubernetes
Distributed Locking in KubernetesDistributed Locking in Kubernetes
Distributed Locking in Kubernetes
 
Architectural patterns for high performance microservices in kubernetes
Architectural patterns for high performance microservices in kubernetesArchitectural patterns for high performance microservices in kubernetes
Architectural patterns for high performance microservices in kubernetes
 
Architectural caching patterns for kubernetes
Architectural caching patterns for kubernetesArchitectural caching patterns for kubernetes
Architectural caching patterns for kubernetes
 
Architectural patterns for caching microservices
Architectural patterns for caching microservicesArchitectural patterns for caching microservices
Architectural patterns for caching microservices
 
Mutation testing with PIT
Mutation testing with PITMutation testing with PIT
Mutation testing with PIT
 
[jLove 2020] Where is my cache architectural patterns for caching microservi...
[jLove 2020] Where is my cache  architectural patterns for caching microservi...[jLove 2020] Where is my cache  architectural patterns for caching microservi...
[jLove 2020] Where is my cache architectural patterns for caching microservi...
 
Where is my cache architectural patterns for caching microservices by example
Where is my cache  architectural patterns for caching microservices by exampleWhere is my cache  architectural patterns for caching microservices by example
Where is my cache architectural patterns for caching microservices by example
 
Architectural caching patterns for kubernetes
Architectural caching patterns for kubernetesArchitectural caching patterns for kubernetes
Architectural caching patterns for kubernetes
 
Build your operator with the right tool
Build your operator with the right toolBuild your operator with the right tool
Build your operator with the right tool
 
5 levels of high availability from multi instance to hybrid cloud
5 levels of high availability  from multi instance to hybrid cloud5 levels of high availability  from multi instance to hybrid cloud
5 levels of high availability from multi instance to hybrid cloud
 
Where is my cache? Architectural patterns for caching microservices by example
Where is my cache? Architectural patterns for caching microservices by exampleWhere is my cache? Architectural patterns for caching microservices by example
Where is my cache? Architectural patterns for caching microservices by example
 
5 Levels of High Availability: From Multi-instance to Hybrid Cloud
5 Levels of High Availability: From Multi-instance to Hybrid Cloud5 Levels of High Availability: From Multi-instance to Hybrid Cloud
5 Levels of High Availability: From Multi-instance to Hybrid Cloud
 
Where is my cache architectural patterns for caching microservices by example
Where is my cache architectural patterns for caching microservices by exampleWhere is my cache architectural patterns for caching microservices by example
Where is my cache architectural patterns for caching microservices by example
 
Where is my cache architectural patterns for caching microservices by example
Where is my cache architectural patterns for caching microservices by exampleWhere is my cache architectural patterns for caching microservices by example
Where is my cache architectural patterns for caching microservices by example
 
Where is my cache? Architectural patterns for caching microservices by example
Where is my cache? Architectural patterns for caching microservices by exampleWhere is my cache? Architectural patterns for caching microservices by example
Where is my cache? Architectural patterns for caching microservices by example
 
[DevopsDays India 2019] Where is my cache? Architectural patterns for caching...
[DevopsDays India 2019] Where is my cache? Architectural patterns for caching...[DevopsDays India 2019] Where is my cache? Architectural patterns for caching...
[DevopsDays India 2019] Where is my cache? Architectural patterns for caching...
 
Where is my cache? Architectural patterns for caching microservices by example
Where is my cache? Architectural patterns for caching microservices by exampleWhere is my cache? Architectural patterns for caching microservices by example
Where is my cache? Architectural patterns for caching microservices by example
 
Stream Processing with Hazelcast Jet - Voxxed Days Thessaloniki 19.11.2018
Stream Processing with Hazelcast Jet - Voxxed Days Thessaloniki 19.11.2018Stream Processing with Hazelcast Jet - Voxxed Days Thessaloniki 19.11.2018
Stream Processing with Hazelcast Jet - Voxxed Days Thessaloniki 19.11.2018
 

Dernier

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Stream Processing in the Cloud - Athens Kubernetes Meetup 16.07.2019

  • 1. 1 Stream Processing in the Cloud Rafał Leszko (@RafalLeszko) Cloud Software Engineer at Hazelcast
  • 3. Hands Up Raise your hand if… ● ...you know what Stream Processing is?
  • 4. Hands Up Raise your hand if… ● ...you know what Stream Processing is? ● ...you have ever used Stream Processing?
  • 5. Hands Up Raise your hand if… ● ...you know what Stream Processing is? ● ...you have ever used Stream Processing? ● ...you have ever used Hazelcast Jet?
  • 6. Agenda ● Part 1: Stream Processing Basics ○ What is Stream Processing and Hazelcast Jet? ○ Example: Word Count ● Part 2: Jet Under the Hood ○ How does it work? ○ Infinite Streams ○ Example: Twitter Cryptocurrency Analysis ● Part 3: Jet in the Cloud ○ Cloud (Kubernetes) integration ○ Example: Stock Trade Aggregator ● Part 4: Jet Features & Use Cases ○ Why would I need it? ○ Example: Web Crawler
  • 7. Part 1: Stream Processing Basics
  • 12. What is Hazelcast Jet? DAG - Direct Acyclic Graph
  • 15. Example 1: Word Count Problem: Count the number of occurrences of each word in the given text. Sample Input: Lorem ipsum dolor, dolor. Sample Output: lorem=1 ipsum=1 dolor=2
  • 16. Example 1: Word Count Pure Java Pattern delimiter = Pattern.compile("W+"); return lines.entrySet().stream() .map(e -> e.getValue().toLowerCase()) .flatMap(t -> Arrays.stream(delimiter.split(t))) .filter(word -> !word.isEmpty()) .collect( groupingBy( identity(), counting()));
  • 22. Example 1: Word Count Hazelcast Jet Pattern delimiter = Pattern.compile("W+"); Pipeline pipeline = Pipeline.create(); pipeline.drawFrom(Sources.<Long, String>map(LINES)) .map(e -> e.getValue().toLowerCase()) .flatMap(t -> traverseArray(delimiter.split(t))) .filter(word -> !word.isEmpty()) .groupingKey(wholeItem()) .aggregate(counting()) .drainTo(Sinks.map(COUNTS)); return pipeline;
  • 23. Example 1: Word Count Pure Java Pattern delimiter = Pattern.compile("W+"); return lines.entrySet().stream() .map(e -> e.getValue().toLowerCase()) .flatMap(t -> Arrays.stream(delimiter.split(t))) .filter(word -> !word.isEmpty()) .collect( groupingBy( identity(), counting()));
  • 24. Example 1: Word Count Hazelcast Jet Pattern delimiter = Pattern.compile("W+"); Pipeline pipeline = Pipeline.create(); pipeline.drawFrom(Sources.<Long, String>map(LINES)) .map(e -> e.getValue().toLowerCase()) .flatMap(t -> traverseArray(delimiter.split(t))) .filter(word -> !word.isEmpty()) .groupingKey(wholeItem()) .aggregate(counting()) .drainTo(Sinks.map(COUNTS)); return pipeline;
  • 27. Example 1: Word Count Hazelcast Jet Pattern delimiter = Pattern.compile("W+"); Pipeline pipeline = Pipeline.create(); pipeline.drawFrom(Sources.<Long, String>map(LINES)) .map(e -> e.getValue().toLowerCase()) .flatMap(t -> traverseArray(delimiter.split(t))) .filter(word -> !word.isEmpty()) .groupingKey(wholeItem()) .aggregate(counting()) .drainTo(Sinks.map(COUNTS)); return pipeline;
  • 28. Example 1: Word Count Demo: https://github.com/hazelcast/hazelcast-jet-code-samples
  • 29. Part 2: Jet Under the Hood
  • 30. How does it work?
  • 31. How does it work?
  • 32. How does it work?
  • 33. How does it work? Under the Hood: ● Generate DAG representation from Pipeline ● Serialize DAG ● Send DAG to every Node ● Deserialize DAG ● Executes DAG on each Node
  • 34.
  • 36. Infinite Streams Examples: ● Currency Exchange Rates ● Tweets from Twitter ● Events in some Event-Based system ● ...
  • 38. Example 2: Twitter Cryptocurrency Analysis Problem: Present in real-time the sentiments about cryptocurrencies Input: Tweets are streamed from Twitter and categorized by coin type (BTC, ETC, XRP, etc) Output: Tweets sentiments (last 30 sec, last minute, last 5 minutes)
  • 39. Example 2: Twitter Cryptocurrency Analysis Demo: https://jet.hazelcast.org/demos/
  • 40. Part 3: Jet in the Cloud
  • 41. Jet in the Cloud: discovery plugins
  • 42. Jet in the Cloud: discovery plugins
  • 43. Jet in the Cloud: discovery plugins
  • 44. Jet in the Cloud: discovery plugins
  • 45. Jet in the Cloud: discovery plugins
  • 46. Jet in the Cloud: discovery plugins
  • 47. Jet in the Cloud: deploying on k8
  • 48. Jet in the Cloud: deploying on k8 $ helm install stable/hazelcast-jet
  • 49. Jet in the Cloud: deploying on k8 $ kubectl scale <name> --replicas=6
  • 50. Example 3: Stock Trade Aggregator Problem: Present in real-time the aggregated trade price of stocks Input: Stock trades with name and price Output: Sum of prices per stock name
  • 51. Example 3: Stock Trade Aggregator Demo: https://github.com/hazelcast/hazelcast-jet-code-samples/t ree/master/integration/kubernetes
  • 52. Part 4: Jet Features & Use Cases
  • 53. Jet Features Categories of Features ● Easy to Use ● Performance
  • 57. Why would I need it? ● Big Data Projects
  • 58. Why would I need it? ● Big Data Projects ● Speed up Everything
  • 59. Why would I need it? ● Big Data Projects ● Speed up Everything
  • 60. Example 4: Web Crawler Problem: Parse all blog posts from the webpage Input: URL of Blog Trips Output: All the content from the Blog
  • 61. Example 4: Web Crawler Demo: https://github.com/leszko/geodump