SlideShare une entreprise Scribd logo
1  sur  41
Télécharger pour lire hors ligne
Presented By: Anuj & Jashan
Let’s get to know
Streaming
- A developer’s point of view
Our Agenda
01 Streaming: What, Why,
Benefits
02 Different Architectures
03 Challenges in Stream Processing
04 Types of Stream Processing: Stateless &
Stateful
05 Stateful Stream Processing:
Elaborated
What is Data Streaming?
● A continuous flow of data is called Data
Streaming.
● Ex: Surge in IoT devices caused more data to
gather.
● Data gathered at real time can be processed
to get real time results.
● Stream processing is the practice of taking
action on a series of data at the time the
data is created.
Why Data Streaming ?
● Providing insights faster.
● Handle never-ending stream of events
● Easy inspection of data from multiple streams
simultaneously
● Stream processing can work with a lot less hardware
than batch processing
● To Design data processing engine with infinite data sets
in mind.
Streaming: Benefits
● Lot less hardware required.
● Real-time fraud and anomaly detection.
● Internet of Things (IoT) real-time analytics.
● Real-time personalization, marketing, and
advertising.
Evolution of Stream Processing
beginning 1970 Early 2000s 2015 Current
Fortran / C
Started Simple processing
SQL and RDBMs
Databases invented
Batch
processing
Bulk processing and Big
Data like Map-Reduce
Streaming or
Micro Batching
Stream processing started
showing promises
Streaming SQL
Unified
Processing
Streaming or Micro-Batching ?
Different Architectures
Different Architectures: At a Glance
Lambda vs. Kappa
Lambda Architecture
Batch Processing + Streaming Power
Kappa Architecture
Pure Streaming
Challenges
Stream Processing Challenges
3 4
Late Data
Data received at a later time
than the actual event time
Deduplication
Removing duplicate data in
stream
1 2
Stream Joins
Joining Data from two
Streams
Aggregations
Aggregation operations for
SQL
5 6
Fast Incoming
DataStream Processor
Not Upto Speed
Fault tolerance
This paragraph actually is a
good place for title
description
Solution in Streaming
1 2
3 4
5 6
Stream Joins
Managing state in Streaming
Watermarking
Late Data
Managing state in Streaming
window, watermarking
Fast Incoming
DataBackpressure
Aggregations
Apply grouping (window)
and watermarking
Deduplication
Managing state in Streaming
with watermarking
Fault tolerance
Checkpointing
Late Data in Streaming
Types of Stream
Processing
Types of Stream Processing
Stateless & Stateful
Stateless Stream Processing
What
This streaming is the straight
forward streaming we don’t need to
maintain state
Where
Where we need to perform some
operation per individual
message/event like filter, select, etc
When
when result is not dependent upon
previous events
Stateful Stream Processing
What
This stream is maintaining the state to
perform Aggregations, Deduplication,
Joins
Where
where we need to perform
operations like groupBy, count, etc
When
When result is dependent upon
previous events.
Stateful Stream
Processing: Elaborated
Stateful Streaming: Aggregation
● Aggregation by key only
● Aggregation by event time windows
● Aggregation by both
Windowing
01
02
03
This is simplest window.. This
window is pretty straight forward
We can perform both
windows by with respect to
the processing time and
event time.
This is window is bit complex then the Fixed
window. This window gives us two insights
like window and slice
Windowing by processing time vs
event time
Fixed window (Tumbling windows)
Sliding window (Hopping
windows)
Event Time & Processing Time
Windowing by Processing Time vs Event
Time
Processing Time Window
● Processing time window is based upon
the clock time window.
● All the late events will keep into current
window
● Do not reorder the out of order events
Event Time Window
● Event time window is based upon time
when event get produced
● Event will be keep in the belonging
window.
● Re-order the out of order events.
Fixed Window (Tumbling Window)
Fixed/tumbling: time is partitioned into same-length,
non-overlapping chunks. Each event belongs to exactly one
window
Fixed Window (Tumbling Window)
Sliding Window (Hopping Window)
Sliding: windows have fixed length, but are separated by a time
interval (step) which can be smaller than the window length. Typically
the window interval is a multiplicity of the step. Each event belongs to
a number of windows ([window interval]/[window step]).
Sliding Window (Hopping Window)
Late Data in Streaming
Watermarking
● Data newer than watermark may be late, but allowed
to aggregate
● Windows older than watermark automatically deleted
to
limit the amount of intermediate state
Handle more late data -> Keep more state
Reduced the state -> Handle less lateness
Internal working of Watermarking
Stateful Streaming: Deduplication
● Drop duplicate records in a Stream
● Specify Columns which uniquely identify
records
● State will store unique keys in stream and
drop any record matching the state
Stateful Streaming: Deduplication
● Too large Key Set in state for
deduplication will make the streaming
unstable
● Solution: Drop the state after a specified
period.
Stateful Streaming: Joins
● Each of the stream should buffer events in
state for matching any future events of other
stream.
Stateful Streaming: Joins
Stateful Streaming: Joins
● Impressions can be 2 hours late
● Clicks can be 3 hours late
● Clicks can occur within 1 hour after the
corresponding impression
Some Use case of Streaming
● Algorithmic Trading, Stock Market Surveillance,
● Smart Patient Care
● Monitoring a production line
● Supply chain optimizations
● Intrusion, Surveillance and Fraud Detection ( e.g. Uber)
● Most Smart Device Applications: Smart Car, Smart Home ..
● Smart Grid — (e.g. load prediction and outlier plug detection see Smart grids, 4 Billion events, throughout in range
of 100Ks)
● Traffic Monitoring, Geofencing, Vehicle, and Wildlife tracking — e.g. TFL London Transport Management System
● Sports analytics — Augment Sports with real-time analytics (e.g. this is a work we did with a real football game (e.g.
Overlaying real time analytics on Football Broadcasts)
● Context-aware promotions and advertising
● Computer system and network monitoring
● Predictive Maintenance, (e.g. Machine Learning Techniques for Predictive Maintenance)
● Geospatial data processing
Some Use case of Streaming
Some Use case of Streaming
References
● https://hazelcast.com/glossary/kappa-architecture/
● https://hazelcast.com/glossary/lambda-architecture/
● https://databricks.com/glossary/lambda-architecture
● Streams Concepts — Confluent Platform
● Stateful Stream Processing: Databricks
● Spark Strcutured Streaming Documentation
Thank You !
Get in touch with us:
anuj.saxena@knoldus.com || anuj1207 || anuj-saxena
jashan.goyal@knoldus.com || jashangoyal09 || jashan-goyal

Contenu connexe

Similaire à Let's get to know the Data Streaming

Empowering Real-Time Decision Making with Data Streaming
Empowering Real-Time Decision Making with Data StreamingEmpowering Real-Time Decision Making with Data Streaming
Empowering Real-Time Decision Making with Data StreamingSafe Software
 
Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...GetInData
 
Scalability truths and serverless architectures
Scalability truths and serverless architecturesScalability truths and serverless architectures
Scalability truths and serverless architecturesRegunath B
 
Streaming Analytics and Internet of Things - Geesara Prathap
Streaming Analytics and Internet of Things - Geesara PrathapStreaming Analytics and Internet of Things - Geesara Prathap
Streaming Analytics and Internet of Things - Geesara PrathapWithTheBest
 
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data ProcessingCloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data ProcessingDoiT International
 
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...Big Data Spain
 
Understanding time in structured streaming
Understanding time in structured streamingUnderstanding time in structured streaming
Understanding time in structured streamingdatamantra
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQLWSO2
 
Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®Ververica
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingKostas Tzoumas
 
AI-Powered Streaming Analytics for Real-Time Customer Experience
AI-Powered Streaming Analytics for Real-Time Customer ExperienceAI-Powered Streaming Analytics for Real-Time Customer Experience
AI-Powered Streaming Analytics for Real-Time Customer ExperienceDatabricks
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uberconfluent
 
Ecetera uses Splunk to facilitate DevOps in forex
Ecetera uses Splunk to facilitate DevOps in forexEcetera uses Splunk to facilitate DevOps in forex
Ecetera uses Splunk to facilitate DevOps in forexOcean Software
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Netflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesNetflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesEd Hunter
 
Real Time Event Processing and In-­memory analysis of Big Data - StampedeCon ...
Real Time Event Processing and In-­memory analysis of Big Data - StampedeCon ...Real Time Event Processing and In-­memory analysis of Big Data - StampedeCon ...
Real Time Event Processing and In-­memory analysis of Big Data - StampedeCon ...StampedeCon
 
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...Flink Forward
 

Similaire à Let's get to know the Data Streaming (20)

Empowering Real-Time Decision Making with Data Streaming
Empowering Real-Time Decision Making with Data StreamingEmpowering Real-Time Decision Making with Data Streaming
Empowering Real-Time Decision Making with Data Streaming
 
Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...
 
Scalability truths and serverless architectures
Scalability truths and serverless architecturesScalability truths and serverless architectures
Scalability truths and serverless architectures
 
Streaming Analytics and Internet of Things - Geesara Prathap
Streaming Analytics and Internet of Things - Geesara PrathapStreaming Analytics and Internet of Things - Geesara Prathap
Streaming Analytics and Internet of Things - Geesara Prathap
 
Stream Processing Overview
Stream Processing OverviewStream Processing Overview
Stream Processing Overview
 
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data ProcessingCloud Dataflow - A Unified Model for Batch and Streaming Data Processing
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
 
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
 
Understanding time in structured streaming
Understanding time in structured streamingUnderstanding time in structured streaming
Understanding time in structured streaming
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
 
Observability at Spotify
Observability at SpotifyObservability at Spotify
Observability at Spotify
 
Apache flink
Apache flinkApache flink
Apache flink
 
Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
 
AI-Powered Streaming Analytics for Real-Time Customer Experience
AI-Powered Streaming Analytics for Real-Time Customer ExperienceAI-Powered Streaming Analytics for Real-Time Customer Experience
AI-Powered Streaming Analytics for Real-Time Customer Experience
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
 
Ecetera uses Splunk to facilitate DevOps in forex
Ecetera uses Splunk to facilitate DevOps in forexEcetera uses Splunk to facilitate DevOps in forex
Ecetera uses Splunk to facilitate DevOps in forex
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Netflix SRE perf meetup_slides
Netflix SRE perf meetup_slidesNetflix SRE perf meetup_slides
Netflix SRE perf meetup_slides
 
Real Time Event Processing and In-­memory analysis of Big Data - StampedeCon ...
Real Time Event Processing and In-­memory analysis of Big Data - StampedeCon ...Real Time Event Processing and In-­memory analysis of Big Data - StampedeCon ...
Real Time Event Processing and In-­memory analysis of Big Data - StampedeCon ...
 
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
 

Plus de Knoldus Inc.

Robusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptxRobusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptxKnoldus Inc.
 
Optimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptxOptimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptxKnoldus Inc.
 
Azure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptxAzure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptxKnoldus Inc.
 
CQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptxCQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptxKnoldus Inc.
 
ETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake PresentationETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake PresentationKnoldus Inc.
 
Scripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics PresentationScripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics PresentationKnoldus Inc.
 
Getting started with dotnet core Web APIs
Getting started with dotnet core Web APIsGetting started with dotnet core Web APIs
Getting started with dotnet core Web APIsKnoldus Inc.
 
Introduction To Rust part II Presentation
Introduction To Rust part II PresentationIntroduction To Rust part II Presentation
Introduction To Rust part II PresentationKnoldus Inc.
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Configuring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRAConfiguring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRAKnoldus Inc.
 
Advanced Python (with dependency injection and hydra configuration packages)
Advanced Python (with dependency injection and hydra configuration packages)Advanced Python (with dependency injection and hydra configuration packages)
Advanced Python (with dependency injection and hydra configuration packages)Knoldus Inc.
 
Azure Databricks (For Data Analytics).pptx
Azure Databricks (For Data Analytics).pptxAzure Databricks (For Data Analytics).pptx
Azure Databricks (For Data Analytics).pptxKnoldus Inc.
 
The Power of Dependency Injection with Dagger 2 and Kotlin
The Power of Dependency Injection with Dagger 2 and KotlinThe Power of Dependency Injection with Dagger 2 and Kotlin
The Power of Dependency Injection with Dagger 2 and KotlinKnoldus Inc.
 
Data Engineering with Databricks Presentation
Data Engineering with Databricks PresentationData Engineering with Databricks Presentation
Data Engineering with Databricks PresentationKnoldus Inc.
 
Databricks for MLOps Presentation (AI/ML)
Databricks for MLOps Presentation (AI/ML)Databricks for MLOps Presentation (AI/ML)
Databricks for MLOps Presentation (AI/ML)Knoldus Inc.
 
NoOps - (Automate Ops) Presentation.pptx
NoOps - (Automate Ops) Presentation.pptxNoOps - (Automate Ops) Presentation.pptx
NoOps - (Automate Ops) Presentation.pptxKnoldus Inc.
 
Mastering Distributed Performance Testing
Mastering Distributed Performance TestingMastering Distributed Performance Testing
Mastering Distributed Performance TestingKnoldus Inc.
 
MLops on Vertex AI Presentation (AI/ML).pptx
MLops on Vertex AI Presentation (AI/ML).pptxMLops on Vertex AI Presentation (AI/ML).pptx
MLops on Vertex AI Presentation (AI/ML).pptxKnoldus Inc.
 
Introduction to Ansible Tower Presentation
Introduction to Ansible Tower PresentationIntroduction to Ansible Tower Presentation
Introduction to Ansible Tower PresentationKnoldus Inc.
 
CQRS with dot net services presentation.
CQRS with dot net services presentation.CQRS with dot net services presentation.
CQRS with dot net services presentation.Knoldus Inc.
 

Plus de Knoldus Inc. (20)

Robusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptxRobusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptx
 
Optimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptxOptimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptx
 
Azure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptxAzure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptx
 
CQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptxCQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptx
 
ETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake PresentationETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake Presentation
 
Scripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics PresentationScripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics Presentation
 
Getting started with dotnet core Web APIs
Getting started with dotnet core Web APIsGetting started with dotnet core Web APIs
Getting started with dotnet core Web APIs
 
Introduction To Rust part II Presentation
Introduction To Rust part II PresentationIntroduction To Rust part II Presentation
Introduction To Rust part II Presentation
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Configuring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRAConfiguring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRA
 
Advanced Python (with dependency injection and hydra configuration packages)
Advanced Python (with dependency injection and hydra configuration packages)Advanced Python (with dependency injection and hydra configuration packages)
Advanced Python (with dependency injection and hydra configuration packages)
 
Azure Databricks (For Data Analytics).pptx
Azure Databricks (For Data Analytics).pptxAzure Databricks (For Data Analytics).pptx
Azure Databricks (For Data Analytics).pptx
 
The Power of Dependency Injection with Dagger 2 and Kotlin
The Power of Dependency Injection with Dagger 2 and KotlinThe Power of Dependency Injection with Dagger 2 and Kotlin
The Power of Dependency Injection with Dagger 2 and Kotlin
 
Data Engineering with Databricks Presentation
Data Engineering with Databricks PresentationData Engineering with Databricks Presentation
Data Engineering with Databricks Presentation
 
Databricks for MLOps Presentation (AI/ML)
Databricks for MLOps Presentation (AI/ML)Databricks for MLOps Presentation (AI/ML)
Databricks for MLOps Presentation (AI/ML)
 
NoOps - (Automate Ops) Presentation.pptx
NoOps - (Automate Ops) Presentation.pptxNoOps - (Automate Ops) Presentation.pptx
NoOps - (Automate Ops) Presentation.pptx
 
Mastering Distributed Performance Testing
Mastering Distributed Performance TestingMastering Distributed Performance Testing
Mastering Distributed Performance Testing
 
MLops on Vertex AI Presentation (AI/ML).pptx
MLops on Vertex AI Presentation (AI/ML).pptxMLops on Vertex AI Presentation (AI/ML).pptx
MLops on Vertex AI Presentation (AI/ML).pptx
 
Introduction to Ansible Tower Presentation
Introduction to Ansible Tower PresentationIntroduction to Ansible Tower Presentation
Introduction to Ansible Tower Presentation
 
CQRS with dot net services presentation.
CQRS with dot net services presentation.CQRS with dot net services presentation.
CQRS with dot net services presentation.
 

Dernier

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 

Dernier (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 

Let's get to know the Data Streaming

  • 1. Presented By: Anuj & Jashan Let’s get to know Streaming - A developer’s point of view
  • 2. Our Agenda 01 Streaming: What, Why, Benefits 02 Different Architectures 03 Challenges in Stream Processing 04 Types of Stream Processing: Stateless & Stateful 05 Stateful Stream Processing: Elaborated
  • 3. What is Data Streaming? ● A continuous flow of data is called Data Streaming. ● Ex: Surge in IoT devices caused more data to gather. ● Data gathered at real time can be processed to get real time results. ● Stream processing is the practice of taking action on a series of data at the time the data is created.
  • 4. Why Data Streaming ? ● Providing insights faster. ● Handle never-ending stream of events ● Easy inspection of data from multiple streams simultaneously ● Stream processing can work with a lot less hardware than batch processing ● To Design data processing engine with infinite data sets in mind.
  • 5. Streaming: Benefits ● Lot less hardware required. ● Real-time fraud and anomaly detection. ● Internet of Things (IoT) real-time analytics. ● Real-time personalization, marketing, and advertising.
  • 6. Evolution of Stream Processing beginning 1970 Early 2000s 2015 Current Fortran / C Started Simple processing SQL and RDBMs Databases invented Batch processing Bulk processing and Big Data like Map-Reduce Streaming or Micro Batching Stream processing started showing promises Streaming SQL Unified Processing
  • 9. Different Architectures: At a Glance Lambda vs. Kappa
  • 13. Stream Processing Challenges 3 4 Late Data Data received at a later time than the actual event time Deduplication Removing duplicate data in stream 1 2 Stream Joins Joining Data from two Streams Aggregations Aggregation operations for SQL 5 6 Fast Incoming DataStream Processor Not Upto Speed Fault tolerance This paragraph actually is a good place for title description
  • 14. Solution in Streaming 1 2 3 4 5 6 Stream Joins Managing state in Streaming Watermarking Late Data Managing state in Streaming window, watermarking Fast Incoming DataBackpressure Aggregations Apply grouping (window) and watermarking Deduplication Managing state in Streaming with watermarking Fault tolerance Checkpointing
  • 15. Late Data in Streaming
  • 17. Types of Stream Processing Stateless & Stateful
  • 18. Stateless Stream Processing What This streaming is the straight forward streaming we don’t need to maintain state Where Where we need to perform some operation per individual message/event like filter, select, etc When when result is not dependent upon previous events
  • 19. Stateful Stream Processing What This stream is maintaining the state to perform Aggregations, Deduplication, Joins Where where we need to perform operations like groupBy, count, etc When When result is dependent upon previous events.
  • 21. Stateful Streaming: Aggregation ● Aggregation by key only ● Aggregation by event time windows ● Aggregation by both
  • 22. Windowing 01 02 03 This is simplest window.. This window is pretty straight forward We can perform both windows by with respect to the processing time and event time. This is window is bit complex then the Fixed window. This window gives us two insights like window and slice Windowing by processing time vs event time Fixed window (Tumbling windows) Sliding window (Hopping windows)
  • 23. Event Time & Processing Time
  • 24. Windowing by Processing Time vs Event Time Processing Time Window ● Processing time window is based upon the clock time window. ● All the late events will keep into current window ● Do not reorder the out of order events Event Time Window ● Event time window is based upon time when event get produced ● Event will be keep in the belonging window. ● Re-order the out of order events.
  • 25. Fixed Window (Tumbling Window) Fixed/tumbling: time is partitioned into same-length, non-overlapping chunks. Each event belongs to exactly one window
  • 27. Sliding Window (Hopping Window) Sliding: windows have fixed length, but are separated by a time interval (step) which can be smaller than the window length. Typically the window interval is a multiplicity of the step. Each event belongs to a number of windows ([window interval]/[window step]).
  • 29. Late Data in Streaming
  • 30. Watermarking ● Data newer than watermark may be late, but allowed to aggregate ● Windows older than watermark automatically deleted to limit the amount of intermediate state Handle more late data -> Keep more state Reduced the state -> Handle less lateness
  • 31. Internal working of Watermarking
  • 32. Stateful Streaming: Deduplication ● Drop duplicate records in a Stream ● Specify Columns which uniquely identify records ● State will store unique keys in stream and drop any record matching the state
  • 33. Stateful Streaming: Deduplication ● Too large Key Set in state for deduplication will make the streaming unstable ● Solution: Drop the state after a specified period.
  • 34. Stateful Streaming: Joins ● Each of the stream should buffer events in state for matching any future events of other stream.
  • 36. Stateful Streaming: Joins ● Impressions can be 2 hours late ● Clicks can be 3 hours late ● Clicks can occur within 1 hour after the corresponding impression
  • 37. Some Use case of Streaming ● Algorithmic Trading, Stock Market Surveillance, ● Smart Patient Care ● Monitoring a production line ● Supply chain optimizations ● Intrusion, Surveillance and Fraud Detection ( e.g. Uber) ● Most Smart Device Applications: Smart Car, Smart Home .. ● Smart Grid — (e.g. load prediction and outlier plug detection see Smart grids, 4 Billion events, throughout in range of 100Ks) ● Traffic Monitoring, Geofencing, Vehicle, and Wildlife tracking — e.g. TFL London Transport Management System ● Sports analytics — Augment Sports with real-time analytics (e.g. this is a work we did with a real football game (e.g. Overlaying real time analytics on Football Broadcasts) ● Context-aware promotions and advertising ● Computer system and network monitoring ● Predictive Maintenance, (e.g. Machine Learning Techniques for Predictive Maintenance) ● Geospatial data processing
  • 38. Some Use case of Streaming
  • 39. Some Use case of Streaming
  • 40. References ● https://hazelcast.com/glossary/kappa-architecture/ ● https://hazelcast.com/glossary/lambda-architecture/ ● https://databricks.com/glossary/lambda-architecture ● Streams Concepts — Confluent Platform ● Stateful Stream Processing: Databricks ● Spark Strcutured Streaming Documentation
  • 41. Thank You ! Get in touch with us: anuj.saxena@knoldus.com || anuj1207 || anuj-saxena jashan.goyal@knoldus.com || jashangoyal09 || jashan-goyal