SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
Data Stream
Processing
Agenda
 Overview
 What is Streaming Data?
 Streaming Data Pipeline
 Streaming Platform components
 What is Stishovite?
Overview
Monitoring Events
In RealTime
Monitoring Alerts
Sending alerts based on
detection of event patterns
in data streams.
Dashboards
RealTime Operational
Dashboards
Search
Full-text querying,
aggregations, Geo Data in
near real time
Analytics
Analyze big volumes of
data quickly and in near
real time
Streaming Data is data that is generated continuously by thousands of data sources, which
typically send in the data records simultaneously, and in small sizes (order of Kilobytes).
This data needs to be processed sequentially and incrementally on a record-by-record basis or
over sliding time windows, and used for a wide variety of analytics including correlations,
aggregations, filtering, and sampling.
Stream processing has become the defacto standard for building real-time ETL and Stream
Analytics applications. We see batch workloads move into Stream processing to act on the
data and derive insights faster. With the explosion of data such IoT and machine-generated
data, Stream Processing + Predictive Analytics is driving tremendous business value.
Streaming Data
Streaming Data examples include:
• Website, Network and Applications monitoring
• Fraud detection
• Advertising
• Internet of Things: sensors (trucks, transportation vehicles, industrial equipment)
• Machine-generated data
• Social analytics
• Private Searching
• Others
Streaming Data Examples
o Persistence
o Performance
o Scale
o Parallel & Partitioned
o Messaging
o Processing
o Storage
Key Requirements for Streaming Data
State of Stream Processing
Stateless
• Filter
• Map
Stateful
• Aggregate
• Join
Typical Streaming Workflow
Producer
Producer
Streaming
Platform
Streaming
Processing
Persistence
Consumer
We need to collect the data, process the data, store the data, and finally serve the data for
analysis, searching, machine learning and dashboards.
Streaming Data Pipeline
Data Sources Collect & Insgest
Data
Serve DataStore DataProcess Data
? ? ? ?
We need to collect the data from a wide array of inputs and write them into a wide array of
outputs in real time.
Collect Data
• Pull-based
• Push-based
Change Data Capture (CDC)
Database Changefeeds
CollectorsCustom Collectors
• Java
• Python
When data is ingested in real time, each data item is imported as it is emitted by the source. An
effective data ingestion process begins by prioritizing data sources, validating individual files
and routing data items to the correct destination.
Streaming Data Ingestion
Kafka Topics
Apache Kafka is a distributed system designed for streams. It is built to be fault-tolerant, high-
throughput, horizontally scalable, and allows geographically distributing data streams and
stream processing applications.
Apache Kafka
Kafka’s system design can be thought of as that of a distributed commit log, where incoming
data is written sequentially to disk. There are four main components involved in moving data in
and out of Kafka:
• Topics
• Producers
• Consumers
• Brokers
How Kafka Works
Kafka Streaming Platform
Collect & ingest
Data
We need to collect the data, process the data, store the data, and finally serve the data for
analysis, machine learning, and dashboards.
Data Sources Serve DataStore DataProcess Data
? ? ?
Streaming Data Pipeline
Data Stream Processing
There are a wide variety of technologies, frameworks, and libraries for building applications
that process streams of data. Frameworks such as Flink, Storm, Samza and Spark all can
process streams of data in real time writing code in Java, Python or Scala doing excellent job.
But if you was looking for something more simple to build data pipelines with a minimal data
processing you should test:
Apache NiFi is an integrated data platform that enables the automation of data flow between
systems. It provides real-time control that makes it easy to manage the movement of data
between any source and any destination. Apache NiFi helps move and track data.
Apache Nifi
Apache NiFi is used for:
• Reliable and secure transfer of data between systems
• Delivery of data from sources to analytic platforms
• Enrichment and preparation of data:
• Conversion between formats
• Extraction/Parsing/Splitting/Aggregation
• Schema translation
• Routing decisions
Collect & ingest
Data
Data Stream Processing
Data Sources Serve DataStore DataProcess Data
? ?
Streaming Data Pipeline
For storing lots of streaming data, we need a data store that supports fast writes and scales.
Storing Streaming Data
Collect & ingest
Data
Storing Streaming Data
Data Sources Serve DataStore DataProcess Data
?
Streaming Data Pipeline
End applications like dashboards, business intelligence tools, and other applications that use
the processed event data.
Serving the Data
Collect & ingest
Data
Complete workflow of streaming data
Data Sources Serve DataStore DataProcess Data
Streaming Data Pipeline
Stishovite is a centralized console to manage the entire pipeline of the xGem Streaming
Platform.
xGem Stream Platform is the integration of differents Open Source Products.
https://gitlab.com/xgem/stishovite
What is Stishovite?
Thanks!
Jorge Hirtz
@jahtux

Contenu connexe

Tendances

Introduction to basic data analytics tools
Introduction to basic data analytics toolsIntroduction to basic data analytics tools
Introduction to basic data analytics toolsNascenia IT
 
Spark Summit Keynote by Suren Nathan
Spark Summit Keynote by Suren NathanSpark Summit Keynote by Suren Nathan
Spark Summit Keynote by Suren NathanSpark Summit
 
Obfuscating LinkedIn Member Data
Obfuscating LinkedIn Member DataObfuscating LinkedIn Member Data
Obfuscating LinkedIn Member DataDataWorks Summit
 
Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1GurinderG
 
Building Data Lakes with Apache Airflow
Building Data Lakes with Apache AirflowBuilding Data Lakes with Apache Airflow
Building Data Lakes with Apache AirflowGary Stafford
 
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit AgarwalSuccinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit AgarwalSpark Summit
 
Dealing with Drift: Building an Enterprise Data Lake
Dealing with Drift: Building an Enterprise Data LakeDealing with Drift: Building an Enterprise Data Lake
Dealing with Drift: Building an Enterprise Data LakePat Patterson
 
EAP - Accelerating behavorial analytics at PayPal using Hadoop
EAP - Accelerating behavorial analytics at PayPal using HadoopEAP - Accelerating behavorial analytics at PayPal using Hadoop
EAP - Accelerating behavorial analytics at PayPal using HadoopDataWorks Summit
 
August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector
August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector
August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector Yahoo Developer Network
 
Real-time Distributed Stream Processing @ Scale
Real-time Distributed Stream Processing@ ScaleReal-time Distributed Stream Processing@ Scale
Real-time Distributed Stream Processing @ ScaleJerome Boulon
 
Spark Summit EU talk by Pat Patterson
Spark Summit EU talk by Pat PattersonSpark Summit EU talk by Pat Patterson
Spark Summit EU talk by Pat PattersonSpark Summit
 
Data Pipelines With Streamsets
Data Pipelines With Streamsets Data Pipelines With Streamsets
Data Pipelines With Streamsets Jowanza Joseph
 
Dealing With Drift - Building an Enterprise Data Lake
Dealing With Drift - Building an Enterprise Data LakeDealing With Drift - Building an Enterprise Data Lake
Dealing With Drift - Building an Enterprise Data LakePat Patterson
 
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...Databricks
 
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Flink Forward
 
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsHeadaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsDatabricks
 
Kafka Summit SF 2017 - Keynote - Managing Data at Scale: The Unreasonable Eff...
Kafka Summit SF 2017 - Keynote - Managing Data at Scale: The Unreasonable Eff...Kafka Summit SF 2017 - Keynote - Managing Data at Scale: The Unreasonable Eff...
Kafka Summit SF 2017 - Keynote - Managing Data at Scale: The Unreasonable Eff...confluent
 
Solving Performance Problems on Hadoop
Solving Performance Problems on HadoopSolving Performance Problems on Hadoop
Solving Performance Problems on HadoopTyler Mitchell
 

Tendances (20)

Introduction to basic data analytics tools
Introduction to basic data analytics toolsIntroduction to basic data analytics tools
Introduction to basic data analytics tools
 
Spark Summit Keynote by Suren Nathan
Spark Summit Keynote by Suren NathanSpark Summit Keynote by Suren Nathan
Spark Summit Keynote by Suren Nathan
 
Obfuscating LinkedIn Member Data
Obfuscating LinkedIn Member DataObfuscating LinkedIn Member Data
Obfuscating LinkedIn Member Data
 
Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1
 
Building Data Lakes with Apache Airflow
Building Data Lakes with Apache AirflowBuilding Data Lakes with Apache Airflow
Building Data Lakes with Apache Airflow
 
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit AgarwalSuccinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
 
Dealing with Drift: Building an Enterprise Data Lake
Dealing with Drift: Building an Enterprise Data LakeDealing with Drift: Building an Enterprise Data Lake
Dealing with Drift: Building an Enterprise Data Lake
 
EAP - Accelerating behavorial analytics at PayPal using Hadoop
EAP - Accelerating behavorial analytics at PayPal using HadoopEAP - Accelerating behavorial analytics at PayPal using Hadoop
EAP - Accelerating behavorial analytics at PayPal using Hadoop
 
August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector
August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector
August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector
 
The Evolution of Big Data Pipelines at Intuit
The Evolution of Big Data Pipelines at Intuit The Evolution of Big Data Pipelines at Intuit
The Evolution of Big Data Pipelines at Intuit
 
Real-time Distributed Stream Processing @ Scale
Real-time Distributed Stream Processing@ ScaleReal-time Distributed Stream Processing@ Scale
Real-time Distributed Stream Processing @ Scale
 
Spark Summit EU talk by Pat Patterson
Spark Summit EU talk by Pat PattersonSpark Summit EU talk by Pat Patterson
Spark Summit EU talk by Pat Patterson
 
Data Pipelines With Streamsets
Data Pipelines With Streamsets Data Pipelines With Streamsets
Data Pipelines With Streamsets
 
Dealing With Drift - Building an Enterprise Data Lake
Dealing With Drift - Building an Enterprise Data LakeDealing With Drift - Building an Enterprise Data Lake
Dealing With Drift - Building an Enterprise Data Lake
 
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...
 
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
 
Accelerating Data Warehouse Modernization
Accelerating Data Warehouse ModernizationAccelerating Data Warehouse Modernization
Accelerating Data Warehouse Modernization
 
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsHeadaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous Applications
 
Kafka Summit SF 2017 - Keynote - Managing Data at Scale: The Unreasonable Eff...
Kafka Summit SF 2017 - Keynote - Managing Data at Scale: The Unreasonable Eff...Kafka Summit SF 2017 - Keynote - Managing Data at Scale: The Unreasonable Eff...
Kafka Summit SF 2017 - Keynote - Managing Data at Scale: The Unreasonable Eff...
 
Solving Performance Problems on Hadoop
Solving Performance Problems on HadoopSolving Performance Problems on Hadoop
Solving Performance Problems on Hadoop
 

Similaire à xGem Data Stream Processing

Apache Spark Streaming -Real time web server log analytics
Apache Spark Streaming -Real time web server log analyticsApache Spark Streaming -Real time web server log analytics
Apache Spark Streaming -Real time web server log analyticsANKIT GUPTA
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureLuan Moreno Medeiros Maciel
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming VisualizationGuido Schmutz
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesAmazon Web Services
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About Jesus Rodriguez
 
Lightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and DruidLightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and DruidDataWorks Summit
 
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSAWS User Group Kochi
 
Streaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache KafkaStreaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache Kafkaconfluent
 
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Data Con LA
 
Streaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaStreaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaAttunity
 
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Amazon Web Services
 
Etl is Dead; Long Live Streams
Etl is Dead; Long Live StreamsEtl is Dead; Long Live Streams
Etl is Dead; Long Live Streamsconfluent
 
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS Amazon Web Services LATAM
 
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...Amazon Web Services
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven productsLars Albertsson
 
Integração de Dados com Apache NIFI - Marco Garcia Cetax
Integração de Dados com Apache NIFI - Marco Garcia CetaxIntegração de Dados com Apache NIFI - Marco Garcia Cetax
Integração de Dados com Apache NIFI - Marco Garcia CetaxMarco Garcia
 
Introduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis AnalyticsIntroduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis AnalyticsAmazon Web Services
 
Real-time Analysis of Data Processing Pipelines with Spring Cloud Data Flow a...
Real-time Analysis of Data Processing Pipelines with Spring Cloud Data Flow a...Real-time Analysis of Data Processing Pipelines with Spring Cloud Data Flow a...
Real-time Analysis of Data Processing Pipelines with Spring Cloud Data Flow a...VMware Tanzu
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time AnalyticsAmazon Web Services
 

Similaire à xGem Data Stream Processing (20)

Apache Spark Streaming -Real time web server log analytics
Apache Spark Streaming -Real time web server log analyticsApache Spark Streaming -Real time web server log analytics
Apache Spark Streaming -Real time web server log analytics
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
 
Xavient - DiP
Xavient - DiPXavient - DiP
Xavient - DiP
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About
 
Lightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and DruidLightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and Druid
 
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
 
Streaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache KafkaStreaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache Kafka
 
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
 
Streaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaStreaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache Kafka
 
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
 
Etl is Dead; Long Live Streams
Etl is Dead; Long Live StreamsEtl is Dead; Long Live Streams
Etl is Dead; Long Live Streams
 
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
 
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven products
 
Integração de Dados com Apache NIFI - Marco Garcia Cetax
Integração de Dados com Apache NIFI - Marco Garcia CetaxIntegração de Dados com Apache NIFI - Marco Garcia Cetax
Integração de Dados com Apache NIFI - Marco Garcia Cetax
 
Introduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis AnalyticsIntroduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis Analytics
 
Real-time Analysis of Data Processing Pipelines with Spring Cloud Data Flow a...
Real-time Analysis of Data Processing Pipelines with Spring Cloud Data Flow a...Real-time Analysis of Data Processing Pipelines with Spring Cloud Data Flow a...
Real-time Analysis of Data Processing Pipelines with Spring Cloud Data Flow a...
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 

Dernier

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 

Dernier (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

xGem Data Stream Processing

  • 1.
  • 3. Agenda  Overview  What is Streaming Data?  Streaming Data Pipeline  Streaming Platform components  What is Stishovite?
  • 4. Overview Monitoring Events In RealTime Monitoring Alerts Sending alerts based on detection of event patterns in data streams. Dashboards RealTime Operational Dashboards Search Full-text querying, aggregations, Geo Data in near real time Analytics Analyze big volumes of data quickly and in near real time
  • 5. Streaming Data is data that is generated continuously by thousands of data sources, which typically send in the data records simultaneously, and in small sizes (order of Kilobytes). This data needs to be processed sequentially and incrementally on a record-by-record basis or over sliding time windows, and used for a wide variety of analytics including correlations, aggregations, filtering, and sampling. Stream processing has become the defacto standard for building real-time ETL and Stream Analytics applications. We see batch workloads move into Stream processing to act on the data and derive insights faster. With the explosion of data such IoT and machine-generated data, Stream Processing + Predictive Analytics is driving tremendous business value. Streaming Data
  • 6. Streaming Data examples include: • Website, Network and Applications monitoring • Fraud detection • Advertising • Internet of Things: sensors (trucks, transportation vehicles, industrial equipment) • Machine-generated data • Social analytics • Private Searching • Others Streaming Data Examples
  • 7. o Persistence o Performance o Scale o Parallel & Partitioned o Messaging o Processing o Storage Key Requirements for Streaming Data
  • 8. State of Stream Processing Stateless • Filter • Map Stateful • Aggregate • Join
  • 10. We need to collect the data, process the data, store the data, and finally serve the data for analysis, searching, machine learning and dashboards. Streaming Data Pipeline Data Sources Collect & Insgest Data Serve DataStore DataProcess Data ? ? ? ?
  • 11. We need to collect the data from a wide array of inputs and write them into a wide array of outputs in real time. Collect Data • Pull-based • Push-based Change Data Capture (CDC) Database Changefeeds CollectorsCustom Collectors • Java • Python
  • 12. When data is ingested in real time, each data item is imported as it is emitted by the source. An effective data ingestion process begins by prioritizing data sources, validating individual files and routing data items to the correct destination. Streaming Data Ingestion Kafka Topics
  • 13. Apache Kafka is a distributed system designed for streams. It is built to be fault-tolerant, high- throughput, horizontally scalable, and allows geographically distributing data streams and stream processing applications. Apache Kafka
  • 14. Kafka’s system design can be thought of as that of a distributed commit log, where incoming data is written sequentially to disk. There are four main components involved in moving data in and out of Kafka: • Topics • Producers • Consumers • Brokers How Kafka Works
  • 16. Collect & ingest Data We need to collect the data, process the data, store the data, and finally serve the data for analysis, machine learning, and dashboards. Data Sources Serve DataStore DataProcess Data ? ? ? Streaming Data Pipeline
  • 17. Data Stream Processing There are a wide variety of technologies, frameworks, and libraries for building applications that process streams of data. Frameworks such as Flink, Storm, Samza and Spark all can process streams of data in real time writing code in Java, Python or Scala doing excellent job. But if you was looking for something more simple to build data pipelines with a minimal data processing you should test:
  • 18. Apache NiFi is an integrated data platform that enables the automation of data flow between systems. It provides real-time control that makes it easy to manage the movement of data between any source and any destination. Apache NiFi helps move and track data. Apache Nifi Apache NiFi is used for: • Reliable and secure transfer of data between systems • Delivery of data from sources to analytic platforms • Enrichment and preparation of data: • Conversion between formats • Extraction/Parsing/Splitting/Aggregation • Schema translation • Routing decisions
  • 19. Collect & ingest Data Data Stream Processing Data Sources Serve DataStore DataProcess Data ? ? Streaming Data Pipeline
  • 20. For storing lots of streaming data, we need a data store that supports fast writes and scales. Storing Streaming Data
  • 21. Collect & ingest Data Storing Streaming Data Data Sources Serve DataStore DataProcess Data ? Streaming Data Pipeline
  • 22. End applications like dashboards, business intelligence tools, and other applications that use the processed event data. Serving the Data
  • 23. Collect & ingest Data Complete workflow of streaming data Data Sources Serve DataStore DataProcess Data Streaming Data Pipeline
  • 24. Stishovite is a centralized console to manage the entire pipeline of the xGem Streaming Platform. xGem Stream Platform is the integration of differents Open Source Products. https://gitlab.com/xgem/stishovite What is Stishovite?