SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
Machine Learning on
Streaming Data
with Apache Kafka, Apache Beam, & TensorFlow
About Us
Mikhail Chrestkha
Machine Learning Specialist
Google Cloud
linkedin.com/in/mchrestkha
Stéphane Maarek
CEO & Kafka Instructor
DataCumulus
linkedin.com/in/stephanemaarek
Big Thanks to:
Julianne Cuneo
Big Data Specialist, Google Cloud
Kai Waehner
Technology Evangelist, Confluent
Agenda
1. Motivation
2. Architecture
3. Use Case Walk-Through w/ Demo
4. Summary
1 Motivation
Technology Landscape
Smart
Analytics
Streaming
InfoWorld’s 2019 Technology of the
Year Award Winners:
● Apache Beam
● Apache Kafka
● Elastic Stack
● DataStax Enterprise
● Firebase
● Horovod
● H2O Driverless AI
● Keras
● Kubernetes
● LLVM
● .Net Core
● PyTorch
● Redis
● TensorFlow
● Visual Studio Code
● XGBoost
Cloud
?
https://www.globenewswire.com/news-release/2019/01/30/1707685/0/en/InfoWorld-Announces-2019-Technology-of-the-Year-Award-Winners.html
Data Ingestion
Data Analysis &
Transformation
Trainer
Model Evaluation
& Validation
Serving
Notebook
Orchestration
ML Framework
ML Platform
OSS Managed Service
Apache Kafka
Event streaming platform
Confluent Cloud
Monitoring, Replication, Data Balancing
Apache Beam
Data processing pipelines
Unified batch & streaming
Dataflow
Automated resource management of workers
TensorFlow
Robust foundation for machine and
deep learning
Cloud Machine Learning Engine
● Training: Distributed training infrastructure that supports
CPUs, GPUs, and TPUs
● Serving: Host models for batch & online prediction
2 Architecture
Reference Kafka ML Architecture
● Data pipelines are simplified
● Building analytic modules is decoupled
from servicing them
● Usage of real time or batch as needed
● Analytic models can be deployed in a
performant, scalable and
mission-critical environment
Kai Waehner
Technology Evangelist, Confluent
https://www.confluent.io/blog/build-deploy-scalable-machine-learning-production-apache-kafka/
Confluent Cloud
Managed by Confluent Analytics, ML training & deployment path
ML serving path
Data warehouse
BigQuery
ML Training
Cloud ML Engine
Topic 1
Raw transaction
Topic 2
Predictions
Kafka
Cluster
Processing
Cloud Dataflow
Leverage managed services to simplify & focus on code not infrastructure
Producer
Consumer
Consumer
ML Notebook
KSQL
SQL Submit ML
Training jobs
ML Serving
Cloud ML Engine
ML notebook development / experimentation
Deploy
ML model
Automate w/
AirFlow
Dataflow
Template
3 Use Case Walk-Through
Kaggle Case Study
Fraud Detection of Credit Card Transactions
● Collect transaction data
● Analyze historical data
● Train model on historic sample
● Evaluate model based on precision & recall
● Predict fraud on new streaming data
492
Fraud (0.172%)
284,807
transactions
● Andrea Dal Pozzolo, Olivier Caelen, Reid A. Johnson and Gianluca Bontempi. Calibrating Probability with Undersampling for Unbalanced Classification. In
Symposium on Computational Intelligence and Data Mining (CIDM), IEEE, 2015
● Dal Pozzolo, Andrea; Caelen, Olivier; Le Borgne, Yann-Ael; Waterschoot, Serge; Bontempi, Gianluca. Learned lessons in credit card fraud detection from a
practitioner perspective, Expert systems with applications,41,10,4915-4928,2014, Pergamon
● Dal Pozzolo, Andrea; Boracchi, Giacomo; Caelen, Olivier; Alippi, Cesare; Bontempi, Gianluca. Credit card fraud detection: a realistic modeling and a novel learning
strategy, IEEE transactions on neural networks and learning systems,29,8,3784-3797,2018,IEEE
○ Dal Pozzolo, Andrea Adaptive Machine learning for credit card fraud detection ULB MLG PhD thesis (supervised by G. Bontempi)
● Carcillo, Fabrizio; Dal Pozzolo, Andrea; Le Borgne, Yann-Aël; Caelen, Olivier; Mazzer, Yannis; Bontempi, Gianluca. Scarff: a scalable framework for streaming
credit card fraud detection with Spark, Information fusion,41, 182-194,2018,Elsevier
● Carcillo, Fabrizio; Le Borgne, Yann-Aël; Caelen, Olivier; Bontempi, Gianluca. Streaming active learning strategies for real-life credit card fraud detection:
assessment and visualization, International Journal of Data Science and Analytics, 5,4,285-300,2018,Springer International Publishing
https://opendatacommons.org/licenses/dbcl/1.0/
DEMO 1 - 5 min
Sending our credit card data
Confluent Cloud, Creating a Topic, Python Script, Security
Kafka to BigQuery
Dataflow Template
Java Code
KafkaIO.<String, String>read() BigQueryIO.writeTableRows()
Create a template for easy re-usability by an analyst
Images from https://beam.apache.org/documentation/pipelines/design-your-pipeline/
redacted
Explore data & train ML model
from ksql import KSQLAPI
redacted
%%bigquery
redacted
gcloud ml-engine jobs
submit training
redacted
Query directly from topic Query petabytes of data Submit ML training job
DEMO 2 - 5 min
Dataflow template & job
Jupyter: KSQL, BQML, TensorFlow CMLE job
Send Predictions back to Kafka
Java Code
Cloud Machine Learning
Engine
Request Response
Hosted ML Model
Image from https://beam.apache.org/documentation/pipelines/design-your-pipeline/
Publish models
KafkaIO.<String, String>read() KafkaIO.<String, String>write()
Train
Model
DEMO 3 - 5 min
(1) Deploy model as an end point
(2) Prediction sent to Kafka topic to be consumed
(3) Track models & monitor predictions in CMLE UI
Futuristic Architecture: Pure Kafka-based ML
Resilient, highly available, sync & async
Confluent Cloud
Managed by Confluent
Topic 1
Raw transaction
Topic 2
Predictions
Producer
Consumer
Consumer
Kafka Streams ML
Synchronous
Application
training
serving
Interactive
Query API
gRPC or
REST API
Internal Topic
ML Model
(compacted?)
Model
state
4 Summary
Summary
● Kafka + Beam + TensorFlow = Great foundation
for future
○ Batch today → streaming tomorrow
○ Small data → big data tomorrow
○ Shallow learning today → deep learning tomorrow
● Make data & ML easier for yourself by using
managed services
● Build for many other use cases:
○ Predictive maintenance
○ Logistics routing
○ Image search & recommendations in e-commerce
Smart
AnalyticsStreaming
Cloud
Talk to Google Cloud
K1Booth
Learn More
Blog: Enabling connected
transformation with Apache Kafka
and TensorFlow on Google Cloud
Platform
bit.ly/2CHERol
KafkaIO on Beam
bit.ly/2YwL3Jc
KafkaToBigQuery Dataflow
Template Example
bit.ly/2HQqVN0
Contact us
linkedin.com/in/mchrestkha
linkedin.com/in/stephanemaarek
Confluent Cloud
Managed by Confluent Analytics, ML training & deployment path
ML serving path
Data warehouse
BigQuery
ML Training
Cloud ML Engine
Topic 1
Raw transaction
Topic 2
Predictions
Kafka
Cluster
Processing
Cloud Dataflow
Questions
Producer
Consumer
Consumer
Cloud ML Notebook
KSQL
SQL Submit ML
Training jobs
ML Serving
Cloud ML Engine
ML notebook development / experimentation
Deploy
ML model
Automate w/
AirFlow
Dataflow
Template

Contenu connexe

Tendances

Idera live 2021: Keynote Presentation The Future of Data is The Data Cloud b...
Idera live 2021:  Keynote Presentation The Future of Data is The Data Cloud b...Idera live 2021:  Keynote Presentation The Future of Data is The Data Cloud b...
Idera live 2021: Keynote Presentation The Future of Data is The Data Cloud b...
IDERA Software
 

Tendances (20)

Ml ops intro session
Ml ops   intro sessionMl ops   intro session
Ml ops intro session
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
 
Emerging Trends in Data Engineering
Emerging Trends in Data EngineeringEmerging Trends in Data Engineering
Emerging Trends in Data Engineering
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InBuilding the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump In
 
Data Science & Best Practices for Apache Spark on Amazon EMR
Data Science & Best Practices for Apache Spark on Amazon EMRData Science & Best Practices for Apache Spark on Amazon EMR
Data Science & Best Practices for Apache Spark on Amazon EMR
 
Kafka and Avro with Confluent Schema Registry
Kafka and Avro with Confluent Schema RegistryKafka and Avro with Confluent Schema Registry
Kafka and Avro with Confluent Schema Registry
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best Practices
 
Keeping Identity Graphs In Sync With Apache Spark
Keeping Identity Graphs In Sync With Apache SparkKeeping Identity Graphs In Sync With Apache Spark
Keeping Identity Graphs In Sync With Apache Spark
 
Data Engineering Basics
Data Engineering BasicsData Engineering Basics
Data Engineering Basics
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Apache Kafka Streams + Machine Learning / Deep Learning
Apache Kafka Streams + Machine Learning / Deep LearningApache Kafka Streams + Machine Learning / Deep Learning
Apache Kafka Streams + Machine Learning / Deep Learning
 
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptxGraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
 
Cassandra internals
Cassandra internalsCassandra internals
Cassandra internals
 
ApacheCon NA 2018 : Apache Unomi, an Open Source Customer Data Platformapache...
ApacheCon NA 2018 : Apache Unomi, an Open Source Customer Data Platformapache...ApacheCon NA 2018 : Apache Unomi, an Open Source Customer Data Platformapache...
ApacheCon NA 2018 : Apache Unomi, an Open Source Customer Data Platformapache...
 
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopProject Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
 
Idera live 2021: Keynote Presentation The Future of Data is The Data Cloud b...
Idera live 2021:  Keynote Presentation The Future of Data is The Data Cloud b...Idera live 2021:  Keynote Presentation The Future of Data is The Data Cloud b...
Idera live 2021: Keynote Presentation The Future of Data is The Data Cloud b...
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020
 
An Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaAn Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and Kibana
 

Similaire à Machine Learning on Streaming Data using Kafka, Beam, and TensorFlow (Mikhail Chrestkha, Google Cloud; Stephane Maarek, DataCumulus) Kafka Summit NYC 2019

Yu_Wang_Resume
Yu_Wang_ResumeYu_Wang_Resume
Yu_Wang_Resume
Ryan Wang
 
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Kai Wähner
 
Monitoring AI with AI
Monitoring AI with AIMonitoring AI with AI
Monitoring AI with AI
Stepan Pushkarev
 

Similaire à Machine Learning on Streaming Data using Kafka, Beam, and TensorFlow (Mikhail Chrestkha, Google Cloud; Stephane Maarek, DataCumulus) Kafka Summit NYC 2019 (20)

Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
 
DEVOPS AND MACHINE LEARNING
DEVOPS AND MACHINE LEARNINGDEVOPS AND MACHINE LEARNING
DEVOPS AND MACHINE LEARNING
 
Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session
 
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
 
Yu_Wang_Resume
Yu_Wang_ResumeYu_Wang_Resume
Yu_Wang_Resume
 
Deep learning for FinTech
Deep learning for FinTechDeep learning for FinTech
Deep learning for FinTech
 
Denis Jannot - Towards Data Science Engineering Principles - Codemotion Milan...
Denis Jannot - Towards Data Science Engineering Principles - Codemotion Milan...Denis Jannot - Towards Data Science Engineering Principles - Codemotion Milan...
Denis Jannot - Towards Data Science Engineering Principles - Codemotion Milan...
 
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
 
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
 
Monitoring AI with AI
Monitoring AI with AIMonitoring AI with AI
Monitoring AI with AI
 
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
 
Production machine learning: Managing models, workflows and risk at scale
Production machine learning: Managing models, workflows and risk at scaleProduction machine learning: Managing models, workflows and risk at scale
Production machine learning: Managing models, workflows and risk at scale
 
Hyf project ideas_02
Hyf project ideas_02Hyf project ideas_02
Hyf project ideas_02
 
The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?
 
Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...
Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...
Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thun...
 

Plus de confluent

Plus de confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Dernier

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

Machine Learning on Streaming Data using Kafka, Beam, and TensorFlow (Mikhail Chrestkha, Google Cloud; Stephane Maarek, DataCumulus) Kafka Summit NYC 2019

  • 1. Machine Learning on Streaming Data with Apache Kafka, Apache Beam, & TensorFlow
  • 2. About Us Mikhail Chrestkha Machine Learning Specialist Google Cloud linkedin.com/in/mchrestkha Stéphane Maarek CEO & Kafka Instructor DataCumulus linkedin.com/in/stephanemaarek Big Thanks to: Julianne Cuneo Big Data Specialist, Google Cloud Kai Waehner Technology Evangelist, Confluent
  • 3. Agenda 1. Motivation 2. Architecture 3. Use Case Walk-Through w/ Demo 4. Summary
  • 5. Technology Landscape Smart Analytics Streaming InfoWorld’s 2019 Technology of the Year Award Winners: ● Apache Beam ● Apache Kafka ● Elastic Stack ● DataStax Enterprise ● Firebase ● Horovod ● H2O Driverless AI ● Keras ● Kubernetes ● LLVM ● .Net Core ● PyTorch ● Redis ● TensorFlow ● Visual Studio Code ● XGBoost Cloud ? https://www.globenewswire.com/news-release/2019/01/30/1707685/0/en/InfoWorld-Announces-2019-Technology-of-the-Year-Award-Winners.html
  • 6. Data Ingestion Data Analysis & Transformation Trainer Model Evaluation & Validation Serving Notebook Orchestration ML Framework ML Platform
  • 7. OSS Managed Service Apache Kafka Event streaming platform Confluent Cloud Monitoring, Replication, Data Balancing Apache Beam Data processing pipelines Unified batch & streaming Dataflow Automated resource management of workers TensorFlow Robust foundation for machine and deep learning Cloud Machine Learning Engine ● Training: Distributed training infrastructure that supports CPUs, GPUs, and TPUs ● Serving: Host models for batch & online prediction
  • 9. Reference Kafka ML Architecture ● Data pipelines are simplified ● Building analytic modules is decoupled from servicing them ● Usage of real time or batch as needed ● Analytic models can be deployed in a performant, scalable and mission-critical environment Kai Waehner Technology Evangelist, Confluent https://www.confluent.io/blog/build-deploy-scalable-machine-learning-production-apache-kafka/
  • 10. Confluent Cloud Managed by Confluent Analytics, ML training & deployment path ML serving path Data warehouse BigQuery ML Training Cloud ML Engine Topic 1 Raw transaction Topic 2 Predictions Kafka Cluster Processing Cloud Dataflow Leverage managed services to simplify & focus on code not infrastructure Producer Consumer Consumer ML Notebook KSQL SQL Submit ML Training jobs ML Serving Cloud ML Engine ML notebook development / experimentation Deploy ML model Automate w/ AirFlow Dataflow Template
  • 11. 3 Use Case Walk-Through
  • 12. Kaggle Case Study Fraud Detection of Credit Card Transactions ● Collect transaction data ● Analyze historical data ● Train model on historic sample ● Evaluate model based on precision & recall ● Predict fraud on new streaming data 492 Fraud (0.172%) 284,807 transactions ● Andrea Dal Pozzolo, Olivier Caelen, Reid A. Johnson and Gianluca Bontempi. Calibrating Probability with Undersampling for Unbalanced Classification. In Symposium on Computational Intelligence and Data Mining (CIDM), IEEE, 2015 ● Dal Pozzolo, Andrea; Caelen, Olivier; Le Borgne, Yann-Ael; Waterschoot, Serge; Bontempi, Gianluca. Learned lessons in credit card fraud detection from a practitioner perspective, Expert systems with applications,41,10,4915-4928,2014, Pergamon ● Dal Pozzolo, Andrea; Boracchi, Giacomo; Caelen, Olivier; Alippi, Cesare; Bontempi, Gianluca. Credit card fraud detection: a realistic modeling and a novel learning strategy, IEEE transactions on neural networks and learning systems,29,8,3784-3797,2018,IEEE ○ Dal Pozzolo, Andrea Adaptive Machine learning for credit card fraud detection ULB MLG PhD thesis (supervised by G. Bontempi) ● Carcillo, Fabrizio; Dal Pozzolo, Andrea; Le Borgne, Yann-Aël; Caelen, Olivier; Mazzer, Yannis; Bontempi, Gianluca. Scarff: a scalable framework for streaming credit card fraud detection with Spark, Information fusion,41, 182-194,2018,Elsevier ● Carcillo, Fabrizio; Le Borgne, Yann-Aël; Caelen, Olivier; Bontempi, Gianluca. Streaming active learning strategies for real-life credit card fraud detection: assessment and visualization, International Journal of Data Science and Analytics, 5,4,285-300,2018,Springer International Publishing https://opendatacommons.org/licenses/dbcl/1.0/
  • 13. DEMO 1 - 5 min Sending our credit card data Confluent Cloud, Creating a Topic, Python Script, Security
  • 14. Kafka to BigQuery Dataflow Template Java Code KafkaIO.<String, String>read() BigQueryIO.writeTableRows() Create a template for easy re-usability by an analyst Images from https://beam.apache.org/documentation/pipelines/design-your-pipeline/ redacted
  • 15. Explore data & train ML model from ksql import KSQLAPI redacted %%bigquery redacted gcloud ml-engine jobs submit training redacted Query directly from topic Query petabytes of data Submit ML training job
  • 16. DEMO 2 - 5 min Dataflow template & job Jupyter: KSQL, BQML, TensorFlow CMLE job
  • 17. Send Predictions back to Kafka Java Code Cloud Machine Learning Engine Request Response Hosted ML Model Image from https://beam.apache.org/documentation/pipelines/design-your-pipeline/ Publish models KafkaIO.<String, String>read() KafkaIO.<String, String>write() Train Model
  • 18. DEMO 3 - 5 min (1) Deploy model as an end point (2) Prediction sent to Kafka topic to be consumed (3) Track models & monitor predictions in CMLE UI
  • 19. Futuristic Architecture: Pure Kafka-based ML Resilient, highly available, sync & async Confluent Cloud Managed by Confluent Topic 1 Raw transaction Topic 2 Predictions Producer Consumer Consumer Kafka Streams ML Synchronous Application training serving Interactive Query API gRPC or REST API Internal Topic ML Model (compacted?) Model state
  • 21. Summary ● Kafka + Beam + TensorFlow = Great foundation for future ○ Batch today → streaming tomorrow ○ Small data → big data tomorrow ○ Shallow learning today → deep learning tomorrow ● Make data & ML easier for yourself by using managed services ● Build for many other use cases: ○ Predictive maintenance ○ Logistics routing ○ Image search & recommendations in e-commerce Smart AnalyticsStreaming Cloud
  • 22. Talk to Google Cloud K1Booth Learn More Blog: Enabling connected transformation with Apache Kafka and TensorFlow on Google Cloud Platform bit.ly/2CHERol KafkaIO on Beam bit.ly/2YwL3Jc KafkaToBigQuery Dataflow Template Example bit.ly/2HQqVN0 Contact us linkedin.com/in/mchrestkha linkedin.com/in/stephanemaarek
  • 23. Confluent Cloud Managed by Confluent Analytics, ML training & deployment path ML serving path Data warehouse BigQuery ML Training Cloud ML Engine Topic 1 Raw transaction Topic 2 Predictions Kafka Cluster Processing Cloud Dataflow Questions Producer Consumer Consumer Cloud ML Notebook KSQL SQL Submit ML Training jobs ML Serving Cloud ML Engine ML notebook development / experimentation Deploy ML model Automate w/ AirFlow Dataflow Template