SlideShare une entreprise Scribd logo
1  sur  42
Télécharger pour lire hors ligne
Dataflows for machine learning operations
Andrei Paleyes and Alex Rakowski
Part I – Dataflow and Streams
Dataflows for machine learning operations
What is ML@CL group?
3
Machine
Learning
@
Computer
Lab
3
Dataflows for machine learning operations
Dataflow research at ML@CL
4 4
Dataflows for machine learning operations
Open Source: Seldon Core
5 5
Dataflows for machine learning operations
Inference graph
6 6
Dataflows for machine learning operations
Pipelines
7
apiVersion: mlops.seldon.io/v1alpha1
kind: Pipeline
metadata:
name: road-counter
spec:
steps:
- name: vectorize
- name: faulty_image_filter
inputs:
- vectorize.outputs
- name: object_detection
inputs:
- vectorize.outputs
- faulty_image_filter.outputs
. . . . .
output:
steps:
- counter
7
Core V1 - Central orchestration
8 8
Dataflows for machine learning operations
Dataflow architecture
9 9
Dataflows for machine learning operations
Dean, J. & Ghemawat, S. (2008). MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1), 107-113.
Dataflow vs control flow
10
“In control flow, the processor follows
explicit order, executing instructions one
after another. In dataflow, by contrast, an
instruction is ready to execute as soon as
all its inputs are available.”
M. Schwarzkopf, The Remarkable Utility of Dataflow Computing
10
Dataflows for machine learning operations
Dataflow features
11
Separation of data
and operations
Data as a first
class citizen
Dataflow graph of
the entire system
Decentralisation
11
Dataflows for machine learning operations
Dataflows for machine learning operations
Dataflow and inference graph
12 12
Role of streaming in dataflow
13 13
Dataflows for machine learning operations
Control plane
14 14
Dataflows for machine learning operations
Part II – Implementation
Tech Stack — Constraints
16
OSS Platform agnostic Lightweight
16
Dataflows for machine learning operations
Tech Stack — Contenders
17 17
Dataflows for machine learning operations
Pub/Sub – Multiplexing Pipelines
18 18
Dataflows for machine learning operations
Pub/Sub – Extending Pipelines
19 19
Dataflows for machine learning operations
Pub/Sub – Repeatable Streams
20 20
Dataflows for machine learning operations
Tech Stack — Choices
21 21
Dataflows for machine learning operations
Anatomy of a Topic Name
22
<prefix>.<namespace>.<pipeline|model>.<name>.<inputs|outputs>
seldon.default.pipeline.foo.inputs
seldon.recommendations.model.bar.outputs
<prefix>.<namespace>.errors.errors
ml.default.errors.errors
22
Dataflows for machine learning operations
Anatomy of an Inference Message
23 23
Dataflows for machine learning operations
Open Inference Protocol
24 24
Dataflows for machine learning operations
Open Inference Protocol – Batching
25 25
Dataflows for machine learning operations
Dataflow Operations – Topic Chaining
26 26
Dataflows for machine learning operations
Dataflow Operations – Tensor Projection
27 27
Dataflows for machine learning operations
28
Dataflow Operations – Batching
Dataflows for machine learning operations
29 29
Dataflow Operations – Inner Join
Dataflows for machine learning operations
Dataflow Operations – Outer Join
30 30
Dataflows for machine learning operations
31 31
Dataflow Operations – Any Join
Dataflows for machine learning operations
Dataflow Operations – Triggers
32 32
Dataflows for machine learning operations
Core v2 Pipeline = KStream
Topology
Topologies
33 33
Dataflows for machine learning operations
apiVersion: mlops.seldon.io/v1alpha1
kind: Pipeline
metadata:
name: road-counter
spec:
steps:
- name: vectorize
- name: faulty_image_filter
inputs:
- vectorize.outputs
- name: object_detection
inputs:
- vectorize.outputs
- faulty_image_filter.outputs
...
val builder = StreamsBuilder()
builder
.stream(inputTopic.topicName, consumerSerde)
.filterForPipeline(inputTopic.pipelineNam
e)
...
val topology = builder.build()
Pipeline Allocation – 1 Topology = 1 Engine
34 34
Dataflows for machine learning operations
Pipeline Allocation – Shared Engines
35 35
Dataflows for machine learning operations
Pipeline Allocation — Threads
36 36
Dataflows for machine learning operations
How Many Is Too Many?
37 37
Dataflows for machine learning operations
How Many Is Too Many?
38 38
Dataflows for machine learning operations
KSQL Limitations in Confluent Cloud
Kafka Summit 2022 - Modular Topologies
KIP-809 (Modular Topologies)
Advanced ML Applications
39
Dataflows for machine learning operations
39
Challenges
40 40
Dataflows for machine learning operations
Dynamism Scalability Variety
Thanks for listening!
41
Dataflows for machine learning operations
41
42
The Seldon Core v2 Team
hello@seldon.io
Core v2 on GitHub Team
Clive Cox
CTO
Rafal Skolasinski
Machine Learning Engineer
Alex Rakowski
Software Engineer
Sherif Akoush
MLOps Engineer
Adrian Gonzalez-Martin
Machine Learning Engineer
Andrei Paleyes
MLOps Researcher
Contact us:

Contenu connexe

Similaire à Dataflows for Machine Learning Operations with Alex Rakowski & Andrei Paleyes

Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningPaige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningEdunomica
 
The Quest for an Open Source Data Science Platform
 The Quest for an Open Source Data Science Platform The Quest for an Open Source Data Science Platform
The Quest for an Open Source Data Science PlatformQAware GmbH
 
An Overview of VIEW
An Overview of VIEWAn Overview of VIEW
An Overview of VIEWShiyong Lu
 
[DSC Adria 23] Mikhail Rozhkov DVC in Machine Learning Engineering and MLOps ...
[DSC Adria 23] Mikhail Rozhkov DVC in Machine Learning Engineering and MLOps ...[DSC Adria 23] Mikhail Rozhkov DVC in Machine Learning Engineering and MLOps ...
[DSC Adria 23] Mikhail Rozhkov DVC in Machine Learning Engineering and MLOps ...DataScienceConferenc1
 
Datacamp @ Transparency Camp 2010
Datacamp @ Transparency Camp 2010Datacamp @ Transparency Camp 2010
Datacamp @ Transparency Camp 2010Knowerce
 
Data Mining with Excel 2010 and PowerPivot 201106
Data Mining with Excel 2010 and PowerPivot 201106Data Mining with Excel 2010 and PowerPivot 201106
Data Mining with Excel 2010 and PowerPivot 201106Mark Tabladillo
 
Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsPaolo Missier
 
Swift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowSwift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowDaniel S. Katz
 
Machine learning on streams of data
Machine learning on streams of dataMachine learning on streams of data
Machine learning on streams of dataTomasz Sosiński
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017Manish Pandey
 
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data StreamsMachine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data StreamsLightbend
 
Automation of the Drilling System: What has been done, what is being done, an...
Automation of the Drilling System: What has been done, what is being done, an...Automation of the Drilling System: What has been done, what is being done, an...
Automation of the Drilling System: What has been done, what is being done, an...Society of Petroleum Engineers
 
Data Lineage, Property Based Testing & Neo4j
Data Lineage, Property Based Testing & Neo4j Data Lineage, Property Based Testing & Neo4j
Data Lineage, Property Based Testing & Neo4j Neo4j
 
(ATS6-APP01) Unleashing the Power of Your Data with Discoverant
(ATS6-APP01) Unleashing the Power of Your Data with Discoverant(ATS6-APP01) Unleashing the Power of Your Data with Discoverant
(ATS6-APP01) Unleashing the Power of Your Data with DiscoverantBIOVIA
 
Deep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep FeaturesDeep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep FeaturesTuri, Inc.
 
Developing and deploying NLP services on the cloud using Azure ML and the Tea...
Developing and deploying NLP services on the cloud using Azure ML and the Tea...Developing and deploying NLP services on the cloud using Azure ML and the Tea...
Developing and deploying NLP services on the cloud using Azure ML and the Tea...Debraj GuhaThakurta
 
Rejunevating software reengineering processes
Rejunevating software reengineering processesRejunevating software reengineering processes
Rejunevating software reengineering processesmanishthaper
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...James Anderson
 
LIFT: A Legacy InFormation retrieval Tool
LIFT: A Legacy InFormation retrieval ToolLIFT: A Legacy InFormation retrieval Tool
LIFT: A Legacy InFormation retrieval ToolKellyton Brito
 

Similaire à Dataflows for Machine Learning Operations with Alex Rakowski & Andrei Paleyes (20)

Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningPaige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
 
The Quest for an Open Source Data Science Platform
 The Quest for an Open Source Data Science Platform The Quest for an Open Source Data Science Platform
The Quest for an Open Source Data Science Platform
 
An Overview of VIEW
An Overview of VIEWAn Overview of VIEW
An Overview of VIEW
 
[DSC Adria 23] Mikhail Rozhkov DVC in Machine Learning Engineering and MLOps ...
[DSC Adria 23] Mikhail Rozhkov DVC in Machine Learning Engineering and MLOps ...[DSC Adria 23] Mikhail Rozhkov DVC in Machine Learning Engineering and MLOps ...
[DSC Adria 23] Mikhail Rozhkov DVC in Machine Learning Engineering and MLOps ...
 
Datacamp @ Transparency Camp 2010
Datacamp @ Transparency Camp 2010Datacamp @ Transparency Camp 2010
Datacamp @ Transparency Camp 2010
 
Data Mining with Excel 2010 and PowerPivot 201106
Data Mining with Excel 2010 and PowerPivot 201106Data Mining with Excel 2010 and PowerPivot 201106
Data Mining with Excel 2010 and PowerPivot 201106
 
Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance records
 
Swift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowSwift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance Workflow
 
Machine learning on streams of data
Machine learning on streams of dataMachine learning on streams of data
Machine learning on streams of data
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017
 
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data StreamsMachine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
 
Automation of the Drilling System: What has been done, what is being done, an...
Automation of the Drilling System: What has been done, what is being done, an...Automation of the Drilling System: What has been done, what is being done, an...
Automation of the Drilling System: What has been done, what is being done, an...
 
Data Lineage, Property Based Testing & Neo4j
Data Lineage, Property Based Testing & Neo4j Data Lineage, Property Based Testing & Neo4j
Data Lineage, Property Based Testing & Neo4j
 
(ATS6-APP01) Unleashing the Power of Your Data with Discoverant
(ATS6-APP01) Unleashing the Power of Your Data with Discoverant(ATS6-APP01) Unleashing the Power of Your Data with Discoverant
(ATS6-APP01) Unleashing the Power of Your Data with Discoverant
 
Deep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep FeaturesDeep Learning Made Easy with Deep Features
Deep Learning Made Easy with Deep Features
 
Developing and deploying NLP services on the cloud using Azure ML and the Tea...
Developing and deploying NLP services on the cloud using Azure ML and the Tea...Developing and deploying NLP services on the cloud using Azure ML and the Tea...
Developing and deploying NLP services on the cloud using Azure ML and the Tea...
 
Rejunevating software reengineering processes
Rejunevating software reengineering processesRejunevating software reengineering processes
Rejunevating software reengineering processes
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
Linux capacity planning
Linux capacity planningLinux capacity planning
Linux capacity planning
 
LIFT: A Legacy InFormation retrieval Tool
LIFT: A Legacy InFormation retrieval ToolLIFT: A Legacy InFormation retrieval Tool
LIFT: A Legacy InFormation retrieval Tool
 

Plus de HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonHostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolHostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesHostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaHostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonHostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonHostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyHostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersHostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformHostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubHostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonHostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLHostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceHostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondHostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsHostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemHostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksHostedbyConfluent
 

Plus de HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Dernier

Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 

Dernier (20)

Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

Dataflows for Machine Learning Operations with Alex Rakowski & Andrei Paleyes

  • 1. Dataflows for machine learning operations Andrei Paleyes and Alex Rakowski
  • 2. Part I – Dataflow and Streams
  • 3. Dataflows for machine learning operations What is ML@CL group? 3 Machine Learning @ Computer Lab 3
  • 4. Dataflows for machine learning operations Dataflow research at ML@CL 4 4
  • 5. Dataflows for machine learning operations Open Source: Seldon Core 5 5
  • 6. Dataflows for machine learning operations Inference graph 6 6
  • 7. Dataflows for machine learning operations Pipelines 7 apiVersion: mlops.seldon.io/v1alpha1 kind: Pipeline metadata: name: road-counter spec: steps: - name: vectorize - name: faulty_image_filter inputs: - vectorize.outputs - name: object_detection inputs: - vectorize.outputs - faulty_image_filter.outputs . . . . . output: steps: - counter 7
  • 8. Core V1 - Central orchestration 8 8 Dataflows for machine learning operations
  • 9. Dataflow architecture 9 9 Dataflows for machine learning operations Dean, J. & Ghemawat, S. (2008). MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1), 107-113.
  • 10. Dataflow vs control flow 10 “In control flow, the processor follows explicit order, executing instructions one after another. In dataflow, by contrast, an instruction is ready to execute as soon as all its inputs are available.” M. Schwarzkopf, The Remarkable Utility of Dataflow Computing 10 Dataflows for machine learning operations
  • 11. Dataflow features 11 Separation of data and operations Data as a first class citizen Dataflow graph of the entire system Decentralisation 11 Dataflows for machine learning operations
  • 12. Dataflows for machine learning operations Dataflow and inference graph 12 12
  • 13. Role of streaming in dataflow 13 13 Dataflows for machine learning operations
  • 14. Control plane 14 14 Dataflows for machine learning operations
  • 15. Part II – Implementation
  • 16. Tech Stack — Constraints 16 OSS Platform agnostic Lightweight 16 Dataflows for machine learning operations
  • 17. Tech Stack — Contenders 17 17 Dataflows for machine learning operations
  • 18. Pub/Sub – Multiplexing Pipelines 18 18 Dataflows for machine learning operations
  • 19. Pub/Sub – Extending Pipelines 19 19 Dataflows for machine learning operations
  • 20. Pub/Sub – Repeatable Streams 20 20 Dataflows for machine learning operations
  • 21. Tech Stack — Choices 21 21 Dataflows for machine learning operations
  • 22. Anatomy of a Topic Name 22 <prefix>.<namespace>.<pipeline|model>.<name>.<inputs|outputs> seldon.default.pipeline.foo.inputs seldon.recommendations.model.bar.outputs <prefix>.<namespace>.errors.errors ml.default.errors.errors 22 Dataflows for machine learning operations
  • 23. Anatomy of an Inference Message 23 23 Dataflows for machine learning operations
  • 24. Open Inference Protocol 24 24 Dataflows for machine learning operations
  • 25. Open Inference Protocol – Batching 25 25 Dataflows for machine learning operations
  • 26. Dataflow Operations – Topic Chaining 26 26 Dataflows for machine learning operations
  • 27. Dataflow Operations – Tensor Projection 27 27 Dataflows for machine learning operations
  • 28. 28 Dataflow Operations – Batching Dataflows for machine learning operations
  • 29. 29 29 Dataflow Operations – Inner Join Dataflows for machine learning operations
  • 30. Dataflow Operations – Outer Join 30 30 Dataflows for machine learning operations
  • 31. 31 31 Dataflow Operations – Any Join Dataflows for machine learning operations
  • 32. Dataflow Operations – Triggers 32 32 Dataflows for machine learning operations
  • 33. Core v2 Pipeline = KStream Topology Topologies 33 33 Dataflows for machine learning operations apiVersion: mlops.seldon.io/v1alpha1 kind: Pipeline metadata: name: road-counter spec: steps: - name: vectorize - name: faulty_image_filter inputs: - vectorize.outputs - name: object_detection inputs: - vectorize.outputs - faulty_image_filter.outputs ... val builder = StreamsBuilder() builder .stream(inputTopic.topicName, consumerSerde) .filterForPipeline(inputTopic.pipelineNam e) ... val topology = builder.build()
  • 34. Pipeline Allocation – 1 Topology = 1 Engine 34 34 Dataflows for machine learning operations
  • 35. Pipeline Allocation – Shared Engines 35 35 Dataflows for machine learning operations
  • 36. Pipeline Allocation — Threads 36 36 Dataflows for machine learning operations
  • 37. How Many Is Too Many? 37 37 Dataflows for machine learning operations
  • 38. How Many Is Too Many? 38 38 Dataflows for machine learning operations KSQL Limitations in Confluent Cloud Kafka Summit 2022 - Modular Topologies KIP-809 (Modular Topologies)
  • 39. Advanced ML Applications 39 Dataflows for machine learning operations 39
  • 40. Challenges 40 40 Dataflows for machine learning operations Dynamism Scalability Variety
  • 41. Thanks for listening! 41 Dataflows for machine learning operations 41
  • 42. 42 The Seldon Core v2 Team hello@seldon.io Core v2 on GitHub Team Clive Cox CTO Rafal Skolasinski Machine Learning Engineer Alex Rakowski Software Engineer Sherif Akoush MLOps Engineer Adrian Gonzalez-Martin Machine Learning Engineer Andrei Paleyes MLOps Researcher Contact us: