SlideShare une entreprise Scribd logo
1  sur  61
DevOps for Data Science
by Stepan Pushkarev
CTO of Hydrosphere.io
DevOps is a catchy buzzword to optimise things
© Josh Wills
http://www.slideshare.net/g33ktalk/dataengconf-sf16-bridging-the-gap-between-data-science-and-data-engineerin
g
Is there life after marriage data science?
Dating, Flowers,
Dreams
Marriage
Happily lived
forever?
Collect & prepare
data
Build ML Model
This talk is for people who are married aware of
“other 99% of data science”
Dating, Flowers,
Dreams
Marriage
Happily lived
forever?
Collect &
prepare data
Build ML Model
This talk is NOT about
- Setting up Apache Spark/Hadoop cluster
- Configuring CI/CD tools like Jenkins
- Configuring monitoring tools & dashboards
- Agile/DevOps brainwashing & consulting story
Agenda
- Challenges in deploying analytics into
production
- Deploying analytics as a service
- Feedback loops: testing, monitoring,
analytics of analytics
Why do companies hire data scientists?
Why do companies hire data scientists?
To make products smarter.
What is a deliverable of data scientist and data
engineer?
What is a deliverable of data scientist?
Academic
paper?
ML Model? R/Python
script?
Jupiter
Notebook?
BI
Dashboard?
What has to be a deliverable of data scientist?
Data pipelines and machine
learning models that deployed as
pluggable, testable, supportable,
monitorable analytics services.
Option 1: Engineer to implement academic paper
Option 2: Engineer to re-implement R/Python script
Option 3: Run notebook as it is using cron
Option 3: Run notebook as it is using cron
Option 4: Build software to eat the world Data Science
Eating data science
© Daniel Tunkelang - Where should you put your data scientists? -
www.slideshare.net/dtunkelang/where-should-you-put-your-data-scientists
Step 1 (management): Integrate data scientists into
cross-functional teams
Eating data science
Step 2 (operations): Make environments scalable
and elastic. Finally.
Eating data science
Step 3: Make data scientists to write less code
Eating data science
Step 4: Deploy analytics as services
Step 5: Use feedback loops: testing, monitoring,
analytics for analytics
Build ML Model
Test
Monitor,
maintain,
analyze
Deploy as a service
Collect & prepare
data
Agenda
- Challenges in deploying analytics into
production
- Deploying analytics as a service
- Feedback loops: testing, monitoring,
analytics of analytics
Deploying analytics as a service
- Defines deliverable for Data Scientist / Data Engineer.
- Plugs analytics into end-to-end products through API.
- With the right tooling allows Data Scientist to deploy it in self
serve
Look around - proprietary ML based APIs
- Alchemy API
- Google Prediction API
- Cloud Vision API
- Azure ML
Can we do our own on top of Apache Spark?
Bad Practice #1. Business logic in Spark? WTF?
Bad Practice #2. Database as API
Execute reporting job
Mark Job as complete &
save result
Poll for new tasks
Poll for resultSet a flag to build a report
Bad Practice #3. Low level HTTP API
When Data Scientists
design an API...
Hydrosphere Mist - a service for exposing analytics
jobs and machine learning models as web services
Types of analytics services
- Enterprise Analytics services
- Reactive or Streaming services
- Realtime ML services
Enterprise analytics services
- Could not be
pre-calculated
- On-demand
parametrized jobs
- Requires large scale
processing
- Reporting
- Simulation (pricing, bank
stress testing, taxi rides)
- Forecasting (ad
campaign, energy
savings, others)
- Ad-hoc analytics tools
for business users
Demo #1
Reactive or Streaming services
Reactive or Streaming Reporting services
Demo #2
Realtime Machine Learning Services
Train models in Apache Spark and deploy it for realtime
low latency serving/scoring with high throughput
PMML is not an option
Spark ML, TensorFlow, H2O, Vowpal Wabbit, and every new ML
library invents uses own serialisation format
Format is not an issue if we re-define a deliverable for
ML model
xml, json, parquet, pojo, other
Single row Serving / Scoring
layer
Large Scale,
Batch
processing
engine
Monitoring,
testing
integration
Deliverable artifact for Machine Learning Model
Repository
Zooming out
MLLib model TensorFlow model Other model
Unified Serving/Scoring API
Demo #3
Agenda
- Challenges in deploying analytics into
production
- Deploying analytics as a service
- Feedback loops: testing,
monitoring, analytics of analytics
Testing, monitoring, analytics of analytics
- Poorly discussed in community.
- We are in production, baby!
- Regression.
- State matters. Model lifetime is limited.
- Data drifts, pipelines and model fail silently.
● Saves time
● Saves money
● Saves lifes
TDD world
TDD world does not work here
Pff… easy:
- Unit tests - by platform developers
- Integration tests - often impossible
Not clear who and not clear how:
- Regression
- Data Validation
- Production testing
- Data and ML pipelines quality monitoring
Need either “Data QA” & “Data Ops” people
or … AI
(formula for the next 10 000 startups - take something and add AI)
Smart data structures and dumb code works a lot
better than the other way around
Can we develop DSL and Data Structure which is
smart enough to learn from data patterns, trends and
anomalies to be self-QAed?
QA view: a universe vs. big data analytics system
People observe and monitor signals from stars to
check that universe is not broken today
Monitoring! Grafana, Kibana
And Marijuana to make sense out of it
Metrics processing, monitoring, correlation insights...
...Isn’t it a big data analytics task on its own?
ML pipeline Kafka
Analytics jobs
for metrics
Emit Metrics
Stream it back
into Spark
Context
Use insights to
make our data
structures
smart
Solution: loop of analytics for analytics
Benefits
● Don’t need to talk to Ops! :)
● Already have Apache Spark and Kafka in place
● Data Scientist in the loop!
● Unlimited flexibility in analytics, correlation and
using ML for ML
● Models could feeded back into Smart self
QA-ed data structures.
Hydrosphere Swirl - a system that creates a swirl of
analytics for analytics
Original ML
pipeline
Kafka
Streaming or
Batch Swirl
jobs
Hydrosphere
Swirl
Plug, modify,
deploy, run jobs &
consume results
Metrics
definition,
Notebook
integration
Hydrosphere
Mist
(1) Emit metrics
Hydrosphere Swirl: Vision
Demo #4
Classify by
sentiment
Twitter
Prepare
Data
Serve ads to
user
Hydrosphere Swirl
Invalid records 10/sec 2k/sec0.8Ratio Clicks
Swirl Demo: Serve Ads to users with positive Tweets
Classify by
sentiment
Twitter
Ingest &
transform
Serve ads to
user
Hydrosphere Swirl
Invalid records 20k/sec 10/sec0.2 Clicks
Data pipeline
is broken
Ratio
Swirl Demo: Serve Ads to users with positive Tweets
Twitter
Ingest &
transform
Serve ads to
user
Hydrosphere Swirl
Invalid records 10/sec 10/sec0.2 Clicks
New ML model
deployment
Deployed
bug in ML
code
Ratio
Swirl Demo: Serve Ads to users with positive Tweets
Thank you
Looking for
- Feedback
- Advisors, mentors & partners
- Pilots and early adopters
Stay in touch
- @hydrospheredata
- https://github.com/Hydrospheredata
- http://hydrosphere.io/
- spushkarev@hydrosphere.io

Contenu connexe

Tendances

Service Mesh - Observability
Service Mesh - ObservabilityService Mesh - Observability
Service Mesh - ObservabilityAraf Karsh Hamid
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...confluent
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsMárton Kodok
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaGuido Schmutz
 
Benefits of Stream Processing and Apache Kafka Use Cases
Benefits of Stream Processing and Apache Kafka Use CasesBenefits of Stream Processing and Apache Kafka Use Cases
Benefits of Stream Processing and Apache Kafka Use Casesconfluent
 
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and ManageEnd to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and ManageAnimesh Singh
 
Azure Migrate
Azure MigrateAzure Migrate
Azure MigrateMustafa
 
Apply MLOps at Scale by H&M
Apply MLOps at Scale by H&MApply MLOps at Scale by H&M
Apply MLOps at Scale by H&MDatabricks
 
Robust MLOps with Open-Source: ModelDB, Docker, Jenkins, and Prometheus
Robust MLOps with Open-Source: ModelDB, Docker, Jenkins, and PrometheusRobust MLOps with Open-Source: ModelDB, Docker, Jenkins, and Prometheus
Robust MLOps with Open-Source: ModelDB, Docker, Jenkins, and PrometheusManasi Vartak
 
Cloud Migration, Application Modernization, and Security
Cloud Migration, Application Modernization, and Security Cloud Migration, Application Modernization, and Security
Cloud Migration, Application Modernization, and Security Tom Laszewski
 
An Agile Approach to Accelerate Mass Migration | AWS Public Sector Summit 2016
An Agile Approach to Accelerate Mass Migration | AWS Public Sector Summit 2016An Agile Approach to Accelerate Mass Migration | AWS Public Sector Summit 2016
An Agile Approach to Accelerate Mass Migration | AWS Public Sector Summit 2016Amazon Web Services
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsWeaveworks
 
The Ideal Approach to Application Modernization; Which Way to the Cloud?
The Ideal Approach to Application Modernization; Which Way to the Cloud?The Ideal Approach to Application Modernization; Which Way to the Cloud?
The Ideal Approach to Application Modernization; Which Way to the Cloud?Codit
 
금융 회사를 위한 클라우드 이용 가이드 – 신은수 AWS 솔루션즈 아키텍트, 김호영 AWS 정책협력 담당:: AWS Cloud Week ...
금융 회사를 위한 클라우드 이용 가이드 –  신은수 AWS 솔루션즈 아키텍트, 김호영 AWS 정책협력 담당:: AWS Cloud Week ...금융 회사를 위한 클라우드 이용 가이드 –  신은수 AWS 솔루션즈 아키텍트, 김호영 AWS 정책협력 담당:: AWS Cloud Week ...
금융 회사를 위한 클라우드 이용 가이드 – 신은수 AWS 솔루션즈 아키텍트, 김호영 AWS 정책협력 담당:: AWS Cloud Week ...Amazon Web Services Korea
 
FinOps-Azure Capabilities
FinOps-Azure CapabilitiesFinOps-Azure Capabilities
FinOps-Azure CapabilitiesGordonByers3
 
Migrate to Microsoft Azure with Confidence
Migrate to Microsoft Azure with ConfidenceMigrate to Microsoft Azure with Confidence
Migrate to Microsoft Azure with ConfidenceDavid J Rosenthal
 

Tendances (20)

Service Mesh - Observability
Service Mesh - ObservabilityService Mesh - Observability
Service Mesh - Observability
 
MLOps with Kubeflow
MLOps with Kubeflow MLOps with Kubeflow
MLOps with Kubeflow
 
infrastructure as code
infrastructure as codeinfrastructure as code
infrastructure as code
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflows
 
Building Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache KafkaBuilding Event Driven (Micro)services with Apache Kafka
Building Event Driven (Micro)services with Apache Kafka
 
Machine Learning Operations & Azure
Machine Learning Operations & AzureMachine Learning Operations & Azure
Machine Learning Operations & Azure
 
Benefits of Stream Processing and Apache Kafka Use Cases
Benefits of Stream Processing and Apache Kafka Use CasesBenefits of Stream Processing and Apache Kafka Use Cases
Benefits of Stream Processing and Apache Kafka Use Cases
 
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and ManageEnd to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
 
Azure Migrate
Azure MigrateAzure Migrate
Azure Migrate
 
Apply MLOps at Scale by H&M
Apply MLOps at Scale by H&MApply MLOps at Scale by H&M
Apply MLOps at Scale by H&M
 
Robust MLOps with Open-Source: ModelDB, Docker, Jenkins, and Prometheus
Robust MLOps with Open-Source: ModelDB, Docker, Jenkins, and PrometheusRobust MLOps with Open-Source: ModelDB, Docker, Jenkins, and Prometheus
Robust MLOps with Open-Source: ModelDB, Docker, Jenkins, and Prometheus
 
Cloud Migration, Application Modernization, and Security
Cloud Migration, Application Modernization, and Security Cloud Migration, Application Modernization, and Security
Cloud Migration, Application Modernization, and Security
 
An Agile Approach to Accelerate Mass Migration | AWS Public Sector Summit 2016
An Agile Approach to Accelerate Mass Migration | AWS Public Sector Summit 2016An Agile Approach to Accelerate Mass Migration | AWS Public Sector Summit 2016
An Agile Approach to Accelerate Mass Migration | AWS Public Sector Summit 2016
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOps
 
The Ideal Approach to Application Modernization; Which Way to the Cloud?
The Ideal Approach to Application Modernization; Which Way to the Cloud?The Ideal Approach to Application Modernization; Which Way to the Cloud?
The Ideal Approach to Application Modernization; Which Way to the Cloud?
 
AWS networking fundamentals
AWS networking fundamentalsAWS networking fundamentals
AWS networking fundamentals
 
금융 회사를 위한 클라우드 이용 가이드 – 신은수 AWS 솔루션즈 아키텍트, 김호영 AWS 정책협력 담당:: AWS Cloud Week ...
금융 회사를 위한 클라우드 이용 가이드 –  신은수 AWS 솔루션즈 아키텍트, 김호영 AWS 정책협력 담당:: AWS Cloud Week ...금융 회사를 위한 클라우드 이용 가이드 –  신은수 AWS 솔루션즈 아키텍트, 김호영 AWS 정책협력 담당:: AWS Cloud Week ...
금융 회사를 위한 클라우드 이용 가이드 – 신은수 AWS 솔루션즈 아키텍트, 김호영 AWS 정책협력 담당:: AWS Cloud Week ...
 
FinOps-Azure Capabilities
FinOps-Azure CapabilitiesFinOps-Azure Capabilities
FinOps-Azure Capabilities
 
Migrate to Microsoft Azure with Confidence
Migrate to Microsoft Azure with ConfidenceMigrate to Microsoft Azure with Confidence
Migrate to Microsoft Azure with Confidence
 

Similaire à DevOps for DataScience

Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsAnyscale
 
Serverless machine learning operations
Serverless machine learning operationsServerless machine learning operations
Serverless machine learning operationsStepan Pushkarev
 
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...All Things Open
 
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)dtz001
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 editionDavid Talby
 
mlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecyclemlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecycleDatabricks
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....Databricks
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in ProductionDataWorks Summit
 
EPAM ML/AI Accelerator - ODAHU
EPAM ML/AI Accelerator - ODAHUEPAM ML/AI Accelerator - ODAHU
EPAM ML/AI Accelerator - ODAHUDmitrii Suslov
 
Hydrosphere.io Platform for AI/ML Operations Automation
Hydrosphere.io Platform for AI/ML Operations AutomationHydrosphere.io Platform for AI/ML Operations Automation
Hydrosphere.io Platform for AI/ML Operations AutomationRustem Zakiev
 
Spark and machine learning in microservices architecture
Spark and machine learning in microservices architectureSpark and machine learning in microservices architecture
Spark and machine learning in microservices architectureStepan Pushkarev
 
IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for SparkMark Kerzner
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabszekeLabs Technologies
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...James Anderson
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningProvectus
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkDatabricks
 
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...ScyllaDB
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learningRajesh Muppalla
 
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...Karthik Murugesan
 

Similaire à DevOps for DataScience (20)

Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
 
03_aiops-1.pptx
03_aiops-1.pptx03_aiops-1.pptx
03_aiops-1.pptx
 
Serverless machine learning operations
Serverless machine learning operationsServerless machine learning operations
Serverless machine learning operations
 
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
 
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 edition
 
mlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecyclemlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecycle
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
 
EPAM ML/AI Accelerator - ODAHU
EPAM ML/AI Accelerator - ODAHUEPAM ML/AI Accelerator - ODAHU
EPAM ML/AI Accelerator - ODAHU
 
Hydrosphere.io Platform for AI/ML Operations Automation
Hydrosphere.io Platform for AI/ML Operations AutomationHydrosphere.io Platform for AI/ML Operations Automation
Hydrosphere.io Platform for AI/ML Operations Automation
 
Spark and machine learning in microservices architecture
Spark and machine learning in microservices architectureSpark and machine learning in microservices architecture
Spark and machine learning in microservices architecture
 
IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for Spark
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabs
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
 
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
 
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...
 

Plus de Stepan Pushkarev

AI for the Human Retina to Protect Newborn Vision
AI for the Human Retina to Protect Newborn VisionAI for the Human Retina to Protect Newborn Vision
AI for the Human Retina to Protect Newborn VisionStepan Pushkarev
 
Automating machine learning lifecycle with kubeflow
Automating machine learning lifecycle with kubeflowAutomating machine learning lifecycle with kubeflow
Automating machine learning lifecycle with kubeflowStepan Pushkarev
 
Handling inference in anomalous ever changing environment
Handling inference in anomalous ever changing environmentHandling inference in anomalous ever changing environment
Handling inference in anomalous ever changing environmentStepan Pushkarev
 
Data ops: Machine Learning in production
Data ops: Machine Learning in productionData ops: Machine Learning in production
Data ops: Machine Learning in productionStepan Pushkarev
 
Multi runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learningMulti runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learningStepan Pushkarev
 

Plus de Stepan Pushkarev (7)

AI for the Human Retina to Protect Newborn Vision
AI for the Human Retina to Protect Newborn VisionAI for the Human Retina to Protect Newborn Vision
AI for the Human Retina to Protect Newborn Vision
 
Automating machine learning lifecycle with kubeflow
Automating machine learning lifecycle with kubeflowAutomating machine learning lifecycle with kubeflow
Automating machine learning lifecycle with kubeflow
 
Handling inference in anomalous ever changing environment
Handling inference in anomalous ever changing environmentHandling inference in anomalous ever changing environment
Handling inference in anomalous ever changing environment
 
Monitoring AI with AI
Monitoring AI with AIMonitoring AI with AI
Monitoring AI with AI
 
Data ops: Machine Learning in production
Data ops: Machine Learning in productionData ops: Machine Learning in production
Data ops: Machine Learning in production
 
Multi runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learningMulti runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learning
 
Spark ML Pipeline serving
Spark ML Pipeline servingSpark ML Pipeline serving
Spark ML Pipeline serving
 

Dernier

WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
tonesoftg
tonesoftgtonesoftg
tonesoftglanshi9
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxAnnaArtyushina1
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...masabamasaba
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationJuha-Pekka Tolvanen
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells
 

Dernier (20)

WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 

DevOps for DataScience

  • 1. DevOps for Data Science by Stepan Pushkarev CTO of Hydrosphere.io
  • 2. DevOps is a catchy buzzword to optimise things
  • 4. Is there life after marriage data science? Dating, Flowers, Dreams Marriage Happily lived forever? Collect & prepare data Build ML Model
  • 5. This talk is for people who are married aware of “other 99% of data science” Dating, Flowers, Dreams Marriage Happily lived forever? Collect & prepare data Build ML Model
  • 6. This talk is NOT about - Setting up Apache Spark/Hadoop cluster - Configuring CI/CD tools like Jenkins - Configuring monitoring tools & dashboards - Agile/DevOps brainwashing & consulting story
  • 7. Agenda - Challenges in deploying analytics into production - Deploying analytics as a service - Feedback loops: testing, monitoring, analytics of analytics
  • 8. Why do companies hire data scientists?
  • 9. Why do companies hire data scientists? To make products smarter.
  • 10. What is a deliverable of data scientist and data engineer?
  • 11. What is a deliverable of data scientist? Academic paper? ML Model? R/Python script? Jupiter Notebook? BI Dashboard?
  • 12. What has to be a deliverable of data scientist? Data pipelines and machine learning models that deployed as pluggable, testable, supportable, monitorable analytics services.
  • 13. Option 1: Engineer to implement academic paper
  • 14. Option 2: Engineer to re-implement R/Python script
  • 15. Option 3: Run notebook as it is using cron
  • 16. Option 3: Run notebook as it is using cron
  • 17. Option 4: Build software to eat the world Data Science
  • 18. Eating data science © Daniel Tunkelang - Where should you put your data scientists? - www.slideshare.net/dtunkelang/where-should-you-put-your-data-scientists Step 1 (management): Integrate data scientists into cross-functional teams
  • 19. Eating data science Step 2 (operations): Make environments scalable and elastic. Finally.
  • 20. Eating data science Step 3: Make data scientists to write less code
  • 21. Eating data science Step 4: Deploy analytics as services
  • 22. Step 5: Use feedback loops: testing, monitoring, analytics for analytics Build ML Model Test Monitor, maintain, analyze Deploy as a service Collect & prepare data
  • 23. Agenda - Challenges in deploying analytics into production - Deploying analytics as a service - Feedback loops: testing, monitoring, analytics of analytics
  • 24. Deploying analytics as a service - Defines deliverable for Data Scientist / Data Engineer. - Plugs analytics into end-to-end products through API. - With the right tooling allows Data Scientist to deploy it in self serve
  • 25. Look around - proprietary ML based APIs - Alchemy API - Google Prediction API - Cloud Vision API - Azure ML Can we do our own on top of Apache Spark?
  • 26. Bad Practice #1. Business logic in Spark? WTF?
  • 27. Bad Practice #2. Database as API Execute reporting job Mark Job as complete & save result Poll for new tasks Poll for resultSet a flag to build a report
  • 28. Bad Practice #3. Low level HTTP API When Data Scientists design an API...
  • 29. Hydrosphere Mist - a service for exposing analytics jobs and machine learning models as web services
  • 30. Types of analytics services - Enterprise Analytics services - Reactive or Streaming services - Realtime ML services
  • 31. Enterprise analytics services - Could not be pre-calculated - On-demand parametrized jobs - Requires large scale processing - Reporting - Simulation (pricing, bank stress testing, taxi rides) - Forecasting (ad campaign, energy savings, others) - Ad-hoc analytics tools for business users
  • 34. Reactive or Streaming Reporting services
  • 36. Realtime Machine Learning Services Train models in Apache Spark and deploy it for realtime low latency serving/scoring with high throughput
  • 37. PMML is not an option Spark ML, TensorFlow, H2O, Vowpal Wabbit, and every new ML library invents uses own serialisation format
  • 38. Format is not an issue if we re-define a deliverable for ML model xml, json, parquet, pojo, other Single row Serving / Scoring layer Large Scale, Batch processing engine Monitoring, testing integration Deliverable artifact for Machine Learning Model
  • 39. Repository Zooming out MLLib model TensorFlow model Other model Unified Serving/Scoring API
  • 41. Agenda - Challenges in deploying analytics into production - Deploying analytics as a service - Feedback loops: testing, monitoring, analytics of analytics
  • 42. Testing, monitoring, analytics of analytics - Poorly discussed in community. - We are in production, baby! - Regression. - State matters. Model lifetime is limited. - Data drifts, pipelines and model fail silently. ● Saves time ● Saves money ● Saves lifes
  • 44. TDD world does not work here Pff… easy: - Unit tests - by platform developers - Integration tests - often impossible Not clear who and not clear how: - Regression - Data Validation - Production testing - Data and ML pipelines quality monitoring
  • 45. Need either “Data QA” & “Data Ops” people or … AI (formula for the next 10 000 startups - take something and add AI)
  • 46. Smart data structures and dumb code works a lot better than the other way around
  • 47. Can we develop DSL and Data Structure which is smart enough to learn from data patterns, trends and anomalies to be self-QAed?
  • 48. QA view: a universe vs. big data analytics system People observe and monitor signals from stars to check that universe is not broken today
  • 50. And Marijuana to make sense out of it
  • 51. Metrics processing, monitoring, correlation insights... ...Isn’t it a big data analytics task on its own?
  • 52.
  • 53. ML pipeline Kafka Analytics jobs for metrics Emit Metrics Stream it back into Spark Context Use insights to make our data structures smart Solution: loop of analytics for analytics
  • 54. Benefits ● Don’t need to talk to Ops! :) ● Already have Apache Spark and Kafka in place ● Data Scientist in the loop! ● Unlimited flexibility in analytics, correlation and using ML for ML ● Models could feeded back into Smart self QA-ed data structures.
  • 55. Hydrosphere Swirl - a system that creates a swirl of analytics for analytics
  • 56. Original ML pipeline Kafka Streaming or Batch Swirl jobs Hydrosphere Swirl Plug, modify, deploy, run jobs & consume results Metrics definition, Notebook integration Hydrosphere Mist (1) Emit metrics Hydrosphere Swirl: Vision
  • 58. Classify by sentiment Twitter Prepare Data Serve ads to user Hydrosphere Swirl Invalid records 10/sec 2k/sec0.8Ratio Clicks Swirl Demo: Serve Ads to users with positive Tweets
  • 59. Classify by sentiment Twitter Ingest & transform Serve ads to user Hydrosphere Swirl Invalid records 20k/sec 10/sec0.2 Clicks Data pipeline is broken Ratio Swirl Demo: Serve Ads to users with positive Tweets
  • 60. Twitter Ingest & transform Serve ads to user Hydrosphere Swirl Invalid records 10/sec 10/sec0.2 Clicks New ML model deployment Deployed bug in ML code Ratio Swirl Demo: Serve Ads to users with positive Tweets
  • 61. Thank you Looking for - Feedback - Advisors, mentors & partners - Pilots and early adopters Stay in touch - @hydrospheredata - https://github.com/Hydrospheredata - http://hydrosphere.io/ - spushkarev@hydrosphere.io