SlideShare une entreprise Scribd logo
1  sur  44
Télécharger pour lire hors ligne
The Role of the
DevOps in the
Data Analytics
Teams
J ON THE BEACH
05/21/16
MORPHEDWITH
DEEP LEARNING™
TYPICAL OPSGUY
(source: Reddit)
TYPICAL YOUNGDATA SCIENTIST
(source: Common Sense)
My initial interests
Type Systems Automated Proving Abstract Program Interpretation Functional Programming Garbage Collection
and Vms
Graph Analytics Chess IA Natural Language Processing 80% Emacs /20% VIM
So to sum it up …
I (USED TO?)
TO BE A BIG NERD
Collaboration
CLICKERS CODERS
Software is a Human Problem
I ended up building
A collaborative software
For data science....
DEV OPS
&&
DATA
Let’s get back to the (brief) history of DevOps
Agile Conference, 2008
Scrum, and Agile
in an operational context
He	!	We	should	have	
our	own	velocity	in	
Belgium
10 deploysper day : Dev and Op
Operation at Flickr
O’Reilly Velocity, June 2009Patrick Dubois
2007
Dev
Ops
QA
DevOpsDays
Ghent, October 2009
DevOps
DevOps is the practice of
operations and development
engineers participating together
in the entire service lifecycle,
from design through the
development
process to production support.
DevOps is also characterized
by operations staff making
use many of the same
techniques as developers for
their systems work.
Invite Ops to the Dev Meeting
Oh. And let them SPEAK
Ops should know how to code
Let’s take an example: John devops from 2009
Learnt Python the Hard Way
Startedwith Puppet 1.0
Used EC2 before ELB and EBS !
Hegelian perspective
Conflict and Frustration
Concept
Combination
Catharsis
Create Culture
Share
Create Tools
Dev
+
Ops
There’s been op associated to data for a while ?
It’s called Business Intelligence !
History of Data Analytics (Oversimplified)
2013 2014 2015 2016 2017 2018
Moving to a world of automated decision making
DATA
FOR MORE INSIGHTS
DATA
FOR AUTOMATED DECISIONS
The Age Of Distributed Intelligence
Global,	Personalised	
and	Real	Time	Data	
Driven	Services
Data, Analytics and Data Science
Conflict and Frustration
Concept
Combination
Catharsis
Create Culture
Share
Create Tools
Data
+
Science
Welcome to Technoslavia !
Classic Business Intelligence Team Organization
Business Leader
Data Consumer
Line-of-business
Data Consumer Business Project
Sponsor
BI Solution Architect
Model Designer
ETL Developer
Dashboard / Report Designer
Specs
Dim
Big Boss
Data Science Team Organization
Business Leader
Data Consumer
Line-of-business
Data Consumer
Business Project
Sponsor
Data Engineer
Data Analyst
System Engineer /
Data Architect
Business
Needs
Data Scientist
IT
Constraints
I.T.
Is there room for a new role ?
Data
Plumberer
Data
Engineer
Data
Scientist
Data
Waiter
Data
Cleaner
Data
Analyst
REAL
JOB
DREAM
JOB
DevOps For Data?
Imagine
a company building
a new ”smart car” app: AutoFine™
”Revolutionary Collaborative network that check the quality of your driving and punish
You with virtual fines if you’re a bad driver”
Imagine
a company building
a new ”smart car” service AutoFine™
10 TB of Data
Every Month
Hive / Spark /
Python
10 Different
PredictiveModels
Real-Time API
/ Workflow
????
??
??
OPERATIONS : Whose is responsible for …
Check that the newly
trained model perform as
expected
Check that the product catalog
and the websitetags remain
consistent
Check that the Hadoop cluster scales
as expected and as enough
bandwidthto handlethe workload
Test the performance for
the real-time API
Monitor the performanceof
the model and decide to
rollback / maintain/ rollout
DATA OPS
As a
Philosophy
X OPS PHILOSOPHY
Highly
consensual
Highly
controversial
Create an API culture
Do not share
o Random Piece of Code
o Flat File
o Email
Do share
ü Reproductible documentedworkflows
ü Clean, documentedAPIs
Defensive Data
Programming
•Software has errors.
•You are not your software, yet
you are are responsible for the
errors.
•You can never remove the
errors, only reduce their
probability.
Defensive Data Programming
•Handle the case when one of the input file is empty
•Handle the case when a new value appear
•Handle the case when two columns become completely
correlated
•Handle the case when a column is 16k long
•Etc.. Etc. etc…
Monitoring : the alerts for people who love it
• Performance ….
• Time Spent …
• Number of Errors …
Monitoring : Business Informal Monitoring
• % Opening
• Market Spent
• Exception User Events …
Resource Allocation
I’ve got this strange
Error ”OutOfMemory” . Do you know what it is
?
Why is the Hadoop Cluster going slower than
my laptop ?
The Philosophy of pre-allocating
more resources than necessary
Get to the latest package culture …
Data Scientist
I need the latest version of scikit
And networkX ….
And coud you repackage that
To enable TensorFlow optimizations ?
System Administrator
…..
The culture of containers
Developers’ Sandbox
DATA OPS
As a
Job Title
Job Title : a matter of name, $$ and social ladder
Data scientist Data Ops
Developer
Statistician
Full Stack Developer
Sys Admin
DevOps
Job Role : A matter of Do or Don’t
DO DON’T
Things you really want to do Things you really don’t want to get into
FIGHT THE
TOY PLATFORM ANTI-PATTERN
Test and Invest in Infrastructure == Skilled People
or
Go For Cloud / Packaged Infrastructure
Your	Brand	New	Hadoop	Cluster
is	perceived	as	slow,	not	so	used	
and	not	reliable
FIGHT THE
TECHNO MISMATCH ANTI-PATTERN
Assume Being Polyglot
or
Be a Dictator
VS
VS
The	Python
Clan
The	R
Tribe
The	Old	Elephant
Fraternity
The	New	Elephant
Club
GETTINGDATA POLITICS
> DATA NOT
AVAILABLE
GETTINGDATA POLITICS
THE	FOX
Hunt for Big Problem!
Convince the CEO that you can
Solve a Business Critical problem
And use it as an excuse to get all
The data you want !
THE	SPIDER
Create Network !
Create a set of trackers or
Addictive Data Collection
internally
To get Data on your side !
PREDICTIVE ANALYTICS DEPLOYMENT STRATEGY
Website	2000’	winners	
Companies	that	were	able	to	release fast	
"Artificial	Intelligence	with		Data	for	
Internet	of	Things"	2010’	winners	
Companies	able	to	put	intelligence	in	production
?
Design a way to put “PREDITICTIVE MODELS”
IN PRODUCTION
OWN ANONYMISATION / PRIVACY
/ DATA SECURITY WITH PARTNERS ISSUES
Technical Feasibility ? What can or cannot be done ?
Let’s Wrap IT Up !
A Company Building a GPS powered automated car fine system
10 TB of Data
Every Month
Hive / Spark /
Python
10 Different
PredictiveModels
Real-Time API
/ Workflow
Robust
Workflow
With
Data Quality
Checks
Functional
Monitoring
By Business
People
through
Slack and
Dashboards
Monitoring
for the API
Feature
Engineering
Pipeline in
Python
But you where do you stand ?
???? ???? ???? ?????
What's your roll-back strategy like?
What kind of multi-variatetesting or strategies do
you havein place for predictivemodels?
How do you manage the robustness of your data flow productionscripts?
How can businesspeople monitor the
performance of the application?
http://bit.ly/production-survey
Food		for	thoughts
www.dataiku.com/blog
THANK	YOU	!
http://bit.ly/production-survey http://bit.ly/production-survey

Contenu connexe

Tendances

3 Keys To Successful Master Data Management - Final Presentation
3 Keys To Successful Master Data Management - Final Presentation3 Keys To Successful Master Data Management - Final Presentation
3 Keys To Successful Master Data Management - Final Presentation
James Chi
 

Tendances (20)

Gartner: Master Data Management Functionality
Gartner: Master Data Management FunctionalityGartner: Master Data Management Functionality
Gartner: Master Data Management Functionality
 
Data and AI summit: data pipelines observability with open lineage
Data and AI summit: data pipelines observability with open lineageData and AI summit: data pipelines observability with open lineage
Data and AI summit: data pipelines observability with open lineage
 
Enterprise Architecture for Dummies
Enterprise Architecture for DummiesEnterprise Architecture for Dummies
Enterprise Architecture for Dummies
 
3 Keys To Successful Master Data Management - Final Presentation
3 Keys To Successful Master Data Management - Final Presentation3 Keys To Successful Master Data Management - Final Presentation
3 Keys To Successful Master Data Management - Final Presentation
 
Data Governance — Aligning Technical and Business Approaches
Data Governance — Aligning Technical and Business ApproachesData Governance — Aligning Technical and Business Approaches
Data Governance — Aligning Technical and Business Approaches
 
‏‏‏‏‏‏‏‏‏‏Chapter 12: Data Quality Management
‏‏‏‏‏‏‏‏‏‏Chapter 12: Data Quality Management‏‏‏‏‏‏‏‏‏‏Chapter 12: Data Quality Management
‏‏‏‏‏‏‏‏‏‏Chapter 12: Data Quality Management
 
Effective Strategy Execution with Capability-Based Planning, Enterprise Arch...
Effective Strategy Execution with Capability-Based Planning, Enterprise Arch...Effective Strategy Execution with Capability-Based Planning, Enterprise Arch...
Effective Strategy Execution with Capability-Based Planning, Enterprise Arch...
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
Managing the Machine Learning Lifecycle with MLOps
Managing the Machine Learning Lifecycle with MLOpsManaging the Machine Learning Lifecycle with MLOps
Managing the Machine Learning Lifecycle with MLOps
 
Data Architecture - The Foundation for Enterprise Architecture and Governance
Data Architecture - The Foundation for Enterprise Architecture and GovernanceData Architecture - The Foundation for Enterprise Architecture and Governance
Data Architecture - The Foundation for Enterprise Architecture and Governance
 
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Value analysis with Value Stream and Capability modeling
Value analysis with Value Stream and Capability modelingValue analysis with Value Stream and Capability modeling
Value analysis with Value Stream and Capability modeling
 
BI Consultancy - Data, Analytics and Strategy
BI Consultancy - Data, Analytics and StrategyBI Consultancy - Data, Analytics and Strategy
BI Consultancy - Data, Analytics and Strategy
 
Data Modeling & Data Integration
Data Modeling & Data IntegrationData Modeling & Data Integration
Data Modeling & Data Integration
 
The Evolution of Self-Service Analytics
The Evolution of Self-Service AnalyticsThe Evolution of Self-Service Analytics
The Evolution of Self-Service Analytics
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
 
Introduction to business architecture
Introduction to business architectureIntroduction to business architecture
Introduction to business architecture
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 

Similaire à The Rise of the DataOps - Dataiku - J On the Beach 2016

jlettvin.resume.20160922.STAR
jlettvin.resume.20160922.STARjlettvin.resume.20160922.STAR
jlettvin.resume.20160922.STAR
Jonathan Lettvin
 
Stream SQL eventflow visual programming for real programmers presentation
Stream SQL eventflow visual programming for real programmers presentationStream SQL eventflow visual programming for real programmers presentation
Stream SQL eventflow visual programming for real programmers presentation
streambase
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
Paco Nathan
 

Similaire à The Rise of the DataOps - Dataiku - J On the Beach 2016 (20)

The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
Dataiku - data driven nyc - april 2016 - the solitude of the data team m...
Dataiku  -  data driven nyc  - april  2016 - the  solitude of the data team m...Dataiku  -  data driven nyc  - april  2016 - the  solitude of the data team m...
Dataiku - data driven nyc - april 2016 - the solitude of the data team m...
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
 
Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)
 
Maintainable Machine Learning Products
Maintainable Machine Learning ProductsMaintainable Machine Learning Products
Maintainable Machine Learning Products
 
Containers, Serverless, Polyglot Development World, And Others…10 trends resh...
Containers, Serverless, Polyglot Development World, And Others…10 trends resh...Containers, Serverless, Polyglot Development World, And Others…10 trends resh...
Containers, Serverless, Polyglot Development World, And Others…10 trends resh...
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area ML
 
Agile.. and then? – Enterprise DevOps: the digital transformation of the IT...
Agile..  and then? – Enterprise DevOps:  the digital transformation of the IT...Agile..  and then? – Enterprise DevOps:  the digital transformation of the IT...
Agile.. and then? – Enterprise DevOps: the digital transformation of the IT...
 
DevOps feedback loops
DevOps feedback loopsDevOps feedback loops
DevOps feedback loops
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache Spark
 
jlettvin.resume.20160922.STAR
jlettvin.resume.20160922.STARjlettvin.resume.20160922.STAR
jlettvin.resume.20160922.STAR
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
Dean4j@Njug5
Dean4j@Njug5Dean4j@Njug5
Dean4j@Njug5
 
Stream SQL eventflow visual programming for real programmers presentation
Stream SQL eventflow visual programming for real programmers presentationStream SQL eventflow visual programming for real programmers presentation
Stream SQL eventflow visual programming for real programmers presentation
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
 
Puppet for SysAdmins
Puppet for SysAdminsPuppet for SysAdmins
Puppet for SysAdmins
 
Talend webinar
Talend webinarTalend webinar
Talend webinar
 
ETL using Big Data Talend
ETL using Big Data Talend  ETL using Big Data Talend
ETL using Big Data Talend
 
TechRadarCon 2022 | Have you built your platform yet ?
TechRadarCon 2022 | Have you built your platform yet ?TechRadarCon 2022 | Have you built your platform yet ?
TechRadarCon 2022 | Have you built your platform yet ?
 
Python webinar 4th june
Python webinar 4th junePython webinar 4th june
Python webinar 4th june
 

Plus de Dataiku

Dataiku productive application to production - pap is may 2015
Dataiku    productive application to production - pap is may 2015 Dataiku    productive application to production - pap is may 2015
Dataiku productive application to production - pap is may 2015
Dataiku
 

Plus de Dataiku (20)

Applied Data Science Part 3: Getting dirty; data preparation and feature crea...
Applied Data Science Part 3: Getting dirty; data preparation and feature crea...Applied Data Science Part 3: Getting dirty; data preparation and feature crea...
Applied Data Science Part 3: Getting dirty; data preparation and feature crea...
 
Applied Data Science Course Part 2: the data science workflow and basic model...
Applied Data Science Course Part 2: the data science workflow and basic model...Applied Data Science Course Part 2: the data science workflow and basic model...
Applied Data Science Course Part 2: the data science workflow and basic model...
 
Applied Data Science Course Part 1: Concepts & your first ML model
Applied Data Science Course Part 1: Concepts & your first ML modelApplied Data Science Course Part 1: Concepts & your first ML model
Applied Data Science Course Part 1: Concepts & your first ML model
 
How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
How to Build a Successful Data Team - Florian Douetteau (@Dataiku) How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
 
The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products
 
The US Healthcare Industry
The US Healthcare IndustryThe US Healthcare Industry
The US Healthcare Industry
 
How to Build Successful Data Team - Dataiku ?
How to Build Successful Data Team -  Dataiku ? How to Build Successful Data Team -  Dataiku ?
How to Build Successful Data Team - Dataiku ?
 
Before Kaggle : from a business goal to a Machine Learning problem
Before Kaggle : from a business goal to a Machine Learning problem Before Kaggle : from a business goal to a Machine Learning problem
Before Kaggle : from a business goal to a Machine Learning problem
 
04Juin2015_Symposium_Présentation_Coyote_Dataiku
04Juin2015_Symposium_Présentation_Coyote_Dataiku 04Juin2015_Symposium_Présentation_Coyote_Dataiku
04Juin2015_Symposium_Présentation_Coyote_Dataiku
 
Dataiku productive application to production - pap is may 2015
Dataiku    productive application to production - pap is may 2015 Dataiku    productive application to production - pap is may 2015
Dataiku productive application to production - pap is may 2015
 
Coyote & Dataiku - Séminaire Dixit GFII du 13 04-2015
Coyote & Dataiku - Séminaire Dixit GFII du 13 04-2015Coyote & Dataiku - Séminaire Dixit GFII du 13 04-2015
Coyote & Dataiku - Séminaire Dixit GFII du 13 04-2015
 
Dataiku - Big data paris 2015 - A Hybrid Platform, a Hybrid Team
Dataiku -  Big data paris 2015 - A Hybrid Platform, a Hybrid Team Dataiku -  Big data paris 2015 - A Hybrid Platform, a Hybrid Team
Dataiku - Big data paris 2015 - A Hybrid Platform, a Hybrid Team
 
The paradox of big data - dataiku / oxalide APEROTECH
The paradox of big data - dataiku / oxalide APEROTECHThe paradox of big data - dataiku / oxalide APEROTECH
The paradox of big data - dataiku / oxalide APEROTECH
 
OWF 2014 - Take back control of your Web tracking - Dataiku
OWF 2014 - Take back control of your Web tracking - DataikuOWF 2014 - Take back control of your Web tracking - Dataiku
OWF 2014 - Take back control of your Web tracking - Dataiku
 
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
 
Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...
Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...
Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Over...
 
Dataiku hadoop summit - semi-supervised learning with hadoop for understand...
Dataiku   hadoop summit - semi-supervised learning with hadoop for understand...Dataiku   hadoop summit - semi-supervised learning with hadoop for understand...
Dataiku hadoop summit - semi-supervised learning with hadoop for understand...
 
Dataiku big data paris - the rise of the hadoop ecosystem
Dataiku   big data paris - the rise of the hadoop ecosystemDataiku   big data paris - the rise of the hadoop ecosystem
Dataiku big data paris - the rise of the hadoop ecosystem
 
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
 
BreizhJUG - Janvier 2014 - Big Data - Dataiku - Pages Jaunes
BreizhJUG - Janvier 2014 - Big Data -  Dataiku - Pages JaunesBreizhJUG - Janvier 2014 - Big Data -  Dataiku - Pages Jaunes
BreizhJUG - Janvier 2014 - Big Data - Dataiku - Pages Jaunes
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Dernier (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 

The Rise of the DataOps - Dataiku - J On the Beach 2016

  • 1. The Role of the DevOps in the Data Analytics Teams J ON THE BEACH 05/21/16 MORPHEDWITH DEEP LEARNING™ TYPICAL OPSGUY (source: Reddit) TYPICAL YOUNGDATA SCIENTIST (source: Common Sense)
  • 2. My initial interests Type Systems Automated Proving Abstract Program Interpretation Functional Programming Garbage Collection and Vms Graph Analytics Chess IA Natural Language Processing 80% Emacs /20% VIM
  • 3. So to sum it up … I (USED TO?) TO BE A BIG NERD
  • 4. Collaboration CLICKERS CODERS Software is a Human Problem I ended up building A collaborative software For data science....
  • 6. Let’s get back to the (brief) history of DevOps Agile Conference, 2008 Scrum, and Agile in an operational context He ! We should have our own velocity in Belgium 10 deploysper day : Dev and Op Operation at Flickr O’Reilly Velocity, June 2009Patrick Dubois 2007 Dev Ops QA DevOpsDays Ghent, October 2009
  • 7. DevOps DevOps is the practice of operations and development engineers participating together in the entire service lifecycle, from design through the development process to production support. DevOps is also characterized by operations staff making use many of the same techniques as developers for their systems work. Invite Ops to the Dev Meeting Oh. And let them SPEAK Ops should know how to code
  • 8. Let’s take an example: John devops from 2009 Learnt Python the Hard Way Startedwith Puppet 1.0 Used EC2 before ELB and EBS !
  • 9. Hegelian perspective Conflict and Frustration Concept Combination Catharsis Create Culture Share Create Tools Dev + Ops
  • 10. There’s been op associated to data for a while ? It’s called Business Intelligence !
  • 11. History of Data Analytics (Oversimplified) 2013 2014 2015 2016 2017 2018 Moving to a world of automated decision making DATA FOR MORE INSIGHTS DATA FOR AUTOMATED DECISIONS
  • 12. The Age Of Distributed Intelligence Global, Personalised and Real Time Data Driven Services
  • 13. Data, Analytics and Data Science Conflict and Frustration Concept Combination Catharsis Create Culture Share Create Tools Data + Science
  • 15. Classic Business Intelligence Team Organization Business Leader Data Consumer Line-of-business Data Consumer Business Project Sponsor BI Solution Architect Model Designer ETL Developer Dashboard / Report Designer Specs Dim Big Boss
  • 16. Data Science Team Organization Business Leader Data Consumer Line-of-business Data Consumer Business Project Sponsor Data Engineer Data Analyst System Engineer / Data Architect Business Needs Data Scientist IT Constraints I.T.
  • 17. Is there room for a new role ? Data Plumberer Data Engineer Data Scientist Data Waiter Data Cleaner Data Analyst REAL JOB DREAM JOB DevOps For Data?
  • 18. Imagine a company building a new ”smart car” app: AutoFine™ ”Revolutionary Collaborative network that check the quality of your driving and punish You with virtual fines if you’re a bad driver”
  • 19. Imagine a company building a new ”smart car” service AutoFine™ 10 TB of Data Every Month Hive / Spark / Python 10 Different PredictiveModels Real-Time API / Workflow
  • 20. ???? ?? ?? OPERATIONS : Whose is responsible for … Check that the newly trained model perform as expected Check that the product catalog and the websitetags remain consistent Check that the Hadoop cluster scales as expected and as enough bandwidthto handlethe workload Test the performance for the real-time API Monitor the performanceof the model and decide to rollback / maintain/ rollout
  • 21.
  • 24. Create an API culture Do not share o Random Piece of Code o Flat File o Email Do share ü Reproductible documentedworkflows ü Clean, documentedAPIs
  • 25. Defensive Data Programming •Software has errors. •You are not your software, yet you are are responsible for the errors. •You can never remove the errors, only reduce their probability.
  • 26. Defensive Data Programming •Handle the case when one of the input file is empty •Handle the case when a new value appear •Handle the case when two columns become completely correlated •Handle the case when a column is 16k long •Etc.. Etc. etc…
  • 27. Monitoring : the alerts for people who love it • Performance …. • Time Spent … • Number of Errors …
  • 28. Monitoring : Business Informal Monitoring • % Opening • Market Spent • Exception User Events …
  • 29. Resource Allocation I’ve got this strange Error ”OutOfMemory” . Do you know what it is ? Why is the Hadoop Cluster going slower than my laptop ?
  • 30. The Philosophy of pre-allocating more resources than necessary
  • 31. Get to the latest package culture … Data Scientist I need the latest version of scikit And networkX …. And coud you repackage that To enable TensorFlow optimizations ? System Administrator …..
  • 32. The culture of containers Developers’ Sandbox
  • 34. Job Title : a matter of name, $$ and social ladder Data scientist Data Ops Developer Statistician Full Stack Developer Sys Admin DevOps
  • 35. Job Role : A matter of Do or Don’t DO DON’T Things you really want to do Things you really don’t want to get into
  • 36. FIGHT THE TOY PLATFORM ANTI-PATTERN Test and Invest in Infrastructure == Skilled People or Go For Cloud / Packaged Infrastructure Your Brand New Hadoop Cluster is perceived as slow, not so used and not reliable
  • 37. FIGHT THE TECHNO MISMATCH ANTI-PATTERN Assume Being Polyglot or Be a Dictator VS VS The Python Clan The R Tribe The Old Elephant Fraternity The New Elephant Club
  • 39. GETTINGDATA POLITICS THE FOX Hunt for Big Problem! Convince the CEO that you can Solve a Business Critical problem And use it as an excuse to get all The data you want ! THE SPIDER Create Network ! Create a set of trackers or Addictive Data Collection internally To get Data on your side !
  • 40. PREDICTIVE ANALYTICS DEPLOYMENT STRATEGY Website 2000’ winners Companies that were able to release fast "Artificial Intelligence with Data for Internet of Things" 2010’ winners Companies able to put intelligence in production ? Design a way to put “PREDITICTIVE MODELS” IN PRODUCTION
  • 41. OWN ANONYMISATION / PRIVACY / DATA SECURITY WITH PARTNERS ISSUES Technical Feasibility ? What can or cannot be done ?
  • 42. Let’s Wrap IT Up ! A Company Building a GPS powered automated car fine system 10 TB of Data Every Month Hive / Spark / Python 10 Different PredictiveModels Real-Time API / Workflow Robust Workflow With Data Quality Checks Functional Monitoring By Business People through Slack and Dashboards Monitoring for the API Feature Engineering Pipeline in Python
  • 43. But you where do you stand ? ???? ???? ???? ????? What's your roll-back strategy like? What kind of multi-variatetesting or strategies do you havein place for predictivemodels? How do you manage the robustness of your data flow productionscripts? How can businesspeople monitor the performance of the application?