SNOWCAMP 2020
DATAOPS n’est pas que DEVOPS appliqué aux
projets DATA !
Frédéric PETIT
Head Of Architecture
Head Of Data Department
MNT-VYV
Adrien BLIND
DataOps Evangelist
Docker Captain
SAAGIE
@madgicweb @AdrienBlind
@mutuelleMNT
@Groupe_VYV
@saagie_io
Kevin KELLY
“ The business plans
of the next 10.000
startups are easy to
forecast :
Take X and add AI”
Photo : @USIEvents
Source : ici
Rédacteur en chef et fondateur du magazine “Wired”
Approche de programmation “Produit Traditionnel”
Compute
Source
Code
Data
Output
FeedBack
Approche “Traditionnelle”
Approche de programmation d’un “Produit intelligent”
Compute
Source
Code
Data
Output
FeedBack
Compute
Training
Code
Labeled
Data
Model(s)
FeedBack
Approche “Machine Learning” Approche “Traditionnelle”
La “DATA” est la matière première du système !
Compute
Source
Code
Output
FeedBack
Compute
Training
Code
Labeled
Data
Model(s)
FeedBack
Data Factory
Data
Sources
Back-End
Data
Extraction
APIs
FeedBack
Data VIZRequest
Analytics
Consumer
Data
Code
DashboardsDashBoard
Code
SI Data vs SI Opérationnel
Compute
Source
Code
Output
FeedBack
Compute
Training
Code
Labeled
Data
Model(s)
FeedBack
Data Factory
Data
Sources
Back-End
Data
Extraction
APIs
FeedBack
Data VIZRequest
Analytics
Consumer
Data
Code
DashboardsDashBoard
Code
SI Data SI Opérationnel
DATA is the new oil - Clive Humby
DATA is the new oil - Clive Humby
Les “cultures” associées
Source
Code
Output
FeedBack
Training
Code
Labeled
Data
Model(s)
FeedBack
Data
Sources
Back-End
Data
Extraction
APIs
FeedBack
Data VIZRequest
Analytics
Consumer
Data
Code
DashboardsDashBoard
Code
MLOps
DevOps
DataOps
Les “cultures” associées
Source
Code
Output
FeedBack
Training
Code
Labeled
Data
Model(s)
FeedBack
Data
Sources
Back-End
Data
Extraction
APIs
FeedBack
Data VIZRequest
Analytics
Consumer
Data
Code
DashboardsDashBoard
Code
MLOps
DevOps
DataOps #1 Data
Engineers needs
pipelines to deliver
DATA
#2 Data
Scientists needs
pipelines to deliver
MODELS
#3 Developers
needs pipelines to
deliver
APPLICATION
Les pipelines associés
Develop Build Test ReleaseNeeds Deploy
APPLICATIO
N
Operated
Develop Training Test Evaluate
Extract Prepare Analyse Storage
Release
Publish
DATA
Exposed
MODELS
Optimized
Driven by :
Intelligence
Data
Capital
Le sujet, ce n’est pas le datalake, c’est le data PROCESSING
Datamarts,
Shared
Dataset(s)
Data processing
Consumers
Si la data est le nouvel or noir, alors :
● Vos datalakes sont vos nappes de pétrole, votre capital (grandes masses de données brutes)
● Hive/Impala & co. sont vos puits de pétrole, permettant de requêter la ressource
● Mais ce sont en fait les orchestrateurs, vos raffineries permettant de transformer le
capital en valeur d’usage
DATALAKE
Data storing: datalakes, object storage, ...
Extract Prepare Analyse Publish
Du batch processing au streaming et à l’event processing
Les pipelines associés
Develop Build Test ReleaseNeeds Deploy
APPLICATIO
N
Operated
Develop Training Test Evaluate
Extract Prepare Analyse Storage
Release
Publish
DATA
Exposed
MODELS
Optimized
Driven by :
Intelligence
Data
Capital
MLOps
DevOps
DataOps
Innovation Pipeline
Value Pipeline
Analysis Pipeline
Quelle culture holistique regroupe toutes ces initiatives ?
Develop Build Test ReleaseNeeds Deploy
Applicatio
n
Operated
Develop Training Test Evaluate
Extract Prepare Analyse Storage
Analys
is
ReleaseValue
Publish
Dataset
Exposed
Models
Optimized
Driven by :
MLOps
DevOps
DataOps
Innovation Pipeline
Value Pipeline
Analytics Pipeline
Faites vos propositions !
http://bit.ly/36eqHqL
On s’égare , revenons au DataOps !
Abordons le sujet de
DataOps en partant d’un
postulat :
Vous avez d’ores et déjà
mis en place la culture
DevOps dans votre
entreprise :)
Source : Giphy @ Snuls
Domaines de compétences techniques nécessaires
Develop Build Test ReleaseNeeds Deploy
APPLICATIO
N
Operated
Develop Training Test Evaluate
Extract Prepare Analyse Storage
Release
Publish
DATA
Exposed
MODELS
Optimized
Driven by :
Intelligence
Data
Capital
DataOpsMLOpsDevOps
Domaines de compétences opérationnelles nécessaires
DevOps
DEV OPS
BIZ
DataOps
DEV
OPSBIZ
Data
Scientist
Data
Engineer
Organisation de la culture DevOps
Tribe
Squad Squad Squad
Chapter Dev
Chapter Ops
Organisation de la culture DataOps qui semblerait naturelle
Tribe
Squad Squad Squad
Chapter Dev
Chapter ...
Chapter DATA
Organisation observée de la culture DataOps
DATA LABS
Tribe
Squad Squad Squad
Chapter Dev
Chapter ...
Squad
Data
Oriented
Squad
Data
Oriented
Organisation de la culture DataOps “post-maturation”
Tribe
Squad Squad Squad
Chapter Dev
Chapter ...
Chapter DATA
DATA LABS
Squad
Data
Oriented
Data Factory
CentralizedDistributed
Data Dictionnary
Data Extraction /
Lineage
Data Catalog
Data Exposition Data Processing
Data WareHouse /
Data Lake
Data Collection
Data Exploration &
Analysis tools
Data Viz
ML Code
ML Trainning
(Model)
Monitoring
Data Viz
Data Verification
Data Quality
Gouvernance /
Security
Modelization
Service
Presentation
Shadow Data
Human after all
TAKE AWAY
POWER (OF IA) IS NOTHING WITHOUT (DATA) CONTROL
Sans maîtrise (de la donnée) la puissance (de l’IA) n’est rien
Rencontrez-vous !
THE BEST
TAKE AWAY
@madgicweb @AdrienBlind
Frédéric
PETIT
Adrien
BLIND

DataOps introduction : DataOps is not only DevOps applied to data!

  • 1.
    SNOWCAMP 2020 DATAOPS n’estpas que DEVOPS appliqué aux projets DATA !
  • 2.
    Frédéric PETIT Head OfArchitecture Head Of Data Department MNT-VYV Adrien BLIND DataOps Evangelist Docker Captain SAAGIE @madgicweb @AdrienBlind @mutuelleMNT @Groupe_VYV @saagie_io
  • 3.
    Kevin KELLY “ Thebusiness plans of the next 10.000 startups are easy to forecast : Take X and add AI” Photo : @USIEvents Source : ici Rédacteur en chef et fondateur du magazine “Wired”
  • 4.
    Approche de programmation“Produit Traditionnel” Compute Source Code Data Output FeedBack Approche “Traditionnelle”
  • 5.
    Approche de programmationd’un “Produit intelligent” Compute Source Code Data Output FeedBack Compute Training Code Labeled Data Model(s) FeedBack Approche “Machine Learning” Approche “Traditionnelle”
  • 6.
    La “DATA” estla matière première du système ! Compute Source Code Output FeedBack Compute Training Code Labeled Data Model(s) FeedBack Data Factory Data Sources Back-End Data Extraction APIs FeedBack Data VIZRequest Analytics Consumer Data Code DashboardsDashBoard Code
  • 7.
    SI Data vsSI Opérationnel Compute Source Code Output FeedBack Compute Training Code Labeled Data Model(s) FeedBack Data Factory Data Sources Back-End Data Extraction APIs FeedBack Data VIZRequest Analytics Consumer Data Code DashboardsDashBoard Code SI Data SI Opérationnel
  • 8.
    DATA is thenew oil - Clive Humby
  • 9.
    DATA is thenew oil - Clive Humby
  • 10.
  • 11.
    Les “cultures” associées Source Code Output FeedBack Training Code Labeled Data Model(s) FeedBack Data Sources Back-End Data Extraction APIs FeedBack DataVIZRequest Analytics Consumer Data Code DashboardsDashBoard Code MLOps DevOps DataOps #1 Data Engineers needs pipelines to deliver DATA #2 Data Scientists needs pipelines to deliver MODELS #3 Developers needs pipelines to deliver APPLICATION
  • 12.
    Les pipelines associés DevelopBuild Test ReleaseNeeds Deploy APPLICATIO N Operated Develop Training Test Evaluate Extract Prepare Analyse Storage Release Publish DATA Exposed MODELS Optimized Driven by : Intelligence Data Capital
  • 13.
    Le sujet, cen’est pas le datalake, c’est le data PROCESSING Datamarts, Shared Dataset(s) Data processing Consumers Si la data est le nouvel or noir, alors : ● Vos datalakes sont vos nappes de pétrole, votre capital (grandes masses de données brutes) ● Hive/Impala & co. sont vos puits de pétrole, permettant de requêter la ressource ● Mais ce sont en fait les orchestrateurs, vos raffineries permettant de transformer le capital en valeur d’usage DATALAKE Data storing: datalakes, object storage, ... Extract Prepare Analyse Publish
  • 14.
    Du batch processingau streaming et à l’event processing
  • 15.
    Les pipelines associés DevelopBuild Test ReleaseNeeds Deploy APPLICATIO N Operated Develop Training Test Evaluate Extract Prepare Analyse Storage Release Publish DATA Exposed MODELS Optimized Driven by : Intelligence Data Capital MLOps DevOps DataOps Innovation Pipeline Value Pipeline Analysis Pipeline
  • 16.
    Quelle culture holistiqueregroupe toutes ces initiatives ? Develop Build Test ReleaseNeeds Deploy Applicatio n Operated Develop Training Test Evaluate Extract Prepare Analyse Storage Analys is ReleaseValue Publish Dataset Exposed Models Optimized Driven by : MLOps DevOps DataOps Innovation Pipeline Value Pipeline Analytics Pipeline Faites vos propositions ! http://bit.ly/36eqHqL
  • 17.
    On s’égare ,revenons au DataOps ! Abordons le sujet de DataOps en partant d’un postulat : Vous avez d’ores et déjà mis en place la culture DevOps dans votre entreprise :) Source : Giphy @ Snuls
  • 18.
    Domaines de compétencestechniques nécessaires Develop Build Test ReleaseNeeds Deploy APPLICATIO N Operated Develop Training Test Evaluate Extract Prepare Analyse Storage Release Publish DATA Exposed MODELS Optimized Driven by : Intelligence Data Capital DataOpsMLOpsDevOps
  • 19.
    Domaines de compétencesopérationnelles nécessaires DevOps DEV OPS BIZ DataOps DEV OPSBIZ Data Scientist Data Engineer
  • 20.
    Organisation de laculture DevOps Tribe Squad Squad Squad Chapter Dev Chapter Ops
  • 21.
    Organisation de laculture DataOps qui semblerait naturelle Tribe Squad Squad Squad Chapter Dev Chapter ... Chapter DATA
  • 22.
    Organisation observée dela culture DataOps DATA LABS Tribe Squad Squad Squad Chapter Dev Chapter ... Squad Data Oriented Squad Data Oriented
  • 23.
    Organisation de laculture DataOps “post-maturation” Tribe Squad Squad Squad Chapter Dev Chapter ... Chapter DATA DATA LABS Squad Data Oriented
  • 24.
    Data Factory CentralizedDistributed Data Dictionnary DataExtraction / Lineage Data Catalog Data Exposition Data Processing Data WareHouse / Data Lake Data Collection Data Exploration & Analysis tools Data Viz ML Code ML Trainning (Model) Monitoring Data Viz Data Verification Data Quality Gouvernance / Security Modelization Service Presentation
  • 25.
  • 26.
  • 27.
    TAKE AWAY POWER (OFIA) IS NOTHING WITHOUT (DATA) CONTROL Sans maîtrise (de la donnée) la puissance (de l’IA) n’est rien
  • 28.
    Rencontrez-vous ! THE BEST TAKEAWAY @madgicweb @AdrienBlind Frédéric PETIT Adrien BLIND