SlideShare une entreprise Scribd logo
1  sur  30
Télécharger pour lire hors ligne
A DATA SCIENTIST JOURNEY TO
INDUSTRIALIZATION OF MACHINE
LEARNING MODELS
DataXDay 2018
17th May 2018
@DataXDay
DATA SCIENCE
FOUNDATIONS FOR DATA SCIENCE AT AIR FRANCE
3
Adoption of Operations
Research for crew
scheduling
Extension to other
business domains:
Revenue Management,
Cargo, Ground
services, …
Adoption of
Hadoop
Focus on Machine
Learning
Ops Research is
now 120 engineers
in Paris and
Amsterdam
Adoption of data science within AFKL IT
was favored by existing Operations Research practice
@DataXDay
DATA SCIENCE
MACHINE LEARNING, SPONSORED BY ORGANIZATION
4
Organization, through Customer Data Management, is one of the key sponsors of
industrialized data science within AFKL
Customer
Data
Management
Customer data
strategy
Customer
knowledge
PersonalizationCoordinates IT efforts
@DataXDay
DATA SCIENCE
STARTING POINT FOR DATA SCIENCE PROJECT IS A POC LOGIC
DWH
Historical
Data
Business
Intelligence
LOCAL
Data
Sample
Proof of
Concept
5
@DataXDay
DATA SCIENCE
WHAT IS AN « INDUSTRIALIZED » ENGINE?
Jupyter notebook, R Executable package
On my own
Integrated within AFKL IT
live ecosystem
Manual launch or crontab
Automated calibration and
prediction
I guess my code is flawless Unit tested
Theoretical performance
Live feedback on
performance
6
@DataXDay
LOCAL
Data
Sample
Proof of
Concept
LIVE
Data feed
DATA SCIENCE
FROM LOCAL STUDIES… TO A ROBUST LIVE DATA PRODUCT
DWH
Historical
Data
Business
Intelligente
EXPLORATION
Historical
Data
Proof of
Concept
MODELS
Repository
Predictions
DATA API
7
@DataXDay
DATA SCIENTISTS X DATA ENGINEERS
Fellowship
@DataXDay
DATA SCIENTISTS X DATA ENGINEERS
IT TAKES TWO TO BRING DATA PRODUCTS LIVE (AT LEAST)
9
PoC
Start of
industrialization
Help!
How to ingest and
expose data?
Live
Product
V1
Translates
business ideas into
data science
Stats,
ML, AI
Data Scientist
Dev,
Big data,
project
architecture
Data Engineer
@DataXDay
DATA SCIENTISTS X DATA ENGINEERS
KEEP THE FRONTIER LOOSE
10
Data scientist and data engineer
are roles, not persons
Awareness of data scientist role on
live environments is key
@DataXDay
LIVE
Data feed
DATA SCIENTISTS X DATA ENGINEERS
A LIVE ECOSYSTEM
DWH
Historical
Data
Business
Intelligente
EXPLORATION
Historical
Data
Proof of
Concept
MODELS
Repository
Predictions
DATA API
11
@DataXDay
PACKAGING DATA SCIENCE
Spark and PEX
@DataXDay
PACKAGING DATA SCIENCE
WHAT DO YOU EXPECT?
13
Features
engineering
Algorithm « Model »
Model Training data
Trained
model
Trained
model
Prediction
data
Predictions
Setup
Train
Predict
We are expecting two main functionalities, training and predicting
@DataXDay
PACKAGING DATA SCIENCE
STANDARDIZATION WITH THE PIPELINE PATTERN
14
LogisticRegressionModel
.transform(dataset)
LogisticRegression
.fit(dataset)
Model training
Dataset
Dataset
+
Predictions
SQLTransformer VectorAssembler
Feature Engineering
Pipeline Model
@DataXDay
PACKAGING DATA SCIENCE
PEX, JUST LIKE UBERJAR
15
PEX
Project
package
External
packages
Company
packages
Company
packages
Company
packages
Company
packages
External
packages
External
packages
External
packages
@DataXDay
LIVE
Data feed
PACKAGING DATA SCIENCE
A LIVE ECOSYSTEM
DWH
Historical
Data
Business
Intelligente
EXPLORATION
Historical
Data
Proof of
Concept
MODELS
Repository
Predictions
DATA
16
API
@DataXDay
LIVE
Data feed
PACKAGING DATA SCIENCE
A LIVE ECOSYSTEM… BUT TRAINING DATA AND LIVE DATA ARE DIFFERENT
DWH
Historical
Data
Business
Intelligente
EXPLORATION
Historical
Data
Proof of
Concept
MODELS
Repository
Predictions
DATA
17
API
@DataXDay
FROM DWH TO DATALAKE
A detour
@DataXDay
FROM DWH TO DATALAKE
TRAINING DATA MUST BE THE SAME AS PRODUCTION
• Data warehouse has a full historical data
• Production platform processes just what is
needed from raw data for live apps
• Data processing on both side are not
identical
• Production platform has to create a full
historical data
19
@DataXDay
LIVE
Data feed
FROM DWH TO DATALAKE
FROM A HISTORICAL/LIVE SYSTEM
DWH
Historical
Data
Business
Intelligente
EXPLORATION
Historical
Data
Proof of
Concept
MODELS
Repository
Predictions
DATA API
20
@DataXDay
LIVE
FROM DWH TO DATALAKE
TO A FULL LIVE SYSTEM
EXPLORATION
Historical
Data
Proof of
Concept
Predictions
DATA
21
Data feed Historical
Data
API
MODELS
Repository
@DataXDay
CONTINUOUS IMPROVEMENT
Growing up
22
@DataXDay
CONTINUOUS IMPROVEMENT
FROM BUD TO FLOWER
• Ease to deploy new model
• Ease to extract new feature
• Ease to access new data
• Stay innovative
• Time To Market
23
@DataXDay
CONTINUOUS IMPROVEMENT
CRAFTSMANSHIP FROM DATA SCIENTIST SIDE
24
@DataXDay
Goal
Make sure each code modification is
not breaking anything
What to do ?
Regularly fetch sources, build project
and run tests
Needs
Tools to automate all tedious
and repetitive tasks
Because we are lazy
CONTINUOUS IMPROVEMENT
CONTINUOUS INTEGRATION
25
@DataXDay
CDCIDevelopment
CONTINUOUS IMPROVEMENT
DATA SCIENTIST - SOFTWARE FACTORY
26
Exploration
Build PEX Expose PEX for
other IT teams
@DataXDay
CONTINUOUS IMPROVEMENT
TRACK MODEL VERSIONING
• Calibration meta data
• Dataset used
• Timestamp + Code version
• Keep track between models and
predictions
• Model used
• Unique ID of prediction
• Input dataset
27
@DataXDay
LIVE
CONTINUOUS IMPROVEMENT
FEEDBACK LOOP
EXPLORATION
Historical
Data
Proof of
Concept
MODELS
Repository
Predictions
DATA
28
Data feed Historical
Data
API
feedback
Metrics
@DataXDay
NEXT STEP
Improve and share best practices
@DataXDay
NEXT STEP
TOO MANY JOURNEYS
• How to maintain the momentum, after few
teams started the adventure ?
• Every teams experienced a different
journey
• But every teams find different paths
30
DataXDay - A data scientist journey to industrialization of machine learning

Contenu connexe

Tendances

WF ED 540, Class Meeting 7, 8 October 2015
WF ED 540, Class Meeting 7, 8 October 2015WF ED 540, Class Meeting 7, 8 October 2015
WF ED 540, Class Meeting 7, 8 October 2015Penn State University
 
DataTalks #4: Построение хранилища данных на основе платформы hadoop / Игорь ...
DataTalks #4: Построение хранилища данных на основе платформы hadoop / Игорь ...DataTalks #4: Построение хранилища данных на основе платформы hadoop / Игорь ...
DataTalks #4: Построение хранилища данных на основе платформы hadoop / Игорь ...WG_ Events
 
Collected List of Business Intelligence Software
Collected List of Business Intelligence SoftwareCollected List of Business Intelligence Software
Collected List of Business Intelligence SoftwareMaurice Dawson
 
Rule-based dispatching of events to a serverless services armada
Rule-based dispatching of events to a serverless services armadaRule-based dispatching of events to a serverless services armada
Rule-based dispatching of events to a serverless services armadaDaniel Buchholz
 
2013 DataCite Summer Meeting - Figshare (Mark Hahnel - Figshare)
2013 DataCite Summer Meeting - Figshare (Mark Hahnel - Figshare)2013 DataCite Summer Meeting - Figshare (Mark Hahnel - Figshare)
2013 DataCite Summer Meeting - Figshare (Mark Hahnel - Figshare)datacite
 
Big data and hadoop lightining talk
Big data and hadoop   lightining talkBig data and hadoop   lightining talk
Big data and hadoop lightining talkEsther Kundin
 
Webinar: SpagoBI & Big Data, a smart approach to turn data into knowledge
Webinar: SpagoBI & Big Data, a smart approach to turn data into knowledge Webinar: SpagoBI & Big Data, a smart approach to turn data into knowledge
Webinar: SpagoBI & Big Data, a smart approach to turn data into knowledge SpagoWorld
 
Cloud computing major project
Cloud computing major projectCloud computing major project
Cloud computing major projectayk115
 
Big Data Analytics @ Munich Re - VIII. International Istanbul Insurance Confe...
Big Data Analytics @ Munich Re - VIII. International Istanbul Insurance Confe...Big Data Analytics @ Munich Re - VIII. International Istanbul Insurance Confe...
Big Data Analytics @ Munich Re - VIII. International Istanbul Insurance Confe...SigortaTatbikatcilariDernegi
 
ICIC 2017: New product presentation minesoft
ICIC 2017: New product presentation minesoftICIC 2017: New product presentation minesoft
ICIC 2017: New product presentation minesoftDr. Haxel Consult
 
How to Create the Google for Earth Data (XLDB 2015, Stanford)
How to Create the Google for Earth Data (XLDB 2015, Stanford)How to Create the Google for Earth Data (XLDB 2015, Stanford)
How to Create the Google for Earth Data (XLDB 2015, Stanford)Rainer Sternfeld
 
Airline traffic management analysis
Airline traffic management analysisAirline traffic management analysis
Airline traffic management analysisSumit Mendiratta
 
OVH Analytics Data Compute and Apache Spark as a Service
OVH Analytics Data Compute and Apache Spark as a ServiceOVH Analytics Data Compute and Apache Spark as a Service
OVH Analytics Data Compute and Apache Spark as a ServiceMojtaba Imani
 
Metadata catalogues survey results, EOSCpilot H2020 EU project
Metadata catalogues survey results, EOSCpilot H2020 EU projectMetadata catalogues survey results, EOSCpilot H2020 EU project
Metadata catalogues survey results, EOSCpilot H2020 EU projectMassimiliano Assante
 
VINEET_ANAND_CV_HADOOP_VA_V3
VINEET_ANAND_CV_HADOOP_VA_V3VINEET_ANAND_CV_HADOOP_VA_V3
VINEET_ANAND_CV_HADOOP_VA_V3Vineet Anand
 
SpagoBI and Big Data: next Open Source Information Management suite, OW2con'1...
SpagoBI and Big Data: next Open Source Information Management suite, OW2con'1...SpagoBI and Big Data: next Open Source Information Management suite, OW2con'1...
SpagoBI and Big Data: next Open Source Information Management suite, OW2con'1...OW2
 
Real Time Reporting Platform
Real Time Reporting PlatformReal Time Reporting Platform
Real Time Reporting PlatformKyle Burke
 

Tendances (20)

new-D2
new-D2new-D2
new-D2
 
SciDB
SciDBSciDB
SciDB
 
WF ED 540, Class Meeting 7, 8 October 2015
WF ED 540, Class Meeting 7, 8 October 2015WF ED 540, Class Meeting 7, 8 October 2015
WF ED 540, Class Meeting 7, 8 October 2015
 
DataTalks #4: Построение хранилища данных на основе платформы hadoop / Игорь ...
DataTalks #4: Построение хранилища данных на основе платформы hadoop / Игорь ...DataTalks #4: Построение хранилища данных на основе платформы hadoop / Игорь ...
DataTalks #4: Построение хранилища данных на основе платформы hadoop / Игорь ...
 
ProteomeXchange update
ProteomeXchange updateProteomeXchange update
ProteomeXchange update
 
Collected List of Business Intelligence Software
Collected List of Business Intelligence SoftwareCollected List of Business Intelligence Software
Collected List of Business Intelligence Software
 
Rule-based dispatching of events to a serverless services armada
Rule-based dispatching of events to a serverless services armadaRule-based dispatching of events to a serverless services armada
Rule-based dispatching of events to a serverless services armada
 
2013 DataCite Summer Meeting - Figshare (Mark Hahnel - Figshare)
2013 DataCite Summer Meeting - Figshare (Mark Hahnel - Figshare)2013 DataCite Summer Meeting - Figshare (Mark Hahnel - Figshare)
2013 DataCite Summer Meeting - Figshare (Mark Hahnel - Figshare)
 
Big data and hadoop lightining talk
Big data and hadoop   lightining talkBig data and hadoop   lightining talk
Big data and hadoop lightining talk
 
Webinar: SpagoBI & Big Data, a smart approach to turn data into knowledge
Webinar: SpagoBI & Big Data, a smart approach to turn data into knowledge Webinar: SpagoBI & Big Data, a smart approach to turn data into knowledge
Webinar: SpagoBI & Big Data, a smart approach to turn data into knowledge
 
Cloud computing major project
Cloud computing major projectCloud computing major project
Cloud computing major project
 
Big Data Analytics @ Munich Re - VIII. International Istanbul Insurance Confe...
Big Data Analytics @ Munich Re - VIII. International Istanbul Insurance Confe...Big Data Analytics @ Munich Re - VIII. International Istanbul Insurance Confe...
Big Data Analytics @ Munich Re - VIII. International Istanbul Insurance Confe...
 
ICIC 2017: New product presentation minesoft
ICIC 2017: New product presentation minesoftICIC 2017: New product presentation minesoft
ICIC 2017: New product presentation minesoft
 
How to Create the Google for Earth Data (XLDB 2015, Stanford)
How to Create the Google for Earth Data (XLDB 2015, Stanford)How to Create the Google for Earth Data (XLDB 2015, Stanford)
How to Create the Google for Earth Data (XLDB 2015, Stanford)
 
Airline traffic management analysis
Airline traffic management analysisAirline traffic management analysis
Airline traffic management analysis
 
OVH Analytics Data Compute and Apache Spark as a Service
OVH Analytics Data Compute and Apache Spark as a ServiceOVH Analytics Data Compute and Apache Spark as a Service
OVH Analytics Data Compute and Apache Spark as a Service
 
Metadata catalogues survey results, EOSCpilot H2020 EU project
Metadata catalogues survey results, EOSCpilot H2020 EU projectMetadata catalogues survey results, EOSCpilot H2020 EU project
Metadata catalogues survey results, EOSCpilot H2020 EU project
 
VINEET_ANAND_CV_HADOOP_VA_V3
VINEET_ANAND_CV_HADOOP_VA_V3VINEET_ANAND_CV_HADOOP_VA_V3
VINEET_ANAND_CV_HADOOP_VA_V3
 
SpagoBI and Big Data: next Open Source Information Management suite, OW2con'1...
SpagoBI and Big Data: next Open Source Information Management suite, OW2con'1...SpagoBI and Big Data: next Open Source Information Management suite, OW2con'1...
SpagoBI and Big Data: next Open Source Information Management suite, OW2con'1...
 
Real Time Reporting Platform
Real Time Reporting PlatformReal Time Reporting Platform
Real Time Reporting Platform
 

Similaire à DataXDay - A data scientist journey to industrialization of machine learning

How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...Denodo
 
Achieving Business Value by Fusing Hadoop and Corporate Data
Achieving Business Value by Fusing Hadoop and Corporate DataAchieving Business Value by Fusing Hadoop and Corporate Data
Achieving Business Value by Fusing Hadoop and Corporate DataInside Analysis
 
Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Denodo
 
How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?OVHcloud
 
Big data presentation, explanations and use cases in industrial sector
Big data presentation, explanations and use cases in industrial sectorBig data presentation, explanations and use cases in industrial sector
Big data presentation, explanations and use cases in industrial sectorNicolas Sarramagna
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationDenodo
 
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...Jürgen Ambrosi
 
Artificial Intelligence and Machine Learning with the Oracle Data Science Cloud
Artificial Intelligence and Machine Learning with the Oracle Data Science CloudArtificial Intelligence and Machine Learning with the Oracle Data Science Cloud
Artificial Intelligence and Machine Learning with the Oracle Data Science CloudJuarez Junior
 
Automatisierte Provisionierung einer Data Lab Umgebung für Data Scientists
Automatisierte Provisionierung einer Data Lab Umgebung für Data ScientistsAutomatisierte Provisionierung einer Data Lab Umgebung für Data Scientists
Automatisierte Provisionierung einer Data Lab Umgebung für Data ScientistsFabian Hardt
 
From Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data ScienceFrom Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data ScienceInstitute of Contemporary Sciences
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataPentaho
 
Industrial Data Space Association - New Members, New Insights, New Future Dir...
Industrial Data Space Association - New Members, New Insights, New Future Dir...Industrial Data Space Association - New Members, New Insights, New Future Dir...
Industrial Data Space Association - New Members, New Insights, New Future Dir...Thorsten Huelsmann
 
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data PlatformsData Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data PlatformsAnant Corporation
 
The Study of the Large Scale Twitter on Machine Learning
The Study of the Large Scale Twitter on Machine LearningThe Study of the Large Scale Twitter on Machine Learning
The Study of the Large Scale Twitter on Machine LearningIRJET Journal
 
What is the future of data strategy?
What is the future of data strategy?What is the future of data strategy?
What is the future of data strategy?Denodo
 
Unlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and ClouderaUnlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and ClouderaCloudera, Inc.
 
2019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 42019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 4Ferdin Joe John Joseph PhD
 
Transforming Business in a Digital Era with Big Data and Microsoft
Transforming Business in a Digital Era with Big Data and MicrosoftTransforming Business in a Digital Era with Big Data and Microsoft
Transforming Business in a Digital Era with Big Data and MicrosoftPerficient, Inc.
 

Similaire à DataXDay - A data scientist journey to industrialization of machine learning (20)

How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
 
Achieving Business Value by Fusing Hadoop and Corporate Data
Achieving Business Value by Fusing Hadoop and Corporate DataAchieving Business Value by Fusing Hadoop and Corporate Data
Achieving Business Value by Fusing Hadoop and Corporate Data
 
Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)
 
How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?
 
Big data presentation, explanations and use cases in industrial sector
Big data presentation, explanations and use cases in industrial sectorBig data presentation, explanations and use cases in industrial sector
Big data presentation, explanations and use cases in industrial sector
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
 
Artificial Intelligence and Machine Learning with the Oracle Data Science Cloud
Artificial Intelligence and Machine Learning with the Oracle Data Science CloudArtificial Intelligence and Machine Learning with the Oracle Data Science Cloud
Artificial Intelligence and Machine Learning with the Oracle Data Science Cloud
 
Automatisierte Provisionierung einer Data Lab Umgebung für Data Scientists
Automatisierte Provisionierung einer Data Lab Umgebung für Data ScientistsAutomatisierte Provisionierung einer Data Lab Umgebung für Data Scientists
Automatisierte Provisionierung einer Data Lab Umgebung für Data Scientists
 
From Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data ScienceFrom Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data Science
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
 
Industrial Data Space Association - New Members, New Insights, New Future Dir...
Industrial Data Space Association - New Members, New Insights, New Future Dir...Industrial Data Space Association - New Members, New Insights, New Future Dir...
Industrial Data Space Association - New Members, New Insights, New Future Dir...
 
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data PlatformsData Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms
 
The Study of the Large Scale Twitter on Machine Learning
The Study of the Large Scale Twitter on Machine LearningThe Study of the Large Scale Twitter on Machine Learning
The Study of the Large Scale Twitter on Machine Learning
 
What is the future of data strategy?
What is the future of data strategy?What is the future of data strategy?
What is the future of data strategy?
 
Unlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and ClouderaUnlocking data science in the enterprise - with Oracle and Cloudera
Unlocking data science in the enterprise - with Oracle and Cloudera
 
2019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 42019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 4
 
Ibisa platform EN
Ibisa platform ENIbisa platform EN
Ibisa platform EN
 
Oracle big data publix sector 1
Oracle big data publix sector 1Oracle big data publix sector 1
Oracle big data publix sector 1
 
Transforming Business in a Digital Era with Big Data and Microsoft
Transforming Business in a Digital Era with Big Data and MicrosoftTransforming Business in a Digital Era with Big Data and Microsoft
Transforming Business in a Digital Era with Big Data and Microsoft
 

Plus de DataXDay Conference by Xebia

DataXDay - Exploring graphs: looking for communities & leaders
DataXDay - Exploring graphs: looking for communities & leadersDataXDay - Exploring graphs: looking for communities & leaders
DataXDay - Exploring graphs: looking for communities & leadersDataXDay Conference by Xebia
 
DataXDay - The wonders of deep learning: how to leverage it for natural langu...
DataXDay - The wonders of deep learning: how to leverage it for natural langu...DataXDay - The wonders of deep learning: how to leverage it for natural langu...
DataXDay - The wonders of deep learning: how to leverage it for natural langu...DataXDay Conference by Xebia
 
DataXDay - Building a Real Time Analytics API at Scale
DataXDay - Building a Real Time Analytics API at ScaleDataXDay - Building a Real Time Analytics API at Scale
DataXDay - Building a Real Time Analytics API at ScaleDataXDay Conference by Xebia
 
DataXDay - Machine learning models at scale with Amazon SageMaker
DataXDay - Machine learning models at scale with Amazon SageMaker DataXDay - Machine learning models at scale with Amazon SageMaker
DataXDay - Machine learning models at scale with Amazon SageMaker DataXDay Conference by Xebia
 

Plus de DataXDay Conference by Xebia (6)

DataXDay - Exploring graphs: looking for communities & leaders
DataXDay - Exploring graphs: looking for communities & leadersDataXDay - Exploring graphs: looking for communities & leaders
DataXDay - Exploring graphs: looking for communities & leaders
 
DataXDay - The wonders of deep learning: how to leverage it for natural langu...
DataXDay - The wonders of deep learning: how to leverage it for natural langu...DataXDay - The wonders of deep learning: how to leverage it for natural langu...
DataXDay - The wonders of deep learning: how to leverage it for natural langu...
 
DataXDay - Real-Time Access log analysis
DataXDay - Real-Time Access log analysis DataXDay - Real-Time Access log analysis
DataXDay - Real-Time Access log analysis
 
DataXDay - Tensors in the sky with CloudML
DataXDay - Tensors in the sky with CloudML DataXDay - Tensors in the sky with CloudML
DataXDay - Tensors in the sky with CloudML
 
DataXDay - Building a Real Time Analytics API at Scale
DataXDay - Building a Real Time Analytics API at ScaleDataXDay - Building a Real Time Analytics API at Scale
DataXDay - Building a Real Time Analytics API at Scale
 
DataXDay - Machine learning models at scale with Amazon SageMaker
DataXDay - Machine learning models at scale with Amazon SageMaker DataXDay - Machine learning models at scale with Amazon SageMaker
DataXDay - Machine learning models at scale with Amazon SageMaker
 

Dernier

Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewDianaGray10
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsLeah Henrickson
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfdanishmna97
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...FIDO Alliance
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityVictorSzoltysek
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...caitlingebhard1
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxFIDO Alliance
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxMarkSteadman7
 
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...SOFTTECHHUB
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Paige Cruz
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governanceWSO2
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceIES VE
 

Dernier (20)

Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
 

DataXDay - A data scientist journey to industrialization of machine learning

  • 1. A DATA SCIENTIST JOURNEY TO INDUSTRIALIZATION OF MACHINE LEARNING MODELS DataXDay 2018 17th May 2018
  • 2. @DataXDay DATA SCIENCE FOUNDATIONS FOR DATA SCIENCE AT AIR FRANCE 3 Adoption of Operations Research for crew scheduling Extension to other business domains: Revenue Management, Cargo, Ground services, … Adoption of Hadoop Focus on Machine Learning Ops Research is now 120 engineers in Paris and Amsterdam Adoption of data science within AFKL IT was favored by existing Operations Research practice
  • 3. @DataXDay DATA SCIENCE MACHINE LEARNING, SPONSORED BY ORGANIZATION 4 Organization, through Customer Data Management, is one of the key sponsors of industrialized data science within AFKL Customer Data Management Customer data strategy Customer knowledge PersonalizationCoordinates IT efforts
  • 4. @DataXDay DATA SCIENCE STARTING POINT FOR DATA SCIENCE PROJECT IS A POC LOGIC DWH Historical Data Business Intelligence LOCAL Data Sample Proof of Concept 5
  • 5. @DataXDay DATA SCIENCE WHAT IS AN « INDUSTRIALIZED » ENGINE? Jupyter notebook, R Executable package On my own Integrated within AFKL IT live ecosystem Manual launch or crontab Automated calibration and prediction I guess my code is flawless Unit tested Theoretical performance Live feedback on performance 6
  • 6. @DataXDay LOCAL Data Sample Proof of Concept LIVE Data feed DATA SCIENCE FROM LOCAL STUDIES… TO A ROBUST LIVE DATA PRODUCT DWH Historical Data Business Intelligente EXPLORATION Historical Data Proof of Concept MODELS Repository Predictions DATA API 7
  • 7. @DataXDay DATA SCIENTISTS X DATA ENGINEERS Fellowship
  • 8. @DataXDay DATA SCIENTISTS X DATA ENGINEERS IT TAKES TWO TO BRING DATA PRODUCTS LIVE (AT LEAST) 9 PoC Start of industrialization Help! How to ingest and expose data? Live Product V1 Translates business ideas into data science Stats, ML, AI Data Scientist Dev, Big data, project architecture Data Engineer
  • 9. @DataXDay DATA SCIENTISTS X DATA ENGINEERS KEEP THE FRONTIER LOOSE 10 Data scientist and data engineer are roles, not persons Awareness of data scientist role on live environments is key
  • 10. @DataXDay LIVE Data feed DATA SCIENTISTS X DATA ENGINEERS A LIVE ECOSYSTEM DWH Historical Data Business Intelligente EXPLORATION Historical Data Proof of Concept MODELS Repository Predictions DATA API 11
  • 12. @DataXDay PACKAGING DATA SCIENCE WHAT DO YOU EXPECT? 13 Features engineering Algorithm « Model » Model Training data Trained model Trained model Prediction data Predictions Setup Train Predict We are expecting two main functionalities, training and predicting
  • 13. @DataXDay PACKAGING DATA SCIENCE STANDARDIZATION WITH THE PIPELINE PATTERN 14 LogisticRegressionModel .transform(dataset) LogisticRegression .fit(dataset) Model training Dataset Dataset + Predictions SQLTransformer VectorAssembler Feature Engineering Pipeline Model
  • 14. @DataXDay PACKAGING DATA SCIENCE PEX, JUST LIKE UBERJAR 15 PEX Project package External packages Company packages Company packages Company packages Company packages External packages External packages External packages
  • 15. @DataXDay LIVE Data feed PACKAGING DATA SCIENCE A LIVE ECOSYSTEM DWH Historical Data Business Intelligente EXPLORATION Historical Data Proof of Concept MODELS Repository Predictions DATA 16 API
  • 16. @DataXDay LIVE Data feed PACKAGING DATA SCIENCE A LIVE ECOSYSTEM… BUT TRAINING DATA AND LIVE DATA ARE DIFFERENT DWH Historical Data Business Intelligente EXPLORATION Historical Data Proof of Concept MODELS Repository Predictions DATA 17 API
  • 17. @DataXDay FROM DWH TO DATALAKE A detour
  • 18. @DataXDay FROM DWH TO DATALAKE TRAINING DATA MUST BE THE SAME AS PRODUCTION • Data warehouse has a full historical data • Production platform processes just what is needed from raw data for live apps • Data processing on both side are not identical • Production platform has to create a full historical data 19
  • 19. @DataXDay LIVE Data feed FROM DWH TO DATALAKE FROM A HISTORICAL/LIVE SYSTEM DWH Historical Data Business Intelligente EXPLORATION Historical Data Proof of Concept MODELS Repository Predictions DATA API 20
  • 20. @DataXDay LIVE FROM DWH TO DATALAKE TO A FULL LIVE SYSTEM EXPLORATION Historical Data Proof of Concept Predictions DATA 21 Data feed Historical Data API MODELS Repository
  • 22. @DataXDay CONTINUOUS IMPROVEMENT FROM BUD TO FLOWER • Ease to deploy new model • Ease to extract new feature • Ease to access new data • Stay innovative • Time To Market 23
  • 24. @DataXDay Goal Make sure each code modification is not breaking anything What to do ? Regularly fetch sources, build project and run tests Needs Tools to automate all tedious and repetitive tasks Because we are lazy CONTINUOUS IMPROVEMENT CONTINUOUS INTEGRATION 25
  • 25. @DataXDay CDCIDevelopment CONTINUOUS IMPROVEMENT DATA SCIENTIST - SOFTWARE FACTORY 26 Exploration Build PEX Expose PEX for other IT teams
  • 26. @DataXDay CONTINUOUS IMPROVEMENT TRACK MODEL VERSIONING • Calibration meta data • Dataset used • Timestamp + Code version • Keep track between models and predictions • Model used • Unique ID of prediction • Input dataset 27
  • 27. @DataXDay LIVE CONTINUOUS IMPROVEMENT FEEDBACK LOOP EXPLORATION Historical Data Proof of Concept MODELS Repository Predictions DATA 28 Data feed Historical Data API feedback Metrics
  • 28. @DataXDay NEXT STEP Improve and share best practices
  • 29. @DataXDay NEXT STEP TOO MANY JOURNEYS • How to maintain the momentum, after few teams started the adventure ? • Every teams experienced a different journey • But every teams find different paths 30