SlideShare une entreprise Scribd logo
1  sur  39
Télécharger pour lire hors ligne
1
Data Science for Business Managers
Akın Osman Kazakçı
MINES ParisTech
Balazs Kégl
Ecole Polytechnique, CNRS
2
External
Data
Database
X
PredictionEngine
Visualisation
Automated
actions
Notifications
The value of data is revealed through prediction.
At the heart of the digital
transformation lies the “data”
Levels of transformation
through data
• reporting: what happened in the past? (reflection)
• dashboards and real time monitoring: what is
happening now? (reactivity)
• prediction: what will happen next? (pro-activity)
How can we accelerate a digital
transformation process by leveraging
data?
5
Building value-driven data projects
What knowledge would increase our profits?
The following questions need to be answered
in this order:
What data do we need to collect?
What ML methods are appropriate?
• Are standard innovation methodologies fit for digital
transformation projects?
• (Can we CK this?)
6
Discussion
• Do I have all the relevant data? 

Strategic data watch: is there any new source of data I can
use?
• Do I have best predictive accuracy?

How do I make sure that I’m working with the best
possible predictive models?
7
Two key aspects
8
Data hunt
• Do I have all the relevant data?
9
Exercice: Data hunt
During your transition to predictive analytics, you may need
to update your databases: to include more variables with
potential explicative power
•A travel IT systems company has some
air traffic / passenger data.
•They are interested in predicting
passenger flux between 20 airports in
US.
•Data for 720 days, for each pair of
airports.
•So,“one” variable.
•How can we augment this dataset?
•Which variables can be added?
•Where can we find the data?
Potential	sources	for	relevant	factors
K1 Events
K2 Plane
accidents
K3 Calendar
K4 Delays
causes
K5 Alternative
transportation
K6 Safety
K7 Data on
airports
K8 Similar data
K9 Oil price
K10 Average
domestic air
fares
K11 Town’s
population
K12 Town’s
attractiveness
20+ participants (students), analysed byYohann Sitruk
11
Model quality
• Do I have best predictive accuracy?
1. Train & test paradigm
2. Prediction error and quality metrics
3. ROI in data science projects
12
Plan
• …involves a great deal of trial and error
• little if any theory-based, model-based design
• even research (development of new algorithms) is (mostly) trial
and error
• the data scientist’s best friend is a well-designed experimental
studio for facilitating fast iterations of
•How can we control the quality of the ensuing model?
13
Building a data science model
• Data-driven predictors should work well on future
(unseen) data
• use historical data to select and fit a model, then use the model to
make predictions on new data
• but we only have historical data: how do we “simulate” past and future
on existing data?
14
Train & test paradigm
15
Train & test paradigm
Data
Train
Test
Develop a model
on training set
Test the model on
the test set
Change the test
set
16
Train & test paradigm
Data
Train
Test
Cycling through the
data in this manner is
called cross-validation
This is a powerful
and important
concept for building
robust models
Question
• Assume your management has decided to outsource your
predictive model building activity.
• How would you evaluate various partners?
1. Train & test paradigm
2. Prediction error and (quality) metrics
3. ROI in data science projects
18
Plan
Back to classification
Modèles Standards
Simple linear model,
Many red and blue
items are misclassified
A complex non linear
model, better
separation of data
(again)
What would be a suitable metric that characterises model
performance in the above case?
Prediction error
Modèles Standards
Number of misclassified points
(red or blue) ?M1
M2
According to this criteria M1
seems worst than M2
Assuming both models
avoid over/under-fitting
(is this the case here?)
A list of metrics from SciKit
Learn
(a widely used ML software
library)
Choice of the metric is
important. Ideally, it should
be tied to a business
objective.
Model performance
Model performance
- a simple case -
Two basic notions:
- False positives
- False negatives
Ex
1. the model predicts cancer for a patient who does not have
cancer
2. the model predicts a patient does not have cancer while
she actually has
See that the cost of these errors are not identical. This is true in most cases.
Can you give other examples?
1. Train & test paradigm
2. Prediction error and (quality) metrics
3. ROI in data science projects
23
Plan
Calculating ROI for improving
predictive accuracy
Think about ad targeting and companies such as Assume, for the
sake of example, the following (fictitious) figures.
The company monitors 100 million page loads per hour by internet users. Within
the short duration of loading the company should predict whether the user will
click on an advertisement.
Company pays 0.10$ for showing the advertisement on the dedicated zone of the
page. It makes, 0.17$ if the user clicks on the ad. How does the model
performance affects profitability?
Assume the model causes 5% false positives and 10% false negatives over 100
million predictions.
17 million mauvaise prédictions - par heure!
Le cout des FPs: 100M x 0.05 x 0.10$ = 500, 000 euros
Le cout des FNs: 100M x 0.10 x 0.07$ = 700, 000 euros
The previous example was for (binary) classification
Calculating ROI for improving
predictive accuracy
What happens in case of “regression”?
Example: Predicting remaining lifetime of devices
How to improve predictive
accuracy?
How to reach best predictive accuracy?
Customer Analytics
- Churn
- Pricing
- Lead scoring
- Credit scoring
- Up&cross-sales
Risk & Production
- Fraud / insurance
- Compliance
- Safety analysis
- Cyber-security
- Manufacturing
Operations
- Maintenance
- Fault analysis
- Logistics
- HR
- Procurement
Better Predictions = More Value
Integrating & Increasing data science
capabilities is hard
Finance Sales Marketing
Engineering
Purchasing HR Accounting
Manufacturing
Planning IT DSR&D
- Skill gap: Shortage of data scientists, Not enough skilled people, PhDs are expensive & high
demand, (McKinsey, 2016), unawareness of latest techniques and experimental methods
- Development gap: Lack of adapted infrastructures and systems, limited resources & time, lack of
management practices and appropriate experimental tools
- Deployment gap: It takes months to go from development to deployment, by the time a model is
ready to be deployed in production, the world has changed (distribution shifts; 78% of companies
has no automated procedures, 50% recode from scratch, Dataiku Production Survey Report)
Main obstacles:
Most companies operate with under-performing models
Ex. %10 improvement in sales prediction = %1 decrease in
stock out = 100M€ increase in sales for a retail giant
28
Developing a predictive model is an
experimental process
- Linear Regression
- Logistic Regression
- DecisionTree
- SVM
- Naive Bayes
- KNN
- K-Means
- Random Forest
- Dimensionality Reduction
Algorithms
- Gradient Boost & Adaboost
- …
ML algorithms ML has produced a large variety
of algorithms
each of which has tunable
parameters
The number of such
(hyper)parameters can vary
anywhere from 1 to ~100
Trying every combination
is not possible
B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
20% improvement over
the baseline model used
by physicists (from 3.2 to
3.8) in detecting Higgs
particles
14
A particular instrument for extending the
“search” for best model is crowdsourcing
Hundreds of models
produced and tested by
the participants
30
RAPID ANALYTICS AND MODEL PROTOTYPING (RAMP)
http://www.ramp.studio
RAMP
32
RAMP
33
Amazing improvement
- in just 3 days -
Some numbers
• 100+ participants, working on the same problem
• 411+ models, in just 3 days
• Starting kit scores:
• Combined = 0.131, Err = 0.090, Mare = 0.212
• Final best submission:
• Combined = 0.032 (%75), Err = 0.015 (80%), Mare = 0.065 (~70%)
• Blended model is even better: 0.023 on combined score (better than
Saclay, Hooray!)
• These improvements are amazing
Workshop
• Assume you are all working in various branches of a same group.
• The executive committee decide to run a company wide initiative to elaborate
a roadmap for accelerating digital transition
• Steps:
• Split into 5 teams of 8 persons
• 30-45m. Each group generates as many prediction problems as possible - with direct relevance
to their work (any of the company branches)
• 60m. Build a list of priority, depending:
• Availability or accessibility of data required
• ROI and potential gain (it’s ok to be approximative, but try to come up with informed estimations)
• 30m. Choose 3 applications, and report to the whole group (debriefing)
39
Workshop

Contenu connexe

Tendances

Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxShanmugasundaram M
 
Innovative Design Workshop - HiggsML and beyond (Machine Learning in Particle...
Innovative Design Workshop - HiggsML and beyond (Machine Learning in Particle...Innovative Design Workshop - HiggsML and beyond (Machine Learning in Particle...
Innovative Design Workshop - HiggsML and beyond (Machine Learning in Particle...Akin Osman Kazakci
 
Data Science applications in business
Data Science applications in businessData Science applications in business
Data Science applications in businessVladyslav Yakovenko
 
Big Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao PauloBig Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao PauloOCTO Technology
 
Adding Open Data Value to 'Closed Data' Problems
Adding Open Data Value to 'Closed Data' ProblemsAdding Open Data Value to 'Closed Data' Problems
Adding Open Data Value to 'Closed Data' ProblemsSimon Price
 
Machine learning in action at Pipedrive
Machine learning in action at PipedriveMachine learning in action at Pipedrive
Machine learning in action at PipedriveAndré Karpištšenko
 
Ai2020 ai and or final
Ai2020 ai and or finalAi2020 ai and or final
Ai2020 ai and or finalRichard Vidgen
 
Predictive Analytics: Advanced techniques in data mining
Predictive Analytics: Advanced techniques in data miningPredictive Analytics: Advanced techniques in data mining
Predictive Analytics: Advanced techniques in data miningSAS Asia Pacific
 
2016 Data Science Salary Survey
2016 Data Science Salary Survey2016 Data Science Salary Survey
2016 Data Science Salary SurveyTrieu Nguyen
 
Modular Machine Learning for Model Validation
Modular Machine Learning for Model ValidationModular Machine Learning for Model Validation
Modular Machine Learning for Model ValidationQuantUniversity
 
Fraud detection ML
Fraud detection MLFraud detection ML
Fraud detection MLMaatougSelim
 
Machine Learning part 2 - Introduction to Data Science
Machine Learning part 2 -  Introduction to Data Science Machine Learning part 2 -  Introduction to Data Science
Machine Learning part 2 - Introduction to Data Science Frank Kienle
 
Business Intelligence & Predictive Analytic by Prof. Lili Saghafi
Business Intelligence & Predictive Analytic by Prof. Lili SaghafiBusiness Intelligence & Predictive Analytic by Prof. Lili Saghafi
Business Intelligence & Predictive Analytic by Prof. Lili SaghafiProfessor Lili Saghafi
 
Introduction to machine learning and model building using linear regression
Introduction to machine learning and model building using linear regressionIntroduction to machine learning and model building using linear regression
Introduction to machine learning and model building using linear regressionGirish Gore
 
Xavier Conort, DataScience SG Meetup - Challenges in insurance pricing
Xavier Conort, DataScience SG Meetup - Challenges in insurance pricingXavier Conort, DataScience SG Meetup - Challenges in insurance pricing
Xavier Conort, DataScience SG Meetup - Challenges in insurance pricingKai Xin Thia
 
“Improving” prediction of human behavior using behavior modification
“Improving” prediction of human behavior using behavior modification“Improving” prediction of human behavior using behavior modification
“Improving” prediction of human behavior using behavior modificationGalit Shmueli
 

Tendances (20)

Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
 
Innovative Design Workshop - HiggsML and beyond (Machine Learning in Particle...
Innovative Design Workshop - HiggsML and beyond (Machine Learning in Particle...Innovative Design Workshop - HiggsML and beyond (Machine Learning in Particle...
Innovative Design Workshop - HiggsML and beyond (Machine Learning in Particle...
 
Data Science in Action
Data Science in ActionData Science in Action
Data Science in Action
 
Data Science applications in business
Data Science applications in businessData Science applications in business
Data Science applications in business
 
Big Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao PauloBig Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao Paulo
 
Adding Open Data Value to 'Closed Data' Problems
Adding Open Data Value to 'Closed Data' ProblemsAdding Open Data Value to 'Closed Data' Problems
Adding Open Data Value to 'Closed Data' Problems
 
Machine learning in action at Pipedrive
Machine learning in action at PipedriveMachine learning in action at Pipedrive
Machine learning in action at Pipedrive
 
Andrea Dal Pozzolo's CV
Andrea Dal Pozzolo's CVAndrea Dal Pozzolo's CV
Andrea Dal Pozzolo's CV
 
Ai2020 ai and or final
Ai2020 ai and or finalAi2020 ai and or final
Ai2020 ai and or final
 
Predictive Analytics: Advanced techniques in data mining
Predictive Analytics: Advanced techniques in data miningPredictive Analytics: Advanced techniques in data mining
Predictive Analytics: Advanced techniques in data mining
 
2016 Data Science Salary Survey
2016 Data Science Salary Survey2016 Data Science Salary Survey
2016 Data Science Salary Survey
 
Modular Machine Learning for Model Validation
Modular Machine Learning for Model ValidationModular Machine Learning for Model Validation
Modular Machine Learning for Model Validation
 
Fraud detection ML
Fraud detection MLFraud detection ML
Fraud detection ML
 
Data Science
Data ScienceData Science
Data Science
 
Ml intro
Ml introMl intro
Ml intro
 
Machine Learning part 2 - Introduction to Data Science
Machine Learning part 2 -  Introduction to Data Science Machine Learning part 2 -  Introduction to Data Science
Machine Learning part 2 - Introduction to Data Science
 
Business Intelligence & Predictive Analytic by Prof. Lili Saghafi
Business Intelligence & Predictive Analytic by Prof. Lili SaghafiBusiness Intelligence & Predictive Analytic by Prof. Lili Saghafi
Business Intelligence & Predictive Analytic by Prof. Lili Saghafi
 
Introduction to machine learning and model building using linear regression
Introduction to machine learning and model building using linear regressionIntroduction to machine learning and model building using linear regression
Introduction to machine learning and model building using linear regression
 
Xavier Conort, DataScience SG Meetup - Challenges in insurance pricing
Xavier Conort, DataScience SG Meetup - Challenges in insurance pricingXavier Conort, DataScience SG Meetup - Challenges in insurance pricing
Xavier Conort, DataScience SG Meetup - Challenges in insurance pricing
 
“Improving” prediction of human behavior using behavior modification
“Improving” prediction of human behavior using behavior modification“Improving” prediction of human behavior using behavior modification
“Improving” prediction of human behavior using behavior modification
 

Similaire à Data Science for Business Managers - An intro to ROI for predictive analytics

Better Living Through Analytics - Strategies for Data Decisions
Better Living Through Analytics - Strategies for Data DecisionsBetter Living Through Analytics - Strategies for Data Decisions
Better Living Through Analytics - Strategies for Data DecisionsProduct School
 
Community-Assisted Software Engineering Decision Making
Community-Assisted Software Engineering Decision MakingCommunity-Assisted Software Engineering Decision Making
Community-Assisted Software Engineering Decision Makinggregoryg
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Roger Barga
 
Data_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdfData_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdfvishal choudhary
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 
Il ruolo chiave degli Advanced Analytics per la Supply Chain
Il ruolo chiave degli Advanced Analytics per la Supply ChainIl ruolo chiave degli Advanced Analytics per la Supply Chain
Il ruolo chiave degli Advanced Analytics per la Supply ChainACTOR
 
ACTOR - "Il ruolo chiave degli Advanced Analytics per la Supply Chain. Intel...
ACTOR -  "Il ruolo chiave degli Advanced Analytics per la Supply Chain. Intel...ACTOR -  "Il ruolo chiave degli Advanced Analytics per la Supply Chain. Intel...
ACTOR - "Il ruolo chiave degli Advanced Analytics per la Supply Chain. Intel...logisticaefficiente
 
Data Driven Engineering 2014
Data Driven Engineering 2014Data Driven Engineering 2014
Data Driven Engineering 2014Roger Barga
 
Applied_Data_Science_Presented_by_Yhat
Applied_Data_Science_Presented_by_YhatApplied_Data_Science_Presented_by_Yhat
Applied_Data_Science_Presented_by_YhatCharlie Hecht
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxssuser1a4f0f
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxwahiba ben abdessalem
 
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...Intel® Software
 
Machine learning101 v1.2
Machine learning101 v1.2Machine learning101 v1.2
Machine learning101 v1.2CCG
 
Symposium 2019 : Gestion de projet en Intelligence Artificielle
Symposium 2019 : Gestion de projet en Intelligence ArtificielleSymposium 2019 : Gestion de projet en Intelligence Artificielle
Symposium 2019 : Gestion de projet en Intelligence ArtificiellePMI-Montréal
 
Top 20 Data Science Interview Questions and Answers in 2023.pptx
Top 20 Data Science Interview Questions and Answers in 2023.pptxTop 20 Data Science Interview Questions and Answers in 2023.pptx
Top 20 Data Science Interview Questions and Answers in 2023.pptxAnanthReddy38
 
(In)convenient truths about applied machine learning
(In)convenient truths about applied machine learning(In)convenient truths about applied machine learning
(In)convenient truths about applied machine learningMax Pagels
 
Real-time applications of Data Science.pptx
Real-time applications  of Data Science.pptxReal-time applications  of Data Science.pptx
Real-time applications of Data Science.pptxshalini s
 
Customer Intelligence & Analytics - Part I
Customer Intelligence & Analytics - Part ICustomer Intelligence & Analytics - Part I
Customer Intelligence & Analytics - Part IVivastream
 

Similaire à Data Science for Business Managers - An intro to ROI for predictive analytics (20)

Better Living Through Analytics - Strategies for Data Decisions
Better Living Through Analytics - Strategies for Data DecisionsBetter Living Through Analytics - Strategies for Data Decisions
Better Living Through Analytics - Strategies for Data Decisions
 
Managing machine learning
Managing machine learningManaging machine learning
Managing machine learning
 
Community-Assisted Software Engineering Decision Making
Community-Assisted Software Engineering Decision MakingCommunity-Assisted Software Engineering Decision Making
Community-Assisted Software Engineering Decision Making
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
Data_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdfData_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdf
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
Il ruolo chiave degli Advanced Analytics per la Supply Chain
Il ruolo chiave degli Advanced Analytics per la Supply ChainIl ruolo chiave degli Advanced Analytics per la Supply Chain
Il ruolo chiave degli Advanced Analytics per la Supply Chain
 
ACTOR - "Il ruolo chiave degli Advanced Analytics per la Supply Chain. Intel...
ACTOR -  "Il ruolo chiave degli Advanced Analytics per la Supply Chain. Intel...ACTOR -  "Il ruolo chiave degli Advanced Analytics per la Supply Chain. Intel...
ACTOR - "Il ruolo chiave degli Advanced Analytics per la Supply Chain. Intel...
 
Data Driven Engineering 2014
Data Driven Engineering 2014Data Driven Engineering 2014
Data Driven Engineering 2014
 
Applied_Data_Science_Presented_by_Yhat
Applied_Data_Science_Presented_by_YhatApplied_Data_Science_Presented_by_Yhat
Applied_Data_Science_Presented_by_Yhat
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
 
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
 
Machine learning101 v1.2
Machine learning101 v1.2Machine learning101 v1.2
Machine learning101 v1.2
 
Mohammed AL Madhani
Mohammed AL MadhaniMohammed AL Madhani
Mohammed AL Madhani
 
Symposium 2019 : Gestion de projet en Intelligence Artificielle
Symposium 2019 : Gestion de projet en Intelligence ArtificielleSymposium 2019 : Gestion de projet en Intelligence Artificielle
Symposium 2019 : Gestion de projet en Intelligence Artificielle
 
Top 20 Data Science Interview Questions and Answers in 2023.pptx
Top 20 Data Science Interview Questions and Answers in 2023.pptxTop 20 Data Science Interview Questions and Answers in 2023.pptx
Top 20 Data Science Interview Questions and Answers in 2023.pptx
 
(In)convenient truths about applied machine learning
(In)convenient truths about applied machine learning(In)convenient truths about applied machine learning
(In)convenient truths about applied machine learning
 
Real-time applications of Data Science.pptx
Real-time applications  of Data Science.pptxReal-time applications  of Data Science.pptx
Real-time applications of Data Science.pptx
 
Customer Intelligence & Analytics - Part I
Customer Intelligence & Analytics - Part ICustomer Intelligence & Analytics - Part I
Customer Intelligence & Analytics - Part I
 

Dernier

Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...amitlee9823
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Dipal Arora
 
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...lizamodels9
 
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756dollysharma2066
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLSeo
 
Grateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfGrateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfPaul Menig
 
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptxB.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptxpriyanshujha201
 
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Neil Kimberley
 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.Aaiza Hassan
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...Paul Menig
 
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdfDr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdfAdmir Softic
 
Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMRavindra Nath Shukla
 
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...lizamodels9
 
Cracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxCracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxWorkforce Group
 
RSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors DataRSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors DataExhibitors Data
 
Famous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st CenturyFamous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st Centuryrwgiffor
 

Dernier (20)

Forklift Operations: Safety through Cartoons
Forklift Operations: Safety through CartoonsForklift Operations: Safety through Cartoons
Forklift Operations: Safety through Cartoons
 
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
 
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
 
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
 
Mifty kit IN Salmiya (+918133066128) Abortion pills IN Salmiyah Cytotec pills
Mifty kit IN Salmiya (+918133066128) Abortion pills IN Salmiyah Cytotec pillsMifty kit IN Salmiya (+918133066128) Abortion pills IN Salmiyah Cytotec pills
Mifty kit IN Salmiya (+918133066128) Abortion pills IN Salmiyah Cytotec pills
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
 
Grateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfGrateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdf
 
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptxB.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
 
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023
 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...
 
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdfDr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
 
Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSM
 
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
 
Cracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxCracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptx
 
RSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors DataRSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors Data
 
Famous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st CenturyFamous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st Century
 

Data Science for Business Managers - An intro to ROI for predictive analytics

  • 1. 1 Data Science for Business Managers Akın Osman Kazakçı MINES ParisTech Balazs Kégl Ecole Polytechnique, CNRS
  • 2. 2 External Data Database X PredictionEngine Visualisation Automated actions Notifications The value of data is revealed through prediction. At the heart of the digital transformation lies the “data”
  • 3. Levels of transformation through data • reporting: what happened in the past? (reflection) • dashboards and real time monitoring: what is happening now? (reactivity) • prediction: what will happen next? (pro-activity)
  • 4. How can we accelerate a digital transformation process by leveraging data?
  • 5. 5 Building value-driven data projects What knowledge would increase our profits? The following questions need to be answered in this order: What data do we need to collect? What ML methods are appropriate?
  • 6. • Are standard innovation methodologies fit for digital transformation projects? • (Can we CK this?) 6 Discussion
  • 7. • Do I have all the relevant data? 
 Strategic data watch: is there any new source of data I can use? • Do I have best predictive accuracy?
 How do I make sure that I’m working with the best possible predictive models? 7 Two key aspects
  • 8. 8 Data hunt • Do I have all the relevant data?
  • 9. 9 Exercice: Data hunt During your transition to predictive analytics, you may need to update your databases: to include more variables with potential explicative power •A travel IT systems company has some air traffic / passenger data. •They are interested in predicting passenger flux between 20 airports in US. •Data for 720 days, for each pair of airports. •So,“one” variable. •How can we augment this dataset? •Which variables can be added? •Where can we find the data?
  • 10. Potential sources for relevant factors K1 Events K2 Plane accidents K3 Calendar K4 Delays causes K5 Alternative transportation K6 Safety K7 Data on airports K8 Similar data K9 Oil price K10 Average domestic air fares K11 Town’s population K12 Town’s attractiveness 20+ participants (students), analysed byYohann Sitruk
  • 11. 11 Model quality • Do I have best predictive accuracy?
  • 12. 1. Train & test paradigm 2. Prediction error and quality metrics 3. ROI in data science projects 12 Plan
  • 13. • …involves a great deal of trial and error • little if any theory-based, model-based design • even research (development of new algorithms) is (mostly) trial and error • the data scientist’s best friend is a well-designed experimental studio for facilitating fast iterations of •How can we control the quality of the ensuing model? 13 Building a data science model
  • 14. • Data-driven predictors should work well on future (unseen) data • use historical data to select and fit a model, then use the model to make predictions on new data • but we only have historical data: how do we “simulate” past and future on existing data? 14 Train & test paradigm
  • 15. 15 Train & test paradigm Data Train Test Develop a model on training set Test the model on the test set Change the test set
  • 16. 16 Train & test paradigm Data Train Test Cycling through the data in this manner is called cross-validation This is a powerful and important concept for building robust models
  • 17. Question • Assume your management has decided to outsource your predictive model building activity. • How would you evaluate various partners?
  • 18. 1. Train & test paradigm 2. Prediction error and (quality) metrics 3. ROI in data science projects 18 Plan
  • 19. Back to classification Modèles Standards Simple linear model, Many red and blue items are misclassified A complex non linear model, better separation of data (again) What would be a suitable metric that characterises model performance in the above case?
  • 20. Prediction error Modèles Standards Number of misclassified points (red or blue) ?M1 M2 According to this criteria M1 seems worst than M2 Assuming both models avoid over/under-fitting (is this the case here?)
  • 21. A list of metrics from SciKit Learn (a widely used ML software library) Choice of the metric is important. Ideally, it should be tied to a business objective. Model performance
  • 22. Model performance - a simple case - Two basic notions: - False positives - False negatives Ex 1. the model predicts cancer for a patient who does not have cancer 2. the model predicts a patient does not have cancer while she actually has See that the cost of these errors are not identical. This is true in most cases. Can you give other examples?
  • 23. 1. Train & test paradigm 2. Prediction error and (quality) metrics 3. ROI in data science projects 23 Plan
  • 24. Calculating ROI for improving predictive accuracy Think about ad targeting and companies such as Assume, for the sake of example, the following (fictitious) figures. The company monitors 100 million page loads per hour by internet users. Within the short duration of loading the company should predict whether the user will click on an advertisement. Company pays 0.10$ for showing the advertisement on the dedicated zone of the page. It makes, 0.17$ if the user clicks on the ad. How does the model performance affects profitability? Assume the model causes 5% false positives and 10% false negatives over 100 million predictions. 17 million mauvaise prédictions - par heure! Le cout des FPs: 100M x 0.05 x 0.10$ = 500, 000 euros Le cout des FNs: 100M x 0.10 x 0.07$ = 700, 000 euros
  • 25. The previous example was for (binary) classification Calculating ROI for improving predictive accuracy What happens in case of “regression”? Example: Predicting remaining lifetime of devices
  • 26. How to improve predictive accuracy?
  • 27. How to reach best predictive accuracy? Customer Analytics - Churn - Pricing - Lead scoring - Credit scoring - Up&cross-sales Risk & Production - Fraud / insurance - Compliance - Safety analysis - Cyber-security - Manufacturing Operations - Maintenance - Fault analysis - Logistics - HR - Procurement Better Predictions = More Value Integrating & Increasing data science capabilities is hard Finance Sales Marketing Engineering Purchasing HR Accounting Manufacturing Planning IT DSR&D - Skill gap: Shortage of data scientists, Not enough skilled people, PhDs are expensive & high demand, (McKinsey, 2016), unawareness of latest techniques and experimental methods - Development gap: Lack of adapted infrastructures and systems, limited resources & time, lack of management practices and appropriate experimental tools - Deployment gap: It takes months to go from development to deployment, by the time a model is ready to be deployed in production, the world has changed (distribution shifts; 78% of companies has no automated procedures, 50% recode from scratch, Dataiku Production Survey Report) Main obstacles: Most companies operate with under-performing models Ex. %10 improvement in sales prediction = %1 decrease in stock out = 100M€ increase in sales for a retail giant
  • 28. 28 Developing a predictive model is an experimental process - Linear Regression - Logistic Regression - DecisionTree - SVM - Naive Bayes - KNN - K-Means - Random Forest - Dimensionality Reduction Algorithms - Gradient Boost & Adaboost - … ML algorithms ML has produced a large variety of algorithms each of which has tunable parameters The number of such (hyper)parameters can vary anywhere from 1 to ~100 Trying every combination is not possible
  • 29. B. Kégl / AppStat@LAL Learning to discover CLASSIFICATION FOR DISCOVERY 20% improvement over the baseline model used by physicists (from 3.2 to 3.8) in detecting Higgs particles 14 A particular instrument for extending the “search” for best model is crowdsourcing Hundreds of models produced and tested by the participants
  • 30. 30 RAPID ANALYTICS AND MODEL PROTOTYPING (RAMP) http://www.ramp.studio
  • 31.
  • 34. Amazing improvement - in just 3 days -
  • 35. Some numbers • 100+ participants, working on the same problem • 411+ models, in just 3 days • Starting kit scores: • Combined = 0.131, Err = 0.090, Mare = 0.212 • Final best submission: • Combined = 0.032 (%75), Err = 0.015 (80%), Mare = 0.065 (~70%) • Blended model is even better: 0.023 on combined score (better than Saclay, Hooray!) • These improvements are amazing
  • 36.
  • 37.
  • 39. • Assume you are all working in various branches of a same group. • The executive committee decide to run a company wide initiative to elaborate a roadmap for accelerating digital transition • Steps: • Split into 5 teams of 8 persons • 30-45m. Each group generates as many prediction problems as possible - with direct relevance to their work (any of the company branches) • 60m. Build a list of priority, depending: • Availability or accessibility of data required • ROI and potential gain (it’s ok to be approximative, but try to come up with informed estimations) • 30m. Choose 3 applications, and report to the whole group (debriefing) 39 Workshop