SlideShare a Scribd company logo
1 of 23
Machine learning in production
+case studies
Dmitrijs Lvovs
Outline
• Machine learning, Data Science, Artificial Intelligence
• Common algorithms
• Pipeline and common pitfalls
• Case studies
Machine learning
• Machine Learning
• Data Science
• Artificial Intelligence
Machine learning
• Machine Learning
• Data Science
• Artificial Intelligence
Machine learning
• Process that enables a machine to perform a task
similarly or above human level
Algorithms
https://docs.microsoft.com/en-us/azure/machine-learning/studio/algorithm-choice
Linear regression Logistic regression Decision tree
Neural networks SVM K-means, other clustering
Algorithms
Lessman et al. https://doi.org/10.1016/j.ejor.2015.05.030
Algorithms
Lessman et al. https://doi.org/10.1016/j.ejor.2015.05.030
Algorithms
• In production: credit scoring from 1950’s
Pipeline & Pitfalls
• Get / clean the data
• Model & Evaluate
• Deploy
• Maintain
Pipeline & Pitfalls
• Get data: garbage in = garbage out
– Ensure all data will be available at the time of
prediction
– Use sampling if necessary
– Use the same code to get data for analysis and
prediction
Pipeline and Pitfalls
• Get data
• Model & Evaluate
– Select the target with business in mind
– Start with simple things and set a benchmark
– Improve, write a notebook
– Test out of sample and out of time
Pipeline and Pitfalls
• Get data
• Model & Evaluate
– Select the target with business in mind
– Start with simple things and set a benchmark
– Improve, write a notebook
– Test out of sample and out of time
wholedataset
out of timetrain + out of sample
time
Pipeline & Pitfalls
• Get data
• Model & Evaluate
• Deploy
– Simpler algorithm = simpler deployment
– For regression – only weights for variables
– For more advanced, usually REST API (R shown):
• https://cran.r-project.org/web/packages/AzureML/index.html
• https://www.opencpu.org/
• https://tensorflow.rstudio.com/tools/tfdeploy/articles/introduction.html
• https://github.com/trestletech/plumber
• ...
• https://github.com/danaki/yshanka
Pipeline & Pitfalls
• Deploy
– OR a training+hosting tool in case budget allows $$$
or cloud is not an issue:
– Budget: 10 000s - 100 000s
Pipeline & Pitfalls
• Get data
• Model & Evaluate
• Deploy
• Maintain
– Test data -> population must be the same!
– Test model -> track output, performance
– Challenge model -> update the model and
challenge it
ML in production
https://en.wikipedia.org/wiki/Voyager_1#/media/File:Voyager_spacecraft.jpg
Case studies
Case: a call centre
Setup:
• A company that connects short-term employees with employers
• Data on several thousands of calls provided, mainly contact data and indication
whether the person accepted employment offer
• Q: Who do we call?
Result:
• A model with AUC 0.8
model output took the job rate calls base
0 8% 49%
10 16% 20%
20 20% 12%
30 28% 8%
40 36% 5%
50 42% 3%
60 50% 2%
70 62% 1%
80 67% 0%
90 71% 0%
16% 100%
0%
10%
20%
30%
40%
50%
60%
70%
80%
0 10 20 30 40 50 60 70 80 90
acceptedoffer
model score
Case: student performance review
Setup:
• A company that records and keeps all student marks throughout the year
• Data on several thousands of marks provided
• The idea for the model to predict the year’s final mark for each subject
• Q: What the year-end mark is going to be for each student and subject?
Result:
• A very simple model, 5% MAE
0
2
4
6
8
10
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
predicted
actual
month prediction error
1 12%
2 11%
3 10%
4 8%
5 8%
6 7%
7 7%
8 6%
9 5%
Case: credit scoring model for online lender
Setup:
• A company that issues loan in a EU country
• Data on several thousands of loans provided
• Q: Will a customer default?
Result:
• An advanced ensemble machine learning pipeline yielded mere 2% gain over a
logistic regression model
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
ensemble machine learning logistic regression
AUC
Case: Will a customer deposit funds?
Setup:
• A company that trades currency
• Data on several hundreds thousands of user agent strings provided
• Q: Will a customer deposit funds to their account?
Result:
• An ensemble machine learning model learned to separate those who will deposit:
Score Deposited Count total
0 3% 7110
10 13% 800
20 16% 341
30 25% 159
40 25% 80
50 43% 23
60 40% 10
70 67% 3
80 100% 2
90 100% 1
Case: Is a transaction fraudulent?
Setup:
• A kaggle dataset with fraudulent transactions from
https://www.kaggle.com/dalpozz/creditcardfraud
• Epistatica’s learning pipeline
• Q: Can we build an unsupervised model?
Result:
• AUC 0.75 on kaggle data (0.6 hit rate with 0.3 FP rate)
Validated:
• One of the top consulting companies data (AUC 0.7 )
• On payment provider data (AUC 0.8)
group size fraud rate fraud rate difference
67% 0.1%
302%
33% 0.3%

More Related Content

What's hot

An Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time ApplicationsAn Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time Applications
Johann Schleier-Smith
 

What's hot (20)

An Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time ApplicationsAn Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time Applications
 
ML-Ops: From Proof-of-Concept to Production Application
ML-Ops: From Proof-of-Concept to Production ApplicationML-Ops: From Proof-of-Concept to Production Application
ML-Ops: From Proof-of-Concept to Production Application
 
BDX 2016 - Kevin lyons & yakir buskilla @ eXelate
BDX 2016 - Kevin lyons & yakir buskilla  @ eXelate BDX 2016 - Kevin lyons & yakir buskilla  @ eXelate
BDX 2016 - Kevin lyons & yakir buskilla @ eXelate
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 
H2O World - Self Guiding Applications with Venkatesh Yadav
H2O World - Self Guiding Applications with Venkatesh YadavH2O World - Self Guiding Applications with Venkatesh Yadav
H2O World - Self Guiding Applications with Venkatesh Yadav
 
Hadoop World 2011: LeveragIng Hadoop to Transform Raw Data to Rich Features a...
Hadoop World 2011: LeveragIng Hadoop to Transform Raw Data to Rich Features a...Hadoop World 2011: LeveragIng Hadoop to Transform Raw Data to Rich Features a...
Hadoop World 2011: LeveragIng Hadoop to Transform Raw Data to Rich Features a...
 
Join 2017_Deep Dive_Table Calculations 101
Join 2017_Deep Dive_Table Calculations 101Join 2017_Deep Dive_Table Calculations 101
Join 2017_Deep Dive_Table Calculations 101
 
Pm.ais ummit 180917 final
Pm.ais ummit 180917 finalPm.ais ummit 180917 final
Pm.ais ummit 180917 final
 
Principles of System Observability
Principles of System Observability Principles of System Observability
Principles of System Observability
 
MLSD18. Automating Machine Learning Workflows
MLSD18. Automating Machine Learning WorkflowsMLSD18. Automating Machine Learning Workflows
MLSD18. Automating Machine Learning Workflows
 
Agile Machine Learning for Real-time Recommender Systems
Agile Machine Learning for Real-time Recommender SystemsAgile Machine Learning for Real-time Recommender Systems
Agile Machine Learning for Real-time Recommender Systems
 
New Capabilities Cloud Computing
New Capabilities Cloud ComputingNew Capabilities Cloud Computing
New Capabilities Cloud Computing
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
Enabling Real-Time Analytics for IoT
Enabling Real-Time Analytics for IoTEnabling Real-Time Analytics for IoT
Enabling Real-Time Analytics for IoT
 
ConFoo 2017: Introduction to performance optimization of .NET web apps
ConFoo 2017: Introduction to performance optimization of .NET web appsConFoo 2017: Introduction to performance optimization of .NET web apps
ConFoo 2017: Introduction to performance optimization of .NET web apps
 
Observability and its application
Observability and its applicationObservability and its application
Observability and its application
 
Microsoft Machine Learning Smackdown
Microsoft Machine Learning SmackdownMicrosoft Machine Learning Smackdown
Microsoft Machine Learning Smackdown
 
EPUG UKI - Lancaster Analytics
EPUG UKI - Lancaster AnalyticsEPUG UKI - Lancaster Analytics
EPUG UKI - Lancaster Analytics
 
Algo Project Proposal
Algo Project ProposalAlgo Project Proposal
Algo Project Proposal
 
Blind spots in big data erez koren @ forter
Blind spots in big data erez koren @ forterBlind spots in big data erez koren @ forter
Blind spots in big data erez koren @ forter
 

Similar to “Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epistatica at Machine Learning focused 62nd DevClub.lv

Machine Learning at Scale with MLflow and Apache Spark
Machine Learning at Scale with MLflow and Apache SparkMachine Learning at Scale with MLflow and Apache Spark
Machine Learning at Scale with MLflow and Apache Spark
Databricks
 
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward
 

Similar to “Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epistatica at Machine Learning focused 62nd DevClub.lv (20)

Project Controls Expo - 31st Oct 2012 - Accurate Management Reports on 1me, e...
Project Controls Expo - 31st Oct 2012 - Accurate Management Reports on 1me, e...Project Controls Expo - 31st Oct 2012 - Accurate Management Reports on 1me, e...
Project Controls Expo - 31st Oct 2012 - Accurate Management Reports on 1me, e...
 
Making Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons LearnedMaking Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons Learned
 
Productionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices ArchitectureProductionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices Architecture
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
 
Sql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.pptSql azure cluster dashboard public.ppt
Sql azure cluster dashboard public.ppt
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
 
Machine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville MeetupMachine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville Meetup
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 
Monitoring Half a Million ML Models, IoT Streaming Data, and Automated Qualit...
Monitoring Half a Million ML Models, IoT Streaming Data, and Automated Qualit...Monitoring Half a Million ML Models, IoT Streaming Data, and Automated Qualit...
Monitoring Half a Million ML Models, IoT Streaming Data, and Automated Qualit...
 
Machine Learning at Scale with MLflow and Apache Spark
Machine Learning at Scale with MLflow and Apache SparkMachine Learning at Scale with MLflow and Apache Spark
Machine Learning at Scale with MLflow and Apache Spark
 
When Should I Use Simulation?
When Should I Use Simulation?When Should I Use Simulation?
When Should I Use Simulation?
 
Witekio introducing-predictive-maintenance
Witekio introducing-predictive-maintenanceWitekio introducing-predictive-maintenance
Witekio introducing-predictive-maintenance
 
Small is Beautiful- Fully Automate your Test Case Design
Small is Beautiful- Fully Automate your Test Case DesignSmall is Beautiful- Fully Automate your Test Case Design
Small is Beautiful- Fully Automate your Test Case Design
 
From Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemFrom Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender system
 
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...Flink Forward San Francisco 2018:  David Reniz & Dahyr Vergara - "Real-time m...
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...
 
A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)
 
Machine learning
Machine learningMachine learning
Machine learning
 
Consolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest AirportsConsolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest Airports
 
Machine learning systems for engineers
Machine learning systems for engineersMachine learning systems for engineers
Machine learning systems for engineers
 

More from DevClub_lv

More from DevClub_lv (20)

Fine-tuning Large Language Models by Dmitry Balabka
Fine-tuning Large Language Models by Dmitry BalabkaFine-tuning Large Language Models by Dmitry Balabka
Fine-tuning Large Language Models by Dmitry Balabka
 
"Infrastructure and AWS at Scale: The story of Posti" by Goran Gjorgievski @ ...
"Infrastructure and AWS at Scale: The story of Posti" by Goran Gjorgievski @ ..."Infrastructure and AWS at Scale: The story of Posti" by Goran Gjorgievski @ ...
"Infrastructure and AWS at Scale: The story of Posti" by Goran Gjorgievski @ ...
 
From 50 to 500 product engineers – data-driven approach to building impactful...
From 50 to 500 product engineers – data-driven approach to building impactful...From 50 to 500 product engineers – data-driven approach to building impactful...
From 50 to 500 product engineers – data-driven approach to building impactful...
 
Why is it so complex to accept a payment? by Dmitry Buzdin from A-Heads Consu...
Why is it so complex to accept a payment? by Dmitry Buzdin from A-Heads Consu...Why is it so complex to accept a payment? by Dmitry Buzdin from A-Heads Consu...
Why is it so complex to accept a payment? by Dmitry Buzdin from A-Heads Consu...
 
Do we need DDD? by Jurijs Čudnovskis from “Craftsmans Passion” at Fintech foc...
Do we need DDD? by Jurijs Čudnovskis from “Craftsmans Passion” at Fintech foc...Do we need DDD? by Jurijs Čudnovskis from “Craftsmans Passion” at Fintech foc...
Do we need DDD? by Jurijs Čudnovskis from “Craftsmans Passion” at Fintech foc...
 
Network security with Azure PaaS services by Erwin Staal from 4DotNet at Azur...
Network security with Azure PaaS services by Erwin Staal from 4DotNet at Azur...Network security with Azure PaaS services by Erwin Staal from 4DotNet at Azur...
Network security with Azure PaaS services by Erwin Staal from 4DotNet at Azur...
 
Using Azure Managed Identities for your App Services by Jan de Vries from 4Do...
Using Azure Managed Identities for your App Services by Jan de Vries from 4Do...Using Azure Managed Identities for your App Services by Jan de Vries from 4Do...
Using Azure Managed Identities for your App Services by Jan de Vries from 4Do...
 
SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...
 
Emergence of IOT & Cloud – Azure by Narendra Sharma at Cloud focused 76th Dev...
Emergence of IOT & Cloud – Azure by Narendra Sharma at Cloud focused 76th Dev...Emergence of IOT & Cloud – Azure by Narendra Sharma at Cloud focused 76th Dev...
Emergence of IOT & Cloud – Azure by Narendra Sharma at Cloud focused 76th Dev...
 
Cross Platform Mobile Development using Flutter by Wei Meng Lee at Mobile foc...
Cross Platform Mobile Development using Flutter by Wei Meng Lee at Mobile foc...Cross Platform Mobile Development using Flutter by Wei Meng Lee at Mobile foc...
Cross Platform Mobile Development using Flutter by Wei Meng Lee at Mobile foc...
 
Building resilient frontend architecture by Monica Lent at FrontCon 2019
Building resilient frontend architecture by Monica Lent at FrontCon 2019Building resilient frontend architecture by Monica Lent at FrontCon 2019
Building resilient frontend architecture by Monica Lent at FrontCon 2019
 
Things that every JavaScript developer should know by Rachel Appel at FrontCo...
Things that every JavaScript developer should know by Rachel Appel at FrontCo...Things that every JavaScript developer should know by Rachel Appel at FrontCo...
Things that every JavaScript developer should know by Rachel Appel at FrontCo...
 
In the Trenches During a Software Supply Chain Attack by Mitch Denny at Front...
In the Trenches During a Software Supply Chain Attack by Mitch Denny at Front...In the Trenches During a Software Supply Chain Attack by Mitch Denny at Front...
In the Trenches During a Software Supply Chain Attack by Mitch Denny at Front...
 
Software Decision Making in Terms of Uncertainty by Ziv Levy at FrontCon 2019
Software Decision Making in Terms of Uncertainty by Ziv Levy at FrontCon 2019Software Decision Making in Terms of Uncertainty by Ziv Levy at FrontCon 2019
Software Decision Making in Terms of Uncertainty by Ziv Levy at FrontCon 2019
 
V8 by example: A journey through the compilation pipeline by Ujjwas Sharma at...
V8 by example: A journey through the compilation pipeline by Ujjwas Sharma at...V8 by example: A journey through the compilation pipeline by Ujjwas Sharma at...
V8 by example: A journey through the compilation pipeline by Ujjwas Sharma at...
 
Bridging the gap between UX and development - A Storybook by Marko Letic at F...
Bridging the gap between UX and development - A Storybook by Marko Letic at F...Bridging the gap between UX and development - A Storybook by Marko Letic at F...
Bridging the gap between UX and development - A Storybook by Marko Letic at F...
 
Case-study: Frontend in Cybersecurity by Ruslan Zavacky by FrontCon 2019
Case-study: Frontend in Cybersecurity by Ruslan Zavacky by FrontCon 2019Case-study: Frontend in Cybersecurity by Ruslan Zavacky by FrontCon 2019
Case-study: Frontend in Cybersecurity by Ruslan Zavacky by FrontCon 2019
 
Building next generation PWA e-commerce frontend by Raivis Dejus at FrontCon ...
Building next generation PWA e-commerce frontend by Raivis Dejus at FrontCon ...Building next generation PWA e-commerce frontend by Raivis Dejus at FrontCon ...
Building next generation PWA e-commerce frontend by Raivis Dejus at FrontCon ...
 
Parcel – your next web application bundler? by Janis Koselevs at FrontCon 2019
Parcel – your next web application bundler? by Janis Koselevs at FrontCon 2019Parcel – your next web application bundler? by Janis Koselevs at FrontCon 2019
Parcel – your next web application bundler? by Janis Koselevs at FrontCon 2019
 
Managing State in React Apps with RxJS by James Wright at FrontCon 2019
Managing State in React Apps with RxJS by James Wright at FrontCon 2019Managing State in React Apps with RxJS by James Wright at FrontCon 2019
Managing State in React Apps with RxJS by James Wright at FrontCon 2019
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

“Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epistatica at Machine Learning focused 62nd DevClub.lv

  • 1. Machine learning in production +case studies Dmitrijs Lvovs
  • 2. Outline • Machine learning, Data Science, Artificial Intelligence • Common algorithms • Pipeline and common pitfalls • Case studies
  • 3. Machine learning • Machine Learning • Data Science • Artificial Intelligence
  • 4. Machine learning • Machine Learning • Data Science • Artificial Intelligence
  • 5. Machine learning • Process that enables a machine to perform a task similarly or above human level
  • 7. Algorithms Lessman et al. https://doi.org/10.1016/j.ejor.2015.05.030
  • 8. Algorithms Lessman et al. https://doi.org/10.1016/j.ejor.2015.05.030
  • 9. Algorithms • In production: credit scoring from 1950’s
  • 10. Pipeline & Pitfalls • Get / clean the data • Model & Evaluate • Deploy • Maintain
  • 11. Pipeline & Pitfalls • Get data: garbage in = garbage out – Ensure all data will be available at the time of prediction – Use sampling if necessary – Use the same code to get data for analysis and prediction
  • 12. Pipeline and Pitfalls • Get data • Model & Evaluate – Select the target with business in mind – Start with simple things and set a benchmark – Improve, write a notebook – Test out of sample and out of time
  • 13. Pipeline and Pitfalls • Get data • Model & Evaluate – Select the target with business in mind – Start with simple things and set a benchmark – Improve, write a notebook – Test out of sample and out of time wholedataset out of timetrain + out of sample time
  • 14. Pipeline & Pitfalls • Get data • Model & Evaluate • Deploy – Simpler algorithm = simpler deployment – For regression – only weights for variables – For more advanced, usually REST API (R shown): • https://cran.r-project.org/web/packages/AzureML/index.html • https://www.opencpu.org/ • https://tensorflow.rstudio.com/tools/tfdeploy/articles/introduction.html • https://github.com/trestletech/plumber • ... • https://github.com/danaki/yshanka
  • 15. Pipeline & Pitfalls • Deploy – OR a training+hosting tool in case budget allows $$$ or cloud is not an issue: – Budget: 10 000s - 100 000s
  • 16. Pipeline & Pitfalls • Get data • Model & Evaluate • Deploy • Maintain – Test data -> population must be the same! – Test model -> track output, performance – Challenge model -> update the model and challenge it
  • 19. Case: a call centre Setup: • A company that connects short-term employees with employers • Data on several thousands of calls provided, mainly contact data and indication whether the person accepted employment offer • Q: Who do we call? Result: • A model with AUC 0.8 model output took the job rate calls base 0 8% 49% 10 16% 20% 20 20% 12% 30 28% 8% 40 36% 5% 50 42% 3% 60 50% 2% 70 62% 1% 80 67% 0% 90 71% 0% 16% 100% 0% 10% 20% 30% 40% 50% 60% 70% 80% 0 10 20 30 40 50 60 70 80 90 acceptedoffer model score
  • 20. Case: student performance review Setup: • A company that records and keeps all student marks throughout the year • Data on several thousands of marks provided • The idea for the model to predict the year’s final mark for each subject • Q: What the year-end mark is going to be for each student and subject? Result: • A very simple model, 5% MAE 0 2 4 6 8 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 predicted actual month prediction error 1 12% 2 11% 3 10% 4 8% 5 8% 6 7% 7 7% 8 6% 9 5%
  • 21. Case: credit scoring model for online lender Setup: • A company that issues loan in a EU country • Data on several thousands of loans provided • Q: Will a customer default? Result: • An advanced ensemble machine learning pipeline yielded mere 2% gain over a logistic regression model 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 ensemble machine learning logistic regression AUC
  • 22. Case: Will a customer deposit funds? Setup: • A company that trades currency • Data on several hundreds thousands of user agent strings provided • Q: Will a customer deposit funds to their account? Result: • An ensemble machine learning model learned to separate those who will deposit: Score Deposited Count total 0 3% 7110 10 13% 800 20 16% 341 30 25% 159 40 25% 80 50 43% 23 60 40% 10 70 67% 3 80 100% 2 90 100% 1
  • 23. Case: Is a transaction fraudulent? Setup: • A kaggle dataset with fraudulent transactions from https://www.kaggle.com/dalpozz/creditcardfraud • Epistatica’s learning pipeline • Q: Can we build an unsupervised model? Result: • AUC 0.75 on kaggle data (0.6 hit rate with 0.3 FP rate) Validated: • One of the top consulting companies data (AUC 0.7 ) • On payment provider data (AUC 0.8) group size fraud rate fraud rate difference 67% 0.1% 302% 33% 0.3%