SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
From Zero to
Production
Deploying Machine Learning Models
in a Legacy Banking Environment
From Zero to Production 22.01.2019
Why are you here?
• You can‘t believe Sparkasse banks are in Data
Analytics (topic reserved for sexy fintechs and
software companies)
• You are curious about the words „machine
learning“ (ML) and „production“
• You are hoping to find the holy grail for your
ML and production problems
Dataiku Meetup
https://commons.wikimedia.org/wiki/File:Holy-grail-round-table-bnf-ms_fr-116F-f610v-15th-detail.jpg
Evrard d'Espinques [Public domain], via Wikimedia Commons
From Zero to Production 22.01.2019
What does „production“ mean anyway?
Dataiku Meetup
https://stackoverflow.com/questions/490289/what-exactly-defines-production
From Zero to Production 22.01.2019
S Rating und Risikosysteme GmbH (SR): We Data Analytics
• Founded 2004 with a focus on providing market,
regulatory, operational and credit risk
frameworks
• > 250 employees
• Team Data Analytics started 1.5 years ago
• Quantitative folks and product managers
(20 folks in total)
• > 30 machine learning models in
production
• „Made in Berlin“ (Spittelmarkt)
Dataiku Meetup
https://www.berliner-sparkasse.de/de/home/200jahre.html?n=true
From Zero to Production 22.01.2019
Savings Banks Finance Group (SFG)
• 383 independent Sparkasse commercial/retail banks
• Decentralized structure (regional principle)
• Central IT service partner (Finanz Informatik)
• OneSystemPlus = core banking system for all institutions
• S Rating und Risikosysteme GmbH central Data Analytics partner
Dataiku Meetup
From Zero to Production 22.01.2019
The SFG (decentralized) data treasure chest
• 50 Mio. customers
• 118 Mio. banking accounts
• 2.1 Bn. online banking visits (per year)
• 114 Bn. payment transactions (per year)
Dataiku Meetup
From Zero to Production 22.01.2019
Example use cases of ML in Banking and Financial Services
Customer Experience Operational Efficiency Sales and Marketing Risk and Fraud
• Chat-bots and robo-
advisors
• Natural Language
Processing (NLP) to
decipher call logs and
customer feedback
• Optimizing operational
expenses such as call
center staff and tellers
• Optimizing sales and
marketing expenses
• Optimizing operational
efficiency
Dataiku Meetup
From Zero to Production 22.01.2019
Getting more with your score
Preparation
• What is your target group?
Expert advice
• Target group based on expert knowledge
Data Analytics
• Target group based on predictive analytics
Age 18-35
Age 35-75
Income
0-1000 €
Income
1000-10000 €
Product Score
Dataiku Meetup
From Zero to Production 22.01.2019
A model data pipeline
Structured Data
Ingest Transform Model Deploy
Dataiku Meetup
From Zero to Production 22.01.2019
Data Analytics closed loop
Train
model
pipeline
Serve
request
(Batch)Deploy models
Monitor
service
Get feedback
Update pipelines
Prototyp &
develop
model
pipelines
Dataiku Meetup
From Zero to Production 22.01.2019
Challenges
• I have time constraints – run fast enough
• We need to play well with others:
• other systems
• other teams
• Need to be robust and just work
• Need to integrate into business processes
• Does it increase profits?
• Live ML doesn‘t always work the way I expect…
Dataiku Meetup
From Zero to Production 22.01.2019
Working well with other teams and systems? (1/3)
wallofconfusion
AUC looks alright, hyperparameter
tuned. Time to deploy!
SR Data Scientist
What the **** is alpha and beta?
FI Mainframe + Java
application developer
Dev Ops + Dev
Dataiku Meetup
wallofconfusion
Business
Sparkasse teller
I want to be there for my clients! Target
variable what?
Person icons made by monkik from www.flaticon.com
From Zero to Production 22.01.2019
Understand the business processes! (2/3)
SR Data Scientist
Sparkasse teller
Business+Dev
Dataiku Meetup
• Business processes generate data, understand every single step
• Work together on the „Ground truth“ (reality you want to predict)
• Does it generalize?
• Verify every sub-results with practioners
• Lack of domain knowledge is a barrier you can overcome
From Zero to Production 22.01.2019
Understand the IT architecture! (3/3)
SR Data Scientist
FI Mainframe + Java
application developer
DevOps
Deploy model parameter
Scoring engine in SAS
Ready for production, yeah!
Dataiku Meetup
From Zero to Production 22.01.2019
101- Decision tree classifier (1/2)
XGBoost: A Scalable Tree Boosting System
Tianqi Chen, Conference Paper, 2016
Dataiku Meetup
• Flowchart structure starting at root node
• Simple IF-ELSE questions in child nodes
• CART (classification and regression tree) algorithm uses binary trees
From Zero to Production 22.01.2019
101- Ensemble prediction (2/2)
Tree 1
Tree 2
Tree …
Score 1
Score 2
Score …
Sum
Score
Dataiku Meetup
From Zero to Production 22.01.2019
Exporting model parameter
Dataiku Meetup
Tree 1
Tree 2
Tree …
TREE_NR INPUT_VAR TREE_SPLT_VAR_NR TREE_SPLT_VALUE
1 Income 1 11.000
1 Age 2 45
1 Occupied 3 1
… … … …
TREE_NR TREE_NODE_NR TREE_NODE_SCORE
1 1 0.00331848000000
1 2 -0.00174424000000
1 3 0.04362040000000
1 4 0.00302040000000
… … …
Where do I need to
split the input
variable?
Which score do I
need to assign to
each node?
From Zero to Production 22.01.2019
SAS score engine
Dataiku Meetup
Export model parameter as CSV
file
Import model parameter
• Model parameter
• Input data
Give the model
parameter and the input
data for every customer
and I tell you the score!
Save the results,
please!
From Zero to Production 22.01.2019
Monitoring requests in production
Dataiku Meetup
• AUC (area under the curve) in case some businees processes
change (=drop in AUC)
• Correlation between scores and input variables
• Descriptive statistics (mean, max, min, count) of input
variables
• „Acid“ test: ratio of scores regarding target variable
• Performance (scores/min)
From Zero to Production 22.01.2019
Wrapping things up
Measure, measure and measure
• Monitor every single step of your
pipeline
• Data quality is the holy grail
Data Scientists = translators
• Learn the „language“
(not only programming) of other teams
• Build bridges
• What business problem do you want to solve?
Start your production
pipeline simple
• Understand the IT system architecture
• Talk with your IT folks and business people
Dataiku Meetup
Only production code is good
code
• A Data Scientist should know programming
principles
• Performance counts in real world applications
• Code quality beats model prediction quality to
some extend
Data First Folks!
Thanks for having me
22.01.2019 Dataiku Meetup
Marco Bahrs, Data Scientist
Get in touch with me via
Disclaimer: This presentations is intended for educational purposes only and does not replace independent professional judgment. Statements of fact and opinions expressed are those of the participants
individually and, unless expressly stated to the contrary, are not the opinion or position of the Sparkasse Rating and Risikosysteme GmbH or the Finanz Informatik. The Sparkasse Rating and Risikosysteme
GmbH does not endorse or approve, and assumes no responsibility for, the content, accuracy or completeness of the information presented.
From Zero to Production 22.01.2019
Bonus material- Ensemble
Dataiku Meetup
XGBoost: A Scalable Tree Boosting System
Tianqi Chen, Conference Paper, 2016
• Combining many weak learners (many trees = forest)
From Zero to Production 22.01.2019
Bonus material- Gradient Boosting training
Age Balance Employed … Personal Loan
37 2560€ 1 1
29 1726€ 1 0
22 460€ 0 0
… … … …
Tree 1
Tree 2
Probability
0,87
0,19
0,05
…
Error
-0,13
0,19
0,05
…
Age Balance Employed … Error
37 2560€ 1 -0,13
29 1726€ 1 0,19
22 460€ 0 0,05
… … … …
Prediction
0,17
0,24
0,08
…
Error
0,30
0,05
0,03
…
…
Dataiku Meetup

Contenu connexe

Tendances

Pro Data Consult Ukraine Eng Pp For Link
Pro Data Consult Ukraine Eng Pp For LinkPro Data Consult Ukraine Eng Pp For Link
Pro Data Consult Ukraine Eng Pp For Link
Larisa Sh
 

Tendances (20)

Agile BI success factors
Agile BI success factorsAgile BI success factors
Agile BI success factors
 
How it works: it consulting
How it works: it consultingHow it works: it consulting
How it works: it consulting
 
Product Management's Role in Digital Transformation
Product Management's Role in Digital TransformationProduct Management's Role in Digital Transformation
Product Management's Role in Digital Transformation
 
Top career opportunities in data science
Top career opportunities in data scienceTop career opportunities in data science
Top career opportunities in data science
 
Dice live training program
Dice live training programDice live training program
Dice live training program
 
BlueBrain Nexus Technical Introduction
BlueBrain Nexus Technical IntroductionBlueBrain Nexus Technical Introduction
BlueBrain Nexus Technical Introduction
 
IT Planning and Budgeting Crash Course
IT Planning and Budgeting Crash CourseIT Planning and Budgeting Crash Course
IT Planning and Budgeting Crash Course
 
New professional careers in data
New professional careers in dataNew professional careers in data
New professional careers in data
 
Machine Learning - why the hype and how it does its magic
Machine Learning - why the hype and how it does its magicMachine Learning - why the hype and how it does its magic
Machine Learning - why the hype and how it does its magic
 
data scientist the sexiest job of the 21st century
data scientist the sexiest job of the 21st centurydata scientist the sexiest job of the 21st century
data scientist the sexiest job of the 21st century
 
Advanced Visualizations, Bijilash Babu
Advanced Visualizations, Bijilash BabuAdvanced Visualizations, Bijilash Babu
Advanced Visualizations, Bijilash Babu
 
Lifecycle of a Data Science Project
Lifecycle of a Data Science ProjectLifecycle of a Data Science Project
Lifecycle of a Data Science Project
 
The empower process
The empower processThe empower process
The empower process
 
Enterprise Architecture - An Introduction
Enterprise Architecture - An Introduction Enterprise Architecture - An Introduction
Enterprise Architecture - An Introduction
 
Big Data : From HindSight to Insight to Foresight
Big Data : From HindSight to Insight to ForesightBig Data : From HindSight to Insight to Foresight
Big Data : From HindSight to Insight to Foresight
 
Business Intelligence Jargon Buster
Business Intelligence Jargon BusterBusiness Intelligence Jargon Buster
Business Intelligence Jargon Buster
 
Systematic Architectural Data migration foundation and patterns
Systematic Architectural  Data migration foundation and patterns Systematic Architectural  Data migration foundation and patterns
Systematic Architectural Data migration foundation and patterns
 
IT Consulting Firm |
IT Consulting Firm |IT Consulting Firm |
IT Consulting Firm |
 
Big Data Analytics - GTech Seminar
Big Data Analytics - GTech SeminarBig Data Analytics - GTech Seminar
Big Data Analytics - GTech Seminar
 
Pro Data Consult Ukraine Eng Pp For Link
Pro Data Consult Ukraine Eng Pp For LinkPro Data Consult Ukraine Eng Pp For Link
Pro Data Consult Ukraine Eng Pp For Link
 

Similaire à From Zero to Production Dataiku Meetup Berlin

Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTX
Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTXCustomer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTX
Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTX
tsigitnist02
 

Similaire à From Zero to Production Dataiku Meetup Berlin (20)

Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi
 
The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products
 
InSource 2017 IIoT Roadshow: Evolution or Revolution
InSource 2017 IIoT Roadshow: Evolution or RevolutionInSource 2017 IIoT Roadshow: Evolution or Revolution
InSource 2017 IIoT Roadshow: Evolution or Revolution
 
Big Data Analytics with Microsoft
Big Data Analytics with MicrosoftBig Data Analytics with Microsoft
Big Data Analytics with Microsoft
 
Top 10 tredning technologies to learn in 2021
Top 10 tredning technologies to learn in 2021Top 10 tredning technologies to learn in 2021
Top 10 tredning technologies to learn in 2021
 
AI Orange Belt - Session 3
AI Orange Belt - Session 3AI Orange Belt - Session 3
AI Orange Belt - Session 3
 
Corporate-training-for-msbi-course-in-mumbai
Corporate-training-for-msbi-course-in-mumbaiCorporate-training-for-msbi-course-in-mumbai
Corporate-training-for-msbi-course-in-mumbai
 
ETDP 2015 D1 SMAC & the Journey from Automation to Digital Factory - Snjeev K...
ETDP 2015 D1 SMAC & the Journey from Automation to Digital Factory - Snjeev K...ETDP 2015 D1 SMAC & the Journey from Automation to Digital Factory - Snjeev K...
ETDP 2015 D1 SMAC & the Journey from Automation to Digital Factory - Snjeev K...
 
Navigating the Workday Analytics and Reporting Ecosystem
Navigating the Workday Analytics and Reporting EcosystemNavigating the Workday Analytics and Reporting Ecosystem
Navigating the Workday Analytics and Reporting Ecosystem
 
Is IIOT Right for You?
Is IIOT Right for You?Is IIOT Right for You?
Is IIOT Right for You?
 
Pursuing Versatile IT Architecture to Effectively Respond to Economic Expansi...
Pursuing Versatile IT Architecture to Effectively Respond to Economic Expansi...Pursuing Versatile IT Architecture to Effectively Respond to Economic Expansi...
Pursuing Versatile IT Architecture to Effectively Respond to Economic Expansi...
 
Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTX
Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTXCustomer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTX
Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTX
 
How to analyze text data for AI and ML with Named Entity Recognition
How to analyze text data for AI and ML with Named Entity RecognitionHow to analyze text data for AI and ML with Named Entity Recognition
How to analyze text data for AI and ML with Named Entity Recognition
 
Certified Data Analyst - Ain GenX (Pvt.) Ltd.-3.pdf
Certified Data Analyst - Ain GenX (Pvt.) Ltd.-3.pdfCertified Data Analyst - Ain GenX (Pvt.) Ltd.-3.pdf
Certified Data Analyst - Ain GenX (Pvt.) Ltd.-3.pdf
 
Intro to Artificial Intelligence w/ Target's Director of PM
 Intro to Artificial Intelligence w/ Target's Director of PM Intro to Artificial Intelligence w/ Target's Director of PM
Intro to Artificial Intelligence w/ Target's Director of PM
 
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
 
Demystifying ML/AI
Demystifying ML/AIDemystifying ML/AI
Demystifying ML/AI
 
Microsoft Office 365
Microsoft Office 365Microsoft Office 365
Microsoft Office 365
 
5 Steps To Measure ROI On Your Data Science Initiatives - Webinar
 5 Steps To Measure ROI On Your Data Science Initiatives - Webinar 5 Steps To Measure ROI On Your Data Science Initiatives - Webinar
5 Steps To Measure ROI On Your Data Science Initiatives - Webinar
 

Dernier

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
shambhavirathore45
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 

Dernier (20)

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 

From Zero to Production Dataiku Meetup Berlin

  • 1. From Zero to Production Deploying Machine Learning Models in a Legacy Banking Environment
  • 2. From Zero to Production 22.01.2019 Why are you here? • You can‘t believe Sparkasse banks are in Data Analytics (topic reserved for sexy fintechs and software companies) • You are curious about the words „machine learning“ (ML) and „production“ • You are hoping to find the holy grail for your ML and production problems Dataiku Meetup https://commons.wikimedia.org/wiki/File:Holy-grail-round-table-bnf-ms_fr-116F-f610v-15th-detail.jpg Evrard d'Espinques [Public domain], via Wikimedia Commons
  • 3. From Zero to Production 22.01.2019 What does „production“ mean anyway? Dataiku Meetup https://stackoverflow.com/questions/490289/what-exactly-defines-production
  • 4. From Zero to Production 22.01.2019 S Rating und Risikosysteme GmbH (SR): We Data Analytics • Founded 2004 with a focus on providing market, regulatory, operational and credit risk frameworks • > 250 employees • Team Data Analytics started 1.5 years ago • Quantitative folks and product managers (20 folks in total) • > 30 machine learning models in production • „Made in Berlin“ (Spittelmarkt) Dataiku Meetup https://www.berliner-sparkasse.de/de/home/200jahre.html?n=true
  • 5. From Zero to Production 22.01.2019 Savings Banks Finance Group (SFG) • 383 independent Sparkasse commercial/retail banks • Decentralized structure (regional principle) • Central IT service partner (Finanz Informatik) • OneSystemPlus = core banking system for all institutions • S Rating und Risikosysteme GmbH central Data Analytics partner Dataiku Meetup
  • 6. From Zero to Production 22.01.2019 The SFG (decentralized) data treasure chest • 50 Mio. customers • 118 Mio. banking accounts • 2.1 Bn. online banking visits (per year) • 114 Bn. payment transactions (per year) Dataiku Meetup
  • 7. From Zero to Production 22.01.2019 Example use cases of ML in Banking and Financial Services Customer Experience Operational Efficiency Sales and Marketing Risk and Fraud • Chat-bots and robo- advisors • Natural Language Processing (NLP) to decipher call logs and customer feedback • Optimizing operational expenses such as call center staff and tellers • Optimizing sales and marketing expenses • Optimizing operational efficiency Dataiku Meetup
  • 8. From Zero to Production 22.01.2019 Getting more with your score Preparation • What is your target group? Expert advice • Target group based on expert knowledge Data Analytics • Target group based on predictive analytics Age 18-35 Age 35-75 Income 0-1000 € Income 1000-10000 € Product Score Dataiku Meetup
  • 9. From Zero to Production 22.01.2019 A model data pipeline Structured Data Ingest Transform Model Deploy Dataiku Meetup
  • 10. From Zero to Production 22.01.2019 Data Analytics closed loop Train model pipeline Serve request (Batch)Deploy models Monitor service Get feedback Update pipelines Prototyp & develop model pipelines Dataiku Meetup
  • 11. From Zero to Production 22.01.2019 Challenges • I have time constraints – run fast enough • We need to play well with others: • other systems • other teams • Need to be robust and just work • Need to integrate into business processes • Does it increase profits? • Live ML doesn‘t always work the way I expect… Dataiku Meetup
  • 12. From Zero to Production 22.01.2019 Working well with other teams and systems? (1/3) wallofconfusion AUC looks alright, hyperparameter tuned. Time to deploy! SR Data Scientist What the **** is alpha and beta? FI Mainframe + Java application developer Dev Ops + Dev Dataiku Meetup wallofconfusion Business Sparkasse teller I want to be there for my clients! Target variable what? Person icons made by monkik from www.flaticon.com
  • 13. From Zero to Production 22.01.2019 Understand the business processes! (2/3) SR Data Scientist Sparkasse teller Business+Dev Dataiku Meetup • Business processes generate data, understand every single step • Work together on the „Ground truth“ (reality you want to predict) • Does it generalize? • Verify every sub-results with practioners • Lack of domain knowledge is a barrier you can overcome
  • 14. From Zero to Production 22.01.2019 Understand the IT architecture! (3/3) SR Data Scientist FI Mainframe + Java application developer DevOps Deploy model parameter Scoring engine in SAS Ready for production, yeah! Dataiku Meetup
  • 15. From Zero to Production 22.01.2019 101- Decision tree classifier (1/2) XGBoost: A Scalable Tree Boosting System Tianqi Chen, Conference Paper, 2016 Dataiku Meetup • Flowchart structure starting at root node • Simple IF-ELSE questions in child nodes • CART (classification and regression tree) algorithm uses binary trees
  • 16. From Zero to Production 22.01.2019 101- Ensemble prediction (2/2) Tree 1 Tree 2 Tree … Score 1 Score 2 Score … Sum Score Dataiku Meetup
  • 17. From Zero to Production 22.01.2019 Exporting model parameter Dataiku Meetup Tree 1 Tree 2 Tree … TREE_NR INPUT_VAR TREE_SPLT_VAR_NR TREE_SPLT_VALUE 1 Income 1 11.000 1 Age 2 45 1 Occupied 3 1 … … … … TREE_NR TREE_NODE_NR TREE_NODE_SCORE 1 1 0.00331848000000 1 2 -0.00174424000000 1 3 0.04362040000000 1 4 0.00302040000000 … … … Where do I need to split the input variable? Which score do I need to assign to each node?
  • 18. From Zero to Production 22.01.2019 SAS score engine Dataiku Meetup Export model parameter as CSV file Import model parameter • Model parameter • Input data Give the model parameter and the input data for every customer and I tell you the score! Save the results, please!
  • 19. From Zero to Production 22.01.2019 Monitoring requests in production Dataiku Meetup • AUC (area under the curve) in case some businees processes change (=drop in AUC) • Correlation between scores and input variables • Descriptive statistics (mean, max, min, count) of input variables • „Acid“ test: ratio of scores regarding target variable • Performance (scores/min)
  • 20. From Zero to Production 22.01.2019 Wrapping things up Measure, measure and measure • Monitor every single step of your pipeline • Data quality is the holy grail Data Scientists = translators • Learn the „language“ (not only programming) of other teams • Build bridges • What business problem do you want to solve? Start your production pipeline simple • Understand the IT system architecture • Talk with your IT folks and business people Dataiku Meetup Only production code is good code • A Data Scientist should know programming principles • Performance counts in real world applications • Code quality beats model prediction quality to some extend
  • 21. Data First Folks! Thanks for having me 22.01.2019 Dataiku Meetup Marco Bahrs, Data Scientist Get in touch with me via Disclaimer: This presentations is intended for educational purposes only and does not replace independent professional judgment. Statements of fact and opinions expressed are those of the participants individually and, unless expressly stated to the contrary, are not the opinion or position of the Sparkasse Rating and Risikosysteme GmbH or the Finanz Informatik. The Sparkasse Rating and Risikosysteme GmbH does not endorse or approve, and assumes no responsibility for, the content, accuracy or completeness of the information presented.
  • 22. From Zero to Production 22.01.2019 Bonus material- Ensemble Dataiku Meetup XGBoost: A Scalable Tree Boosting System Tianqi Chen, Conference Paper, 2016 • Combining many weak learners (many trees = forest)
  • 23. From Zero to Production 22.01.2019 Bonus material- Gradient Boosting training Age Balance Employed … Personal Loan 37 2560€ 1 1 29 1726€ 1 0 22 460€ 0 0 … … … … Tree 1 Tree 2 Probability 0,87 0,19 0,05 … Error -0,13 0,19 0,05 … Age Balance Employed … Error 37 2560€ 1 -0,13 29 1726€ 1 0,19 22 460€ 0 0,05 … … … … Prediction 0,17 0,24 0,08 … Error 0,30 0,05 0,03 … … Dataiku Meetup