With an eye on gaining a competitive edge in the marketplace, banks intend to drive customer engagement with data analytics. By analyzing their clients' economic activities, it is possible to detect patterns and behavior to offer personalized, tailor-made financial products to enhance customer satisfaction. However, the challenge lies in the banks' legacy systems, which can impede the ability to unlock the value of the data they already hold. In his talk, Marco presented a real-world example of predictive modeling in banking. He highlighted the used modeling practice and provided practical advice on the deployment process in a production banking environment. Additionally, Marco explored some best practice techniques on how to tackle data science projects.
2. From Zero to Production 22.01.2019
Why are you here?
• You can‘t believe Sparkasse banks are in Data
Analytics (topic reserved for sexy fintechs and
software companies)
• You are curious about the words „machine
learning“ (ML) and „production“
• You are hoping to find the holy grail for your
ML and production problems
Dataiku Meetup
https://commons.wikimedia.org/wiki/File:Holy-grail-round-table-bnf-ms_fr-116F-f610v-15th-detail.jpg
Evrard d'Espinques [Public domain], via Wikimedia Commons
3. From Zero to Production 22.01.2019
What does „production“ mean anyway?
Dataiku Meetup
https://stackoverflow.com/questions/490289/what-exactly-defines-production
4. From Zero to Production 22.01.2019
S Rating und Risikosysteme GmbH (SR): We Data Analytics
• Founded 2004 with a focus on providing market,
regulatory, operational and credit risk
frameworks
• > 250 employees
• Team Data Analytics started 1.5 years ago
• Quantitative folks and product managers
(20 folks in total)
• > 30 machine learning models in
production
• „Made in Berlin“ (Spittelmarkt)
Dataiku Meetup
https://www.berliner-sparkasse.de/de/home/200jahre.html?n=true
5. From Zero to Production 22.01.2019
Savings Banks Finance Group (SFG)
• 383 independent Sparkasse commercial/retail banks
• Decentralized structure (regional principle)
• Central IT service partner (Finanz Informatik)
• OneSystemPlus = core banking system for all institutions
• S Rating und Risikosysteme GmbH central Data Analytics partner
Dataiku Meetup
6. From Zero to Production 22.01.2019
The SFG (decentralized) data treasure chest
• 50 Mio. customers
• 118 Mio. banking accounts
• 2.1 Bn. online banking visits (per year)
• 114 Bn. payment transactions (per year)
Dataiku Meetup
7. From Zero to Production 22.01.2019
Example use cases of ML in Banking and Financial Services
Customer Experience Operational Efficiency Sales and Marketing Risk and Fraud
• Chat-bots and robo-
advisors
• Natural Language
Processing (NLP) to
decipher call logs and
customer feedback
• Optimizing operational
expenses such as call
center staff and tellers
• Optimizing sales and
marketing expenses
• Optimizing operational
efficiency
Dataiku Meetup
8. From Zero to Production 22.01.2019
Getting more with your score
Preparation
• What is your target group?
Expert advice
• Target group based on expert knowledge
Data Analytics
• Target group based on predictive analytics
Age 18-35
Age 35-75
Income
0-1000 €
Income
1000-10000 €
Product Score
Dataiku Meetup
9. From Zero to Production 22.01.2019
A model data pipeline
Structured Data
Ingest Transform Model Deploy
Dataiku Meetup
10. From Zero to Production 22.01.2019
Data Analytics closed loop
Train
model
pipeline
Serve
request
(Batch)Deploy models
Monitor
service
Get feedback
Update pipelines
Prototyp &
develop
model
pipelines
Dataiku Meetup
11. From Zero to Production 22.01.2019
Challenges
• I have time constraints – run fast enough
• We need to play well with others:
• other systems
• other teams
• Need to be robust and just work
• Need to integrate into business processes
• Does it increase profits?
• Live ML doesn‘t always work the way I expect…
Dataiku Meetup
12. From Zero to Production 22.01.2019
Working well with other teams and systems? (1/3)
wallofconfusion
AUC looks alright, hyperparameter
tuned. Time to deploy!
SR Data Scientist
What the **** is alpha and beta?
FI Mainframe + Java
application developer
Dev Ops + Dev
Dataiku Meetup
wallofconfusion
Business
Sparkasse teller
I want to be there for my clients! Target
variable what?
Person icons made by monkik from www.flaticon.com
13. From Zero to Production 22.01.2019
Understand the business processes! (2/3)
SR Data Scientist
Sparkasse teller
Business+Dev
Dataiku Meetup
• Business processes generate data, understand every single step
• Work together on the „Ground truth“ (reality you want to predict)
• Does it generalize?
• Verify every sub-results with practioners
• Lack of domain knowledge is a barrier you can overcome
14. From Zero to Production 22.01.2019
Understand the IT architecture! (3/3)
SR Data Scientist
FI Mainframe + Java
application developer
DevOps
Deploy model parameter
Scoring engine in SAS
Ready for production, yeah!
Dataiku Meetup
15. From Zero to Production 22.01.2019
101- Decision tree classifier (1/2)
XGBoost: A Scalable Tree Boosting System
Tianqi Chen, Conference Paper, 2016
Dataiku Meetup
• Flowchart structure starting at root node
• Simple IF-ELSE questions in child nodes
• CART (classification and regression tree) algorithm uses binary trees
16. From Zero to Production 22.01.2019
101- Ensemble prediction (2/2)
Tree 1
Tree 2
Tree …
Score 1
Score 2
Score …
Sum
Score
Dataiku Meetup
17. From Zero to Production 22.01.2019
Exporting model parameter
Dataiku Meetup
Tree 1
Tree 2
Tree …
TREE_NR INPUT_VAR TREE_SPLT_VAR_NR TREE_SPLT_VALUE
1 Income 1 11.000
1 Age 2 45
1 Occupied 3 1
… … … …
TREE_NR TREE_NODE_NR TREE_NODE_SCORE
1 1 0.00331848000000
1 2 -0.00174424000000
1 3 0.04362040000000
1 4 0.00302040000000
… … …
Where do I need to
split the input
variable?
Which score do I
need to assign to
each node?
18. From Zero to Production 22.01.2019
SAS score engine
Dataiku Meetup
Export model parameter as CSV
file
Import model parameter
• Model parameter
• Input data
Give the model
parameter and the input
data for every customer
and I tell you the score!
Save the results,
please!
19. From Zero to Production 22.01.2019
Monitoring requests in production
Dataiku Meetup
• AUC (area under the curve) in case some businees processes
change (=drop in AUC)
• Correlation between scores and input variables
• Descriptive statistics (mean, max, min, count) of input
variables
• „Acid“ test: ratio of scores regarding target variable
• Performance (scores/min)
20. From Zero to Production 22.01.2019
Wrapping things up
Measure, measure and measure
• Monitor every single step of your
pipeline
• Data quality is the holy grail
Data Scientists = translators
• Learn the „language“
(not only programming) of other teams
• Build bridges
• What business problem do you want to solve?
Start your production
pipeline simple
• Understand the IT system architecture
• Talk with your IT folks and business people
Dataiku Meetup
Only production code is good
code
• A Data Scientist should know programming
principles
• Performance counts in real world applications
• Code quality beats model prediction quality to
some extend
21. Data First Folks!
Thanks for having me
22.01.2019 Dataiku Meetup
Marco Bahrs, Data Scientist
Get in touch with me via
Disclaimer: This presentations is intended for educational purposes only and does not replace independent professional judgment. Statements of fact and opinions expressed are those of the participants
individually and, unless expressly stated to the contrary, are not the opinion or position of the Sparkasse Rating and Risikosysteme GmbH or the Finanz Informatik. The Sparkasse Rating and Risikosysteme
GmbH does not endorse or approve, and assumes no responsibility for, the content, accuracy or completeness of the information presented.
22. From Zero to Production 22.01.2019
Bonus material- Ensemble
Dataiku Meetup
XGBoost: A Scalable Tree Boosting System
Tianqi Chen, Conference Paper, 2016
• Combining many weak learners (many trees = forest)
23. From Zero to Production 22.01.2019
Bonus material- Gradient Boosting training
Age Balance Employed … Personal Loan
37 2560€ 1 1
29 1726€ 1 0
22 460€ 0 0
… … … …
Tree 1
Tree 2
Probability
0,87
0,19
0,05
…
Error
-0,13
0,19
0,05
…
Age Balance Employed … Error
37 2560€ 1 -0,13
29 1726€ 1 0,19
22 460€ 0 0,05
… … … …
Prediction
0,17
0,24
0,08
…
Error
0,30
0,05
0,03
…
…
Dataiku Meetup