This document summarizes a presentation by Milan Berka and Jakub Mašek on Moneta Money Bank's approach to analytics in the cloud using an agile approach. Some key points:
- Moneta is a major Czech bank undergoing a digital transformation and migration to the cloud. They created a "Data Squad" team of 3 people from Moneta and 2.5 people from DataSentics to build out their analytical platform.
- The team set up an analytical environment in AWS using services like S3, Glue, IAM and Databricks to build a cloud-based platform fully utilizing cloud resources.
- They developed several use cases like utilizing online marketing and customer data, optimizing
2. Who is presenting today …
2
Milan BerkaJakub Mašek
- Machine learning engineer at
DataSentics, working for Moneta’s
DataSquad
- Spark-certified developer
- Roles:
- Building the analytical platform
- Productionalizing the usecases
- Evangelize Spark across the
company
- Leader of DataSquad at MONETA
- Experienced data science manager
- Roles:
- Partnering with the different
departments across the bank
- Helping finding them the ML
opportunities
- Managing the process
milan.berka@datasentics.com
www.linkedin.com/in/milan-berka/
jakub.masek@moneta.cz
www.linkedin.com/in/jakub-mašek-
19631155
3. Agenda
Background:
• Who is MONETA Money Bank a what is the role of Datasentics
• Moneta’s journey into the cloud
• Creation of Data Squad
Building the analytical platform:
• Setting up an analytical environment in the cloud fully utilizing AWS and Databricks
• Hurdles along the way
Use-cases:
• Utilizing online data in digital marketing and customer value management
• Optimization of branches/ATM
Next steps, Q&A
4. § Major Czech banking institution
§ 4th in size, 1st in innovation
§ 1 mio clients; 181 branches; 650 ATMs
§ 3.000 employees
§ Undergoing digital transformation
§ Collecting innovation awards
§ Smart Banka (mobile app)
§ Digital products
§ Migration to the cloud
Moneta Money Bank - Czech bank for Czech people
6. Make data science and machine learning have a real
impact on organizations across the world - demystify
the hype and black magic surrounding AI/ML and bring
to life transparent production-level data science
solutions and products delivering tangible impact and
innovation.
DataSentics - European Data Science Center of Excellence
based in Prague
- Machine learning and cloud data engineering
boutique
- Helping customers build end-to-end data solutions in
cloud
- Incubator of ML-based products
- 50 specialist (data science, data/software engineering)
- Partner of Databricks & Microsoft
7. Moneta and it’s journey to the cloud
2018
2019
2020
2021
10% cloud-based
30+% cloud-based
50+% cloud-based
Optimal cloud
hosting
Growing Platform
as a Service
• Primary Datacenter
migration
• Cloud design & initiation
• First set of application
migrated to Amazon
Cloud
• PaaS, SaaS and
Containers
• Automation embedded
into the key processes
• Second Datacenter
migration
• AS400 refresh/hosting
• Software and
Infrastructure
harmonization
• Platform as a Service,
implemented for the
selected capabilities
• Use the most optimal
hosting strategy for each
application
• Further infrastructure and
application optimization
• Hosted fixed telephony
• Software as a Service
implemented for the
selected capabilities
8. Birth of Datasquad as a new analytical DNA supporting the
cloud journey and making „digital“ into real
New analytical worldOld analytical world
-Tools:
-On-premise Oracle data warehouse
with limited computational power
-On-premise SAS for modelling
-Data: Mainly offline (transactions, …)
-Tools:
- Cloud-based, elastic and scalable –
unlimited resources
- Data in Datalake
- Spark, Python, R
-Data:
- offline (internal data)
- online (web-browsing data, digital
marketing data, …)
9. Datasquad is pioneering the new analytical world
DATALAKE
PLATFORM
DATA TEAM EVANGELIZATION
& SERVICE
DATA SCIENCE
SOLUTIONS
- POC; MVP
- Products
- Frameworks
- onboarding
- Evangelize Spark
and new
technologies
10. Main goal: utilize cloud services as much as possible
Technology:
§ Storage: AWS S3 with auto-encryption
§ ETL: AWS Glue
§ Access Management: AWS IAM + ADFS
§ Analytical service: Databricks
§ Security measures: AWS S3 auto encryption, AWS
EBS auto-encryption, Databricks SSO, Databricks
without access to internet, hashing of all sensitive data
Building the analytical platform
12. Datalake structure
Data:
§ Adform data (terabytes)
§ Web data (terabytes)
§ Geo-data (gigabytes)
§ Branches/ATM data (gigabytes)
§ Onboarding/fraud data (gigabytes)
§ Transactions (terabytes)
13. Use-cases
“Online” data
Web analytics data
(AdobeAnalytics/GA)
Campaign data (Adform)
Real estate market data
“Offline” data
Branch/ATM performance
Sales data
Onboarding data
CVM data
Feature Store CVM STORY
DIGITAL STORY
RISK STORY
BRANCH / ATM STORY
FRAUD / AML STORY
14. Use-cases
“Online” data
Web analytics data
(AdobeAnalytics/GA)
Campaign data (Adform)
Real estate market data
Feature Store CVM STORY
DIGITAL STORY
RISK STORY
BRANCH / ATM STORY
FRAUD / AML STORY
“Offline” data
Branch/ATM performance
Sales data
Onboarding data
CVM data
15. If we look at a typical customer journey for a
consumer loan, we see a relevant touchpoint
gap, an opportunity for us to address …
15
… and we already have a plan in
motion to address this opportunity
Digital Story
Digital marketing
cost analysis
1
Moneta Ad Quality2
Ad Targeting users
in „think“ phase
3
„Think“ phase predictors
in CVM campaigns
4
16. If we look at a typical customer journey for a
consumer loan, we see a relevant touchpoint
gap, an opportunity for us to address …
16
… and we already have a plan in
motion to address this opportunity
Digital Story
Digital marketing
cost analysis
1
Moneta Ad Quality2
Ad Targeting users
in „think“ phase
3
„Think“ phase predictors
in CVM campaigns
4
17. USE CASE: Digital marketing cost analysis
17
→ WE HAVE PROVEN, THAT DISPLAY ADS DRIVE SALES INDIRECTLY
1
THERE IS OBVIOUS POTENTIAL IN THE „THINK“ PHASE
DATA WE USED
• Advertising data (what user, on which specific
website/page/context, for how long has seen or interacted with
our Ads, for how much)
• Moneta Website behavior
• Marketing costs
WHAT WE DID
• We implemented an attribution model to prove how online ad
impressions (not clicks!) drive sales. An attribution model
shows how each market channel drives conversions. Here we
wanted to see what contribution each channel makes to
closing consumer loans.
NEXT STEPS
• Incrementally start to reallocate more budget to Online Ads
(upper funnel – think phase) and evaluate impact on efficiency
BUSINESS CASE
• Increase digital sales for the same media spending. By
better split between Online Ads and Search
Marketing channel Costs (units) Cost efficiency
Performance - Adform 1 11,3
Brand - Adform 17 6,6
Performance - remarketing 23 2,4
Performance - display 26 1,2
Performance – search 1
115 1
Performance - social 0,75 0,5
Brand – youtube 0,4 0,18
1 Performance – search chosen as a reference with cost effeciency ratio 1
18. USE CASE – Moneta Ad Quality
18
2
→ DIFFERENT COST PER VISIBLE MINUTE
ACROSS DIFFERENT WEBSITES
WE CAN INCREASE AD VISIBILITY TO USERS IN
THINK PHASE
DATA WE USED
• Advertising data (what user, on which specific
website/page/context, for how long has seen or
interacted with our Ads, for how much)
WHAT WE DID
• We see an ENORMOUS difference in visible time
of online Ads. Cost per 1 visible minute in Online
differs from 15 to 35 CZK in
NEXT STEPS
• Create engine to optimize Online Ads buying
(buy more visible ads)
BUSINESS CASE
• We should be able to buy at least 20% more
media time for the same budget
Analytical output - Cost per visible minute
→ ADJUSTING ADFORM BY DISADVANTAGING
DOMAINS WITH EXPENSIVE VISIBLE MINUTES
Adform implementation – multipliers
autoweb.cz 0.75
autozine.cz 0.8
autozive.cz 0.9
avizo.cz 0.85
babinet.cz 0.95
babyweb.cz 0.65
banger.cz 0.85
banky.cz 0.85
bazarbox.cz 0.7
behani.cz 0.85
bejvavalo.cz 0.85
bezrealitky.cz 0.65
biatlonmag.cz 0.8
biginzerce.cz 0.7
bike-mania.cz 0.85
...
...
API
Quality
model
19. 19
Locality (L) attractiveness is given by
surrounded points of interests
To measure attractiveness, weights of individual
points of interests need to be set
MONETA wants to compare localities in terms of
business KPI - possible bank performance
200
m
eters
L
• Total attractiveness of the measured point is given by the sum
of partial weights
• Two possible scenarios how to set the weights:
By expert (e.g. Bank 50; Bus station 15 …) having
dimensionless index
Data Science approach (machine learnig) - using
internal data to set KPI and having interpretable resuls
1 2 181
Branch Story
Moneta needs to independently evaluate every single
locality or branch network cross the country …
v Assumption v Target variablev Approach
20. 20
→ PRAGUE – EXPOSED AREAS BY PREDICTED PERFORMANCE INDEXWE CAN PREDICT PERFORMANCE IN ANY LOCALITY IN CZ
DATA WE USED
• Geospatial data - points of interests
• Population statistics
• Internal data – performance of our existing branches; costs; #
FTEs; ATM performance
WHAT WE DID
• We wanted to evaluate every single location in CZ in terms of
footfall. The closest equivalent to footfall is visitors' rate which
is measured only for 15% of our network. But visitors' rate is
strongly corelated with business KPI - performance rate -
which was finally used as a proxy variable for our model. We
are now able to predict possible banking performance of any
observed location.
MODEL VARIABLES
• # of transportation in 200m
• # of food in 200m
• # of competitors and highly exposed areas
• City population
Branch Story use case
21. Use-case deep dive: DSID = Enabler for the Digital attribution
model
Problem: we have many identifiers (internal id, phone, website cookie, Adform cookie) of
a person/client, which shows at different times at different places – how do we connect
all these into a single ID?
I1
I2
I3
W1
W2
W3
W4
W5
A1
A2
A3
22. Use-case deep dive: DSID = Enabler for the Digital attribution
model
Answer:
GraphFrames!
24. Use-case deep dive: DSID = Enabler for the Digital attribution
model
src dst
W1 I1
W2 I1
W3 NULL
W1 A1
W2 A2
W3 A3
I3 019645
vertices = df
.selectExpr(‘src AS id’)
.union(df.selectExpr(‘dst AS id’))
edges = df
g = GraphFrame(vertices, edges)
df_connected = g.connected_components()
25. Use-case deep dive: DSID = Enabler for the Digital attribution
model
id Component
W1 1
W2 1
W3 2
I3 3
I1 1
A1 1
A2 1
A3 2
019645 3
plus further adjustements:
• filter business clients
• disjoint the groups with two or more internal ids
• …
= DSID
Statistics:
- Number of vertices (ids): 14 969 170
- Number of edges: 30 029 363
- Running time: ~20 min
26. Next steps
26
- Major goal: Continue with democratizing of the platform, the ultimate goal is to have a self-serving data
platform
- Continue with the use-cases and moving them to production
- Implement company-wide feature store
- Employ new technologies (in particular - Spark Structured Streaming)
29. Wrap up
29
Even with the small team you can do big things …
Achieving this - you need to have supportive environment
and you need to be disruptive to drive changes and show the added value to prove that:
… „data is really the new oil for your company“
Safety always first
Data science is about data AND science – doing science is always linked with blind paths – be patient and
keep going!