Challenges in building a churn prediction model in different industries, presented by Jelena Pekez from Comtrade System Integration. Talk is focused on real-life use-case experience.
What To Do For World Nature Conservation Day by Slidesgo.pptx
Solving churn challenge in Big Data environment - Jelena Pekez
1. Solving Churn challenge in Big data
environment
Jelena Pekez
Principal Business Consultant, Lead Data Scientist
Comtrade System Integrations
2. CHURN IS A CHALLENGE IN EVERY INDUSTRY, THE
DIFFERENCE IS HOW IT IS MANAGED
Examine how this model will be used
Business
focus
High margin
customers
Strategy
relevant
segments
At Risk
customers
How fresh data do we need?
Monthly/Daily/Real time?
3. Outcome
Identified Key Patterns of Behavior that
Lead to Churn
Enabled pro-active outreach to save
profitable at-risk customers
The Goal
Enhance churn prediction through multi-
channel customer behavior analytics and
find an incremental number of high risk
churners compared to the traditional
statistical models
BUSINESS GOAL AND OUTCOMES
The goal of retention strategy is to keep churn under control.
4. • Domain specific features
• Special models outputs as
new features
• Balancing techniques
Evaluation
Data
Deployment
Modeling
Data
preparation
Data
understanding
Business
understanding
CRISP METHODOLOGY
5. BUSINESS UNDERSTANDING
What is goal?
Relevant segment
Challenge the business
definition
Formal definition is often
not relevant for targeting
purposes
e.g. Churn 90 days 10
days
Are there already
tested campaigns
Results, take rate
e.g. do we have integrated
campaigns results,
experiences
Existing reports
review
Reduces data analysis and
understanding
Helps to set expectations
and feasibility of prediction
Trend and seasonality
understanding
Which population is
relevant
Exclude inactive customers
Find irrelevant groups and
black list
Examine existing
segments / behavior
groups
Define metrics
for success
Evaluation metrics and
expectations
What will be product
offering
Target list size for
campaign
Frequency
6. 6
Set objectives
Produce Project Plan
Business success criteria/DS success
criteria
Assess the current situation
Risks assumptions, constraints and
contingencies
Terminology
Cost and benefits
BUSINESS
UNDERSTANDING
POTENTIAL
ANALYSIS / MODELS
Churn prediction model creation
Sequence of impacting events
Content Categorization
Competitor calls recognition
CEI – Experience Index
Social Network Analytics (SNA)
Behavior clusters
Offer optimization
7. 7
DATA PREPARATION
PHASE
1. Data understanding
2. Data integration:
Data Integration from different data sources
Data quality report
3. Data preparation:
Deriving new attributes and trend variables
Balancing data set
Handling nulls and outliers
Normalization and standardization of data
Data reduction techniques
4. Feature selection
5. Create Event Tables
Generating and investigating events
Creating Event History Table
Fine-tuning of event definitions based on their correlation
with churn
6. Create Event Sequences
Generating event sequences from event table
Generating subpaths from event sequences
Analyzing temporal churn effects of event paths
8. Features from different data sources
DWH
Lifecycle stage (near contract
termination indicator)
Drop calls and Silent calls
Products and Discounts
Spending and profitability
Device info
Contract history
NPS score
Close friend churned (based on
freq. calls)
Network KPI-s
Calls to Competition
CRM
Shop visits
Handset service
Campaigns available to the customer
(Upsell, NBA, X-sell)
Previous termination requests
Call Center Activity
IVR
Call logs (frequency, recency,
duration, branch)
Text mining, text segmentation
Complaints (network, device,
contract…)
Web/ App usage
Web/App Categories browsing
Voice, data, SMS usage and limits Bill shock Web/App keyword search
9. CHURN IMPACT THROUGH „BIG DATA“ FEED
Non traditional data
(raw) CDR:
Competitor CC, Poaching
calls, Usage change…
Market Research:
Satisfaction surveys,
competitor new offer
(conjoint)…
Call Center:
Compliance, compliance
path, operator data, …
POS:
Visits, inquires…
Web:
Self Service portal,
browsing behavior, …
CRM:
Campaign/Response
history, Opt In/Out,
Customer data change, …
Provisioning:
Not successful activations,
…
Network:
All bad network events
(mulfunction, droped calls,
silent calls…)
Network External
Process Interaction
Event triggers
10. SPECIAL FEATURES EXTRACTION
Customer email
Call record
Call summary note
SN comments
Define key words Recognize intent
CHURN
NON
CHURN
Web crawled numbers
to all POS and agents
of direct competition
Find trend of calls
Find sequence of calls and SMS to
these numbers for
relevant groups
NON
CHURN
CHURN
Content Categorization
Like: Tariff, product,
service, competitor,..
Competitors calls recognition
11. AGREGGATED CUSTOMER EXPERIENCE INDEX
N PC U= + + +
Cantakeanyvalue between
0 and1,where1is Excellent
and0is Awful
NetworkExp. Measuredby
numberof
drops&failures
CallcenterExp. Ismeasured
byvoice sentiments
(positive,neutral,negative)
ProductExp. Ismeasuredby
numberof attemptsto
searchcompetitors
productsorsites
UsageExp. Ismeasuredby
appsusage
Calculated
DAILY
At
SUBSCRIBER
level
Benchmarked
againstAVERAGE CEM score
With ALARMS
if scoresuddenlydrops
IndividualCustomer Experience Index varies from 0 to 1 and is determined
bythe following parameters:
12. Using CDR data and modern tools for data integration, we can create graph of customers interactions and calculate different relationship metrics.
Combine social groups with Geo-location calculations
Features for model
• Size of network
(number of nodes)
• Number of links+
• unique links
• Leadership score
• Role in community
• Community shape
• Centrality
• Density
SOCIAL NETWORK ANALYTICS FEATURES
• Who contacts whom?
• How often?
• How long?
• Both directions?
Identify the social network
• Who influences whom?
• Who work together?
• Close people
Identify important people, calling
circles
SNA: graph analysis where nodes are metrics
Using CDR data and modern tools for data integration, we can create graph of customers interactions and calculate
different relationship metrics.
Combine social groups with Geo-location calculations
13. CUSTOMER PROFILE – E.G. GAMER
Network:
- Capabilities
- Access
- Bandwidth
- …
Social:
- Social Media
- Gaming forum
- Social Network
- Multi-Gaming
identity
- …
Consumption:
- Data volume
- Messaging
- VoIP
- …
Devices:
- Multi vs Single
- Online / Offline
- …
Sources of Experience:
- Profile
(demographics)
- Behavior (Usage,
CDRs)
- Interaction (CRM)
- Price plan, add-on
services
- History
- …
Traditional
sources:
Areas of importance: (AoI)
- MMO1 vs. Single player
- Online vs. Offline - / multi-screen
- VoIP
- Game communication
- Data volume, Latency
- Access method
- Gaming forum, youtube channels
- …
Experience:
1 MMO – Massively-Multiplayer Online Game
14. DATA INTEGRATION
From Analytical Data Mart to training table
DWH
Training table
Evaluation table
Scoring table
Features
Engineering
15. The metric trap – if any of values is Zero- model is
Biased
The Goal
Is to get
curve like
this
Non-churn
Churn
0
100000
200000
300000
400000
500000
600000
0 1
Share of churn in relevant population is
less than 10% in majority of cases
even less than 1% in some cases
TYPICAL CHALLENGE IS HIGHLY IMBALANCED DATASET
Confusion Matrix
99%
ACCURACY
Predicted Class
No Yes
Observed
Class
No 114700 0
Yes 4334 0
0
20
40
60
80
100
0 10 20 30 40 50 60 70 80 90 100
%ofevents
% of data sets
Gain Chart
16. RESAMPLING TECHNIQUES
UNDERSAMPLING
Removing samples from the
majority class
https://www.kaggle.com/rafjaa/resampling-strategies-for-imbalanced-datasets
OVERSAMPLING
Adding more examples from
the minority class
Weaknesses:
1. Loss of information
2. Overfitting
Useful only with big
enough data sets.
When 1% is actually
more than 10
thousands units.
Tools:
• SQL / Python
• imbalanced-learn
17. OVER-SAMPLING FOLLOWED BY UNDER-SAMPLING
SMOTE ADASYN consists of synthesizing elements for the
minority class, based on those that already exist. It is based on the
nearest neighbors:
• Randomly pick a point from the minority class
• Computing the k-nearest neighbors for this point
• The synthetic points are added between the chosen point and its
neighbors
• Adds a random small values to the points
• TOMEK LINKS are pairs of very close instances, but of opposite
classes. Removing the instances of the majority class of each pair
increases the space between the two classes, facilitating the
classification process.
18. EXAMPLE OF BALANCING TECHNIQUES COMBINATION
0 1
12 months
historical data
10:1 ratio
Boost minority
class with
SMOTE
Eliminate similar
points with
TomekLinks
0 1
5:1 ratio
More
balanced
training set
SMOTETomek
19. XGBoost offers fast computing speed
combined with explainable results with
regards to ranking feature importance's.
Compatible with the SHAP framework offering
even more in-depth explanations of model
predictions
MODEL DEVELOPMENT USING XGBoost ALGORITHM
IS THE BEST PRACTICE FOR IMBALANCED DATASET
XGBoost
Regularization for
avoiding
Overfitting
(both Lasso and Rige)
Efficient handling
of missing data
(?)
Cash awareness
and out-of-core
computing
Parallelized
processing
In-built
cross-validation
capability
Tree pruning
using depth-first
approach
Sequentially learning algorithm that is based on function approximation by
optimizing specific loss functions as well as applying several regularization
techniques.
LatBill shock= 1,15
20. MODEL INTERPRETATION IS VITAL FOR
FINE TUNING OF OFFERING
1. Overall interpretation
Understanding the most important features with feature
importance plot.
2. Local interpretation:
1. understand for an individual case the reasons of the
prediction.
2. understand on a filtered population the most frequent
reasons of their prediction
SHAP summary plot
3 variables with most contribution
1st variable 2nd variable 3rd varible
ID
Probability
to churn
Class
predicted
Name Impact Name Impact Name Impact
12098321 95% 1 Reb_1 +34 Bill_3 +19% Lat_2 +8%
12098322 88% 1 Bill_1 +25 NPS_2 +14% Sill_c3 +13%
12098323 35% 0 Inf_7 -27 Lat_2 -23% Reb_1 -12%
21. 21
ANALYTICAL
OBJECTIVE
MODEL PERFORMANCE
EVALUATION
Lift on top 1%, 10%, and 20% most likely
churners
Campaign performance evaluation (A/B
testing):
• Churn rate in different model
percentiles
• Churn rate DNC vs. TGT
• Offer response rate DNC vs. TGT
• Churn rate old vs. BD approach
• Offer response rate old vs. BD
approach
• Monthly level measurement
Assign a churn score to all customers in the eligible
segment
Automatically target top X% of customers with high
probability with special offer
The score should be recalculated on a daily level
New events should trigger near real-time scoring
Optimize offer type and price for individual customer
23. BENEFITS OF BIG DATA PLATFORM
1 2 3 4 5Include better
granularity of specific
features.
Quickly calculate daily
attributes and longer
history from more data
sources
Faster combine results
of different analytical
models to optimize
process and value
Recompute score in
real-time based on the
latest customer activity
/ event
Efficient monitoring of
model performance
and execution
from imblearn.combine import SMOTETomek
Synthetic Minority Oversampling Technique
XGBoost4j on Scala-Spark
Early stopping may still contain bugs
Real time triggers – Examples:
1. Bill shock + call to Call Centar-
-customer has Bill Shock but call to CC triggers Real time scoring and agent can see new score for that customer during a call
2. Reclamation + low NPS score
Customer submitted reclamation and gives low NPS score, which triggers real time restoring for that customer