Three case studies are discussed, that include cluster analysis as a component.
1) Customer description for a credit card attrition model, to describe how to talk to customers.
2) Hotel price optimization. Use clusters to find subsets of similar behavior, and optimize prices within each cluster. Use a neural net as the objective function.
3) Retail supply chain, planning replenishment using 52 week demand curves using thousands of seasonal "profiles" or clusters.
2. Customer Description – CC Company –
“Who” vs. “How” to Talk to Customers
Hotel Price Optimization – Using
Clusters as Non-Linear Constraints
Retail Supply Chain – Planning
Replenishment for 52 Week Demand
Curves
3. Context:
◦ Major credit card company
◦ South American Market
◦ Repeat for Argentina, Brazil… and “dollar countries”
Objectives or Problem:
◦ How to best manage the customer population
◦ Develop a software system, to repeat over geography
and time
◦ How to AUTOMATE understanding?
How to automate naming the clusters?
4. Solution, 3 projects for each customer base
◦ “WHO” to talk to…
Customer Attrition Model – Neural Network (5 algs tested)
Decrease in spending over time
Basic vs. Supplemental Cards
By 7 categories
Challenge: Double digit inflation in some countries (90’s)
Standardize by monthly spending
Mining Factoid: Credit Card Digit 11 was predictive
Billing cycles? Monthly salaries + high inflation
Customer Profitability – Net Present Value
◦ “HOW” to talk to them…
Cluster Analysis
5. Consider Scalability
◦ 100k – 500k customers
◦ Some cluster methods are O(n) or O(n2)
◦ Use Kmeans to create 100 clusters O(n)
◦ Then use O(n2) methods to reduce from 100 clusters
down to 8-12 clusters
◦
6. Select the 5-15% customers
“highest” in the spike
1
4 7 10 13 16 19
Tree-Net
Random
Cumulative Profit
5% Customer
Groups
Total Profit / cell
Attrition
Profitability
83% of Attrition Profit was Lost in top 15%
7. How to design the cluster analysis?
◦ Select top fields from neural network
Sensitivity Analysis on the NN
% spending by category
Restaurant, Retail, Grocery, Hotel, Air, Auto, …
Trend over time (slope, expected future value)
Decide to create 8 – 12 clusters or customer segments
to communicate to marketers
◦
8. Consider Scalability
◦ 100k – 500k customers
◦ Some cluster methods are O(n) or O(n2)
◦ Use Kmeans to create 100 clusters O(n)
◦ Then use O(n2) methods to reduce from 100 clusters
down to 8-12 clusters
◦
9. Consider Scalability
◦ 100k – 500k customers
◦ Some cluster methods are O(n) or O(n2)
◦ Use Kmeans to create 100 clusters O(n)
◦ Then use O(n2) methods to reduce from 100 clusters
down to 8-12 clusters
◦ This uses all the data scalebly, and more
sophisticated hierarchical cluster search
◦
11. Clusters
Most customers Least
Ordered
by
Importance
ALL 1 2 N
100% 36% Fields 22% 5% min MAX
12. Most:
Var X, Y, Z
Least
Var A, B, C
May have 12 clusters, 36 variables
Then each cluster may have 6 attributes
to use in naming
min MAX
13. Select “WHO” with (Attrition)x(Profitability)
Select “HOW” with Cluster Segments
◦ Given the variable selection, only a few clusters
matched most of the 15% subset of the customers to
manage
Marketers could understand well the different
audiences and reasons for attrition – and
could better write copy for communication
About 50 Executives walked around with the
one page cluster summary in their pocket,
frequently used to plan customer strategies
14. Analysis
Type
CRM
Behavior
Media
Message
$$$
Best
Customers
Upgrade, Downgrade
Loyal
Loyalty
Cross-Sell
Prospect
Segment
Reactivation
Attrition
Retention
Fraud
15. Customer Description – CC Company –
“Who” vs. “How” to Talk to Customers
Hotel Price Optimization – Using
Clusters as Non-Linear Constraints
Retail Supply Chain – Planning
Replenishment for 52 Week Demand
Curves
16. Objective:
◦ Optimize pricing for hotel rooms
◦ Take into account geography & use
weekend, vacation, business, conference, …
Seasons of the year as it relates to demand
The hotel owns many brands (chains) focused
on different audiences
◦ Different price tiers, target audiences,…
◦ Hotel, motel, extended stay, …
◦ What “lessons learned” cross brands?
17. Revenue Management is a general process used
to
◦ optimize profit
◦ given the remaining (plane seat or hotel room)
inventory
◦ the remaining time until the inventory is gone
Operations Research
◦ Linear or Non-Linear Programming
Lin or Non-Lin in either constraints or objective function
◦ Need an objective function to optimize
Train predictive models to forecast price, given
conditions
18. Data Mining and Operations Research Design
◦ When training predictive models, it helps to learn
behavior “in the same ball park” with the same
model.
◦ If the underlying thought process is fairly different,
subdivide the data into different subsets and train
different models. For example:
Attrition: checking, credit card, line of credit, mortgage
In Mortgage Bond Pricing: monthly prepayment of
none vs. 100’s vs. 1,000’s vs. a full refinance
19. How do we group or divide individual hotels,
given all the attributes?
◦ Brand, location, % utilization weekday or weekend,
Find bottom-up clusters, rather than top-down
assertions on the data
For cluster variables – use best variables in
pricing predictive models (sound familiar?)
20. Solution:
◦ 1) Build an initial predictive model predicting
pricing. Find the most important variables.
◦ 2) Create 8-16 clusters, using those variables
◦ 3) Within each cluster
A) Train a predictive model for use as the OR objective
function
B) Run a LINEAR OR price optimization, on the data
subset
21. Customer Description – CC Company –
“Who” vs. “How” to Talk to Customers
Hotel Price Optimization – Using
Clusters as Non-Linear Constraints
Retail Supply Chain – Planning
Replenishment for 52 Week Demand
Curves
22. The “Retail Supply Chain” is from
◦ the manufacturer to
◦ distribution center to
◦ Warehouse to
◦ Store to Consumer
Replenishment is to re-supply products on the
shelves
◦ Minimize overstock and understock
◦ Heavy understock causes LOSS OF SALES
◦ Heavy overstock causes 30% end of season liquidation
23. 4,000 stores
100,000 products/SKU’s (stock keeping units)
◦ 400 million store-product combinations
52 weeks per year
◦ 20.8 billion store-product-week combinations
Not the smallest problem in the mid-90’s
Holidays shift in week number, from year to
year – need to adjust
24.
25. End up creating 2,000+ “profiles” or
centroids
Assign new store-SKU’s to an existing profile
If it doesn’t match (within a radius)…
◦ Re-run cluster analysis
◦ Lock existing centroids
◦ Create new centroids for data points outside
◦ Add to the “profile library”
26. Bottom-up findings (after the fact)
◦ Buying hunting related items as the ducks migrate
north