Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Beyond Churn Prediction :
An Introduction to Uplift Modelling
Pierre Gutierrez
Plan
•  Introduction / Client situation
•  Uplift Use Cases
•  Global Uplift Strategy
•  Machine learning for Uplift
•  Up...
Dataiku
•  Founded in 2013
•  60 + employees
•  Paris, New-York, London, San Francisco
DESIGN
Load and prepare
your data
P...
Motivations
Client situation
•  Client : French Online Gaming Company (MMORPG)
•  Users are leaving (more than 10 years old )
•  let’s...
Client situation
•  Model Results :
•  AUC 0.88
•  Very stable model in time
•  Marketing actions :
•  7 different actions ...
Uplift Definition
•  But wait !
•  Strong hypothesis : target the person that are the most likely to churn
Uplift Definition
•  But wait !
•  Strong hypothesis : target the person that are the most likely to churn
•  What is the ...
Uplift Definition
•  But wait !
•  Strong hypothesis : target the person that are the most likely to churn
•  What is the ...
Uplift Definition
•  Gain to maximize:
•  Targeting churner:
Does not optimize the difference !
Is good if treatment good.
...
Uplift Definition
If not treated
Positive Response Negative Response
Unnecessary costs Negative impact
Positive Response
N...
Uplift Use Cases
Uplift Use Cases
•  Healthcare :
•  Typical medical trial:
•  Treatment group: gets the treatment
•  Control group: gets p...
Uplift Use Cases
•  Churn :
•  E-gaming
•  Other Ex : Coyote
•  Retail :
•  Compare effect of
coupons campaigns
•  Marketin...
Example
•  Mailing : Hillstrom challenge
•  2 campaigns :
•  one men email
•  one woman email
•  Question : who are the pe...
Uplift VS Causal Inference methods
•  Causal inference closer to econometrics
•  Uplift closer to ML, more practical
•  Ev...
Global Uplift Strategy
Uplift as a natural evolution
Train	
  Data	
  
Step 1 : train a (churn) model
Training	
  
Churn	
  Model	
  
Uplift as a natural evolution
Train	
  Data	
  
Test	
  Data	
  
A/B	
  test	
  on	
  
scored	
  dataset	
  
Step 2 : A/B ...
Uplift as a natural evolution
Train	
  Data	
  
Test	
  Data	
  
A/B	
  test	
  on	
  
scored	
  dataset	
  
Step 3 : trai...
Uplift as a natural evolution
Train	
  Data	
  
Test	
  Data	
  
A/B	
  test	
  on	
  
scored	
  dataset	
  
New	
  scorin...
Uplift as a natural evolution
Train	
  Data	
  
Test	
  Data	
  
A/B	
  test	
  on	
  
scored	
  dataset	
  
New	
  scorin...
Machine learning Model
Uplift modeling
•  Three main methods in Uplift Literature:
•  Two models approach
•  Class variable modification
•  Modif...
Uplift modeling : Two model approach
•  Build a model on treatment to get
•  Build a model on control to get
•  Set :
PT
(...
Uplift modeling : Two model approach
•  Advantages :
•  Standard ML models can be used
•  In theory, two good estimators -...
Uplift modeling : Class variable transformation
•  Introduced in Jaskowski, Jaroszewicz 2012
•  Allows any classifier to b...
•  Why does it work ?
•  By design (A/B test warning !), should be independent from
•  Possibly with a reweighting of the ...
•  Why does it work ?
Thus
And sorting by is the same as sorting by
2P(Z = 1|X) = PT
(Y = 1|X) + PC
(Y = 0|X)
= PT
(Y = 1|...
•  Summary :
•  Flip class for control dataset
•  Concatenate test and control dataset
•  Build a classifier
•  Target use...
Generalization :
•  From Athey:
Y ?
= Y G e(X)
e(X)(1 e(X))Let
•  Any classical estimator can be used
•  Generalize to mor...
Uplift modeling : Other methods
•  Based on decision trees :
•  Rzepakowski Jaroszewicz 2012
new decision tree split crite...
Model Evaluation
Evaluation
•  Problem :
•  We don’t have a clear 0/1 target.
•  We would need to know for each customer
•  Response to tre...
Evaluation: Uplift Decile / Bins
•  Uplift bins:
•  Sort dataset by predicted uplift descending
•  Calculate uplift per bi...
Evaluation: Uplift Decile / Bins
•  Cumulative Uplift bins :
•  Sort dataset by predicted uplift descending
•  Calculate u...
Evaluation: Uplift Curve
•  Generalization of the previous curve
Parametric curve defined by:
•  Similar to lift / ROC Cur...
Evaluation: Qini
•  Introduced in Radcliffe
Parametric curve defined by: f(t) = YT (t) YC(t) ⇤ NT (t)/NC(t)
t (observa1ons)...
Evaluation: Qini
•  Best model :
•  Take first all positive in target and last all positive in control.
•  No theoretic be...
Evaluation: Qini
t (observa1ons)	
  
Evaluation: Qini
t (observa1ons)	
  
Conclusion
•  Uplift Modeling :
•  Surprisingly little literature / examples
•  The theory is rather easy to test
•  Two m...
Thank you for your attention !
A few references
•  Data :
•  Churn in gaming :
WOWAH dataset
•  Uplift for healthcare :
Colon Dataset
•  Uplift in mailin...
A few references
•  Application
•  Uplift modeling for clinical trial data (Jaskowski, Jaroszewicz)
•  Uplift Modeling in ...
A few references
•  Causal inference
•  Machine Learning Methods for Estimating Heterogeneous Causal Effects (Athey, Imbens...
Prochain SlideShare
Chargement dans…5
×

Beyond Churn Prediction : An Introduction to uplift modeling

4 288 vues

Publié le

These slides are from a talk I at the papis conference in Boston in 2016. The main subject is uplift modelling. Starting from a churn model approach for an e-gaming company, we introduce when to apply uplift methods, how to mathematically model them, and finally, how to evaluate them.

I tried to bridge the gap between causal inference theory and uplift theory, especially concerning how to properly cross validate the results. The notation used is the one from uplift modelling.

Publié dans : Données & analyses
  • Identifiez-vous pour voir les commentaires

Beyond Churn Prediction : An Introduction to uplift modeling

  1. 1. Beyond Churn Prediction : An Introduction to Uplift Modelling Pierre Gutierrez
  2. 2. Plan •  Introduction / Client situation •  Uplift Use Cases •  Global Uplift Strategy •  Machine learning for Uplift •  Uplift Evaluation •  Conclusion Material •  Complete project http://gallery.dataiku.com/projects/DKU_UPLIFT/ •  Notebooks & Data https://github.com/PGuti/Uplift.git
  3. 3. Dataiku •  Founded in 2013 •  60 + employees •  Paris, New-York, London, San Francisco DESIGN Load and prepare your data PREPARE Build your models MODEL Visualize and share your work ANALYSE Re-execute your workflow at ease AUTOMATE Follow your production environment MONITOR Get predictions in real time SCORE PRODUCTION Data Science Software Editor of Dataiku DSS
  4. 4. Motivations
  5. 5. Client situation •  Client : French Online Gaming Company (MMORPG) •  Users are leaving (more than 10 years old ) •  let’s do a churn prediction model ! •  Target : no come back in 14 or 28 days. (14 missing days -> 80 % of chance not to come back 28 missing days -> 90 % of chance not to come back) •  Features : •  Connection features : •  Time played in 1,7,15,30,… days •  Time since last connection •  Connection frequency •  Days of week / hours of days played •  Equivalent for payments and subscriptions •  Age, sex, country •  Number of account, is a bot … •  No in game features (no data)    
  6. 6. Client situation •  Model Results : •  AUC 0.88 •  Very stable model in time •  Marketing actions : •  7 different actions based on customer segmentation (offers, promotion, … ) •  A/B test -> -5 % churn for persons contacted by email •  Going further : •  Feature engineering : guilds, close network, in game actions, … •  Study long term churn …
  7. 7. Uplift Definition •  But wait ! •  Strong hypothesis : target the person that are the most likely to churn
  8. 8. Uplift Definition •  But wait ! •  Strong hypothesis : target the person that are the most likely to churn •  What is the gain / person for an action ? •  cost of action •  fixed value of the customer •  independent variables •  “treated” population and “control” population •  •  Value with action : •  Value without action : •  Gain : c vi i X T C Y = ⇢ 1 if customer churn 0 otherwise ET (Vi) = vi(1 PT (Y = 1|X)) c EC (Vi) = vi(1 PC (Y = 1|X)) E(Gi) = vi(PC (Y = 1|X) PT (Y = 1|X)) c vi(hypothesis  :                      independent  of  ac1on)    
  9. 9. Uplift Definition •  But wait ! •  Strong hypothesis : target the person that are the most likely to churn •  What is the gain / person for an action ? •  Real Target : People who are †he most likely to change positively their behavior if there is an action Upli5  =  Model   E(Gi) = vi(PC (Y = 1|X) PT (Y = 1|X)) c P
  10. 10. Uplift Definition •  Gain to maximize: •  Targeting churner: Does not optimize the difference ! Is good if treatment good. •  Intuitive examples: •  : action is expected to make the situation worst. Spam ? •  : user does not care E(Gi) = vi(PC (Y = 1|X) PT (Y = 1|X)) c PC (Y = 1) ⇡ PT (Y = 1) PC (Y = 1) < PT (Y = 1)
  11. 11. Uplift Definition If not treated Positive Response Negative Response Unnecessary costs Negative impact Positive Response Negative Response If treated Unnecessary costs The people we want to target SURE THINGS SLEEPING DOGS PERSUADABLES LOST CAUSES
  12. 12. Uplift Use Cases
  13. 13. Uplift Use Cases •  Healthcare : •  Typical medical trial: •  Treatment group: gets the treatment •  Control group: gets placebo (or another treatment) •  Statistical test show that the treatment works or not globally •  With uplift modeling we can find out for whom the treatment works best •  Personalized medicine •  Ex : What is the gain in survival probability ? -> classification/uplift problem
  14. 14. Uplift Use Cases •  Churn : •  E-gaming •  Other Ex : Coyote •  Retail : •  Compare effect of coupons campaigns •  Marketing / CRM : •  Churn •  E-Mailing
  15. 15. Example •  Mailing : Hillstrom challenge •  2 campaigns : •  one men email •  one woman email •  Question : who are the people to target / that have the best response rate
  16. 16. Uplift VS Causal Inference methods •  Causal inference closer to econometrics •  Uplift closer to ML, more practical •  Evaluation based on Cross Validation •  Usage of classical ML models •  Sometimes lack of theory •  Different people who don’t really talk together: •  Different Notations (sorry). Today is uplift’s •  Different evaluation functions •  Different models ? Not really !
  17. 17. Global Uplift Strategy
  18. 18. Uplift as a natural evolution Train  Data   Step 1 : train a (churn) model Training   Churn  Model  
  19. 19. Uplift as a natural evolution Train  Data   Test  Data   A/B  test  on   scored  dataset   Step 2 : A/B test the model Training   Churn  Model  
  20. 20. Uplift as a natural evolution Train  Data   Test  Data   A/B  test  on   scored  dataset   Step 3 : train your uplift model Training   Churn  Model   Training  
  21. 21. Uplift as a natural evolution Train  Data   Test  Data   A/B  test  on   scored  dataset   New  scoring   Step 4 : deploy Training   Churn  Model   New  Test  Data   Upli5  Model   Training  
  22. 22. Uplift as a natural evolution Train  Data   Test  Data   A/B  test  on   scored  dataset   New  scoring   Capitalize on your A/B test data ! Training   Churn  Model   New  Test  Data   Upli5  Model   Training   Today’s Focus
  23. 23. Machine learning Model
  24. 24. Uplift modeling •  Three main methods in Uplift Literature: •  Two models approach •  Class variable modification •  Modification of existing machine learning models (tree based methods, out of the scope of today). •  Generalization: Causal inference approach •  Main Assumption (Uncofoundedness) : Control and Treatment belonging should be independent of the response
  25. 25. Uplift modeling : Two model approach •  Build a model on treatment to get •  Build a model on control to get •  Set : PT (Y |X) PC (Y |X) P = PT (Y |X) PC (Y |X)
  26. 26. Uplift modeling : Two model approach •  Advantages : •  Standard ML models can be used •  In theory, two good estimators -> a good uplift model •  Works well in practice •  Generalize to regression and multi-treatment easily •  Drawbacks •  Difference of estimators is probably not the best estimator of the difference •  The two classifier can ignore the weaker uplift signal (since it’s not their target) •  Algorithm focusing on estimating the difference should perform better
  27. 27. Uplift modeling : Class variable transformation •  Introduced in Jaskowski, Jaroszewicz 2012 •  Allows any classifier to be updated to uplift modeling •  Let denote the group membership (Treatment or Control) •  Let’s define the new target variable : •  This corresponds to flipping the target in the control dataset. G 2 {T, C} Z = 8 < : 1 if G = T and Y = 1 1 if G = C and Y = 0 0 otherwise
  28. 28. •  Why does it work ? •  By design (A/B test warning !), should be independent from •  Possibly with a reweighting of the datasets we should have : thus P(Z = 1|X) = PT (Y = 1|X)P(G = T|X) + PC (Y = 0|X)P(G = C|X) P(Z = 1|X) = PT (Y = 1|X)P(G = T) + PC (Y = 0|X)P(G = C) G X P(G = T) = P(G = C) = 1/2 2P(Z = 1|X) = PT (Y = 1|X) + PC (Y = 0|X) Uplift modeling : Class variable transformation
  29. 29. •  Why does it work ? Thus And sorting by is the same as sorting by 2P(Z = 1|X) = PT (Y = 1|X) + PC (Y = 0|X) = PT (Y = 1|X) + 1 PC (Y = 1|X) P = 2P(Z = 1|X) 1 P(Z = 1|X) P Uplift modeling : Class variable transformation
  30. 30. •  Summary : •  Flip class for control dataset •  Concatenate test and control dataset •  Build a classifier •  Target users with highest probability •  Advantages : •  Any classifier can be used •  Directly predict uplift (and not each class separately) •  Single model on a larger dataset (instead of two small ones) •  Drawbacks : •  Complex decision surface -> model can perform poorly Uplift modeling : Class variable transformation
  31. 31. Generalization : •  From Athey: Y ? = Y G e(X) e(X)(1 e(X))Let •  Any classical estimator can be used •  Generalize to more advanced A/B test schemed •  Specific estimator can be derived (see paper) With E(Y ? = P)Then (Unconfoundedness) e(X) = P(G = 1|X)
  32. 32. Uplift modeling : Other methods •  Based on decision trees : •  Rzepakowski Jaroszewicz 2012 new decision tree split criterion based on information theory •  Soltys Rzepakowski Jaroszewicz 2013 Ensemble methods for uplift modeling (out of today scope )
  33. 33. Model Evaluation
  34. 34. Evaluation •  Problem : •  We don’t have a clear 0/1 target. •  We would need to know for each customer •  Response to treatment •  Response to control -> not possible •  Cross Validation : •  Train and Validation split •  Stratified on target/control variable.
  35. 35. Evaluation: Uplift Decile / Bins •  Uplift bins: •  Sort dataset by predicted uplift descending •  Calculate uplift per bin •  Hard to compare models YT number of positive in treated YC number of positive in control NT number in treated NC number in control U = YT NT YC NC
  36. 36. Evaluation: Uplift Decile / Bins •  Cumulative Uplift bins : •  Sort dataset by predicted uplift descending •  Calculate uplift on all bins preceding •  Cumulative Uplift Gain bins : •  Sort dataset by predicted uplift descending •  Calculate uplift on all bins preceding •  Multiply by number of instances
  37. 37. Evaluation: Uplift Curve •  Generalization of the previous curve Parametric curve defined by: •  Similar to lift / ROC Curve •  Models can be compared ! AUC
  38. 38. Evaluation: Qini •  Introduced in Radcliffe Parametric curve defined by: f(t) = YT (t) YC(t) ⇤ NT (t)/NC(t) t (observa1ons)  
  39. 39. Evaluation: Qini •  Best model : •  Take first all positive in target and last all positive in control. •  No theoretic best model : •  depends on possibility of negative effect •  Displayed for no negative effect •  Random model : •  Corresponds to global effect of treatment •  Hillstrom Dataset : •  For women models are comparable and useful •  For men, there is no clear individuals to target
  40. 40. Evaluation: Qini t (observa1ons)  
  41. 41. Evaluation: Qini t (observa1ons)  
  42. 42. Conclusion •  Uplift Modeling : •  Surprisingly little literature / examples •  The theory is rather easy to test •  Two models •  Class modification •  The intuition and evaluation are not easy to grasp •  On the client side : •  A good lead to select the best offer for a customer -> Can lead to more customer personalization •  Applications : •  Churn, mailing, retail couponing, personalized medicine…
  43. 43. Thank you for your attention !
  44. 44. A few references •  Data : •  Churn in gaming : WOWAH dataset •  Uplift for healthcare : Colon Dataset •  Uplift in mailing : Hillstrom data challenge •  Uplift in General : Simulated data : available on gallery.dataiku.com •  Demo : •  http://gallery.dataiku.com/projects/DKU_UPLIFT/
  45. 45. A few references •  Application •  Uplift modeling for clinical trial data (Jaskowski, Jaroszewicz) •  Uplift Modeling in Direct Marketing (Rzepakowski, Jaroszewicz) •  Modeling techniques : •  Rzepakowski Jaroszewicz 2011 (decision trees) •  Soltys Rzepakowski Jaroszewicz 2013 (ensemble for uplift) •  Jaskowski Jaroszewicz 2012 (Class modification model) •  Evaluation •  Using Control Groups to Target on Predicted Lift (Radcliffe) •  Testing a New Metric for Uplift Models (Mesalles Naranjo)
  46. 46. A few references •  Causal inference •  Machine Learning Methods for Estimating Heterogeneous Causal Effects (Athey, Imbens 2015) •  Introduction to Causal Inference (Sprites 2010) •  Causal inference in statistics: An overview (Pearl 2009)

×