SlideShare une entreprise Scribd logo
1  sur  76
Data	
  Mining	
  Project	
  
By:	
  Akanksha	
  Jain	
  
Project	
  Goals	
  
•  Goal	
  :	
  Using	
  historical	
  credit	
  data	
  set	
  SMALL,	
  develop	
  a	
  model	
  which	
  can	
  
predict	
  whether	
  the	
  prospect	
  will	
  respond	
  to	
  a	
  markeBng	
  campaign	
  in	
  
future	
  
•  Scope:	
  SMALL	
  data	
  set	
  
–  145	
  Variables	
  
–  8000	
  observaBons	
  
–  Dependent	
  Variable:	
  RESP_FLG	
  (Binary)	
  
•  Responder:	
  1	
  
•  Non-­‐Responder:	
  0	
  
Tools	
  	
  
•  SAS	
  Enterprise	
  Miner	
  WorkstaBon	
  7.1	
  
•  SAS	
  9.3_M1	
  
Variable	
  

Variable	
  DefiniBon	
  
Defini,on	
  

Type	
  

AAl01	
  –	
  AAL17	
  

All	
  Types	
  

Char	
  

AAU01	
  -­‐	
  AAU07	
  

Auto	
  

Char	
  

ABK01	
  -­‐	
  ABK15	
  

Bankcard	
  

Char	
  

ACE01	
  -­‐	
  ACE03	
  

Cust	
  Elim	
  

Char	
  

ACL02	
  –	
  ACL12	
  

CollecBon	
  

Char	
  

ADI01	
  –	
  ADI09	
  

Derog	
  By	
  Ind	
  

Char	
  

AEQ01	
  -­‐	
  AEQ07	
  

Home	
  Equity	
  

Char	
  

AHI01	
  -­‐	
  AHI05	
  

Historical	
  

Char	
  

AIN01	
  -­‐	
  AIN15	
  

Installmnt	
  

Char	
  

AIQ01	
  -­‐	
  AIQ05	
  

Inquiries	
  

Char	
  

ALE01	
  -­‐	
  ALE07	
  

Lease	
  

Char	
  

ALN01	
  -­‐	
  ALN07	
  

LN	
  Finance	
  

Char	
  

AMG01	
  -­‐	
  AMG07	
  

Mortgage	
  

Char	
  

APR17	
  -­‐	
  APR21	
  

Public	
  REC	
  

Char	
  

ART01	
  -­‐	
  ART15	
  

Retail	
  

Char	
  

ARV01	
  -­‐	
  ARV15	
  

Revolving	
  

Char	
  

CUS04	
  

Customer	
  Data	
  

Char	
  

SCORE01	
  

FICO	
  

Num	
  

SCORE02	
  

MDS	
  (Market	
  DeriveD	
  SignalS)	
  

Num	
  

RESP_FLG	
  

Responder	
  Flag	
  

Num	
  
Data	
  Cleaning	
  
•  Dataset	
  SMALL	
  has	
  missing	
  values	
  for	
  Variables	
  SCORE01	
  (FICO)	
  and	
  
SCORE02	
  (MDS)	
  
	
  

	
  

data	
  mylib.small_clean	
  mylib.small_bad;	
  
	
  set	
  mylib.small;	
  
	
  if	
  score01	
  =	
  .	
  or	
  score02	
  =	
  .	
  then	
  output	
  mylib.small_bad;	
  
	
  else	
  output	
  mylib.small_clean;	
  
run;	
  
	
  
LOG:	
  	
  
NOTE:	
  There	
  were	
  8000	
  observaBons	
  read	
  from	
  the	
  data	
  set	
  MYLIB.SMALL.	
  
NOTE:	
  The	
  data	
  set	
  MYLIB.SMALL_CLEAN	
  has	
  5782	
  observa:ons	
  and	
  145	
  variables.	
  
NOTE:	
  The	
  data	
  set	
  MYLIB.SMALL_BAD	
  has	
  2218	
  observa:ons	
  and	
  145	
  variables.	
  

•  Going	
  forward,	
  will	
  use	
  dataset	
  SMALL_CLEAN	
  
•  InvesBgate	
  separately	
  why	
  2218	
  observaBons	
  had	
  missing	
  values	
  for	
  
SCORE01	
  and	
  SCORE02	
  
	
  
	
  

	
  
Diagram	
  
Data	
  Source	
  
•  Rejected	
  Variables	
  (have	
  more	
  than	
  20	
  categories):	
  
–  ACE03	
  
–  ACL10	
  
•  Variable	
  RESP_FLG	
  
–  Change	
  Role	
  to	
  TARGET	
  
–  Change	
  Order	
  to	
  DESCENDING	
  
•  Set	
  Prior	
  ProbabiliBes	
  	
  
–  Non	
  –	
  Responder/	
  event	
  =	
  “0”:	
  0.99	
  
–  Responder/	
  event	
  =	
  “1”:	
  0.01	
  
Data	
  ParBBon	
  
•  Train	
  –	
  55%	
  
•  Validate	
  –	
  35%	
  
•  Test	
  –	
  10%	
  
Model:	
  Maximum	
  CHAID	
  
•  Nominal	
  Criterion:	
  ProbChiSq	
  
•  Significance	
  Level:	
  0.2	
  
Model:	
  Maximum	
  CHAID	
  
On	
  the	
  Lel	
  side,	
  the	
  percentage	
  of	
  1’s	
  i.e.	
  Respondents	
  is	
  higher,	
  and	
  hence	
  people	
  with	
  FICO	
  
score	
  <	
  700.5,	
  in	
  (0,	
  4,	
  5,	
  Missing)	
  category	
  of	
  RETAIL:	
  BAL	
  >	
  0	
  IN	
  6	
  MNTHS,	
  ALL	
  will	
  respond	
  to	
  
the	
  markeBng	
  campaign	
  
Maximum	
  CHAID:	
  CumulaBve	
  LIFT	
  
Maximum	
  CHAID:	
  Final	
  Variables	
  
Model:	
  Pruned	
  CHAID	
  
• 
• 
• 
• 
• 

	
  

Nominal	
  Criterion:	
  ProbChiSq	
  
Significance	
  Level:	
  0.2	
  
Leaf	
  Size:	
  120	
  
Split	
  Size:	
  300	
  
Maximum	
  Depth:	
  3	
  
Model:	
  Pruned	
  CHAID	
  
Pruned	
  CHAID:	
  CumulaBve	
  LIFT	
  
Pruned	
  CHAID:	
  Final	
  Variables	
  	
  
Model:	
  CART	
  
• 
• 
• 
• 
• 
	
  

	
  

Nominal	
  Criterion:	
  Gini	
  
Significance	
  Level:	
  0.2	
  
Leaf	
  Size:	
  120	
  
Split	
  Size:	
  300	
  
Maximum	
  Depth:	
  3	
  
CART:	
  Tree	
  
CART:	
  CumulaBve	
  LIFT	
  
CART:	
  Final	
  Variables	
  
Model:	
  C4.5	
  
• 
• 
• 
• 
• 
	
  

	
  

Nominal	
  Criterion:	
  Entropy	
  
Significance	
  Level:	
  0.2	
  
Leaf	
  Size:	
  120	
  
Split	
  Size:	
  300	
  
Maximum	
  Depth:	
  3	
  
C4.5:	
  Tree	
  
C4.5:	
  CumulaBve	
  LIFT	
  
C4.5:	
  Final	
  Variables	
  
Variable	
  Comparison	
  
Maximum	
  CHAID	
  

Pruned	
  CHAID	
  

CART	
  

SCORE01	
  
AMG07	
  
ABK10	
  
ART11	
  
AAL04	
  
AIQ04	
  
ACE02	
  
AAU03	
  
AAL14	
  
AEQ07	
  
AEQ01	
  
AIN03	
  
ABK14	
  
AIN10	
  

SCORE01	
  
AMG07	
  
ALN01	
  

SCORE01	
  
AMG07	
  
ABK10	
  
SCORE02	
  
AMG06	
  
AMG01	
  
AMG03	
  
ARV10	
  
ARV03	
  
ARV01	
  
Transform	
  Variables	
  
SCORE01	
  

SCORE02	
  
Transform	
  Variables	
  
•  Skewed	
  SCORE01	
  and	
  SCORE02	
  
•  Transform	
  funcBon	
  –	
  LOG	
  

	
  
Impute	
  
•  Default	
  Input	
  Methods	
  	
  
–  For	
  Interval	
  Variables	
  –	
  Median	
  
–  For	
  Class	
  Variables	
  -­‐	
  Count	
  
Model:	
  Event	
  ‘0’	
  
Model:	
  Full	
  Model	
  Regression	
  
• 
• 
• 
• 
• 
• 

Input	
  Coding	
  -­‐	
  GLM	
  
MODEL	
  SELECTION	
  -­‐	
  None	
  
OPTIMIZATION	
  OPTIONS	
  -­‐	
  TECHNIQUE	
  -­‐	
  Default	
  
OPTIMIZATION	
  OPTIONS	
  -­‐	
  DEFAULT	
  OPTIMIZATION	
  -­‐	
  No	
  
OPTIMIZATION	
  OPTIONS	
  -­‐	
  MAX	
  ITERATIONS	
  -­‐	
  20	
  
OPTIMIZATION	
  OPTIONS	
  -­‐	
  MAX	
  FUNCTION	
  CALLS	
  -­‐	
  10	
  
Full	
  Model	
  Regression:	
  CumulaBve	
  LIFT	
  
Model:	
  Forward	
  Regression	
  
• 
• 
• 
• 
• 
• 
• 
• 

	
  

Input	
  Coding	
  -­‐	
  GLM	
  
MODEL	
  SELECTION	
  -­‐	
  Forward	
  
SELECTION	
  CRITERION	
  -­‐	
  Akaike	
  InformaBon	
  Criterion	
  
USE	
  SELECTION	
  DEFAULTS	
  -­‐	
  YES	
  
OPTIMIZATION	
  OPTIONS	
  -­‐	
  TECHNIQUE	
  -­‐	
  Default	
  
OPTIMIZATION	
  OPTIONS	
  -­‐	
  DEFAULT	
  OPTIMIZATION	
  -­‐	
  No	
  
OPTIMIZATION	
  OPTIONS	
  -­‐	
  MAX	
  ITERATIONS	
  -­‐	
  20	
  
OPTIMIZATION	
  OPTIONS	
  -­‐	
  MAX	
  FUNCTION	
  CALLS	
  -­‐	
  10	
  
Forward	
  Regression:	
  CumulaBve	
  LIFT	
  
Forward	
  Regression:	
  CumulaBve	
  %	
  
Captured	
  Response	
  
Forward	
  Regression:	
  Final	
  Variables	
  
Model:	
  Backward	
  Regression	
  
• 
• 
• 
• 
• 
• 
• 
• 

	
  

Input	
  Coding	
  -­‐	
  GLM	
  
MODEL	
  SELECTION	
  -­‐	
  Backward	
  
SELECTION	
  CRITERION	
  -­‐	
  Akaike	
  InformaBon	
  Criterion	
  
USE	
  SELECTION	
  DEFAULTS	
  -­‐	
  YES	
  
OPTIMIZATION	
  OPTIONS	
  -­‐	
  TECHNIQUE	
  -­‐	
  Default	
  
OPTIMIZATION	
  OPTIONS	
  -­‐	
  DEFAULT	
  OPTIMIZATION	
  -­‐	
  No	
  
OPTIMIZATION	
  OPTIONS	
  -­‐	
  MAX	
  ITERATIONS	
  -­‐	
  20	
  
OPTIMIZATION	
  OPTIONS	
  -­‐	
  MAX	
  FUNCTION	
  CALLS	
  -­‐	
  10	
  
Backward	
  Regression:	
  CumulaBve	
  LIFT	
  
Backward	
  Regression:	
  CumulaBve	
  %	
  
Captured	
  Response	
  
Backward	
  Regression:	
  Final	
  Variables	
  
Model:	
  Stepwise	
  Regression	
  
• 
• 
• 
• 
• 

• 
• 
• 
• 

Input	
  Coding	
  -­‐	
  GLM	
  
MODEL	
  SELECTION	
  -­‐	
  Stepwise	
  
SELECTION	
  CRITERION	
  -­‐	
  Akaike	
  InformaBon	
  Criterion	
  
USE	
  SELECTION	
  DEFAULTS	
  -­‐	
  No	
  
MODEL	
  SELECTION	
  -­‐	
  SELECTION	
  OPTIONS	
  	
  
–  ENTRY	
  SIGNIFICANCE	
  LEVEL	
  =	
  0.15	
  
–  STAY	
  SIGNIFICANCE	
  LEVEL	
  =	
  0.05	
  
–  MAXIMUM	
  NUMBER	
  OF	
  STEPS	
  =	
  300	
  
OPTIMIZATION	
  OPTIONS	
  -­‐	
  TECHNIQUE	
  -­‐	
  Default	
  
OPTIMIZATION	
  OPTIONS	
  -­‐	
  DEFAULT	
  OPTIMIZATION	
  -­‐	
  No	
  
OPTIMIZATION	
  OPTIONS	
  -­‐	
  MAX	
  ITERATIONS	
  -­‐	
  20	
  
OPTIMIZATION	
  OPTIONS	
  -­‐	
  MAX	
  FUNCTION	
  CALLS	
  -­‐	
  10	
  
Stepwise	
  Regression:	
  CumulaBve	
  LIFT	
  
Stepwise	
  Regression:	
  CumulaBve	
  %	
  
Captured	
  Response	
  
Stepwise	
  Regression:	
  Final	
  Variables	
  
Variable	
  Comparison	
  
Forward	
  

Backward	
  

Stepwise	
  

AAL11	
  

ACE01	
  

AAL11	
  

ACE01	
  

AEQ01	
  

ACE01	
  

AEQ01	
  

AEQ07	
  

AEQ01	
  

AEQ07	
  

AHI01	
  

AEQ07	
  

AHI01	
  

ALN01	
  

AHI01	
  

ALN01	
  

AMG01	
  

ALN01	
  

AMG01	
  

AMG07	
  

AMG01	
  

AMG07	
  

APR20	
  

AMG07	
  

APR20	
  

LOG_SCORE01	
  

APR20	
  

ART11	
  

AEQ03	
  

ART11	
  

LOG_SCORE01	
  

AEQ04	
  

LOG_SCORE01	
  

AEQ02	
  

ALE01	
  
ALE02	
  
InteracBon	
  Terms	
  
• 
• 
• 
• 
• 
• 

log_score01	
  *	
  log_score01	
  
log_score01	
  *	
  ace01	
  
log_score01	
  *	
  amg01	
  
log_score01	
  *	
  ahi01	
  
log_score01	
  *	
  log_score02	
  
log_score02	
  *	
  log_score02	
  
Model:	
  Forward	
  Reg	
  InteracBon	
  
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 

EQUATION	
  -­‐	
  USER	
  TERMS	
  -­‐	
  YES	
  
EQUATION	
  -­‐	
  TERM	
  EDITOR	
  -­‐	
  Enter	
  InteracBon	
  Terms	
  
Input	
  Coding	
  -­‐	
  GLM	
  
MODEL	
  SELECTION	
  -­‐	
  Forward	
  Reg	
  InteracBon	
  
SELECTION	
  CRITERION	
  -­‐	
  Akaike	
  InformaBon	
  Criterion	
  
USE	
  SELECTION	
  DEFAULTS	
  -­‐	
  YES	
  
OPTIMIZATION	
  OPTIONS	
  -­‐	
  TECHNIQUE	
  -­‐	
  Default	
  
OPTIMIZATION	
  OPTIONS	
  -­‐	
  DEFAULT	
  OPTIMIZATION	
  -­‐	
  No	
  
OPTIMIZATION	
  OPTIONS	
  -­‐	
  MAX	
  ITERATIONS	
  -­‐	
  20	
  
OPTIMIZATION	
  OPTIONS	
  -­‐	
  MAX	
  FUNCTION	
  CALLS	
  -­‐	
  10	
  
Forward	
  Reg	
  InteracBon:	
  CumulaBve	
  
LIFT	
  
Forward	
  Reg	
  InteracBon:	
  CumulaBve	
  
%	
  Captured	
  Response	
  
Forward	
  Reg	
  InteracBon:	
  Final	
  
Variables	
  
Model:	
  Backward	
  Reg	
  InteracBon	
  
• 
• 
• 
• 
• 
• 
• 
• 
• 
• 

EQUATION	
  -­‐	
  USER	
  TERMS	
  -­‐	
  YES	
  
EQUATION	
  -­‐	
  TERM	
  EDITOR	
  -­‐	
  Enter	
  InteracBon	
  Terms	
  
Input	
  Coding	
  -­‐	
  GLM	
  
MODEL	
  SELECTION	
  -­‐	
  Backward	
  Reg	
  InteracBon	
  
SELECTION	
  CRITERION	
  -­‐	
  Akaike	
  InformaBon	
  Criterion	
  
USE	
  SELECTION	
  DEFAULTS	
  -­‐	
  YES	
  
OPTIMIZATION	
  OPTIONS	
  -­‐	
  TECHNIQUE	
  -­‐	
  Default	
  
OPTIMIZATION	
  OPTIONS	
  -­‐	
  DEFAULT	
  OPTIMIZATION	
  -­‐	
  No	
  
OPTIMIZATION	
  OPTIONS	
  -­‐	
  MAX	
  ITERATIONS	
  -­‐	
  20	
  
OPTIMIZATION	
  OPTIONS	
  -­‐	
  MAX	
  FUNCTION	
  CALLS	
  -­‐	
  10	
  
Backward	
  Reg	
  InteracBon:	
  CumulaBve	
  
LIFT	
  
Backward	
  Reg	
  InteracBon:	
  CumulaBve	
  
%	
  Captured	
  Response	
  
Backward	
  Reg	
  InteracBon:	
  Final	
  
Variables	
  
Model:	
  Stepwise	
  Reg	
  InteracBon	
  
• 
• 
• 
• 
• 
• 
• 

• 
• 
• 
• 

EQUATION	
  -­‐	
  USER	
  TERMS	
  -­‐	
  YES	
  
EQUATION	
  -­‐	
  TERM	
  EDITOR	
  -­‐	
  Enter	
  InteracBon	
  Terms	
  
Input	
  Coding	
  -­‐	
  GLM	
  
MODEL	
  SELECTION	
  -­‐	
  Stepwise	
  Reg	
  InteracBon	
  
SELECTION	
  CRITERION	
  -­‐	
  Akaike	
  InformaBon	
  Criterion	
  
USE	
  SELECTION	
  DEFAULTS	
  -­‐	
  No	
  
MODEL	
  SELECTION	
  -­‐	
  SELECTION	
  OPTIONS	
  –	
  
–  ENTRY	
  SIGNIFICANCE	
  LEVEL	
  =	
  0.15	
  
–  STAY	
  SIGNIFICANCE	
  LEVEL	
  =	
  0.05	
  
–  MAXIMUM	
  NUMBER	
  OF	
  STEPS	
  =	
  300	
  
OPTIMIZATION	
  OPTIONS	
  -­‐	
  TECHNIQUE	
  -­‐	
  Default	
  
OPTIMIZATION	
  OPTIONS	
  -­‐	
  DEFAULT	
  OPTIMIZATION	
  -­‐	
  No	
  
OPTIMIZATION	
  OPTIONS	
  -­‐	
  MAX	
  ITERATIONS	
  -­‐	
  20	
  
OPTIMIZATION	
  OPTIONS	
  -­‐	
  MAX	
  FUNCTION	
  CALLS	
  -­‐	
  10	
  
Stepwise	
  Reg	
  InteracBon:	
  CumulaBve	
  
LIFT	
  
Stepwise	
  Reg	
  InteracBon:	
  CumulaBve	
  
%	
  Captured	
  Response	
  
Stepwise	
  Reg	
  InteracBon:	
  Final	
  
Variables	
  
Variable	
  Comparison	
  
Forward_Interac,on	
  

Backward_Interac,on	
  

Stepwise_Interac,on	
  

LOG_SCORE01*ACE01	
  

AEQ01	
  

LOG_SCORE01*ACE01	
  

LOG_SCORE02*AMG01	
  

AEQ07	
  

LOG_SCORE02*AMG01	
  

AAL11	
  

AHI01	
  

AAL11	
  

AEQ01	
  

ALN01	
  

AEQ01	
  

AEQ07	
  

AMG07	
  

AEQ07	
  

AHI01	
  

APR20	
  

AHI01	
  

ALN01	
  

LOG_SCORE01*LOG_SCORE01	
  

ALN01	
  

AMG07	
  

LOG_SCORE01*AHI01	
  

AMG07	
  

APR20	
  

ACE01	
  

APR20	
  

ART11	
  

AEQ03	
  

ART11	
  

AEQ02	
  

AEQ04	
  
ALE01	
  
ALE02	
  
AMG01	
  
LOG_SCORE01	
  
Model:	
  Neural	
  Network	
  
• 
• 
• 
• 
• 

NETWORK	
  -­‐	
  DIRECT	
  CONNECTION	
  =	
  Yes	
  
OPTIMIZATION	
  -­‐	
  PRELIMINARY	
  TRAINING	
  -­‐	
  ENABLE	
  =	
  No	
  
OPTIMIZATION	
  -­‐	
  Maximum	
  IteraBons	
  =	
  50	
  
OPTIMIZATION	
  -­‐	
  PRELIMINARY	
  TRAINING	
  -­‐	
  Number	
  of	
  Runs	
  =	
  10	
  
MODEL	
  SELECTION	
  CRITERION	
  -­‐	
  MisclassificaBon	
  
Neural	
  Network:	
  CumulaBve	
  LIFT	
  
Neural	
  Network:	
  CumulaBve	
  %	
  
Captured	
  Response	
  
Neural	
  Network:	
  Average	
  Square	
  Error	
  
If	
  we	
  increase	
  the	
  number	
  of	
  iteraBons,	
  then	
  the	
  average	
  square	
  error	
  decreases	
  for	
  TRAIN	
  but	
  
increases	
  for	
  VALIDATE	
  data	
  set	
  

	
  

	
  
	
  
Ensemble	
  Node	
  
Select	
  the	
  model	
  that	
  performs	
  best	
  in	
  
–  Decision	
  Trees	
  
–  Regression	
  
–  Regression	
  with	
  InteracBon	
  Terms	
  
	
  
	
  
Build	
  an	
  Ensemble	
  Node	
  on:	
  
–  Pruned	
  Chaid	
  
–  Forward	
  Regression	
  
–  Forward	
  Reg	
  InteracBon	
  
Ensemble	
  Node:	
  CumulaBve	
  LIFT	
  
Ensemble	
  Node:	
  CumulaBve	
  %	
  
Captured	
  Response	
  
Model	
  Comparison	
  
•  ASSESSMENT	
  REPORTS	
  -­‐	
  NUMBER	
  OF	
  BINS	
  =	
  50	
  
•  MODEL	
  SELECTION	
  -­‐	
  SELECTION	
  STATISTIC	
  =	
  MISCLASSIFICATION	
  RATE	
  
•  Comparing	
  LIFT	
  at	
  top	
  20%	
  
Model	
  Comparison:	
  ROC	
  
Model	
  Comparison:	
  CumulaBve	
  LIFT	
  
(Train)	
  
Model	
  Comparison:	
  CumulaBve	
  LIFT	
  
(Validate)	
  
Model	
  Comparison:	
  CumulaBve	
  LIFT	
  
(Test)	
  
Model	
  Comparison:	
  Conclusion	
  
•  TRAIN:	
  	
  
–  Ensemble	
  works	
  best,	
  followed	
  by	
  Forward	
  Regression	
  
–  Check	
  for	
  Validate	
  and	
  Test	
  results	
  to	
  finalize	
  the	
  model	
  

•  VALIDATE	
  and	
  TEST	
  
–  Forward	
  Regression	
  works	
  bever	
  than	
  Ensemble	
  
Final	
  Model	
  
•  Forward	
  Regression	
  
•  List	
  of	
  Variables:	
  
–  AAL11	
  
– 
– 
– 
– 
– 
– 
– 
– 
– 
– 
– 

ACE01	
  
AEQ01	
  
AEQ07	
  
AHI01	
  
ALN01	
  
AMG01	
  
AMG07	
  
APR20	
  
ART11	
  
LOG_SCORE01	
  
AEQ02	
  
SCORE	
  
•  In	
  Model	
  Comparison	
  Node-­‐	
  SCORE	
  -­‐>	
  SELECTION	
  EDITOR	
  	
  
•  Do	
  YES	
  for	
  Forward	
  Regression	
  and	
  NO	
  for	
  Stepwise	
  Reg	
  InteracBon	
  
(which	
  was	
  selected	
  by	
  default)	
  
•  Connect	
  Model	
  Comparison	
  with	
  SCORE,	
  and	
  Run	
  it	
  
•  Get	
  OpBmized	
  SAS	
  code	
  
Model	
  Performance	
  
•  PROC	
  RANK	
  
–  Rank	
  2:	
  Top	
  1/3rd	
  responders	
  
	
  
	
  
Model	
  Performance	
  
Thank	
  You	
  
QuesBons???	
  

Contenu connexe

En vedette

Ed 289 mining issue
Ed 289 mining issueEd 289 mining issue
Ed 289 mining issue
49072013
 
Random forest
Random forestRandom forest
Random forest
Ujjawal
 
Survey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease predictionSurvey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease prediction
Sivagowry Shathesh
 
Data mining seminar report
Data mining seminar reportData mining seminar report
Data mining seminar report
mayurik19
 
Kotler chapter 17: Designing and Managing Integrated Marketing Communication
Kotler chapter 17: Designing and Managing Integrated Marketing CommunicationKotler chapter 17: Designing and Managing Integrated Marketing Communication
Kotler chapter 17: Designing and Managing Integrated Marketing Communication
rianparulan
 
Openpit Design Fundas
Openpit Design FundasOpenpit Design Fundas
Openpit Design Fundas
VR M
 

En vedette (19)

Ed 289 mining issue
Ed 289 mining issueEd 289 mining issue
Ed 289 mining issue
 
Random forest
Random forestRandom forest
Random forest
 
Survey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease predictionSurvey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease prediction
 
Review of scheduling algorithms in Open Pit Mining
Review of scheduling algorithms in Open Pit MiningReview of scheduling algorithms in Open Pit Mining
Review of scheduling algorithms in Open Pit Mining
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
Data mining seminar report
Data mining seminar reportData mining seminar report
Data mining seminar report
 
Open pit mining
Open pit miningOpen pit mining
Open pit mining
 
Copy of open pit mining
Copy of open pit miningCopy of open pit mining
Copy of open pit mining
 
Lecture10 - Naïve Bayes
Lecture10 - Naïve BayesLecture10 - Naïve Bayes
Lecture10 - Naïve Bayes
 
Random forest
Random forestRandom forest
Random forest
 
2 surface mine planning
2 surface mine planning2 surface mine planning
2 surface mine planning
 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to Practice
 
Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 
Surface mining
Surface miningSurface mining
Surface mining
 
Mining (1)
Mining (1)Mining (1)
Mining (1)
 
Classification with Naive Bayes
Classification with Naive BayesClassification with Naive Bayes
Classification with Naive Bayes
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Kotler chapter 17: Designing and Managing Integrated Marketing Communication
Kotler chapter 17: Designing and Managing Integrated Marketing CommunicationKotler chapter 17: Designing and Managing Integrated Marketing Communication
Kotler chapter 17: Designing and Managing Integrated Marketing Communication
 
Openpit Design Fundas
Openpit Design FundasOpenpit Design Fundas
Openpit Design Fundas
 

Similaire à Prospect Identification from a Credit Database using Regression, Decision Trees, And Neural Network

Automated Change Impact Analysis between SysML Models of Requirements and Design
Automated Change Impact Analysis between SysML Models of Requirements and DesignAutomated Change Impact Analysis between SysML Models of Requirements and Design
Automated Change Impact Analysis between SysML Models of Requirements and Design
Lionel Briand
 
Automating sap testing with qtp10 & qc10
Automating sap testing with qtp10 & qc10Automating sap testing with qtp10 & qc10
Automating sap testing with qtp10 & qc10
Patrick Sun
 
Automated Test Suite Generation for Time-Continuous Simulink Models
Automated Test Suite Generation for Time-Continuous Simulink ModelsAutomated Test Suite Generation for Time-Continuous Simulink Models
Automated Test Suite Generation for Time-Continuous Simulink Models
Lionel Briand
 
385125794-SAP-New-Asset-Accounting-Training-Document17-26-pdf.pdf
385125794-SAP-New-Asset-Accounting-Training-Document17-26-pdf.pdf385125794-SAP-New-Asset-Accounting-Training-Document17-26-pdf.pdf
385125794-SAP-New-Asset-Accounting-Training-Document17-26-pdf.pdf
ssuserad3af4
 
10 chapter05 counters_fa14
10 chapter05 counters_fa1410 chapter05 counters_fa14
10 chapter05 counters_fa14
John Todora
 

Similaire à Prospect Identification from a Credit Database using Regression, Decision Trees, And Neural Network (20)

Topic 4 Digital Technique Data conversion
Topic 4  Digital Technique Data conversionTopic 4  Digital Technique Data conversion
Topic 4 Digital Technique Data conversion
 
Automated Change Impact Analysis between SysML Models of Requirements and Design
Automated Change Impact Analysis between SysML Models of Requirements and DesignAutomated Change Impact Analysis between SysML Models of Requirements and Design
Automated Change Impact Analysis between SysML Models of Requirements and Design
 
Interesting and Useful Features of the DeltaV PID Controller
Interesting and Useful Features of the DeltaV PID ControllerInteresting and Useful Features of the DeltaV PID Controller
Interesting and Useful Features of the DeltaV PID Controller
 
Six sigma11
Six sigma11Six sigma11
Six sigma11
 
Wayfair-Data Science Project
Wayfair-Data Science ProjectWayfair-Data Science Project
Wayfair-Data Science Project
 
Interesting and Useful Features of the DeltaV PID, Ratio and Bias/Gain Contro...
Interesting and Useful Features of the DeltaV PID, Ratio and Bias/Gain Contro...Interesting and Useful Features of the DeltaV PID, Ratio and Bias/Gain Contro...
Interesting and Useful Features of the DeltaV PID, Ratio and Bias/Gain Contro...
 
Chapter 6 - Introduction to 8085 Instructions
Chapter 6 - Introduction to 8085 InstructionsChapter 6 - Introduction to 8085 Instructions
Chapter 6 - Introduction to 8085 Instructions
 
Adaptive Query Optimization
Adaptive Query OptimizationAdaptive Query Optimization
Adaptive Query Optimization
 
Automating sap testing with qtp10 & qc10
Automating sap testing with qtp10 & qc10Automating sap testing with qtp10 & qc10
Automating sap testing with qtp10 & qc10
 
Mechatronics and the Injection Moulding Machine
Mechatronics and the Injection Moulding MachineMechatronics and the Injection Moulding Machine
Mechatronics and the Injection Moulding Machine
 
Operations research
Operations researchOperations research
Operations research
 
Airline scheduling and pricing using a genetic algorithm
Airline scheduling and pricing using a genetic algorithmAirline scheduling and pricing using a genetic algorithm
Airline scheduling and pricing using a genetic algorithm
 
Automated Test Suite Generation for Time-Continuous Simulink Models
Automated Test Suite Generation for Time-Continuous Simulink ModelsAutomated Test Suite Generation for Time-Continuous Simulink Models
Automated Test Suite Generation for Time-Continuous Simulink Models
 
Customer choice probabilities
Customer choice probabilitiesCustomer choice probabilities
Customer choice probabilities
 
385125794-SAP-New-Asset-Accounting-Training-Document17-26-pdf.pdf
385125794-SAP-New-Asset-Accounting-Training-Document17-26-pdf.pdf385125794-SAP-New-Asset-Accounting-Training-Document17-26-pdf.pdf
385125794-SAP-New-Asset-Accounting-Training-Document17-26-pdf.pdf
 
Make streaming processing towards ANSI SQL
Make streaming processing towards ANSI SQLMake streaming processing towards ANSI SQL
Make streaming processing towards ANSI SQL
 
JF608: Quality Control - Unit 4
JF608: Quality Control - Unit 4JF608: Quality Control - Unit 4
JF608: Quality Control - Unit 4
 
Intelligent Production: Deploying IoT and cloud-based machine learning to opt...
Intelligent Production: Deploying IoT and cloud-based machine learning to opt...Intelligent Production: Deploying IoT and cloud-based machine learning to opt...
Intelligent Production: Deploying IoT and cloud-based machine learning to opt...
 
10 chapter05 counters_fa14
10 chapter05 counters_fa1410 chapter05 counters_fa14
10 chapter05 counters_fa14
 
TQM
TQMTQM
TQM
 

Dernier

Riding the Wave of AI Disruption - Navigating the AI Fear Cycle in Marketing ...
Riding the Wave of AI Disruption - Navigating the AI Fear Cycle in Marketing ...Riding the Wave of AI Disruption - Navigating the AI Fear Cycle in Marketing ...
Riding the Wave of AI Disruption - Navigating the AI Fear Cycle in Marketing ...
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
FULL ENJOY Call Girls In Majnu.Ka.Tilla Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu.Ka.Tilla Delhi Contact Us 8377877756FULL ENJOY Call Girls In Majnu.Ka.Tilla Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu.Ka.Tilla Delhi Contact Us 8377877756
dollysharma2066
 

Dernier (20)

Cash payment girl 9257726604 Hand ✋ to Hand over girl
Cash payment girl 9257726604 Hand ✋ to Hand over girlCash payment girl 9257726604 Hand ✋ to Hand over girl
Cash payment girl 9257726604 Hand ✋ to Hand over girl
 
Riding the Wave of AI Disruption - Navigating the AI Fear Cycle in Marketing ...
Riding the Wave of AI Disruption - Navigating the AI Fear Cycle in Marketing ...Riding the Wave of AI Disruption - Navigating the AI Fear Cycle in Marketing ...
Riding the Wave of AI Disruption - Navigating the AI Fear Cycle in Marketing ...
 
Social media, ppt. Features, characteristics
Social media, ppt. Features, characteristicsSocial media, ppt. Features, characteristics
Social media, ppt. Features, characteristics
 
The Future of Brands on LinkedIn - Alison Kaltman
The Future of Brands on LinkedIn - Alison KaltmanThe Future of Brands on LinkedIn - Alison Kaltman
The Future of Brands on LinkedIn - Alison Kaltman
 
FULL ENJOY Call Girls In Majnu.Ka.Tilla Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu.Ka.Tilla Delhi Contact Us 8377877756FULL ENJOY Call Girls In Majnu.Ka.Tilla Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu.Ka.Tilla Delhi Contact Us 8377877756
 
Digital-Marketing-Into-by-Zoraiz-Ahmad.pptx
Digital-Marketing-Into-by-Zoraiz-Ahmad.pptxDigital-Marketing-Into-by-Zoraiz-Ahmad.pptx
Digital-Marketing-Into-by-Zoraiz-Ahmad.pptx
 
How to utilize calculated properties in your HubSpot setups
How to utilize calculated properties in your HubSpot setupsHow to utilize calculated properties in your HubSpot setups
How to utilize calculated properties in your HubSpot setups
 
Uncover Insightful User Journey Secrets Using GA4 Reports
Uncover Insightful User Journey Secrets Using GA4 ReportsUncover Insightful User Journey Secrets Using GA4 Reports
Uncover Insightful User Journey Secrets Using GA4 Reports
 
A.I. and The Social Media Shift - Mohit Rajhans
A.I. and The Social Media Shift - Mohit RajhansA.I. and The Social Media Shift - Mohit Rajhans
A.I. and The Social Media Shift - Mohit Rajhans
 
What is Google Search Console and What is it provide?
What is Google Search Console and What is it provide?What is Google Search Console and What is it provide?
What is Google Search Console and What is it provide?
 
SEO Master Class - Steve Wiideman, Wiideman Consulting Group
SEO Master Class - Steve Wiideman, Wiideman Consulting GroupSEO Master Class - Steve Wiideman, Wiideman Consulting Group
SEO Master Class - Steve Wiideman, Wiideman Consulting Group
 
Google 3rd-Party Cookie Deprecation [Update] + 5 Best Strategies
Google 3rd-Party Cookie Deprecation [Update] + 5 Best StrategiesGoogle 3rd-Party Cookie Deprecation [Update] + 5 Best Strategies
Google 3rd-Party Cookie Deprecation [Update] + 5 Best Strategies
 
Navigating the SEO of Tomorrow, Competitive Benchmarking, China as an e-Comme...
Navigating the SEO of Tomorrow, Competitive Benchmarking, China as an e-Comme...Navigating the SEO of Tomorrow, Competitive Benchmarking, China as an e-Comme...
Navigating the SEO of Tomorrow, Competitive Benchmarking, China as an e-Comme...
 
Creator Influencer Strategy Master Class - Corinne Rose Guirgis
Creator Influencer Strategy Master Class - Corinne Rose GuirgisCreator Influencer Strategy Master Class - Corinne Rose Guirgis
Creator Influencer Strategy Master Class - Corinne Rose Guirgis
 
Generative AI Content Creation - Andrew Jenkins
Generative AI Content Creation - Andrew JenkinsGenerative AI Content Creation - Andrew Jenkins
Generative AI Content Creation - Andrew Jenkins
 
The Science of Landing Page Messaging.pdf
The Science of Landing Page Messaging.pdfThe Science of Landing Page Messaging.pdf
The Science of Landing Page Messaging.pdf
 
Unraveling the Mystery of the Hinterkaifeck Murders.pptx
Unraveling the Mystery of the Hinterkaifeck Murders.pptxUnraveling the Mystery of the Hinterkaifeck Murders.pptx
Unraveling the Mystery of the Hinterkaifeck Murders.pptx
 
Pillar-Based Marketing Master Class - Ryan Brock
Pillar-Based Marketing Master Class - Ryan BrockPillar-Based Marketing Master Class - Ryan Brock
Pillar-Based Marketing Master Class - Ryan Brock
 
Foundation First - Why Your Website and Content Matters - David Pisarek
Foundation First - Why Your Website and Content Matters - David PisarekFoundation First - Why Your Website and Content Matters - David Pisarek
Foundation First - Why Your Website and Content Matters - David Pisarek
 
Major SEO Trends in 2024 - Banyanbrain Digital
Major SEO Trends in 2024 - Banyanbrain DigitalMajor SEO Trends in 2024 - Banyanbrain Digital
Major SEO Trends in 2024 - Banyanbrain Digital
 

Prospect Identification from a Credit Database using Regression, Decision Trees, And Neural Network

  • 1. Data  Mining  Project   By:  Akanksha  Jain  
  • 2. Project  Goals   •  Goal  :  Using  historical  credit  data  set  SMALL,  develop  a  model  which  can   predict  whether  the  prospect  will  respond  to  a  markeBng  campaign  in   future   •  Scope:  SMALL  data  set   –  145  Variables   –  8000  observaBons   –  Dependent  Variable:  RESP_FLG  (Binary)   •  Responder:  1   •  Non-­‐Responder:  0  
  • 3. Tools     •  SAS  Enterprise  Miner  WorkstaBon  7.1   •  SAS  9.3_M1  
  • 4. Variable   Variable  DefiniBon   Defini,on   Type   AAl01  –  AAL17   All  Types   Char   AAU01  -­‐  AAU07   Auto   Char   ABK01  -­‐  ABK15   Bankcard   Char   ACE01  -­‐  ACE03   Cust  Elim   Char   ACL02  –  ACL12   CollecBon   Char   ADI01  –  ADI09   Derog  By  Ind   Char   AEQ01  -­‐  AEQ07   Home  Equity   Char   AHI01  -­‐  AHI05   Historical   Char   AIN01  -­‐  AIN15   Installmnt   Char   AIQ01  -­‐  AIQ05   Inquiries   Char   ALE01  -­‐  ALE07   Lease   Char   ALN01  -­‐  ALN07   LN  Finance   Char   AMG01  -­‐  AMG07   Mortgage   Char   APR17  -­‐  APR21   Public  REC   Char   ART01  -­‐  ART15   Retail   Char   ARV01  -­‐  ARV15   Revolving   Char   CUS04   Customer  Data   Char   SCORE01   FICO   Num   SCORE02   MDS  (Market  DeriveD  SignalS)   Num   RESP_FLG   Responder  Flag   Num  
  • 5. Data  Cleaning   •  Dataset  SMALL  has  missing  values  for  Variables  SCORE01  (FICO)  and   SCORE02  (MDS)       data  mylib.small_clean  mylib.small_bad;    set  mylib.small;    if  score01  =  .  or  score02  =  .  then  output  mylib.small_bad;    else  output  mylib.small_clean;   run;     LOG:     NOTE:  There  were  8000  observaBons  read  from  the  data  set  MYLIB.SMALL.   NOTE:  The  data  set  MYLIB.SMALL_CLEAN  has  5782  observa:ons  and  145  variables.   NOTE:  The  data  set  MYLIB.SMALL_BAD  has  2218  observa:ons  and  145  variables.   •  Going  forward,  will  use  dataset  SMALL_CLEAN   •  InvesBgate  separately  why  2218  observaBons  had  missing  values  for   SCORE01  and  SCORE02        
  • 7. Data  Source   •  Rejected  Variables  (have  more  than  20  categories):   –  ACE03   –  ACL10   •  Variable  RESP_FLG   –  Change  Role  to  TARGET   –  Change  Order  to  DESCENDING   •  Set  Prior  ProbabiliBes     –  Non  –  Responder/  event  =  “0”:  0.99   –  Responder/  event  =  “1”:  0.01  
  • 8. Data  ParBBon   •  Train  –  55%   •  Validate  –  35%   •  Test  –  10%  
  • 9. Model:  Maximum  CHAID   •  Nominal  Criterion:  ProbChiSq   •  Significance  Level:  0.2  
  • 10. Model:  Maximum  CHAID   On  the  Lel  side,  the  percentage  of  1’s  i.e.  Respondents  is  higher,  and  hence  people  with  FICO   score  <  700.5,  in  (0,  4,  5,  Missing)  category  of  RETAIL:  BAL  >  0  IN  6  MNTHS,  ALL  will  respond  to   the  markeBng  campaign  
  • 12. Maximum  CHAID:  Final  Variables  
  • 13. Model:  Pruned  CHAID   •  •  •  •  •    Nominal  Criterion:  ProbChiSq   Significance  Level:  0.2   Leaf  Size:  120   Split  Size:  300   Maximum  Depth:  3  
  • 16. Pruned  CHAID:  Final  Variables    
  • 17. Model:  CART   •  •  •  •  •      Nominal  Criterion:  Gini   Significance  Level:  0.2   Leaf  Size:  120   Split  Size:  300   Maximum  Depth:  3  
  • 21. Model:  C4.5   •  •  •  •  •      Nominal  Criterion:  Entropy   Significance  Level:  0.2   Leaf  Size:  120   Split  Size:  300   Maximum  Depth:  3  
  • 25. Variable  Comparison   Maximum  CHAID   Pruned  CHAID   CART   SCORE01   AMG07   ABK10   ART11   AAL04   AIQ04   ACE02   AAU03   AAL14   AEQ07   AEQ01   AIN03   ABK14   AIN10   SCORE01   AMG07   ALN01   SCORE01   AMG07   ABK10   SCORE02   AMG06   AMG01   AMG03   ARV10   ARV03   ARV01  
  • 27. Transform  Variables   •  Skewed  SCORE01  and  SCORE02   •  Transform  funcBon  –  LOG    
  • 28. Impute   •  Default  Input  Methods     –  For  Interval  Variables  –  Median   –  For  Class  Variables  -­‐  Count  
  • 30. Model:  Full  Model  Regression   •  •  •  •  •  •  Input  Coding  -­‐  GLM   MODEL  SELECTION  -­‐  None   OPTIMIZATION  OPTIONS  -­‐  TECHNIQUE  -­‐  Default   OPTIMIZATION  OPTIONS  -­‐  DEFAULT  OPTIMIZATION  -­‐  No   OPTIMIZATION  OPTIONS  -­‐  MAX  ITERATIONS  -­‐  20   OPTIMIZATION  OPTIONS  -­‐  MAX  FUNCTION  CALLS  -­‐  10  
  • 31. Full  Model  Regression:  CumulaBve  LIFT  
  • 32. Model:  Forward  Regression   •  •  •  •  •  •  •  •    Input  Coding  -­‐  GLM   MODEL  SELECTION  -­‐  Forward   SELECTION  CRITERION  -­‐  Akaike  InformaBon  Criterion   USE  SELECTION  DEFAULTS  -­‐  YES   OPTIMIZATION  OPTIONS  -­‐  TECHNIQUE  -­‐  Default   OPTIMIZATION  OPTIONS  -­‐  DEFAULT  OPTIMIZATION  -­‐  No   OPTIMIZATION  OPTIONS  -­‐  MAX  ITERATIONS  -­‐  20   OPTIMIZATION  OPTIONS  -­‐  MAX  FUNCTION  CALLS  -­‐  10  
  • 34. Forward  Regression:  CumulaBve  %   Captured  Response  
  • 36. Model:  Backward  Regression   •  •  •  •  •  •  •  •    Input  Coding  -­‐  GLM   MODEL  SELECTION  -­‐  Backward   SELECTION  CRITERION  -­‐  Akaike  InformaBon  Criterion   USE  SELECTION  DEFAULTS  -­‐  YES   OPTIMIZATION  OPTIONS  -­‐  TECHNIQUE  -­‐  Default   OPTIMIZATION  OPTIONS  -­‐  DEFAULT  OPTIMIZATION  -­‐  No   OPTIMIZATION  OPTIONS  -­‐  MAX  ITERATIONS  -­‐  20   OPTIMIZATION  OPTIONS  -­‐  MAX  FUNCTION  CALLS  -­‐  10  
  • 38. Backward  Regression:  CumulaBve  %   Captured  Response  
  • 40. Model:  Stepwise  Regression   •  •  •  •  •  •  •  •  •  Input  Coding  -­‐  GLM   MODEL  SELECTION  -­‐  Stepwise   SELECTION  CRITERION  -­‐  Akaike  InformaBon  Criterion   USE  SELECTION  DEFAULTS  -­‐  No   MODEL  SELECTION  -­‐  SELECTION  OPTIONS     –  ENTRY  SIGNIFICANCE  LEVEL  =  0.15   –  STAY  SIGNIFICANCE  LEVEL  =  0.05   –  MAXIMUM  NUMBER  OF  STEPS  =  300   OPTIMIZATION  OPTIONS  -­‐  TECHNIQUE  -­‐  Default   OPTIMIZATION  OPTIONS  -­‐  DEFAULT  OPTIMIZATION  -­‐  No   OPTIMIZATION  OPTIONS  -­‐  MAX  ITERATIONS  -­‐  20   OPTIMIZATION  OPTIONS  -­‐  MAX  FUNCTION  CALLS  -­‐  10  
  • 42. Stepwise  Regression:  CumulaBve  %   Captured  Response  
  • 44. Variable  Comparison   Forward   Backward   Stepwise   AAL11   ACE01   AAL11   ACE01   AEQ01   ACE01   AEQ01   AEQ07   AEQ01   AEQ07   AHI01   AEQ07   AHI01   ALN01   AHI01   ALN01   AMG01   ALN01   AMG01   AMG07   AMG01   AMG07   APR20   AMG07   APR20   LOG_SCORE01   APR20   ART11   AEQ03   ART11   LOG_SCORE01   AEQ04   LOG_SCORE01   AEQ02   ALE01   ALE02  
  • 45. InteracBon  Terms   •  •  •  •  •  •  log_score01  *  log_score01   log_score01  *  ace01   log_score01  *  amg01   log_score01  *  ahi01   log_score01  *  log_score02   log_score02  *  log_score02  
  • 46. Model:  Forward  Reg  InteracBon   •  •  •  •  •  •  •  •  •  •  EQUATION  -­‐  USER  TERMS  -­‐  YES   EQUATION  -­‐  TERM  EDITOR  -­‐  Enter  InteracBon  Terms   Input  Coding  -­‐  GLM   MODEL  SELECTION  -­‐  Forward  Reg  InteracBon   SELECTION  CRITERION  -­‐  Akaike  InformaBon  Criterion   USE  SELECTION  DEFAULTS  -­‐  YES   OPTIMIZATION  OPTIONS  -­‐  TECHNIQUE  -­‐  Default   OPTIMIZATION  OPTIONS  -­‐  DEFAULT  OPTIMIZATION  -­‐  No   OPTIMIZATION  OPTIONS  -­‐  MAX  ITERATIONS  -­‐  20   OPTIMIZATION  OPTIONS  -­‐  MAX  FUNCTION  CALLS  -­‐  10  
  • 47. Forward  Reg  InteracBon:  CumulaBve   LIFT  
  • 48. Forward  Reg  InteracBon:  CumulaBve   %  Captured  Response  
  • 49. Forward  Reg  InteracBon:  Final   Variables  
  • 50. Model:  Backward  Reg  InteracBon   •  •  •  •  •  •  •  •  •  •  EQUATION  -­‐  USER  TERMS  -­‐  YES   EQUATION  -­‐  TERM  EDITOR  -­‐  Enter  InteracBon  Terms   Input  Coding  -­‐  GLM   MODEL  SELECTION  -­‐  Backward  Reg  InteracBon   SELECTION  CRITERION  -­‐  Akaike  InformaBon  Criterion   USE  SELECTION  DEFAULTS  -­‐  YES   OPTIMIZATION  OPTIONS  -­‐  TECHNIQUE  -­‐  Default   OPTIMIZATION  OPTIONS  -­‐  DEFAULT  OPTIMIZATION  -­‐  No   OPTIMIZATION  OPTIONS  -­‐  MAX  ITERATIONS  -­‐  20   OPTIMIZATION  OPTIONS  -­‐  MAX  FUNCTION  CALLS  -­‐  10  
  • 51. Backward  Reg  InteracBon:  CumulaBve   LIFT  
  • 52. Backward  Reg  InteracBon:  CumulaBve   %  Captured  Response  
  • 53. Backward  Reg  InteracBon:  Final   Variables  
  • 54. Model:  Stepwise  Reg  InteracBon   •  •  •  •  •  •  •  •  •  •  •  EQUATION  -­‐  USER  TERMS  -­‐  YES   EQUATION  -­‐  TERM  EDITOR  -­‐  Enter  InteracBon  Terms   Input  Coding  -­‐  GLM   MODEL  SELECTION  -­‐  Stepwise  Reg  InteracBon   SELECTION  CRITERION  -­‐  Akaike  InformaBon  Criterion   USE  SELECTION  DEFAULTS  -­‐  No   MODEL  SELECTION  -­‐  SELECTION  OPTIONS  –   –  ENTRY  SIGNIFICANCE  LEVEL  =  0.15   –  STAY  SIGNIFICANCE  LEVEL  =  0.05   –  MAXIMUM  NUMBER  OF  STEPS  =  300   OPTIMIZATION  OPTIONS  -­‐  TECHNIQUE  -­‐  Default   OPTIMIZATION  OPTIONS  -­‐  DEFAULT  OPTIMIZATION  -­‐  No   OPTIMIZATION  OPTIONS  -­‐  MAX  ITERATIONS  -­‐  20   OPTIMIZATION  OPTIONS  -­‐  MAX  FUNCTION  CALLS  -­‐  10  
  • 55. Stepwise  Reg  InteracBon:  CumulaBve   LIFT  
  • 56. Stepwise  Reg  InteracBon:  CumulaBve   %  Captured  Response  
  • 57. Stepwise  Reg  InteracBon:  Final   Variables  
  • 58. Variable  Comparison   Forward_Interac,on   Backward_Interac,on   Stepwise_Interac,on   LOG_SCORE01*ACE01   AEQ01   LOG_SCORE01*ACE01   LOG_SCORE02*AMG01   AEQ07   LOG_SCORE02*AMG01   AAL11   AHI01   AAL11   AEQ01   ALN01   AEQ01   AEQ07   AMG07   AEQ07   AHI01   APR20   AHI01   ALN01   LOG_SCORE01*LOG_SCORE01   ALN01   AMG07   LOG_SCORE01*AHI01   AMG07   APR20   ACE01   APR20   ART11   AEQ03   ART11   AEQ02   AEQ04   ALE01   ALE02   AMG01   LOG_SCORE01  
  • 59. Model:  Neural  Network   •  •  •  •  •  NETWORK  -­‐  DIRECT  CONNECTION  =  Yes   OPTIMIZATION  -­‐  PRELIMINARY  TRAINING  -­‐  ENABLE  =  No   OPTIMIZATION  -­‐  Maximum  IteraBons  =  50   OPTIMIZATION  -­‐  PRELIMINARY  TRAINING  -­‐  Number  of  Runs  =  10   MODEL  SELECTION  CRITERION  -­‐  MisclassificaBon  
  • 61. Neural  Network:  CumulaBve  %   Captured  Response  
  • 62. Neural  Network:  Average  Square  Error   If  we  increase  the  number  of  iteraBons,  then  the  average  square  error  decreases  for  TRAIN  but   increases  for  VALIDATE  data  set        
  • 63. Ensemble  Node   Select  the  model  that  performs  best  in   –  Decision  Trees   –  Regression   –  Regression  with  InteracBon  Terms       Build  an  Ensemble  Node  on:   –  Pruned  Chaid   –  Forward  Regression   –  Forward  Reg  InteracBon  
  • 65. Ensemble  Node:  CumulaBve  %   Captured  Response  
  • 66. Model  Comparison   •  ASSESSMENT  REPORTS  -­‐  NUMBER  OF  BINS  =  50   •  MODEL  SELECTION  -­‐  SELECTION  STATISTIC  =  MISCLASSIFICATION  RATE   •  Comparing  LIFT  at  top  20%  
  • 68. Model  Comparison:  CumulaBve  LIFT   (Train)  
  • 69. Model  Comparison:  CumulaBve  LIFT   (Validate)  
  • 70. Model  Comparison:  CumulaBve  LIFT   (Test)  
  • 71. Model  Comparison:  Conclusion   •  TRAIN:     –  Ensemble  works  best,  followed  by  Forward  Regression   –  Check  for  Validate  and  Test  results  to  finalize  the  model   •  VALIDATE  and  TEST   –  Forward  Regression  works  bever  than  Ensemble  
  • 72. Final  Model   •  Forward  Regression   •  List  of  Variables:   –  AAL11   –  –  –  –  –  –  –  –  –  –  –  ACE01   AEQ01   AEQ07   AHI01   ALN01   AMG01   AMG07   APR20   ART11   LOG_SCORE01   AEQ02  
  • 73. SCORE   •  In  Model  Comparison  Node-­‐  SCORE  -­‐>  SELECTION  EDITOR     •  Do  YES  for  Forward  Regression  and  NO  for  Stepwise  Reg  InteracBon   (which  was  selected  by  default)   •  Connect  Model  Comparison  with  SCORE,  and  Run  it   •  Get  OpBmized  SAS  code  
  • 74. Model  Performance   •  PROC  RANK   –  Rank  2:  Top  1/3rd  responders