This document describes analyzing a donor dataset to predict whether a prospect will donate or not donate. It discusses the project goals, tools used, data preprocessing steps, and various predictive models tested including CHAID, forward regression, backward regression, and stepwise regression. The models are compared on their ability to capture the top donors based on cumulative lift charts. Forward regression performed best at capturing the top 20% of donors, while backward regression captured the top 30% best. Adding an additional correlated variable to forward regression did not significantly improve performance.
2. Project
Goals
• Goal:
Using
historical
data
set
DONOR_RAW,
develop
a
model
which
can
predict
whether
the
prospect
will
donate/
not
donate
• Scope:
DONOR_RAW
data
set
• 50
Variables
• 19,372
observaKons
• Dependent
Variable:
TARGET_B(Binary)
• Responder:
1
• Non-‐Responder:
0
6. Data
Source
• Reject
Variables:
• TARGET_D
(using
TARGET_B
as
target)
• ID
(an
id
number)
• WEALTH_RATING
(huge
no.
of
missing
values)
• Variable
TARGET_B
• Change
Role
to
TARGET
• Change
Order
to
DESCENDING
• Select
complete
data
set
as
Sample
• Set
Prior
ProbabiliKes
• Responder:
0.05
• Non-‐Responder:
0.95
8. Variable
Transformation
Taking
Log
TransformaKon
to
reduce
Skewness
• LIFETIME_GIFT_RANGE
• LIFETIME_MAX_GIFT_AMT
• LIFETIME_MIN_GIFT_AMT
• MOR_HIT_RATE
• FILE_AVG_GIFT
• LIFETIME_AVG_GIFT_AMT
• PCT_ATTRIBUTE1
• LAST_GIFT_AMT
• RECENT_AVG_GIFT_AMT
Keep
all
variables,
original
and
log
transformaKons
9. Model:
CHAID
• Nominal
Criterion:
Chi
Square
• Significance
Level:
0.1
• Minimum
number
of
observaKons
in
a
leaf
=
25
• ObservaKons
required
for
a
split
search
=
55
• Model
assessment
measure:
Total
Leaf
Impurity
(Gini
Index)
11. Model:
CHAID
(con’t)
Inference:
FREQUENCY_STATUS_97NK
=
3
or
4;
MONTHS_SINCE_LAST_GIFT
<
8.5
1%
=
56%
Less
MarkeKng
Effort
needed
as
most
likely
that
they
will
donate
anyways
FREQUENCY_STATUS_97NK
=
3
or
4;
MONTHS_SINCE_LAST_GIFT
>=
8.5;
NUMBER_PROM_12
<11.5
1%
=
43%
Will
also
donate
but
the
company
should
be
careful
and
not
send
them
too
many
promoKons
FREQUENCY_STATUS_97NK
=
3
or
4;
MONTHS_SINCE_LAST_GIFT
>=
8.5;
NUMBER_PROM_12
>=
11.5
1%
=
30%
Are
geong
too
many
promoKons;
and
hence
company
should
cut
on
sending
them
promoKons
FREQUENCY_STATUS_97NK
=
1,
2
or
Missing
1%
=
21%
Study
them
more
closely
as
in
why
they
are
not
donaKng,
what
other
factors
are
responsible
and
then
decide
how
to
design
a
markeKng
campaign
for
them.
30. Model
Comparison
(Validation):
Cumulative
LIFT
Inference:
• Capture
top
20%
of
the
market
-‐>For_1Extra
• Capture
top
30%
of
the
market
-‐>
BACKWARD
39. Model
Comparison
(Validation):
Cumulative
LIFT
Inference:
• Capture
top
20%
of
the
market
-‐>FOR_1E_INT
• Capture
top
30%
of
the
market
-‐>
FOR_1EXTRA
44. Model
Comparison
(Validation):
Cumulative
LIFT
Inference:
• Capture
top
20%
of
the
market
-‐>
FOR_1E_INT
• Capture
top
30%
of
the
market
-‐>
FOR_UNION