10 best practices in operational analytics

Webinar: 10 best
practices in
operational
James Taylor,
analytics
CEO

Your presenters
James Taylor
CEO of Decision Management Solutions. James works with
clients to improve their business by applying analytics and
analytic technology to automate and improve decisions. He
has spent the last 8 years developing the concept of Decision
Management and has 20 years experience in all aspects of
software.

Dean Abbott
Owner of Abbott Analytics. Dean has applied Data Mining and
Predictive Analytics for 22 years and provides mentoring,
coaching, and solutions for Web Analytics, Compliance,
Fraud Detection, Survey Analysis, Text Mining, Marketing and
CRM analytics and more. Dean has partnerships with the
largest predictive analytics organizations in the US.

©2011 Decision Management Solutions 2

AGENDA

1 Introducing
Operational
Analytics

2 The 10 Best
Practices

3 Wrap up

The 10 Best Practices
1. Be flexible; data mining is not a set of rules!
2. Avoid 3 key data preparation, modeling mistakes
3. Diversity is strength: build lots of models
4. Pick the right metric to assess models
5. Have deployment in mind when building models
6. Focus on actions
7. The three legged stool
8. Focus on explicability
9. Build in decision analysis
10. BWTDIM

Introducing
Operational Analytics

©2011 Decision Management Solutions
5

Analytics have power

Online Acquisition Campaign
Conversion Rates Response

Risk Customer
Fraud
Churn


And that power is operational
How do I…
prevent this customer from churning?
convert this visitor?
acquire this prospect?
make this offer compelling to this person?
identify this claim as fraudulent?
correctly estimate the risk of this loan?

It’s not about “aha” moments
It’s about making better operational decisions

Multiplying the power of analytics
Type

Strategy

Tactics

Operations

Low Economic impact High


Operational decisions matter

“Most discussions of decision making assume
that only senior executives make decisions
or that only senior executives’ decisions
matter. This is a dangerous mistake.”

Peter Drucker


10 Best Practices

©2011 Decision Management Solutions
10

Be Flexible: Data Mining is
Not a Series of Recipes
Data Mining Project Entry
Points:
1) Business Understanding
2) Data Understanding
Business Data
Understanding Understanding

Data Mining Project Next Data
Data Preparation
Steps: Data
Deployment Data
1) Data Understanding
Modeling
2) Modeling, then Data
Preparation
Evaluation
3) Data Preparation, then Data
Understanding, then
Modeling

11

Avoid The Three Biggest
Data Preparation Mistakes
1. Don’t blindly use data mining software
defaults
– Missing data
 Is the record with missing values in one of the fields
kept at all?
 What value is filled in? What effect will this have?
– Exploding categorical variables with large
numbers of values – what happens to the
models?

12

Some Software Fills Missing
Values Automatically
 Common automated
missing value imputation:
– 0, mid-point, mean, or
listwise deletion
 Example at upper right
has 5300+ records, 17
missing values encoded
as ―0‖
 Afterfixing model with
mean imputation, R^2
rises from 0.597 to 0.657
13

2. Don’t forget some algorithms assume the
distributions for data
– Some algorithms assume normally distributed
data: linear regression, Bayes and Nearest Mean
classifiers

14

How Non-normality affects
Regression Models

Regression models
―fit‖ is worse with
skewed (non-
normal) data
– In example at right,
by simply applying
the log transform,
performance is
improved from
R^2=0.566 to 0.597
15

classifiers
– Distance-based algorithms are strongly influenced
by outliers and skewed distributions: k-Nearest
Neighbor, k-Means, the above algorithms

16

classifiers
– Distance-based algorithms are strongly influenced
by outliers and skewed distributions: k-Nearest
Neighbor, k-Means, the above algorithms
– Some algorithms require categorical data (rather
than numeric): Naïve Bayes, CHAID, Apriori

17

Avoid The Three Biggest Data
Preparation Mistakes
3. Don’t assume algorithms can ―figure out‖
patterns on their own
– Features fix data distribution problems
– Features present data (information) to
modeling algorithms in ways they perhaps can
never identify themselves
 Interactions, record-connecting and temporal
features, non-linear transformations

18

What are Model Ensembles?

 Combining outputs from multiple models into single
decision
 Models can be created using the same algorithm, or
several different algorithms

Decision Logic

Ensemble Prediction
19

Motivation for Ensembles

 Performance, performance, performance
 Single model sometimes provide insufficient
accuracy
– Neural networks become stuck in local minima
– Decision trees run out of data
– Single algorithms keep pushing performance using
the same ideas (basis function / algorithm), and
are incapable of ―thinking outside of their box‖
 Often, different algorithms achieve the same
level of accuracy but on different cases—they
identify different ways to get the same level
of accuracy
20

Four Keys to Effective
Ensembling

 Diversity of opinion
 Independence
 Decentralization
 Aggregation

 From The Wisdom of Crowds, James
Surowiecki
21

Bagging

 Bagging Method
– Create many data sets by
bootstrapping (can also do this
with cross validation)
– Create one decision tree for
each data set
– Combine decision trees by
averaging (or voting) final
decisions
– Primarily reduces model
variance rather than bias
 Results
Final
– On average, better than any Answer
individual tree (average)

22

Boosting (Adaboost)

 Boosting Method
– Creating tree using training data set Reweight
examples
– Score each data point, indicating when each where
incorrect decision is made (errors) classification
– Retrain, giving rows with incorrect decisions incorrect
more weight. Repeat
Combine
– Final prediction is a weighted average of all models via
models-> model regularization. weighted sum
– Best to create ―weak‖ models—simple models
(just a few splits for a decision tree) and let
the boosting iterations find the complexity.
– Often used with trees or Naïve Bayes
 Results
– Usually better than individual tree or Bagging

23

Random Forest Ensembles

 Random Forest (RF) Method
– Exact same methodology as
Bagging, but with a twist
– At each split, rather than using the
entire set of candidate inputs, use
a random subset of candidate
inputs
– Generates diversity of samples and
inputs (splits)
 Results
– On average, better than any Final
individual tree, Bagging, or even Answer
Boosting (average)

24

Model Ensembles:
The Good and the Bad

 Pro
– Can significantly reduce model error
– Can be easy to automate -- already has been done
in many commercial tools using Boosting, Bagging,
ARCing, RF
 Con
– Model interpretability is lost (if there was any)
– If not done automatically, can be very time
consuming to generate dozens of models to combine

25

Ensembles of Trees: Smoothers

 Ensembles smooth jagged decision boundaries

Picture from
T.G. Dietterich. Ensemble methods in machine learning. In Multiple Classier
Systems, Cagliari, Italy, 2000.
26

Heterogeneous Model
Ensembles on Glass Data

 Model prediction diversity
obtained by using different
algorithms: tree, NN, RBF,
Gaussian, Regression, k-NN
 Combining 3-5 models on
average better than best
single model
 Combining all 6 models not
best (best is 3&4 model
combination), but is close
 The is an example of reducing
model variance through
ensembles, but not model
bias

27

The Conflict with
Data Mining Algorithm Objectives

Algorithm Objectives
– Linear Regression and
Neural networks minimize
squared error
– C5 minimizes entropy
– CART minimizes Gini index
– Logistic regression
maximizes the log of the
odds of the probability the
record belongs to class ―1‖
(classification accuracy)
– Nearest neighbor
minimizes Euclidean
distance

28

The Conflict with
Data Mining Algorithm Objectives

Algorithm Objectives Business Objectives
– Linear Regression and – Maximize net revenue
Neural networks minimize – Achieve cumulative
squared error response rate of 13%
– C5 minimizes entropy – Maximize responders
– CART minimizes Gini index subject to a budget of
– Logistic regression $100,000
maximizes the log of the – Maximize savings from
odds of the probability the identifying customer likely
record belongs to class ―1‖ to churn
(classification accuracy) – Maximize collected revenue
– Nearest neighbor by identifying next best
minimizes Euclidean case to collect
distance – Minimize false alarms in
top 100 hits
– Maximize hits subject to a
false alarm rate of 1 in
1,000,000
29

Possible Solutions to Business Objective
/ Data Mining Objective Mismatch

Model Ranking Metric Model Building Considerations
1. Rank models by algorithm 1. Force the data into the
objectives, ignoring business algorithm box, and hope the
objectives, and hope the winner does a good job in
models do a good enough job reality

2. Use optimization algorithms to 2. Throw away very nice theory of
maximize/minimize directly the data mining algorithms, and
business objective hope the optimization
algorithms converge well

3. Build models normally, but rank 3. Take your lumps with
models by business objectives, algorithms not quite doing what
ignoring their ―natural‖ we want them to do, but take
algorithm score, hoping that advantage of the power and
some algorithms do well efficiency of algorithms
enough at scoring by business
objective
30

Model Comparison Example:
Rankings Tell Different Stories

 Top RMS model is 9th in AUC, 2nd Test RMS rank is 42nd in AUC
 Correlation between rankings:

31

Model Deployment Methods

 In data mining software application itself
– Pro: Easy--same processing done as in building model
– Con: Slowest method of implementation with large data
 In database or real-time system
– Model encoded in Predictive Model Markup Language (PMML) --
http://www.dmg.org/
 A database becomes the run-time engine
 Typically for model only, though PMML supports
data preparation and cleansing functions as well
– SQL code
– Model encoded in ―wrapper‖, run via calls from database,
transaction system, or operating system
 Batch run or source code

 Run-time engine
– Often part of data mining software package itself
32

Sample PMML Code

33

Typical Predictive Model
Deployment Processing Flow

Select Clean Data
Import/Select
Fields (missing,
Data to Score
Data to Needed recodes, …)
Score
The key: reproduce all Re-create
data pre-processing done Derived
to build the models Variables

Decile**
Score*
Scored Scored
Data
Data Data

34

Knowing is not enough

Those who know first, win

Those who ACT first, win
Provided they act intelligently


Avoid the insight-to-action gap


Analytic insights must drive action

?


Business rules drive decisions

Decision Regulations
Policy

History

Experience
Legacy
Applications

Three legged stools need three legs


Operational decisions at the center

Business


Monitoring and compliance


Scorecards are a powerful tool
Years Under Contract Years Under Contract
1 0 1 0
2 5 2 5
More than 2 10 More than 2 10
Number of Contract Changes Number of Contract Changes
0 0 0 0
1 5 1 5
More than 1 10 More than 1 10
Value Rating of Current Plan Value Rating of Current Plan
Poor 0 Poor 0
Good 10 Good 10
Excellent 20 Excellent 20
Score Score 30

©2011 Decision Management Solutions Fig 5.4
Smart (Enough) Systems, Prentice Hall June 2007. 42

Why use a scorecard?
Reason Codes Simplicity
•Return the most important •Easy to use and explain
reason(s) for a score •Easy to implement
•Explaining results •Although not necessarily easy to
build
Transparency Compact
•It is really clear how a score card •One score card can often replace
got its result many rules and tables
•The complete workings of a score •One artifact for one prediction
card can be logged

Compliance Familiar
•Easy to enforce rules about use of •Analytic teams are used to
specific attributes developing score cards
•Easy to remove rough edges •Regulators and business owners
are used to reviewing them


Continuous improvement


Don’t start by focusing on the data

Better
decision

Analytic
insight

Derived
information

Available
data


Start by focusing on the value

Better
decision

Analytic
insight
Analytic
Derived insight
Derived
information
Available information
data

Available
data


The 10 Best Practices
1. Be flexible; data mining is not a set of rules!
2. Avoid 3 key data preparation, modeling
mistakes
3. Diversity is strength: build lots of models
4. Pick the right metric to assess models
5. Have deployment in mind when building
models
6. Focus on actions
7. The three legged stool
8. Focus on explicability
9. Build in decision analysis

Action Plan

Identify your decisions
before analytics

Adopt business rules to
implement analytics

Bring business, analytic
and IT people together

Let us know if we can help
Decision Management Solutions can help you
Focus on the right decisions
Implement a blueprint
Define a strategy
http://www.decisionmanagementsolutions.com

Abbott Analytics can help you
Find the right software
Define a strategy
Learn the ropes
http://www.abbottanalytics.com

Thank you!

James Taylor, CEO
james@decisionmanagementsolutions.com
www.decisionmangementsolutions.com/learnmo
re

10 best practices in operational analytics

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à 10 best practices in operational analytics

Similaire à 10 best practices in operational analytics (20)

Plus de Decision Management Solutions

Plus de Decision Management Solutions (20)

Dernier

Dernier (20)

10 best practices in operational analytics

Notes de l'éditeur