One of the most powerful ways to apply advanced analytics is by putting them to work in operational systems. Using analytics to improve the way every transaction, every customer, every website visitor is handled is tremendously effective. The multiplicative effect means that even small analytic improvements add up to real business benefit.
This is the slide deck from the Webinar. James Taylor, CEO of Decision Management Solutions, and Dean Abbott of Abbott Analytics discuss 10 best practices to make sure you can effectively build and deploy analytic models into you operational systems. webinar recording available here: https://decisionmanagement.omnovia.com/archives/70931
11. Be Flexible: Data Mining is
Not a Series of Recipes
Data Mining Project Entry
Points:
1) Business Understanding
2) Data Understanding
Business Data
Understanding Understanding
Data Mining Project Next Data
Data Preparation
Steps: Data
Deployment Data
1) Data Understanding
Modeling
2) Modeling, then Data
Preparation
Evaluation
3) Data Preparation, then Data
Understanding, then
Modeling
11
12. Avoid The Three Biggest
Data Preparation Mistakes
1. Don’t blindly use data mining software
defaults
– Missing data
Is the record with missing values in one of the fields
kept at all?
What value is filled in? What effect will this have?
– Exploding categorical variables with large
numbers of values – what happens to the
models?
12
13. Some Software Fills Missing
Values Automatically
Common automated
missing value imputation:
– 0, mid-point, mean, or
listwise deletion
Example at upper right
has 5300+ records, 17
missing values encoded
as ―0‖
Afterfixing model with
mean imputation, R^2
rises from 0.597 to 0.657
13
14. Avoid The Three Biggest
Data Preparation Mistakes
2. Don’t forget some algorithms assume the
distributions for data
– Some algorithms assume normally distributed
data: linear regression, Bayes and Nearest Mean
classifiers
14
15. How Non-normality affects
Regression Models
Regression models
―fit‖ is worse with
skewed (non-
normal) data
– In example at right,
by simply applying
the log transform,
performance is
improved from
R^2=0.566 to 0.597
15
16. Avoid The Three Biggest
Data Preparation Mistakes
2. Don’t forget some algorithms assume the
distributions for data
– Some algorithms assume normally distributed
data: linear regression, Bayes and Nearest Mean
classifiers
– Distance-based algorithms are strongly influenced
by outliers and skewed distributions: k-Nearest
Neighbor, k-Means, the above algorithms
16
17. Avoid The Three Biggest
Data Preparation Mistakes
2. Don’t forget some algorithms assume the
distributions for data
– Some algorithms assume normally distributed
data: linear regression, Bayes and Nearest Mean
classifiers
– Distance-based algorithms are strongly influenced
by outliers and skewed distributions: k-Nearest
Neighbor, k-Means, the above algorithms
– Some algorithms require categorical data (rather
than numeric): Naïve Bayes, CHAID, Apriori
17
18. Avoid The Three Biggest Data
Preparation Mistakes
3. Don’t assume algorithms can ―figure out‖
patterns on their own
– Features fix data distribution problems
– Features present data (information) to
modeling algorithms in ways they perhaps can
never identify themselves
Interactions, record-connecting and temporal
features, non-linear transformations
18
19. What are Model Ensembles?
Combining outputs from multiple models into single
decision
Models can be created using the same algorithm, or
several different algorithms
Decision Logic
Ensemble Prediction
19
20. Motivation for Ensembles
Performance, performance, performance
Single model sometimes provide insufficient
accuracy
– Neural networks become stuck in local minima
– Decision trees run out of data
– Single algorithms keep pushing performance using
the same ideas (basis function / algorithm), and
are incapable of ―thinking outside of their box‖
Often, different algorithms achieve the same
level of accuracy but on different cases—they
identify different ways to get the same level
of accuracy
20
21. Four Keys to Effective
Ensembling
Diversity of opinion
Independence
Decentralization
Aggregation
From The Wisdom of Crowds, James
Surowiecki
21
22. Bagging
Bagging Method
– Create many data sets by
bootstrapping (can also do this
with cross validation)
– Create one decision tree for
each data set
– Combine decision trees by
averaging (or voting) final
decisions
– Primarily reduces model
variance rather than bias
Results
Final
– On average, better than any Answer
individual tree (average)
22
23. Boosting (Adaboost)
Boosting Method
– Creating tree using training data set Reweight
examples
– Score each data point, indicating when each where
incorrect decision is made (errors) classification
– Retrain, giving rows with incorrect decisions incorrect
more weight. Repeat
Combine
– Final prediction is a weighted average of all models via
models-> model regularization. weighted sum
– Best to create ―weak‖ models—simple models
(just a few splits for a decision tree) and let
the boosting iterations find the complexity.
– Often used with trees or Naïve Bayes
Results
– Usually better than individual tree or Bagging
23
24. Random Forest Ensembles
Random Forest (RF) Method
– Exact same methodology as
Bagging, but with a twist
– At each split, rather than using the
entire set of candidate inputs, use
a random subset of candidate
inputs
– Generates diversity of samples and
inputs (splits)
Results
– On average, better than any Final
individual tree, Bagging, or even Answer
Boosting (average)
24
25. Model Ensembles:
The Good and the Bad
Pro
– Can significantly reduce model error
– Can be easy to automate -- already has been done
in many commercial tools using Boosting, Bagging,
ARCing, RF
Con
– Model interpretability is lost (if there was any)
– If not done automatically, can be very time
consuming to generate dozens of models to combine
25
26. Ensembles of Trees: Smoothers
Ensembles smooth jagged decision boundaries
Picture from
T.G. Dietterich. Ensemble methods in machine learning. In Multiple Classier
Systems, Cagliari, Italy, 2000.
26
27. Heterogeneous Model
Ensembles on Glass Data
Model prediction diversity
obtained by using different
algorithms: tree, NN, RBF,
Gaussian, Regression, k-NN
Combining 3-5 models on
average better than best
single model
Combining all 6 models not
best (best is 3&4 model
combination), but is close
The is an example of reducing
model variance through
ensembles, but not model
bias
27
28. The Conflict with
Data Mining Algorithm Objectives
Algorithm Objectives
– Linear Regression and
Neural networks minimize
squared error
– C5 minimizes entropy
– CART minimizes Gini index
– Logistic regression
maximizes the log of the
odds of the probability the
record belongs to class ―1‖
(classification accuracy)
– Nearest neighbor
minimizes Euclidean
distance
28
29. The Conflict with
Data Mining Algorithm Objectives
Algorithm Objectives Business Objectives
– Linear Regression and – Maximize net revenue
Neural networks minimize – Achieve cumulative
squared error response rate of 13%
– C5 minimizes entropy – Maximize responders
– CART minimizes Gini index subject to a budget of
– Logistic regression $100,000
maximizes the log of the – Maximize savings from
odds of the probability the identifying customer likely
record belongs to class ―1‖ to churn
(classification accuracy) – Maximize collected revenue
– Nearest neighbor by identifying next best
minimizes Euclidean case to collect
distance – Minimize false alarms in
top 100 hits
– Maximize hits subject to a
false alarm rate of 1 in
1,000,000
29
30. Possible Solutions to Business Objective
/ Data Mining Objective Mismatch
Model Ranking Metric Model Building Considerations
1. Rank models by algorithm 1. Force the data into the
objectives, ignoring business algorithm box, and hope the
objectives, and hope the winner does a good job in
models do a good enough job reality
2. Use optimization algorithms to 2. Throw away very nice theory of
maximize/minimize directly the data mining algorithms, and
business objective hope the optimization
algorithms converge well
3. Build models normally, but rank 3. Take your lumps with
models by business objectives, algorithms not quite doing what
ignoring their ―natural‖ we want them to do, but take
algorithm score, hoping that advantage of the power and
some algorithms do well efficiency of algorithms
enough at scoring by business
objective
30
31. Model Comparison Example:
Rankings Tell Different Stories
Top RMS model is 9th in AUC, 2nd Test RMS rank is 42nd in AUC
Correlation between rankings:
31
32. Model Deployment Methods
In data mining software application itself
– Pro: Easy--same processing done as in building model
– Con: Slowest method of implementation with large data
In database or real-time system
– Model encoded in Predictive Model Markup Language (PMML) --
http://www.dmg.org/
A database becomes the run-time engine
Typically for model only, though PMML supports
data preparation and cleansing functions as well
– SQL code
– Model encoded in ―wrapper‖, run via calls from database,
transaction system, or operating system
Batch run or source code
Run-time engine
– Often part of data mining software package itself
32
34. Typical Predictive Model
Deployment Processing Flow
Select Clean Data
Import/Select
Fields (missing,
Data to Score
Data to Needed recodes, …)
Score
The key: reproduce all Re-create
data pre-processing done Derived
to build the models Variables
Decile**
Score*
Scored Scored
Data
Data Data
34
52. Thank you!
James Taylor, CEO
james@decisionmanagementsolutions.com
www.decisionmangementsolutions.com/learnmo
re
Notes de l'éditeur
Webinar: 10 best practices in operational analyticsOne of the most powerful ways to apply advanced analytics is by putting them to work in operational systems. Using analytics to improve the way every transaction, every customer, every website visitor is handled is tremendously effective. The multiplicative effect means that even small analytic improvements add up to real business benefit.In this session James Taylor and Dean Abbott will provide you with 10 best practices to make sure you can effectively build and deploy analytic models into you operational systems.<Quick overview on operational decisions and the value of analytics in operational systems>1) Be flexible; data mining is not a set of rules! (though the results may be)I need to work this point out more2) Avoid three key data preparation and modeling mistakesblindly using data mining software defaultsforgetting some algorithms assume particular distributions of dataassuming powerful algorithms can "figure out" the model3) Diversity is strength: build lots of modelsalgorithms have strengths and weakness; leverage multiple families of algorithms to improve understanding of the datamodel ensembles can provide significant improvements in model accuracy4) Pick the right metric to assess modelsthe metric dictates which model will be selectedthe metric should match the business objective, not how algorithms view the models5) Have deployment in mind when building modelsdifferent approaches are necessary for real-time deployment vs. batch deployment or offline deploymentbiggest problem: moving all data preparation from data mining tool environment to the database6) Focus on actionsKnowing is not enough, must actMake sure you understand the options, how the model helps you select between them, what the regulations and policies are7) The three legged stoolOperational decisions have to work for three different groups – business IT and analyticsLike a three legged stool it will only stay up if all three groups are working togetherCollaboration across the groups is key8) Focus on explicabilityBusiness people understand their business, IT people understand their systemsThe models, and the actions taken in response to them, must be explicableOperational decisions are often regulatedConsider model representations like scorecards, decision trees, rules and an implementation platform like a brms to ensure explicability9) Build in decision analysisNo decision is static, no decision remains good over timeModels too age and degradeAny decision implementation must therefre monitor results, to see if it is degrading, and constantly challenge itself with new approaches, new models, new rules to see if it could be improved.Test and learn10) BWTDIMBegin with the decision in mind
As we are talking about decisions it is worth remembering that all decisions matter, as Peter Drucker noted. Not just the big, strategic decisions of your executives but the day to day decisions that drive your business.
Models make predictions but predictions alone will not help much – you must ACT based on those predictions.When you are thinking about smarter systems, taking action means having the system take action in a way that uses the predictions you made. You need to make a decision based on those predictions and this means combining the models with rules about how and when to act.Let’s take our retention example from earlier. Knowing that a customer is a retention risk is interesting, acting appropriately and in time to prevent them leaving is usefulGrovel index story
Story about powerpoint modelRisks of models that are done separately and the need to put them to workPredictive models don’t DO anything, they just make predictionsRules make them actionableTaking the rules, for instance, that represent a segmentation and deploying them into a decision makes them actionable
Remember – decisions are where the business, analytics and IT all come together
Once deployed analytics cannot be a “black box”, we must understand analytic performanceObviously you need a 'hold out sample' or business as usual random group to compare to.You need to understand what's working and what's the next challenge – which segments are being retained, for instanceYou must understand operational negation.You need to track input variables, scores, decisions or actions taken (classic example is in collections where a strategy may dictate a 'do nothing' strategy, but the collections manager overrides the decision and puts the accounts into a calling queue) and operational data that fed the decisionBoth analysts and business users must think about what they can do to improve decision making, which is the foundation of adaptive controlIn our retention example I need to have some customers I don’t attempt to retain or that I don’t spend any money retaining. I have to capture what the call center representative ACTUALLY offered and what was actually accepted (if anything), not just what SHOULD have been offered and I have to be able to show the results to my business users in terms they understand.When decisions have to be compliant, and many do, or when decisions might have to be explained or justified in court or even in the court of public opinion, automated systems can be a challengeWhere a judge or journalist can talk to people who made decisions and review company policy documents, they don’t do so well talking to computers or reviewing math and code.If a decision is automated it must be possible to log how the decision was made, how predictions were calculated, what actions were taken and why. This must be something that can be reviewed, even made public. Business rules and models like decision trees and scorecards are particularly helpful in this respect.You need models that are good at explaining their actions - scorecards and decision trees/strategies for example – and the ability to trace these decisions historically and document them.Retention offers may not seem like they have a big compliance issue but what if a particular group of customers argues they are being discriminated against because they always seem to get worse offers than another group? Could your business users explain exactly how it was done? Could you show a judge and a jury that your approach was fair and reasonable?
Analytics improve decision makingFind problem areas and improveSuggest rules to close the gapsEnhance data with predictive analytics
Begin!Identify your decisionsHidden decisions, transactional decisions, customer decisionsDecisions buried in complex processesDecisions that are the difference between two processesConsiderWho takes them nowWhat drives changes in themAssess Change ReadinessConsider Organizational changeAdopt decisioning technologyAdopt business rules approach and technologyInvestigate data mining and predictive analyticsThink about adaptive control
Decision Management Solutions can help youFind the right decisions to apply business rules, analyticsImplement a decision management blueprintDefine a strategy for business rule or analytic adoptionYou are welcome to email me directly, james at decision management solutions.com or you can go to decision management solutions.com / learn more. There you’ll find links to contact me, check out the blog and find more resources for learning about Decision Management.