SlideShare une entreprise Scribd logo
1  sur  52
Webinar: 10 best
                    practices in
                    operational
James Taylor,
                      analytics
       CEO
Your presenters
 James Taylor
   CEO of Decision Management Solutions. James works with
   clients to improve their business by applying analytics and
   analytic technology to automate and improve decisions. He
   has spent the last 8 years developing the concept of Decision
   Management and has 20 years experience in all aspects of
   software.

 Dean Abbott
   Owner of Abbott Analytics. Dean has applied Data Mining and
   Predictive Analytics for 22 years and provides mentoring,
   coaching, and solutions for Web Analytics, Compliance,
   Fraud Detection, Survey Analysis, Text Mining, Marketing and
   CRM analytics and more. Dean has partnerships with the
   largest predictive analytics organizations in the US.

                                         ©2011 Decision Management Solutions   2
AGENDA

         1   Introducing
             Operational
             Analytics




         2   The 10 Best
             Practices




         3   Wrap up
The 10 Best Practices
1.    Be flexible; data mining is not a set of rules!
2.    Avoid 3 key data preparation, modeling mistakes
3.    Diversity is strength: build lots of models
4.    Pick the right metric to assess models
5.    Have deployment in mind when building models
6.    Focus on actions
7.    The three legged stool
8.    Focus on explicability
9.    Build in decision analysis
10.   BWTDIM
                                 ©2011 Decision Management Solutions   4
Introducing
Operational Analytics




             ©2011 Decision Management Solutions
                                               5
Analytics have power




        Online     Acquisition           Campaign
      Conversion     Rates               Response

                     Risk                 Customer
        Fraud
                                           Churn




                                 ©2011 Decision Management Solutions   6
And that power is operational
 How do I…
  prevent this customer from churning?
  convert this visitor?
  acquire this prospect?
  make this offer compelling to this person?
  identify this claim as fraudulent?
  correctly estimate the risk of this loan?

 It’s not about “aha” moments
 It’s about making better operational decisions
                             ©2011 Decision Management Solutions   7
Multiplying the power of analytics
    Type


  Strategy



   Tactics



Operations



           Low     Economic impact                                High




                            ©2011 Decision Management Solutions   8
Operational decisions matter


  “Most discussions of decision making assume
   that only senior executives make decisions
     or that only senior executives’ decisions
      matter. This is a dangerous mistake.”

                 Peter Drucker




                            ©2011 Decision Management Solutions   9
10 Best Practices




        ©2011 Decision Management Solutions
                                         10
Be Flexible: Data Mining is
Not a Series of Recipes
Data Mining Project Entry
   Points:
1) Business Understanding
2) Data Understanding
                                        Business                       Data
                                      Understanding                Understanding



Data Mining Project Next                                                         Data
                                                        Data                  Preparation
   Steps:                                                Data
                                 Deployment               Data
1) Data Understanding
                                                                               Modeling
2) Modeling, then Data
   Preparation
                                                      Evaluation
3) Data Preparation, then Data
   Understanding, then
   Modeling

                                                                                          11
Avoid The Three Biggest
Data Preparation Mistakes
1.   Don’t blindly use data mining software
     defaults
     –   Missing data
            Is the record with missing values in one of the fields
             kept at all?
            What value is filled in? What effect will this have?
     –   Exploding categorical variables with large
         numbers of values – what happens to the
         models?


                                                                  12
Some Software Fills Missing
 Values Automatically
 Common  automated
 missing value imputation:
  – 0, mid-point, mean, or
    listwise deletion
 Example at upper right
 has 5300+ records, 17
 missing values encoded
 as ―0‖
 Afterfixing model with
 mean imputation, R^2
 rises from 0.597 to 0.657
                               13
Avoid The Three Biggest
Data Preparation Mistakes
2.   Don’t forget some algorithms assume the
     distributions for data
     –   Some algorithms assume normally distributed
         data: linear regression, Bayes and Nearest Mean
         classifiers




                                                     14
How Non-normality affects
Regression Models

Regression models
―fit‖ is worse with
skewed (non-
normal) data
    – In example at right,
      by simply applying
      the log transform,
      performance is
      improved from
      R^2=0.566 to 0.597
                             15
Avoid The Three Biggest
Data Preparation Mistakes
2.   Don’t forget some algorithms assume the
     distributions for data
     –   Some algorithms assume normally distributed
         data: linear regression, Bayes and Nearest Mean
         classifiers
     –   Distance-based algorithms are strongly influenced
         by outliers and skewed distributions: k-Nearest
         Neighbor, k-Means, the above algorithms



                                                      16
Avoid The Three Biggest
Data Preparation Mistakes
2.   Don’t forget some algorithms assume the
     distributions for data
     –   Some algorithms assume normally distributed
         data: linear regression, Bayes and Nearest Mean
         classifiers
     –   Distance-based algorithms are strongly influenced
         by outliers and skewed distributions: k-Nearest
         Neighbor, k-Means, the above algorithms
     –   Some algorithms require categorical data (rather
         than numeric): Naïve Bayes, CHAID, Apriori

                                                      17
Avoid The Three Biggest Data
Preparation Mistakes
3.   Don’t assume algorithms can ―figure out‖
     patterns on their own
     –   Features fix data distribution problems
     –   Features present data (information) to
         modeling algorithms in ways they perhaps can
         never identify themselves
            Interactions, record-connecting and temporal
             features, non-linear transformations




                                                            18
What are Model Ensembles?

   Combining outputs from multiple models into single
    decision
   Models can be created using the same algorithm, or
    several different algorithms




                      Decision Logic


                    Ensemble Prediction
                                                     19
Motivation for Ensembles

   Performance, performance, performance
   Single model sometimes provide insufficient
    accuracy
    – Neural networks become stuck in local minima
    – Decision trees run out of data
    – Single algorithms keep pushing performance using
      the same ideas (basis function / algorithm), and
      are incapable of ―thinking outside of their box‖
   Often, different algorithms achieve the same
    level of accuracy but on different cases—they
    identify different ways to get the same level
    of accuracy
                                                   20
Four Keys to Effective
Ensembling

   Diversity of opinion
   Independence
   Decentralization
   Aggregation

   From The Wisdom of Crowds, James
    Surowiecki
                                       21
Bagging

   Bagging Method
     – Create many data sets by
       bootstrapping (can also do this
       with cross validation)
     – Create one decision tree for
       each data set
     – Combine decision trees by
       averaging (or voting) final
       decisions
     – Primarily reduces model
       variance rather than bias
   Results
                                           Final
     – On average, better than any        Answer
       individual tree                   (average)


                                                     22
Boosting (Adaboost)

 Boosting   Method
  – Creating tree using training data set             Reweight
                                                      examples
  – Score each data point, indicating when each       where
    incorrect decision is made (errors)               classification
  – Retrain, giving rows with incorrect decisions     incorrect
    more weight. Repeat
                                                      Combine
  – Final prediction is a weighted average of all     models via
    models-> model regularization.                    weighted sum
  – Best to create ―weak‖ models—simple models
    (just a few splits for a decision tree) and let
    the boosting iterations find the complexity.
  – Often used with trees or Naïve Bayes
 Results
  – Usually better than individual tree or Bagging

                                                              23
Random Forest Ensembles

 Random    Forest (RF) Method
   – Exact same methodology as
     Bagging, but with a twist
   – At each split, rather than using the
     entire set of candidate inputs, use
     a random subset of candidate
     inputs
   – Generates diversity of samples and
     inputs (splits)
 Results
   – On average, better than any              Final
     individual tree, Bagging, or even       Answer
     Boosting                               (average)


                                                        24
Model Ensembles:
 The Good and the Bad

 Pro
  – Can significantly reduce model error
  – Can be easy to automate -- already has been done
    in many commercial tools using Boosting, Bagging,
    ARCing, RF
 Con
  – Model interpretability is lost (if there was any)
  – If not done automatically, can be very time
    consuming to generate dozens of models to combine


                                                  25
Ensembles of Trees: Smoothers

            Ensembles        smooth jagged decision boundaries




Picture from
T.G. Dietterich. Ensemble methods in machine learning. In Multiple Classier
Systems, Cagliari, Italy, 2000.
                                                                              26
Heterogeneous Model
Ensembles on Glass Data

                      Model prediction diversity
                       obtained by using different
                       algorithms: tree, NN, RBF,
                       Gaussian, Regression, k-NN
                      Combining 3-5 models on
                       average better than best
                       single model
                      Combining all 6 models not
                       best (best is 3&4 model
                       combination), but is close
                      The is an example of reducing
                       model variance through
                       ensembles, but not model
                       bias




                                             27
The Conflict with
Data Mining Algorithm Objectives

 Algorithm Objectives
  – Linear Regression and
    Neural networks minimize
    squared error
  – C5 minimizes entropy
  – CART minimizes Gini index
  – Logistic regression
    maximizes the log of the
    odds of the probability the
    record belongs to class ―1‖
    (classification accuracy)
  – Nearest neighbor
    minimizes Euclidean
    distance




                                   28
The Conflict with
Data Mining Algorithm Objectives

 Algorithm Objectives             Business Objectives
  – Linear Regression and         – Maximize net revenue
    Neural networks minimize      – Achieve cumulative
    squared error                   response rate of 13%
  – C5 minimizes entropy          – Maximize responders
  – CART minimizes Gini index       subject to a budget of
  – Logistic regression             $100,000
    maximizes the log of the      – Maximize savings from
    odds of the probability the     identifying customer likely
    record belongs to class ―1‖     to churn
    (classification accuracy)     – Maximize collected revenue
  – Nearest neighbor                by identifying next best
    minimizes Euclidean             case to collect
    distance                      – Minimize false alarms in
                                    top 100 hits
                                  – Maximize hits subject to a
                                    false alarm rate of 1 in
                                    1,000,000
                                                             29
Possible Solutions to Business Objective
      / Data Mining Objective Mismatch

     Model Ranking Metric               Model Building Considerations
1.    Rank models by algorithm          1.   Force the data into the
      objectives, ignoring business          algorithm box, and hope the
      objectives, and hope the               winner does a good job in
      models do a good enough job            reality

2.    Use optimization algorithms to    2.   Throw away very nice theory of
      maximize/minimize directly the         data mining algorithms, and
      business objective                     hope the optimization
                                             algorithms converge well

3.    Build models normally, but rank   3.   Take your lumps with
      models by business objectives,         algorithms not quite doing what
      ignoring their ―natural‖               we want them to do, but take
      algorithm score, hoping that           advantage of the power and
      some algorithms do well                efficiency of algorithms
      enough at scoring by business
      objective
                                                                           30
Model Comparison Example:
Rankings Tell Different Stories




 Top RMS model is 9th in AUC, 2nd Test RMS rank is 42nd in AUC
 Correlation between rankings:



                                                                  31
Model Deployment Methods

   In data mining software application itself
     – Pro: Easy--same processing done as in building model
     – Con: Slowest method of implementation with large data
   In database or real-time system
     – Model encoded in Predictive Model Markup Language (PMML) --
       http://www.dmg.org/
            A database becomes the run-time engine
            Typically for model only, though PMML supports
             data preparation and cleansing functions as well
     – SQL code
     – Model encoded in ―wrapper‖, run via calls from database,
       transaction system, or operating system
          Batch run or source code

   Run-time engine
     – Often part of data mining software package itself
                                                                     32
Sample PMML Code




                   33
Typical Predictive Model
    Deployment Processing Flow

                                 Select    Clean Data
          Import/Select
                                 Fields     (missing,
          Data to Score
Data to                          Needed   recodes, …)
Score
              The key: reproduce all             Re-create
              data pre-processing done            Derived
              to build the models                Variables



                          Decile**
                                              Score*
 Scored                   Scored
                                               Data
  Data                     Data


                                                             34
Knowing is not enough



     Those who know first, win

      Those who ACT first, win
    Provided they act intelligently


                        ©2011 Decision Management Solutions   35
Avoid the insight-to-action gap




                        ©2011 Decision Management Solutions   36
Analytic insights must drive action




                      ?


                        ©2011 Decision Management Solutions   37
Business rules drive decisions



                Decision                              Regulations
   Policy




    History


                                            Experience
                   Legacy
                 Applications
                                ©2011 Decision Management Solutions   38
Three legged stools need three legs




                        ©2011 Decision Management Solutions   39
Operational decisions at the center


                  Business




                             ©2011 Decision Management Solutions   40
Monitoring and compliance




                      ©2011 Decision Management Solutions   41
Scorecards are a powerful tool
        Years Under Contract                  Years Under Contract
 1                   0              1                                 0
 2                   5              2                                 5
 More than 2         10             More than 2                       10
     Number of Contract Changes         Number of Contract Changes
 0                   0              0                                 0
 1                   5              1                                 5
 More than 1         10             More than 1                       10
     Value Rating of Current Plan       Value Rating of Current Plan
 Poor                0              Poor                              0
 Good                10             Good                              10
 Excellent           20             Excellent                         20
             Score                                     Score 30


                                         ©2011 Decision Management Solutions Fig 5.4
                                                 Smart (Enough) Systems, Prentice Hall June 2007.   42
Why use a scorecard?
Reason Codes                           Simplicity
•Return the most important             •Easy to use and explain
reason(s) for a score                  •Easy to implement
•Explaining results                    •Although not necessarily easy to
                                       build
Transparency                           Compact
•It is really clear how a score card   •One score card can often replace
got its result                         many rules and tables
•The complete workings of a score      •One artifact for one prediction
card can be logged

Compliance                             Familiar
•Easy to enforce rules about use of    •Analytic teams are used to
specific attributes                    developing score cards
•Easy to remove rough edges            •Regulators and business owners
                                       are used to reviewing them

                                            ©2011 Decision Management Solutions   43
Continuous improvement




                    ©2011 Decision Management Solutions   44
Continuous improvement




                    ©2011 Decision Management Solutions   45
Don’t start by focusing on the data


                                                                Better
                                                                decision

                                    Analytic
                                    insight


                      Derived
                      information




          Available
          data


                                      ©2011 Decision Management Solutions   46
Start by focusing on the value


                                                                   Better
                                                                   decision

                                       Analytic
                                       insight
                                          Analytic
                       Derived            insight
                         Derived
                       information
       Available         information
       data



           Available
           data


                                         ©2011 Decision Management Solutions   47
Wrap Up
The 10 Best Practices
1.   Be flexible; data mining is not a set of rules!
2.   Avoid 3 key data preparation, modeling
     mistakes
3.   Diversity is strength: build lots of models
4.   Pick the right metric to assess models
5.   Have deployment in mind when building
     models
6.   Focus on actions
7.   The three legged stool
8.   Focus on explicability
9.   Build in decision analysis
                                ©2011 Decision Management Solutions   49
Action Plan

              Identify your decisions
                 before analytics


              Adopt business rules to
               implement analytics


              Bring business, analytic
              and IT people together
                          ©2011 Decision Management Solutions   50
Let us know if we can help
 Decision Management Solutions can help you
  Focus on the right decisions
  Implement a blueprint
  Define a strategy
  http://www.decisionmanagementsolutions.com

 Abbott Analytics can help you
  Find the right software
  Define a strategy
  Learn the ropes
  http://www.abbottanalytics.com
                            ©2011 Decision Management Solutions   51
Thank you!




                      James Taylor, CEO
   james@decisionmanagementsolutions.com
www.decisionmangementsolutions.com/learnmo
                                         re

Contenu connexe

Tendances

840 plenary elder_using his laptop
840 plenary elder_using his laptop840 plenary elder_using his laptop
840 plenary elder_using his laptopRising Media, Inc.
 
1555 track 1 huang_using his mac
1555 track 1 huang_using his mac1555 track 1 huang_using his mac
1555 track 1 huang_using his macRising Media, Inc.
 
Barga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 KeynoteBarga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 KeynoteRoger Barga
 
1225 lunchlearn shekhar_using his mac
1225 lunchlearn shekhar_using his mac1225 lunchlearn shekhar_using his mac
1225 lunchlearn shekhar_using his macRising Media, Inc.
 
Putting data science in your business a first utility feedback
Putting data science in your business a first utility feedbackPutting data science in your business a first utility feedback
Putting data science in your business a first utility feedbackPeculium Crypto
 
Solution Architecture US healthcare
Solution Architecture US healthcare Solution Architecture US healthcare
Solution Architecture US healthcare sumiteshkr
 
Predictive analytics in action real-world examples and advice
Predictive analytics in action real-world examples and advicePredictive analytics in action real-world examples and advice
Predictive analytics in action real-world examples and adviceThe Marketing Distillery
 
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino Data Lab
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptopRising Media, Inc.
 
Cloudera Fast Forward Labs: Accelerate machine learning
Cloudera Fast Forward Labs: Accelerate machine learningCloudera Fast Forward Labs: Accelerate machine learning
Cloudera Fast Forward Labs: Accelerate machine learningCloudera, Inc.
 
1645 track 1 bress_using his laptop
1645 track 1 bress_using his laptop1645 track 1 bress_using his laptop
1645 track 1 bress_using his laptopRising Media, Inc.
 
Causal Inference, Reinforcement Learning, and Continuous Optimization
Causal Inference, Reinforcement Learning, and Continuous OptimizationCausal Inference, Reinforcement Learning, and Continuous Optimization
Causal Inference, Reinforcement Learning, and Continuous OptimizationScientificRevenue
 
1120 track 3 prendki_using our laptop
1120 track 3 prendki_using our laptop1120 track 3 prendki_using our laptop
1120 track 3 prendki_using our laptopRising Media, Inc.
 
Blended Analytics for IT Unknown Unknowns
Blended Analytics for IT Unknown UnknownsBlended Analytics for IT Unknown Unknowns
Blended Analytics for IT Unknown UnknownsEvolven Software
 
Predictive Analytics Project in Automotive Industry
Predictive Analytics Project in Automotive IndustryPredictive Analytics Project in Automotive Industry
Predictive Analytics Project in Automotive IndustryMatouš Havlena
 
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...Value Amplify Consulting
 

Tendances (20)

840 plenary elder_using his laptop
840 plenary elder_using his laptop840 plenary elder_using his laptop
840 plenary elder_using his laptop
 
1555 track 1 huang_using his mac
1555 track 1 huang_using his mac1555 track 1 huang_using his mac
1555 track 1 huang_using his mac
 
0940 diamondsponsor de
0940 diamondsponsor de0940 diamondsponsor de
0940 diamondsponsor de
 
Barga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 KeynoteBarga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 Keynote
 
1225 lunchlearn shekhar_using his mac
1225 lunchlearn shekhar_using his mac1225 lunchlearn shekhar_using his mac
1225 lunchlearn shekhar_using his mac
 
Putting data science in your business a first utility feedback
Putting data science in your business a first utility feedbackPutting data science in your business a first utility feedback
Putting data science in your business a first utility feedback
 
Solution Architecture US healthcare
Solution Architecture US healthcare Solution Architecture US healthcare
Solution Architecture US healthcare
 
Predictive analytics in action real-world examples and advice
Predictive analytics in action real-world examples and advicePredictive analytics in action real-world examples and advice
Predictive analytics in action real-world examples and advice
 
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
 
940 diamond sponsor sengupta
940 diamond sponsor sengupta940 diamond sponsor sengupta
940 diamond sponsor sengupta
 
Cloudera Fast Forward Labs: Accelerate machine learning
Cloudera Fast Forward Labs: Accelerate machine learningCloudera Fast Forward Labs: Accelerate machine learning
Cloudera Fast Forward Labs: Accelerate machine learning
 
1645 track 1 bress_using his laptop
1645 track 1 bress_using his laptop1645 track 1 bress_using his laptop
1645 track 1 bress_using his laptop
 
Causal Inference, Reinforcement Learning, and Continuous Optimization
Causal Inference, Reinforcement Learning, and Continuous OptimizationCausal Inference, Reinforcement Learning, and Continuous Optimization
Causal Inference, Reinforcement Learning, and Continuous Optimization
 
1120 track 3 prendki_using our laptop
1120 track 3 prendki_using our laptop1120 track 3 prendki_using our laptop
1120 track 3 prendki_using our laptop
 
Data Mining Technique - SEMMA
Data Mining Technique - SEMMAData Mining Technique - SEMMA
Data Mining Technique - SEMMA
 
Blended Analytics for IT Unknown Unknowns
Blended Analytics for IT Unknown UnknownsBlended Analytics for IT Unknown Unknowns
Blended Analytics for IT Unknown Unknowns
 
Data Science for Retail Broking
Data Science for Retail BrokingData Science for Retail Broking
Data Science for Retail Broking
 
Predictive Analytics Project in Automotive Industry
Predictive Analytics Project in Automotive IndustryPredictive Analytics Project in Automotive Industry
Predictive Analytics Project in Automotive Industry
 
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
 

Similaire à 10 best practices in operational analytics

Egypt hackathon 2014 analytics & spss session
Egypt hackathon 2014   analytics & spss sessionEgypt hackathon 2014   analytics & spss session
Egypt hackathon 2014 analytics & spss sessionM Baddar
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data miningUjjawal
 
Data mining 2012 generalwithmethods
Data mining  2012 generalwithmethodsData mining  2012 generalwithmethods
Data mining 2012 generalwithmethodsMichael Gilman
 
Data science lecture4_doaa_mohey
Data science lecture4_doaa_moheyData science lecture4_doaa_mohey
Data science lecture4_doaa_moheyDoaa Mohey Eldin
 
Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreTuri, Inc.
 
Deep neural networks and tabular data
Deep neural networks and tabular dataDeep neural networks and tabular data
Deep neural networks and tabular dataJimmyLiang20
 
Nss power point_machine_learning
Nss power point_machine_learningNss power point_machine_learning
Nss power point_machine_learningGauravsd2014
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10Roger Barga
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Researchjim
 
Data Mining in Market Research
Data Mining in Market ResearchData Mining in Market Research
Data Mining in Market Researchbutest
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Researchkevinlan
 
TreeNet Overview - Updated October 2012
TreeNet Overview  - Updated October 2012TreeNet Overview  - Updated October 2012
TreeNet Overview - Updated October 2012Salford Systems
 
Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...Kun Le
 
An Introduction to Random Forest and linear regression algorithms
An Introduction to Random Forest and linear regression algorithmsAn Introduction to Random Forest and linear regression algorithms
An Introduction to Random Forest and linear regression algorithmsShouvic Banik0139
 
Datascience101presentation4
Datascience101presentation4Datascience101presentation4
Datascience101presentation4Salford Systems
 

Similaire à 10 best practices in operational analytics (20)

Egypt hackathon 2014 analytics & spss session
Egypt hackathon 2014   analytics & spss sessionEgypt hackathon 2014   analytics & spss session
Egypt hackathon 2014 analytics & spss session
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data mining
 
Data mining 2012 generalwithmethods
Data mining  2012 generalwithmethodsData mining  2012 generalwithmethods
Data mining 2012 generalwithmethods
 
Data processing
Data processingData processing
Data processing
 
Analytics
AnalyticsAnalytics
Analytics
 
Data science lecture4_doaa_mohey
Data science lecture4_doaa_moheyData science lecture4_doaa_mohey
Data science lecture4_doaa_mohey
 
Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignore
 
Deep neural networks and tabular data
Deep neural networks and tabular dataDeep neural networks and tabular data
Deep neural networks and tabular data
 
Nss power point_machine_learning
Nss power point_machine_learningNss power point_machine_learning
Nss power point_machine_learning
 
Kevin Swingler: Introduction to Data Mining
Kevin Swingler: Introduction to Data MiningKevin Swingler: Introduction to Data Mining
Kevin Swingler: Introduction to Data Mining
 
Predictive analytics in decision management systems
Predictive analytics in decision management systemsPredictive analytics in decision management systems
Predictive analytics in decision management systems
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
 
Data Mining in Market Research
Data Mining in Market ResearchData Mining in Market Research
Data Mining in Market Research
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
 
TreeNet Overview - Updated October 2012
TreeNet Overview  - Updated October 2012TreeNet Overview  - Updated October 2012
TreeNet Overview - Updated October 2012
 
Ch12
Ch12Ch12
Ch12
 
Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...
 
An Introduction to Random Forest and linear regression algorithms
An Introduction to Random Forest and linear regression algorithmsAn Introduction to Random Forest and linear regression algorithms
An Introduction to Random Forest and linear regression algorithms
 
Datascience101presentation4
Datascience101presentation4Datascience101presentation4
Datascience101presentation4
 

Plus de Decision Management Solutions

Increasing Productivity in Insurance Operations with Digital Decisioning
Increasing Productivity in Insurance Operations with Digital DecisioningIncreasing Productivity in Insurance Operations with Digital Decisioning
Increasing Productivity in Insurance Operations with Digital DecisioningDecision Management Solutions
 
Backward Engineering: Plan Machine Learning Deployment in Reverse
Backward Engineering: Plan Machine Learning Deployment in ReverseBackward Engineering: Plan Machine Learning Deployment in Reverse
Backward Engineering: Plan Machine Learning Deployment in ReverseDecision Management Solutions
 
Simplifying Complex Processes with Decision Management
Simplifying Complex Processes with Decision ManagementSimplifying Complex Processes with Decision Management
Simplifying Complex Processes with Decision ManagementDecision Management Solutions
 
Mind The Gap - Refinements to DMN 1.1 Suggested by Real-World Experience
Mind The Gap - Refinements to DMN 1.1 Suggested by Real-World ExperienceMind The Gap - Refinements to DMN 1.1 Suggested by Real-World Experience
Mind The Gap - Refinements to DMN 1.1 Suggested by Real-World ExperienceDecision Management Solutions
 
Bringing clarity to analytics projects with decision modeling: a leading prac...
Bringing clarity to analytics projects with decision modeling: a leading prac...Bringing clarity to analytics projects with decision modeling: a leading prac...
Bringing clarity to analytics projects with decision modeling: a leading prac...Decision Management Solutions
 
Analytics Teams: 5 Things You Need to Know Before You Deploy Your Model
Analytics Teams: 5 Things You Need to Know Before You Deploy Your ModelAnalytics Teams: 5 Things You Need to Know Before You Deploy Your Model
Analytics Teams: 5 Things You Need to Know Before You Deploy Your ModelDecision Management Solutions
 
3 Reasons to Adopt Decision Modeling in your BRMS Program
3 Reasons to Adopt Decision Modeling in your BRMS Program3 Reasons to Adopt Decision Modeling in your BRMS Program
3 Reasons to Adopt Decision Modeling in your BRMS ProgramDecision Management Solutions
 
Analytics Teams: 6 Questions To Ask Your Business Partner Before You Model
Analytics Teams: 6 Questions To Ask Your Business Partner Before You ModelAnalytics Teams: 6 Questions To Ask Your Business Partner Before You Model
Analytics Teams: 6 Questions To Ask Your Business Partner Before You ModelDecision Management Solutions
 
4 Reasons to Start with Decision Modeling on Your First BRMS Project
4 Reasons to Start with Decision Modeling on Your First BRMS Project4 Reasons to Start with Decision Modeling on Your First BRMS Project
4 Reasons to Start with Decision Modeling on Your First BRMS ProjectDecision Management Solutions
 

Plus de Decision Management Solutions (20)

Customer digitaldecisioningfinal
Customer digitaldecisioningfinalCustomer digitaldecisioningfinal
Customer digitaldecisioningfinal
 
Introduction to Digital Decisioning
Introduction to Digital DecisioningIntroduction to Digital Decisioning
Introduction to Digital Decisioning
 
Introduction to Digital Decisioning
Introduction to Digital DecisioningIntroduction to Digital Decisioning
Introduction to Digital Decisioning
 
Increasing Productivity in Insurance Operations with Digital Decisioning
Increasing Productivity in Insurance Operations with Digital DecisioningIncreasing Productivity in Insurance Operations with Digital Decisioning
Increasing Productivity in Insurance Operations with Digital Decisioning
 
Backward Engineering: Plan Machine Learning Deployment in Reverse
Backward Engineering: Plan Machine Learning Deployment in ReverseBackward Engineering: Plan Machine Learning Deployment in Reverse
Backward Engineering: Plan Machine Learning Deployment in Reverse
 
Five Reasons to Fire Your Rules Consultant
Five Reasons to Fire Your Rules ConsultantFive Reasons to Fire Your Rules Consultant
Five Reasons to Fire Your Rules Consultant
 
3 Secrets to Becoming a Predictive Enterprise
3 Secrets to Becoming a Predictive Enterprise3 Secrets to Becoming a Predictive Enterprise
3 Secrets to Becoming a Predictive Enterprise
 
Delivering the Business Value of Analytics
Delivering the Business Value of Analytics Delivering the Business Value of Analytics
Delivering the Business Value of Analytics
 
What is a claims handling pilot?
What is a claims handling pilot?What is a claims handling pilot?
What is a claims handling pilot?
 
Simplifying Complex Processes with Decision Management
Simplifying Complex Processes with Decision ManagementSimplifying Complex Processes with Decision Management
Simplifying Complex Processes with Decision Management
 
3 Critical Elements of a Digital Business Platform
3 Critical Elements of a Digital Business Platform3 Critical Elements of a Digital Business Platform
3 Critical Elements of a Digital Business Platform
 
The role of decision models in analytic excellence
The role of decision models in analytic excellenceThe role of decision models in analytic excellence
The role of decision models in analytic excellence
 
Mind The Gap - Refinements to DMN 1.1 Suggested by Real-World Experience
Mind The Gap - Refinements to DMN 1.1 Suggested by Real-World ExperienceMind The Gap - Refinements to DMN 1.1 Suggested by Real-World Experience
Mind The Gap - Refinements to DMN 1.1 Suggested by Real-World Experience
 
Bringing clarity to analytics projects with decision modeling: a leading prac...
Bringing clarity to analytics projects with decision modeling: a leading prac...Bringing clarity to analytics projects with decision modeling: a leading prac...
Bringing clarity to analytics projects with decision modeling: a leading prac...
 
Analytics Teams: 5 Things You Need to Know Before You Deploy Your Model
Analytics Teams: 5 Things You Need to Know Before You Deploy Your ModelAnalytics Teams: 5 Things You Need to Know Before You Deploy Your Model
Analytics Teams: 5 Things You Need to Know Before You Deploy Your Model
 
3 Reasons to Adopt Decision Modeling in your BRMS Program
3 Reasons to Adopt Decision Modeling in your BRMS Program3 Reasons to Adopt Decision Modeling in your BRMS Program
3 Reasons to Adopt Decision Modeling in your BRMS Program
 
Analytics Teams: 6 Questions To Ask Your Business Partner Before You Model
Analytics Teams: 6 Questions To Ask Your Business Partner Before You ModelAnalytics Teams: 6 Questions To Ask Your Business Partner Before You Model
Analytics Teams: 6 Questions To Ask Your Business Partner Before You Model
 
4 Reasons to Start with Decision Modeling on Your First BRMS Project
4 Reasons to Start with Decision Modeling on Your First BRMS Project4 Reasons to Start with Decision Modeling on Your First BRMS Project
4 Reasons to Start with Decision Modeling on Your First BRMS Project
 
DecisionsFrst Modeler and Red Hat JBoss BRMS
DecisionsFrst Modeler and Red Hat JBoss BRMSDecisionsFrst Modeler and Red Hat JBoss BRMS
DecisionsFrst Modeler and Red Hat JBoss BRMS
 
DecisionsFirst Modeler and IBM ODM Demonstration
DecisionsFirst Modeler and IBM ODM Demonstration DecisionsFirst Modeler and IBM ODM Demonstration
DecisionsFirst Modeler and IBM ODM Demonstration
 

Dernier

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Dernier (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

10 best practices in operational analytics

  • 1. Webinar: 10 best practices in operational James Taylor, analytics CEO
  • 2. Your presenters James Taylor CEO of Decision Management Solutions. James works with clients to improve their business by applying analytics and analytic technology to automate and improve decisions. He has spent the last 8 years developing the concept of Decision Management and has 20 years experience in all aspects of software. Dean Abbott Owner of Abbott Analytics. Dean has applied Data Mining and Predictive Analytics for 22 years and provides mentoring, coaching, and solutions for Web Analytics, Compliance, Fraud Detection, Survey Analysis, Text Mining, Marketing and CRM analytics and more. Dean has partnerships with the largest predictive analytics organizations in the US. ©2011 Decision Management Solutions 2
  • 3. AGENDA 1 Introducing Operational Analytics 2 The 10 Best Practices 3 Wrap up
  • 4. The 10 Best Practices 1. Be flexible; data mining is not a set of rules! 2. Avoid 3 key data preparation, modeling mistakes 3. Diversity is strength: build lots of models 4. Pick the right metric to assess models 5. Have deployment in mind when building models 6. Focus on actions 7. The three legged stool 8. Focus on explicability 9. Build in decision analysis 10. BWTDIM ©2011 Decision Management Solutions 4
  • 5. Introducing Operational Analytics ©2011 Decision Management Solutions 5
  • 6. Analytics have power Online Acquisition Campaign Conversion Rates Response Risk Customer Fraud Churn ©2011 Decision Management Solutions 6
  • 7. And that power is operational How do I… prevent this customer from churning? convert this visitor? acquire this prospect? make this offer compelling to this person? identify this claim as fraudulent? correctly estimate the risk of this loan? It’s not about “aha” moments It’s about making better operational decisions ©2011 Decision Management Solutions 7
  • 8. Multiplying the power of analytics Type Strategy Tactics Operations Low Economic impact High ©2011 Decision Management Solutions 8
  • 9. Operational decisions matter “Most discussions of decision making assume that only senior executives make decisions or that only senior executives’ decisions matter. This is a dangerous mistake.” Peter Drucker ©2011 Decision Management Solutions 9
  • 10. 10 Best Practices ©2011 Decision Management Solutions 10
  • 11. Be Flexible: Data Mining is Not a Series of Recipes Data Mining Project Entry Points: 1) Business Understanding 2) Data Understanding Business Data Understanding Understanding Data Mining Project Next Data Data Preparation Steps: Data Deployment Data 1) Data Understanding Modeling 2) Modeling, then Data Preparation Evaluation 3) Data Preparation, then Data Understanding, then Modeling 11
  • 12. Avoid The Three Biggest Data Preparation Mistakes 1. Don’t blindly use data mining software defaults – Missing data  Is the record with missing values in one of the fields kept at all?  What value is filled in? What effect will this have? – Exploding categorical variables with large numbers of values – what happens to the models? 12
  • 13. Some Software Fills Missing Values Automatically  Common automated missing value imputation: – 0, mid-point, mean, or listwise deletion  Example at upper right has 5300+ records, 17 missing values encoded as ―0‖  Afterfixing model with mean imputation, R^2 rises from 0.597 to 0.657 13
  • 14. Avoid The Three Biggest Data Preparation Mistakes 2. Don’t forget some algorithms assume the distributions for data – Some algorithms assume normally distributed data: linear regression, Bayes and Nearest Mean classifiers 14
  • 15. How Non-normality affects Regression Models Regression models ―fit‖ is worse with skewed (non- normal) data – In example at right, by simply applying the log transform, performance is improved from R^2=0.566 to 0.597 15
  • 16. Avoid The Three Biggest Data Preparation Mistakes 2. Don’t forget some algorithms assume the distributions for data – Some algorithms assume normally distributed data: linear regression, Bayes and Nearest Mean classifiers – Distance-based algorithms are strongly influenced by outliers and skewed distributions: k-Nearest Neighbor, k-Means, the above algorithms 16
  • 17. Avoid The Three Biggest Data Preparation Mistakes 2. Don’t forget some algorithms assume the distributions for data – Some algorithms assume normally distributed data: linear regression, Bayes and Nearest Mean classifiers – Distance-based algorithms are strongly influenced by outliers and skewed distributions: k-Nearest Neighbor, k-Means, the above algorithms – Some algorithms require categorical data (rather than numeric): Naïve Bayes, CHAID, Apriori 17
  • 18. Avoid The Three Biggest Data Preparation Mistakes 3. Don’t assume algorithms can ―figure out‖ patterns on their own – Features fix data distribution problems – Features present data (information) to modeling algorithms in ways they perhaps can never identify themselves  Interactions, record-connecting and temporal features, non-linear transformations 18
  • 19. What are Model Ensembles?  Combining outputs from multiple models into single decision  Models can be created using the same algorithm, or several different algorithms Decision Logic Ensemble Prediction 19
  • 20. Motivation for Ensembles  Performance, performance, performance  Single model sometimes provide insufficient accuracy – Neural networks become stuck in local minima – Decision trees run out of data – Single algorithms keep pushing performance using the same ideas (basis function / algorithm), and are incapable of ―thinking outside of their box‖  Often, different algorithms achieve the same level of accuracy but on different cases—they identify different ways to get the same level of accuracy 20
  • 21. Four Keys to Effective Ensembling  Diversity of opinion  Independence  Decentralization  Aggregation  From The Wisdom of Crowds, James Surowiecki 21
  • 22. Bagging  Bagging Method – Create many data sets by bootstrapping (can also do this with cross validation) – Create one decision tree for each data set – Combine decision trees by averaging (or voting) final decisions – Primarily reduces model variance rather than bias  Results Final – On average, better than any Answer individual tree (average) 22
  • 23. Boosting (Adaboost)  Boosting Method – Creating tree using training data set Reweight examples – Score each data point, indicating when each where incorrect decision is made (errors) classification – Retrain, giving rows with incorrect decisions incorrect more weight. Repeat Combine – Final prediction is a weighted average of all models via models-> model regularization. weighted sum – Best to create ―weak‖ models—simple models (just a few splits for a decision tree) and let the boosting iterations find the complexity. – Often used with trees or Naïve Bayes  Results – Usually better than individual tree or Bagging 23
  • 24. Random Forest Ensembles  Random Forest (RF) Method – Exact same methodology as Bagging, but with a twist – At each split, rather than using the entire set of candidate inputs, use a random subset of candidate inputs – Generates diversity of samples and inputs (splits)  Results – On average, better than any Final individual tree, Bagging, or even Answer Boosting (average) 24
  • 25. Model Ensembles: The Good and the Bad  Pro – Can significantly reduce model error – Can be easy to automate -- already has been done in many commercial tools using Boosting, Bagging, ARCing, RF  Con – Model interpretability is lost (if there was any) – If not done automatically, can be very time consuming to generate dozens of models to combine 25
  • 26. Ensembles of Trees: Smoothers  Ensembles smooth jagged decision boundaries Picture from T.G. Dietterich. Ensemble methods in machine learning. In Multiple Classier Systems, Cagliari, Italy, 2000. 26
  • 27. Heterogeneous Model Ensembles on Glass Data  Model prediction diversity obtained by using different algorithms: tree, NN, RBF, Gaussian, Regression, k-NN  Combining 3-5 models on average better than best single model  Combining all 6 models not best (best is 3&4 model combination), but is close  The is an example of reducing model variance through ensembles, but not model bias 27
  • 28. The Conflict with Data Mining Algorithm Objectives Algorithm Objectives – Linear Regression and Neural networks minimize squared error – C5 minimizes entropy – CART minimizes Gini index – Logistic regression maximizes the log of the odds of the probability the record belongs to class ―1‖ (classification accuracy) – Nearest neighbor minimizes Euclidean distance 28
  • 29. The Conflict with Data Mining Algorithm Objectives Algorithm Objectives Business Objectives – Linear Regression and – Maximize net revenue Neural networks minimize – Achieve cumulative squared error response rate of 13% – C5 minimizes entropy – Maximize responders – CART minimizes Gini index subject to a budget of – Logistic regression $100,000 maximizes the log of the – Maximize savings from odds of the probability the identifying customer likely record belongs to class ―1‖ to churn (classification accuracy) – Maximize collected revenue – Nearest neighbor by identifying next best minimizes Euclidean case to collect distance – Minimize false alarms in top 100 hits – Maximize hits subject to a false alarm rate of 1 in 1,000,000 29
  • 30. Possible Solutions to Business Objective / Data Mining Objective Mismatch Model Ranking Metric Model Building Considerations 1. Rank models by algorithm 1. Force the data into the objectives, ignoring business algorithm box, and hope the objectives, and hope the winner does a good job in models do a good enough job reality 2. Use optimization algorithms to 2. Throw away very nice theory of maximize/minimize directly the data mining algorithms, and business objective hope the optimization algorithms converge well 3. Build models normally, but rank 3. Take your lumps with models by business objectives, algorithms not quite doing what ignoring their ―natural‖ we want them to do, but take algorithm score, hoping that advantage of the power and some algorithms do well efficiency of algorithms enough at scoring by business objective 30
  • 31. Model Comparison Example: Rankings Tell Different Stories  Top RMS model is 9th in AUC, 2nd Test RMS rank is 42nd in AUC  Correlation between rankings: 31
  • 32. Model Deployment Methods  In data mining software application itself – Pro: Easy--same processing done as in building model – Con: Slowest method of implementation with large data  In database or real-time system – Model encoded in Predictive Model Markup Language (PMML) -- http://www.dmg.org/  A database becomes the run-time engine  Typically for model only, though PMML supports data preparation and cleansing functions as well – SQL code – Model encoded in ―wrapper‖, run via calls from database, transaction system, or operating system  Batch run or source code  Run-time engine – Often part of data mining software package itself 32
  • 34. Typical Predictive Model Deployment Processing Flow Select Clean Data Import/Select Fields (missing, Data to Score Data to Needed recodes, …) Score The key: reproduce all Re-create data pre-processing done Derived to build the models Variables Decile** Score* Scored Scored Data Data Data 34
  • 35. Knowing is not enough Those who know first, win Those who ACT first, win Provided they act intelligently ©2011 Decision Management Solutions 35
  • 36. Avoid the insight-to-action gap ©2011 Decision Management Solutions 36
  • 37. Analytic insights must drive action ? ©2011 Decision Management Solutions 37
  • 38. Business rules drive decisions Decision Regulations Policy History Experience Legacy Applications ©2011 Decision Management Solutions 38
  • 39. Three legged stools need three legs ©2011 Decision Management Solutions 39
  • 40. Operational decisions at the center Business ©2011 Decision Management Solutions 40
  • 41. Monitoring and compliance ©2011 Decision Management Solutions 41
  • 42. Scorecards are a powerful tool Years Under Contract Years Under Contract 1 0 1 0 2 5 2 5 More than 2 10 More than 2 10 Number of Contract Changes Number of Contract Changes 0 0 0 0 1 5 1 5 More than 1 10 More than 1 10 Value Rating of Current Plan Value Rating of Current Plan Poor 0 Poor 0 Good 10 Good 10 Excellent 20 Excellent 20 Score Score 30 ©2011 Decision Management Solutions Fig 5.4 Smart (Enough) Systems, Prentice Hall June 2007. 42
  • 43. Why use a scorecard? Reason Codes Simplicity •Return the most important •Easy to use and explain reason(s) for a score •Easy to implement •Explaining results •Although not necessarily easy to build Transparency Compact •It is really clear how a score card •One score card can often replace got its result many rules and tables •The complete workings of a score •One artifact for one prediction card can be logged Compliance Familiar •Easy to enforce rules about use of •Analytic teams are used to specific attributes developing score cards •Easy to remove rough edges •Regulators and business owners are used to reviewing them ©2011 Decision Management Solutions 43
  • 44. Continuous improvement ©2011 Decision Management Solutions 44
  • 45. Continuous improvement ©2011 Decision Management Solutions 45
  • 46. Don’t start by focusing on the data Better decision Analytic insight Derived information Available data ©2011 Decision Management Solutions 46
  • 47. Start by focusing on the value Better decision Analytic insight Analytic Derived insight Derived information Available information data Available data ©2011 Decision Management Solutions 47
  • 49. The 10 Best Practices 1. Be flexible; data mining is not a set of rules! 2. Avoid 3 key data preparation, modeling mistakes 3. Diversity is strength: build lots of models 4. Pick the right metric to assess models 5. Have deployment in mind when building models 6. Focus on actions 7. The three legged stool 8. Focus on explicability 9. Build in decision analysis ©2011 Decision Management Solutions 49
  • 50. Action Plan Identify your decisions before analytics Adopt business rules to implement analytics Bring business, analytic and IT people together ©2011 Decision Management Solutions 50
  • 51. Let us know if we can help Decision Management Solutions can help you Focus on the right decisions Implement a blueprint Define a strategy http://www.decisionmanagementsolutions.com Abbott Analytics can help you Find the right software Define a strategy Learn the ropes http://www.abbottanalytics.com ©2011 Decision Management Solutions 51
  • 52. Thank you! James Taylor, CEO james@decisionmanagementsolutions.com www.decisionmangementsolutions.com/learnmo re

Notes de l'éditeur

  1. Webinar: 10 best practices in operational analyticsOne of the most powerful ways to apply advanced analytics is by putting them to work in operational systems. Using analytics to improve the way every transaction, every customer, every website visitor is handled is tremendously effective. The multiplicative effect means that even small analytic improvements add up to real business benefit.In this session James Taylor and Dean Abbott will provide you with 10 best practices to make sure you can effectively build and deploy analytic models into you operational systems.<Quick overview on operational decisions and the value of analytics in operational systems>1) Be flexible; data mining is not a set of rules! (though the results may be)I need to work this point out more2) Avoid three key data preparation and modeling mistakesblindly using data mining software defaultsforgetting some algorithms assume particular distributions of dataassuming powerful algorithms can "figure out" the model3) Diversity is strength: build lots of modelsalgorithms have strengths and weakness; leverage multiple families of algorithms to improve understanding of the datamodel ensembles can provide significant improvements in model accuracy4) Pick the right metric to assess modelsthe metric dictates which model will be selectedthe metric should match the business objective, not how algorithms view the models5) Have deployment in mind when building modelsdifferent approaches are necessary for real-time deployment vs. batch deployment or offline deploymentbiggest problem: moving all data preparation from data mining tool environment to the database6) Focus on actionsKnowing is not enough, must actMake sure you understand the options, how the model helps you select between them, what the regulations and policies are7) The three legged stoolOperational decisions have to work for three different groups – business IT and analyticsLike a three legged stool it will only stay up if all three groups are working togetherCollaboration across the groups is key8) Focus on explicabilityBusiness people understand their business, IT people understand their systemsThe models, and the actions taken in response to them, must be explicableOperational decisions are often regulatedConsider model representations like scorecards, decision trees, rules and an implementation platform like a brms to ensure explicability9) Build in decision analysisNo decision is static, no decision remains good over timeModels too age and degradeAny decision implementation must therefre monitor results, to see if it is degrading, and constantly challenge itself with new approaches, new models, new rules to see if it could be improved.Test and learn10) BWTDIMBegin with the decision in mind
  2. As we are talking about decisions it is worth remembering that all decisions matter, as Peter Drucker noted. Not just the big, strategic decisions of your executives but the day to day decisions that drive your business.
  3. Models make predictions but predictions alone will not help much – you must ACT based on those predictions.When you are thinking about smarter systems, taking action means having the system take action in a way that uses the predictions you made. You need to make a decision based on those predictions and this means combining the models with rules about how and when to act.Let’s take our retention example from earlier. Knowing that a customer is a retention risk is interesting, acting appropriately and in time to prevent them leaving is usefulGrovel index story
  4. Story about powerpoint modelRisks of models that are done separately and the need to put them to workPredictive models don’t DO anything, they just make predictionsRules make them actionableTaking the rules, for instance, that represent a segmentation and deploying them into a decision makes them actionable
  5. Remember – decisions are where the business, analytics and IT all come together
  6. Once deployed analytics cannot be a “black box”, we must understand analytic performanceObviously you need a 'hold out sample' or business as usual random group to compare to.You need to understand what's working and what's the next challenge – which segments are being retained, for instanceYou must understand operational negation.You need to track input variables, scores, decisions or actions taken (classic example is in collections where a strategy may dictate a 'do nothing' strategy, but the collections manager overrides the decision and puts the accounts into a calling queue) and operational data that fed the decisionBoth analysts and business users must think about what they can do to improve decision making, which is the foundation of adaptive controlIn our retention example I need to have some customers I don’t attempt to retain or that I don’t spend any money retaining. I have to capture what the call center representative ACTUALLY offered and what was actually accepted (if anything), not just what SHOULD have been offered and I have to be able to show the results to my business users in terms they understand.When decisions have to be compliant, and many do, or when decisions might have to be explained or justified in court or even in the court of public opinion, automated systems can be a challengeWhere a judge or journalist can talk to people who made decisions and review company policy documents, they don’t do so well talking to computers or reviewing math and code.If a decision is automated it must be possible to log how the decision was made, how predictions were calculated, what actions were taken and why. This must be something that can be reviewed, even made public. Business rules and models like decision trees and scorecards are particularly helpful in this respect.You need models that are good at explaining their actions - scorecards and decision trees/strategies for example – and the ability to trace these decisions historically and document them.Retention offers may not seem like they have a big compliance issue but what if a particular group of customers argues they are being discriminated against because they always seem to get worse offers than another group? Could your business users explain exactly how it was done? Could you show a judge and a jury that your approach was fair and reasonable?
  7. Analytics improve decision makingFind problem areas and improveSuggest rules to close the gapsEnhance data with predictive analytics
  8. Begin!Identify your decisionsHidden decisions, transactional decisions, customer decisionsDecisions buried in complex processesDecisions that are the difference between two processesConsiderWho takes them nowWhat drives changes in themAssess Change ReadinessConsider Organizational changeAdopt decisioning technologyAdopt business rules approach and technologyInvestigate data mining and predictive analyticsThink about adaptive control
  9. Decision Management Solutions can help youFind the right decisions to apply business rules, analyticsImplement a decision management blueprintDefine a strategy for business rule or analytic adoptionYou are welcome to email me directly, james at decision management solutions.com or you can go to decision management solutions.com / learn more. There you’ll find links to contact me, check out the blog and find more resources for learning about Decision Management.