SlideShare une entreprise Scribd logo
1  sur  39
Télécharger pour lire hors ligne
Galit Shmuéli
       Ij       Israel Statistical Association
                    & Tel Aviv University
                         July 9, 2012


 To Explain or To Predict?
      ?‫להסביר או לנבא‬
Points for discussion: goo.gl/gcjlN

Twitter: #explainpredict
Road Map
Definitions
Explanatory-dominated social sciences
Explanatory ≠ predictive modeling
 Why?
 Different modeling paths
 Explanatory vs. predictive power


So what?
Definitions

Explanatory modeling:
Theory-based, statistical testing of
causal hypotheses

Explanatory power:
Strength of relationship in statistical
model
Definitions

Predictive modeling:
Empirical method for predicting new
observations

Predictive power:
Ability to accurately predict new
observations
Statistical modeling in
     social science research



Purpose: test causal theory (“explain”)
           Association-based statistical models
                         Prediction nearly absent
Explanatory modeling à-la social sciences
Start with a causal
theory

Generate causal
hypotheses on
constructs

Operationalize constructs → Measurable variables

Fit statistical model

Statistical inference → Causal conclusions
In the social sciences,

data analysis is mainly used for testing
            causal theory.

     “If it explains, it predicts”
“Empirical prediction alone
            is un-scientific”

Some statisticians share this view:

   The two goals in analyzing data... I prefer to describe
   as “management” and “science”. Management seeks
   profit... Science seeks truth.

                        - Parzen, Statistical Science 2001
52 “predictive” articles among 1,072
in Information Systems top journals
Why Predict? for Scientific Research
          new theory
          develop measures
          compare theories
          improve theory
          assess relevance
          predictability

Shmueli & Koppius, “Predictive Analytics in IS Research”
(MISQ, 2011)
“A good explanatory model will also
predict well”
“You must understand the underlying
causes in order to predict”
Philosophy of Science
“Explanation and prediction have the
same logical structure”
                Hempel & Oppenheim, 1948

  “It becomes pertinent to investigate the
  possibilities of predictive procedures
  autonomous of those used for explanation”
                             Helmer & Rescher, 1959

         “Theories of social and human behavior
         address themselves to two distinct goals of
         science: (1) prediction and (2) understanding”
                                Dubin, Theory Building, 1969
Why statistical

explanatory modeling
       differs from

predictive modeling
   Shmueli (2010), Statistical Science
Theory vs. its manifestation




                     ?
Notation

Theoretical constructs: X, Y
Causal theoretical model: Y=F(X)
Measurable variables: X, Y
Statistical model: E(y)=f(X)
Four aspects                 Y=F(X)
                             E(Y)=f(X)
1. Theory – Data
2. Causation – Association
3. Retrospective – Prospective
4. Bias - Variance
“The goal of finding models that are
predictively accurate differs from the
goal of finding models that are true.”
Point #1
Best explanatory model


              ≠
       Best predictive model
Four aspects                 Y=F(X)
                             Y=f(X)
1. Theory - Data
2. Causation – Association
3. Retrospective – Prospective
4. Bias - Variance
Predict ≠ Explain
               “we tried to benefit from an extensive
               set of attributes describing each of the
               movies in the dataset. Those attributes
               certainly carry a significant signal and
                +
               can explain some of the user behavior.
               However… they could not help at all
                                                       ?
               for improving the [predictive]
               accuracy.”
                                         Bell et al., 2008
Predict ≠ Explain
The FDA considers two products
bioequivalent if the 90% CI of the
relative mean of the generic to brand
formulation is within 80%-125%




“We are planning to… develop predictive models for bioavailability
and bioequivalence”
                                           Lester M. Crawford, 2005
                                Acting Commissioner of Food & Drugs
Goal       Design &         Data          EDA
Definition   Collection    Preparation




Variables?                               Model Use &
Methods?     Evaluation,                  Reporting
             Validation
              & Model
              Selection
Study design
    & data collection
Observational or experiment?
Primary or secondary data?
Instrument (reliability+validity vs. measur accuracy)
How much data?
How to sample?

                             Hierarchical data
Data Preprocessing




   missing    reduced-
               feature
               models
                         partitioning
Data exploration & reduction

                   Interactive
                  visualization
                      PCA
                      SVD
Which Variables?



                        endogeneity
                          ex-post
                         availability
causation associations
  Multicollinearity? A, B, A*B?
Methods / Models
                    Blackbox / interpretable
                    Mapping to theory


   variance                       bias




Shrinkage models
           ensembles
Model fit ≠
              Validation
                                 Explanatory power

Theoretical                Empirical
                                              Data
  model                     model

        Evaluation, Validation
          & Model Selection

Empirical                  Training data      Over-fitting
 model                     Holdout data        analysis
         Predictive power
Model Use
 test causal theory         Inference
                             Null hypothesis


new theory
Develop measures
compare theories      Predictive performance
improve theory         Naïve/baseline
assess relevance      Over-fitting analysis
predictability
Point #2

Explanatory            Predictive
  Power         ≠        Power

Cannot infer one from the other
out-of-sample
 interpretation

p-values                        prediction
                                 accuracy
               Performance
R2                                      costs
                 Metrics
                             Training vs.
goodness-of-fit              holdout
     type I,II errors    over-fitting
Predictive Power




                   Explanatory Power
The predictive power of an
explanatory model has important
scientific value


Relevance, reality check, predictability
In “explanatory” fields
Prediction underappreciated

Distinction blurred
Unfamiliar with predictive
modeling/assessment
  “While the value of scientific prediction… is beyond
  question… the inexact sciences [do not] have…the
  use of predictive expertise well in hand.”
                               Helmer & Rescher, 1959
How does all this impact
   Scientific Research?
What can be done?

   acknowledge
incorporate prediction into
         curriculum
What happens in other fields?

     Epidemiology
         Engineering
             Life sciences

What about “predictive only”
fields?           http://goo.gl/gcjlN
Shmueli (2010), “To Explain or To Predict?”, Statistical Science
Shmueli & Koppius (2011), “Predictive analytics in IS research”, MISQ

Contenu connexe

Tendances

Hedging Predictions in Machine Learning
Hedging Predictions in Machine LearningHedging Predictions in Machine Learning
Hedging Predictions in Machine Learningbutest
 
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)Predictive analytics in Information Systems Research (TSWIM 2015 keynote)
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)Galit Shmueli
 
Statistical and Predictive Modelling
Statistical and Predictive ModellingStatistical and Predictive Modelling
Statistical and Predictive ModellingJMP software from SAS
 
MAT80 - White paper july 2017 - Prof. P. Irwing
MAT80 - White paper july 2017 - Prof. P. IrwingMAT80 - White paper july 2017 - Prof. P. Irwing
MAT80 - White paper july 2017 - Prof. P. IrwingPaul Irwing
 
Causal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine LearningCausal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine LearningBill Liu
 
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)Hypothesis Testing: Central Tendency – Normal (Compare 1:1)
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)Matt Hansen
 
Hypothesis Testing: Finding the Right Statistical Test
Hypothesis Testing: Finding the Right Statistical TestHypothesis Testing: Finding the Right Statistical Test
Hypothesis Testing: Finding the Right Statistical TestMatt Hansen
 
Modul Ajar Statistika Inferensia ke-12: Uji Asumsi Klasik pada Regresi Linier...
Modul Ajar Statistika Inferensia ke-12: Uji Asumsi Klasik pada Regresi Linier...Modul Ajar Statistika Inferensia ke-12: Uji Asumsi Klasik pada Regresi Linier...
Modul Ajar Statistika Inferensia ke-12: Uji Asumsi Klasik pada Regresi Linier...Arif Rahman
 
Hypothesis Testing: Proportions (Compare 2+ Factors)
Hypothesis Testing: Proportions (Compare 2+ Factors)Hypothesis Testing: Proportions (Compare 2+ Factors)
Hypothesis Testing: Proportions (Compare 2+ Factors)Matt Hansen
 
Collaboration with Statistician? 矩陣視覺化於探索式資料分析
Collaboration with Statistician? 矩陣視覺化於探索式資料分析Collaboration with Statistician? 矩陣視覺化於探索式資料分析
Collaboration with Statistician? 矩陣視覺化於探索式資料分析台灣資料科學年會
 
To combine forecasts or to combine forecast models?
To combine forecasts or to combine forecast models?To combine forecasts or to combine forecast models?
To combine forecasts or to combine forecast models?Devon K. Barrow
 
Hypothesis Testing: Relationships (Compare 1:1)
Hypothesis Testing: Relationships (Compare 1:1)Hypothesis Testing: Relationships (Compare 1:1)
Hypothesis Testing: Relationships (Compare 1:1)Matt Hansen
 
Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)Matt Hansen
 
Hypothesis Testing: Spread (Compare 1:1)
Hypothesis Testing: Spread (Compare 1:1)Hypothesis Testing: Spread (Compare 1:1)
Hypothesis Testing: Spread (Compare 1:1)Matt Hansen
 
Project two guidelines and rubric.html competencyin this pr
Project two guidelines and rubric.html competencyin this prProject two guidelines and rubric.html competencyin this pr
Project two guidelines and rubric.html competencyin this prPOLY33
 

Tendances (19)

Hedging Predictions in Machine Learning
Hedging Predictions in Machine LearningHedging Predictions in Machine Learning
Hedging Predictions in Machine Learning
 
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)Predictive analytics in Information Systems Research (TSWIM 2015 keynote)
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)
 
Statistical and Predictive Modelling
Statistical and Predictive ModellingStatistical and Predictive Modelling
Statistical and Predictive Modelling
 
MAT80 - White paper july 2017 - Prof. P. Irwing
MAT80 - White paper july 2017 - Prof. P. IrwingMAT80 - White paper july 2017 - Prof. P. Irwing
MAT80 - White paper july 2017 - Prof. P. Irwing
 
Causal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine LearningCausal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine Learning
 
FSRM 582 Project
FSRM 582 ProjectFSRM 582 Project
FSRM 582 Project
 
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)Hypothesis Testing: Central Tendency – Normal (Compare 1:1)
Hypothesis Testing: Central Tendency – Normal (Compare 1:1)
 
Hypothesis Testing: Finding the Right Statistical Test
Hypothesis Testing: Finding the Right Statistical TestHypothesis Testing: Finding the Right Statistical Test
Hypothesis Testing: Finding the Right Statistical Test
 
Modul Ajar Statistika Inferensia ke-12: Uji Asumsi Klasik pada Regresi Linier...
Modul Ajar Statistika Inferensia ke-12: Uji Asumsi Klasik pada Regresi Linier...Modul Ajar Statistika Inferensia ke-12: Uji Asumsi Klasik pada Regresi Linier...
Modul Ajar Statistika Inferensia ke-12: Uji Asumsi Klasik pada Regresi Linier...
 
Hypothesis Testing: Proportions (Compare 2+ Factors)
Hypothesis Testing: Proportions (Compare 2+ Factors)Hypothesis Testing: Proportions (Compare 2+ Factors)
Hypothesis Testing: Proportions (Compare 2+ Factors)
 
Collaboration with Statistician? 矩陣視覺化於探索式資料分析
Collaboration with Statistician? 矩陣視覺化於探索式資料分析Collaboration with Statistician? 矩陣視覺化於探索式資料分析
Collaboration with Statistician? 矩陣視覺化於探索式資料分析
 
To combine forecasts or to combine forecast models?
To combine forecasts or to combine forecast models?To combine forecasts or to combine forecast models?
To combine forecasts or to combine forecast models?
 
Hypothesis Testing: Relationships (Compare 1:1)
Hypothesis Testing: Relationships (Compare 1:1)Hypothesis Testing: Relationships (Compare 1:1)
Hypothesis Testing: Relationships (Compare 1:1)
 
Research methodology
Research methodologyResearch methodology
Research methodology
 
50134 09
50134 0950134 09
50134 09
 
Hypothesis
HypothesisHypothesis
Hypothesis
 
Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)
 
Hypothesis Testing: Spread (Compare 1:1)
Hypothesis Testing: Spread (Compare 1:1)Hypothesis Testing: Spread (Compare 1:1)
Hypothesis Testing: Spread (Compare 1:1)
 
Project two guidelines and rubric.html competencyin this pr
Project two guidelines and rubric.html competencyin this prProject two guidelines and rubric.html competencyin this pr
Project two guidelines and rubric.html competencyin this pr
 

Similaire à Shmueli

Statistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, DescribingStatistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, DescribingGalit Shmueli
 
1.model building
1.model building1.model building
1.model buildingVinod Sahu
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testingpraveen3030
 
Research Methodology
Research MethodologyResearch Methodology
Research MethodologyAneel Raza
 
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...jemille6
 
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docxDeliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docxtheodorelove43763
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Miningbutest
 
Inverse Modeling for Cognitive Science "in the Wild"
Inverse Modeling for Cognitive Science "in the Wild"Inverse Modeling for Cognitive Science "in the Wild"
Inverse Modeling for Cognitive Science "in the Wild"Aalto University
 
2 types of research
2 types of research2 types of research
2 types of researchNaveed Saeed
 
Lec 2 types of research
Lec 2 types of researchLec 2 types of research
Lec 2 types of researchNaveed Saeed
 
Bps managing dissertation
Bps managing dissertationBps managing dissertation
Bps managing dissertationChuck Eesley
 
TPCMFinalACone
TPCMFinalAConeTPCMFinalACone
TPCMFinalAConeAdam Cone
 
D. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severelyD. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severelyjemille6
 
Theory Building in Business Research
Theory Building in Business ResearchTheory Building in Business Research
Theory Building in Business ResearchRajesh Timane, PhD
 
How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkStats Statswork
 
Item Response Theory in Constructing Measures
Item Response Theory in Constructing MeasuresItem Response Theory in Constructing Measures
Item Response Theory in Constructing MeasuresCarlo Magno
 
Pharmacokinetic pharmacodynamic modeling
Pharmacokinetic pharmacodynamic modelingPharmacokinetic pharmacodynamic modeling
Pharmacokinetic pharmacodynamic modelingMeghana Gowda
 

Similaire à Shmueli (20)

Statistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, DescribingStatistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, Describing
 
1.model building
1.model building1.model building
1.model building
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
Research Methodology
Research MethodologyResearch Methodology
Research Methodology
 
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
 
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docxDeliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Mining
 
Inverse Modeling for Cognitive Science "in the Wild"
Inverse Modeling for Cognitive Science "in the Wild"Inverse Modeling for Cognitive Science "in the Wild"
Inverse Modeling for Cognitive Science "in the Wild"
 
2 types of research
2 types of research2 types of research
2 types of research
 
Lec 2 types of research
Lec 2 types of researchLec 2 types of research
Lec 2 types of research
 
man0 ppt.pptx
man0 ppt.pptxman0 ppt.pptx
man0 ppt.pptx
 
Bps managing dissertation
Bps managing dissertationBps managing dissertation
Bps managing dissertation
 
Mgmt 802 week 1(1)
Mgmt 802 week 1(1)Mgmt 802 week 1(1)
Mgmt 802 week 1(1)
 
TPCMFinalACone
TPCMFinalAConeTPCMFinalACone
TPCMFinalACone
 
D. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severelyD. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severely
 
Theory Building in Business Research
Theory Building in Business ResearchTheory Building in Business Research
Theory Building in Business Research
 
The Business Research Method
The Business Research MethodThe Business Research Method
The Business Research Method
 
How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - Statswork
 
Item Response Theory in Constructing Measures
Item Response Theory in Constructing MeasuresItem Response Theory in Constructing Measures
Item Response Theory in Constructing Measures
 
Pharmacokinetic pharmacodynamic modeling
Pharmacokinetic pharmacodynamic modelingPharmacokinetic pharmacodynamic modeling
Pharmacokinetic pharmacodynamic modeling
 

Dernier

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 

Dernier (20)

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 

Shmueli

  • 1. Galit Shmuéli Ij Israel Statistical Association & Tel Aviv University July 9, 2012 To Explain or To Predict? ?‫להסביר או לנבא‬
  • 2. Points for discussion: goo.gl/gcjlN Twitter: #explainpredict
  • 3. Road Map Definitions Explanatory-dominated social sciences Explanatory ≠ predictive modeling Why? Different modeling paths Explanatory vs. predictive power So what?
  • 4. Definitions Explanatory modeling: Theory-based, statistical testing of causal hypotheses Explanatory power: Strength of relationship in statistical model
  • 5. Definitions Predictive modeling: Empirical method for predicting new observations Predictive power: Ability to accurately predict new observations
  • 6. Statistical modeling in social science research Purpose: test causal theory (“explain”) Association-based statistical models Prediction nearly absent
  • 7. Explanatory modeling à-la social sciences Start with a causal theory Generate causal hypotheses on constructs Operationalize constructs → Measurable variables Fit statistical model Statistical inference → Causal conclusions
  • 8. In the social sciences, data analysis is mainly used for testing causal theory. “If it explains, it predicts”
  • 9. “Empirical prediction alone is un-scientific” Some statisticians share this view: The two goals in analyzing data... I prefer to describe as “management” and “science”. Management seeks profit... Science seeks truth. - Parzen, Statistical Science 2001
  • 10. 52 “predictive” articles among 1,072 in Information Systems top journals
  • 11. Why Predict? for Scientific Research new theory develop measures compare theories improve theory assess relevance predictability Shmueli & Koppius, “Predictive Analytics in IS Research” (MISQ, 2011)
  • 12. “A good explanatory model will also predict well” “You must understand the underlying causes in order to predict”
  • 13. Philosophy of Science “Explanation and prediction have the same logical structure” Hempel & Oppenheim, 1948 “It becomes pertinent to investigate the possibilities of predictive procedures autonomous of those used for explanation” Helmer & Rescher, 1959 “Theories of social and human behavior address themselves to two distinct goals of science: (1) prediction and (2) understanding” Dubin, Theory Building, 1969
  • 14. Why statistical explanatory modeling differs from predictive modeling Shmueli (2010), Statistical Science
  • 15. Theory vs. its manifestation ?
  • 16. Notation Theoretical constructs: X, Y Causal theoretical model: Y=F(X) Measurable variables: X, Y Statistical model: E(y)=f(X)
  • 17. Four aspects Y=F(X) E(Y)=f(X) 1. Theory – Data 2. Causation – Association 3. Retrospective – Prospective 4. Bias - Variance
  • 18. “The goal of finding models that are predictively accurate differs from the goal of finding models that are true.”
  • 19. Point #1 Best explanatory model ≠ Best predictive model
  • 20. Four aspects Y=F(X) Y=f(X) 1. Theory - Data 2. Causation – Association 3. Retrospective – Prospective 4. Bias - Variance
  • 21. Predict ≠ Explain “we tried to benefit from an extensive set of attributes describing each of the movies in the dataset. Those attributes certainly carry a significant signal and + can explain some of the user behavior. However… they could not help at all ? for improving the [predictive] accuracy.” Bell et al., 2008
  • 22. Predict ≠ Explain The FDA considers two products bioequivalent if the 90% CI of the relative mean of the generic to brand formulation is within 80%-125% “We are planning to… develop predictive models for bioavailability and bioequivalence” Lester M. Crawford, 2005 Acting Commissioner of Food & Drugs
  • 23. Goal Design & Data EDA Definition Collection Preparation Variables? Model Use & Methods? Evaluation, Reporting Validation & Model Selection
  • 24. Study design & data collection Observational or experiment? Primary or secondary data? Instrument (reliability+validity vs. measur accuracy) How much data? How to sample? Hierarchical data
  • 25. Data Preprocessing missing reduced- feature models partitioning
  • 26. Data exploration & reduction Interactive visualization PCA SVD
  • 27. Which Variables? endogeneity ex-post availability causation associations Multicollinearity? A, B, A*B?
  • 28. Methods / Models Blackbox / interpretable Mapping to theory variance bias Shrinkage models ensembles
  • 29. Model fit ≠ Validation Explanatory power Theoretical Empirical Data model model Evaluation, Validation & Model Selection Empirical Training data Over-fitting model Holdout data analysis Predictive power
  • 30. Model Use test causal theory Inference Null hypothesis new theory Develop measures compare theories Predictive performance improve theory Naïve/baseline assess relevance Over-fitting analysis predictability
  • 31. Point #2 Explanatory Predictive Power ≠ Power Cannot infer one from the other
  • 32. out-of-sample interpretation p-values prediction accuracy Performance R2 costs Metrics Training vs. goodness-of-fit holdout type I,II errors over-fitting
  • 33. Predictive Power Explanatory Power
  • 34. The predictive power of an explanatory model has important scientific value Relevance, reality check, predictability
  • 35. In “explanatory” fields Prediction underappreciated Distinction blurred Unfamiliar with predictive modeling/assessment “While the value of scientific prediction… is beyond question… the inexact sciences [do not] have…the use of predictive expertise well in hand.” Helmer & Rescher, 1959
  • 36. How does all this impact Scientific Research?
  • 37. What can be done? acknowledge incorporate prediction into curriculum
  • 38. What happens in other fields? Epidemiology Engineering Life sciences What about “predictive only” fields? http://goo.gl/gcjlN
  • 39. Shmueli (2010), “To Explain or To Predict?”, Statistical Science Shmueli & Koppius (2011), “Predictive analytics in IS research”, MISQ