SlideShare une entreprise Scribd logo
1  sur  30
Télécharger pour lire hors ligne
On InfoQ and PSE:

          A brief introduction


                      Ron S. Kenett
KPA Ltd., Raanana, Israel and University of Torino, Torino, Italy
                       ron@kpa.co.il

                                                                    1
Introduction
This presentation is about doing the right research with
statistical methods, the right way - we call it Quality
Research. Research is a critical activity leading to
knowledge acquisition and formulation of policies and
management decisions.

By effective research we mean research that produces an
impact, as intended by decision makers. One measure of
effective research is Information Quality (InfoQ), an
approach developed by Kenett and Shmueli (2009) to
assess Information Quality. Practical Statistical Efficiency
(PSE) is assessing the level of implementation of the
research recommendations (Kenett, Coleman and
Stewardson, 2003).
                                                               2
Information Quality (InfoQ)




Are we doing the right research?              3
Information Quality (InfoQ)
                                     Knowledge

                                                                    Goals
                                       Information
                                          Quality


                                     Data               Analysis
                                    Quality             Quality
Primary Data      Secondary Data       Kenett, R. abd Shmueli, G., “On Information Quality”,
- Experimental    - Experimental
- Observational   - Observational      http://ssrn.com/abstract=1464444, 2009.               4
Practical Statistical Efficiency (PSE)




Are our research recommendations having an impact?   5
Practical Statistical Efficiency (PSE)

    PSE = E{R} x T {I} x P {I} x V {PS} x P {S} x V {P} x V {M} x V {D}

    • V{D} = value of the data actually collected
    • V{M} = value of the statistical method employed
    • V{P} = value of the problem to be solved
    • P{S} = probability that the problem actually gets solved
    • V{PS} = value of the problem being solved
    • P{I} = probability the solution is actually implemented
    • T{I} = time the solution stays implemented
    • E{R} = expected number of replications
Kenett, R.S., Coleman, S.Y. and Stewardson, D. (2003), “Statistical Efficiency: The
Practical Perspective”, Quality and Reliability Engineering International, 19: 265-272.   6
Information Quality (InfoQ)
      Knowledge

                        Goals
        Information
           Quality


      Data        Analysis
     Quality      Quality

                                7
Information Quality (InfoQ)

1.   Data resolution
2.   Data structure
3.   Data integration
4.   Temporal relevance
5.   Sampling bias
6.   Chronology of data and goal
7.   Concept operationalization
8.   Communication and data visualization

                                            8
The InfoQ Suisse Cheese Model
                        Sampling             Concept
                          bias           operationalization
                                                              Communication and
                                                               data visualization
                                     Chronology of
                                     data and goal




  Data
resolution


      Data structure


                          Data        Temporal
                       integration    relevance                                9
InfoQ1: Data Resolution
• Two aspects of data resolution are measurement
  scale and data aggregation.
• The measurement scale of the data must be
  adequate for the purpose of the study.
• The level of aggregation of the data relative to the
  task at hand. For example, consider data on daily
  purchases of over-the-counter medications at a large
  pharmacy. If the goal of the analysis is to forecast
  future inventory levels of different medications, when
  re-stocking is done on a weekly basis, then we would
  prefer weekly aggregate data to daily aggregate
  data.

                                                           10
InfoQ2: Data Structure
• The data can combine structured quantitative
  data with unstructured, semantic based data.
• For example, in assessing the reputation of an
  organization one might combine data derived
  from balance sheets with data mined from text
  such as newspaper archives or press reports.




                                                   11
InfoQ3: Data Integration
• Knowledge is often spread out across multiple
  data sources.
• Hence, identifying the different relevant
  sources, collecting the relevant data, and
  integrating the data, directly affect information
  quality.




                                                      12
InfoQ4: Temporal Relevance
• A data set contains information collected during a
  certain period of time. The degree of relevance of the
  data to the current goal at hand must be assessed.
• For instance, in order to learn about current online
  shopping behaviors, a dataset that records online
  purchase behavior (such as Comscore data
  (www.comscore.com)) can be irrelevant if it is even
  several years old, because of the fast changing
  online shopping environment.


                                                           13
InfoQ5: Chronology of Data and Goal
• A data set contains daily weather information for a particular
  city for a certain period as well as information on the Air
  Quality Index (AQI) on those days.
• For the United States such data are publicly available from
  the National Oceanic and Atmospheric Administration website
  (http://www.noaa.gov). To assess the quality of the
  information contained in this data set, we must consider the
  purpose of the analysis.
• Although AQI is widely used (for instance, for issuing a “code
  red” day), how it is computed is not easy to figure out. One
  analysis goal might therefore be to find out how AQI is
  computed from weather data (by reverse-engineering). For
  such a purpose, this data is likely to contain high quality
  information. In contrast, if the goal is to predict future AQI
  levels, then the data on past temperatures contains low-
  quality information.
                                                                   14
InfoQ6: Sampling Bias
• A clear definition of the population of interest and how the
  sample relates to that population is necessary in both primary
  and secondary analyses.
• Dealing with sampling bias can be proactive or retroactive. In
  studies where there is control over the design (e.g., surveys),
  sampling schemes are selected to reduce bias. Such
  methods do not apply to retrospective studies. However,
  retroactive measures such as post-stratification weighting,
  which are often used in survey analysis, can be useful in
  secondary studies as well.




                                                                    15
InfoQ7: Concept Operationalization
• Observable data are an operationalization of
  underlying concepts. “Anger” can be measured via a
  questionnaire or by measuring blood pressure;
  “economic prosperity” can be measured via income
  or by unemployment rate; and “length” can be
  measured in centimeters or in inches.
• The role of concept operationalization is different for
  explanatory, predictive, and descriptive goals,.



                                                            16
InfoQ8: Communication and Data
             Visualization
• If crucial information does not reach the right
  person at the right time, then the quality of
  information becomes poor.
• Data visualization is also directly related to the
  quality of information. Poor visualization can
  lead to degradation of the information
  contained in the data.


                                                       17
The InfoQ Score
     For each measure, Yi(x) is defined as a univariate desirability function di(Yi)
     which assigns numbers between 0 and 1 to the possible values of Yi, with
     di(Yi)=0 representing a completely undesirable value of Yi and di(Yi)=1
     representing a completely desirable or ideal response value. The individual
     desirabilities are then combined to an overall desirability index using the
     geometric mean of the individual desirabilities:
                  Desirability Function = [(d1(Y1) x d2(Y2))x … dk(Yk))]1/k
     with k denoting the number of measures. Notice that if any response Yi is
     completely undesirable (di(Yi) = 0), then the overall desirability is zero.
     We use the Desirability Function to compute an InfoQ Score based on an
     assessment of indicators reflecting the 8 InfoQ dimensions.


Derringer, G., and Suich, R., (1980), "Simultaneous Optimization of Several Response
Variables," Journal of Quality Technology, 12, 4, 214-219.
Harrington, E. C. (1965). The desirability function. Industrial Quality Control, 21, 494-498   18
The InfoQ Score
       InfoQ Score = [(d1(Y1) x d2(Y2))x …d8(Y8))]1/8


                                               1.   Data resolution
                                               2.   Data structure
                                               3.   Data integration
                                               4.   Temporal relevance
                                               5.   Sampling bias
                                               6.   Chronology of data and goal
                                               7.   Concept operationalization
   5
               1   2     4   6   3   7   8     8.   Communication and data visualization



The lower                                    The higher
             On target
the better                                   the better
                                                                                    19
Practical Statistical Efficiency (PSE)

PSE = E{R} x T {I} x P {I} x V {PS} x P {S} x V {P} x V {M} x V {D}

• V{D} = value of the data actually collected
• V{M} = value of the statistical method employed
• V{P} = value of the problem to be solved
• P{S} = probability that the problem actually gets solved
• V{PS} = value of the problem being solved
• P{I} = probability the solution is actually implemented
• T{I} = time the solution stays implemented
• E{R} = expected number of replications
                                                                  20
V{D} = value of the data actually collected


PSE = E{R} x T {I} x P {I} x V {PS} x P {S} x V {P} x V {M} x V {D}


     Readily accessible data, is like
    observations below the lamppost
          where there is light -
   not necessarily where you lost your
    key or where the answer to your
              problem lies
                                                                  21
V{M} = value of the statistical method employed

PSE = E{R} x T {I} x P {I} x V {PS} x P {S} x V {P} x V {M} x V {D}

   A mathematical definition of statistical
            efficiency is given by:
Relative Efficiency of Test A versus Test B =
         Ratio of sample size for test
  A to sample size for test B, where sample
      sizes are determined so that both
tests reach a certain power against the same
                  alternative.                                    22
V{P} = value of the problem to be solved


PSE = E{R} x T {I} x P {I} x V {PS} x P {S} x V {P} x V {M} x V {D}

 Statisticians too often forget this
part of the equation. We frequently
 choose problems to be solved on
the basis of their statistical interest
  rather than the value of solving
                them.
                                                                  23
P{S} = probability that the problem actually gets solved


PSE = E{R} x T {I} x P {I} x V {PS} x P {S} x V {P} x V {M} x V {D}



      Usually no one method or attempt
      actually solves the entire problem,
       only part of it. So this part of the
      equation could be expressed as a
                    fraction

                                                                  24
V{PS} = value of the problem being solved


PSE = E{R} x T {I} x P {I} x V {PS} x P {S} x V {P} x V {M} x V {D}


   This is both a statistical question and
      a management question. Did the
     method work and lead to a solution
       that worked and were the data,
    information and resources available
            to solve the problem?
                                                                  25
P{I} = probability the solution is actually implemented


PSE = E{R} x T {I} x P {I} x V {PS} x P {S} x V {P} x V {M} x V {D}


Here is the non-statistical part of the
     equation that is often the most
difficult to evaluate. Implementing the
 solution may be far harder than just
      coming up with the solution.
                                                                  26
T{I} = time the solution stays implemented

PSE = E{R} x T {I} x P {I} x V {PS} x P {S} x V {P} x V {M} x V {D}



Problems have the tendency not to
stay solved. This is why we need to
put much emphasis on holding the
gains in any process improvement.

                                                                  27
E{R} = expected number of replications


PSE = E{R} x T {I} x P {I} x V {PS} x P {S} x V {P} x V {M} x V {D}



This is the part most often missed in
 companies. If the basic idea of the
solution could be replicated in other
 areas of the company, the savings
         could be enormous.
                                                                  28
The Quality Ladder: Matching Management
             Approach with Statistical Methods
              Quality by Design                Design of Experiments



        Process Improvement                    Statistical Process Control



                        Inspection             Sampling



                     Fire Fighting             Data Accumulation
Kenett, R. and Zacks S., Modern Industrial Statistics: Design and Control of Quality
and Reliability (with S. Zacks), Duxbury Press, San Francisco, 1998, Spanish edition
2002, 2nd paperback edition 2002, Chinese edition 2004.                                29
The Statistical Efficiency Conjecture
      Let PSE = PSE of a specific project and L= the maturity level of an
      organization on the Quality Ladder (L=1,…4).
      PSE is a random variable with specific realisations for individual projects.
      E{ PSE } = The expected value of PSE in a given organisation over all
      projects.
      The Statistical Efficiency Conjecture is linking Expected Practical Statistical
      Efficiency with the maturity of an organisation on the Quality Ladder.
      In more formal terms it is stated as:

      Conditioned on the right variable,
                   E{ PSE } is an increasing function of L
                          We partially demonstrated this with 21 case studies
Kenett, R., De Frenne, A., Tort-Martorell, X and McCollin, C., The Statistical Efficiency
Conjecture, Chapter 4 in Applying Statistical Methods in Business and Industry –
the state of the art , Coleman S., Greenfield, T. and Montgomery, D. (editors), John        30
Wiley and Sons, 2008.

Contenu connexe

Tendances

Opinion mining framework using proposed RB-bayes model for text classication
Opinion mining framework using proposed RB-bayes model for text classicationOpinion mining framework using proposed RB-bayes model for text classication
Opinion mining framework using proposed RB-bayes model for text classicationIJECEIAES
 
5 クラスタリングと異常検出
5 クラスタリングと異常検出5 クラスタリングと異常検出
5 クラスタリングと異常検出Seiichi Uchida
 
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET Journal
 
Probabilistic Interestingness Measures - An Introduction with Bayesian Belief...
Probabilistic Interestingness Measures - An Introduction with Bayesian Belief...Probabilistic Interestingness Measures - An Introduction with Bayesian Belief...
Probabilistic Interestingness Measures - An Introduction with Bayesian Belief...Adnan Masood
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining ProcessMarc Berman
 
Illustration of Medical Image Segmentation based on Clustering Algorithms
Illustration of Medical Image Segmentation based on Clustering AlgorithmsIllustration of Medical Image Segmentation based on Clustering Algorithms
Illustration of Medical Image Segmentation based on Clustering Algorithmsrahulmonikasharma
 
Associative Regressive Decision Rule Mining for Predicting Customer Satisfact...
Associative Regressive Decision Rule Mining for Predicting Customer Satisfact...Associative Regressive Decision Rule Mining for Predicting Customer Satisfact...
Associative Regressive Decision Rule Mining for Predicting Customer Satisfact...csandit
 
IRJET - Employee Performance Prediction System using Data Mining
IRJET - Employee Performance Prediction System using Data MiningIRJET - Employee Performance Prediction System using Data Mining
IRJET - Employee Performance Prediction System using Data MiningIRJET Journal
 
Anomaly detection via eliminating data redundancy and rectifying data error i...
Anomaly detection via eliminating data redundancy and rectifying data error i...Anomaly detection via eliminating data redundancy and rectifying data error i...
Anomaly detection via eliminating data redundancy and rectifying data error i...nalini manogaran
 
Final Report
Final ReportFinal Report
Final Reportimu409
 
Implementation of Prototype Based Credal Classification approach For Enhanced...
Implementation of Prototype Based Credal Classification approach For Enhanced...Implementation of Prototype Based Credal Classification approach For Enhanced...
Implementation of Prototype Based Credal Classification approach For Enhanced...IRJET Journal
 
Prediction of Default Customer in Banking Sector using Artificial Neural Network
Prediction of Default Customer in Banking Sector using Artificial Neural NetworkPrediction of Default Customer in Banking Sector using Artificial Neural Network
Prediction of Default Customer in Banking Sector using Artificial Neural Networkrahulmonikasharma
 
Ci2004-10.doc
Ci2004-10.docCi2004-10.doc
Ci2004-10.docbutest
 
13 分類とパターン認識
13 分類とパターン認識13 分類とパターン認識
13 分類とパターン認識Seiichi Uchida
 
IRJET-Performance Enhancement in Machine Learning System using Hybrid Bee Col...
IRJET-Performance Enhancement in Machine Learning System using Hybrid Bee Col...IRJET-Performance Enhancement in Machine Learning System using Hybrid Bee Col...
IRJET-Performance Enhancement in Machine Learning System using Hybrid Bee Col...IRJET Journal
 

Tendances (19)

How to crack down big data?
How to crack down big data? How to crack down big data?
How to crack down big data?
 
Opinion mining framework using proposed RB-bayes model for text classication
Opinion mining framework using proposed RB-bayes model for text classicationOpinion mining framework using proposed RB-bayes model for text classication
Opinion mining framework using proposed RB-bayes model for text classication
 
Ja3615721579
Ja3615721579Ja3615721579
Ja3615721579
 
5 クラスタリングと異常検出
5 クラスタリングと異常検出5 クラスタリングと異常検出
5 クラスタリングと異常検出
 
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data Mining
 
Probabilistic Interestingness Measures - An Introduction with Bayesian Belief...
Probabilistic Interestingness Measures - An Introduction with Bayesian Belief...Probabilistic Interestingness Measures - An Introduction with Bayesian Belief...
Probabilistic Interestingness Measures - An Introduction with Bayesian Belief...
 
7 主成分分析
7 主成分分析7 主成分分析
7 主成分分析
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining Process
 
Illustration of Medical Image Segmentation based on Clustering Algorithms
Illustration of Medical Image Segmentation based on Clustering AlgorithmsIllustration of Medical Image Segmentation based on Clustering Algorithms
Illustration of Medical Image Segmentation based on Clustering Algorithms
 
Associative Regressive Decision Rule Mining for Predicting Customer Satisfact...
Associative Regressive Decision Rule Mining for Predicting Customer Satisfact...Associative Regressive Decision Rule Mining for Predicting Customer Satisfact...
Associative Regressive Decision Rule Mining for Predicting Customer Satisfact...
 
IRJET - Employee Performance Prediction System using Data Mining
IRJET - Employee Performance Prediction System using Data MiningIRJET - Employee Performance Prediction System using Data Mining
IRJET - Employee Performance Prediction System using Data Mining
 
Anomaly detection via eliminating data redundancy and rectifying data error i...
Anomaly detection via eliminating data redundancy and rectifying data error i...Anomaly detection via eliminating data redundancy and rectifying data error i...
Anomaly detection via eliminating data redundancy and rectifying data error i...
 
Ho3313111316
Ho3313111316Ho3313111316
Ho3313111316
 
Final Report
Final ReportFinal Report
Final Report
 
Implementation of Prototype Based Credal Classification approach For Enhanced...
Implementation of Prototype Based Credal Classification approach For Enhanced...Implementation of Prototype Based Credal Classification approach For Enhanced...
Implementation of Prototype Based Credal Classification approach For Enhanced...
 
Prediction of Default Customer in Banking Sector using Artificial Neural Network
Prediction of Default Customer in Banking Sector using Artificial Neural NetworkPrediction of Default Customer in Banking Sector using Artificial Neural Network
Prediction of Default Customer in Banking Sector using Artificial Neural Network
 
Ci2004-10.doc
Ci2004-10.docCi2004-10.doc
Ci2004-10.doc
 
13 分類とパターン認識
13 分類とパターン認識13 分類とパターン認識
13 分類とパターン認識
 
IRJET-Performance Enhancement in Machine Learning System using Hybrid Bee Col...
IRJET-Performance Enhancement in Machine Learning System using Hybrid Bee Col...IRJET-Performance Enhancement in Machine Learning System using Hybrid Bee Col...
IRJET-Performance Enhancement in Machine Learning System using Hybrid Bee Col...
 

En vedette

6. merchant qa advanced health
6. merchant qa   advanced health6. merchant qa   advanced health
6. merchant qa advanced healthMoreNiche
 
Conversion Presentation
Conversion PresentationConversion Presentation
Conversion PresentationMoreNiche
 
Certificates
CertificatesCertificates
Certificatescondor46
 
7. mastering wordpress
7. mastering wordpress7. mastering wordpress
7. mastering wordpressMoreNiche
 
Fore! (-teen below)
Fore! (-teen below) Fore! (-teen below)
Fore! (-teen below) rmschmidt9
 
Proactol Afiliate Presentation
Proactol Afiliate PresentationProactol Afiliate Presentation
Proactol Afiliate PresentationMoreNiche
 
5. get the most from google analytics
5. get the most from google analytics5. get the most from google analytics
5. get the most from google analyticsMoreNiche
 
Воронка продаж Web 3.0
Воронка продаж Web 3.0Воронка продаж Web 3.0
Воронка продаж Web 3.0Viktor Kharchevskyi
 
тренды в интернет-маркетинге
тренды в интернет-маркетингетренды в интернет-маркетинге
тренды в интернет-маркетингеViktor Kharchevskyi
 

En vedette (14)

6. merchant qa advanced health
6. merchant qa   advanced health6. merchant qa   advanced health
6. merchant qa advanced health
 
Conversion Presentation
Conversion PresentationConversion Presentation
Conversion Presentation
 
Certificates
CertificatesCertificates
Certificates
 
7. mastering wordpress
7. mastering wordpress7. mastering wordpress
7. mastering wordpress
 
The Giveaway Cafe
The Giveaway CafeThe Giveaway Cafe
The Giveaway Cafe
 
Avsorchards
AvsorchardsAvsorchards
Avsorchards
 
Fore! (-teen below)
Fore! (-teen below) Fore! (-teen below)
Fore! (-teen below)
 
Proactol Afiliate Presentation
Proactol Afiliate PresentationProactol Afiliate Presentation
Proactol Afiliate Presentation
 
5. get the most from google analytics
5. get the most from google analytics5. get the most from google analytics
5. get the most from google analytics
 
Skills expo v03
Skills expo v03Skills expo v03
Skills expo v03
 
ABA Environmental Technologies
ABA Environmental TechnologiesABA Environmental Technologies
ABA Environmental Technologies
 
Euclid Network
Euclid NetworkEuclid Network
Euclid Network
 
Воронка продаж Web 3.0
Воронка продаж Web 3.0Воронка продаж Web 3.0
Воронка продаж Web 3.0
 
тренды в интернет-маркетинге
тренды в интернет-маркетингетренды в интернет-маркетинге
тренды в интернет-маркетинге
 

Similaire à Kenett on info q and pse

How Data Scientists Make Reliable Decisions with Data
How Data Scientists Make Reliable Decisions with DataHow Data Scientists Make Reliable Decisions with Data
How Data Scientists Make Reliable Decisions with DataTa-Wei (David) Huang
 
Interactive informationretrieval 토인모_201202
Interactive informationretrieval 토인모_201202Interactive informationretrieval 토인모_201202
Interactive informationretrieval 토인모_201202Jungah Park
 
Making an impact with data science
Making an impact  with data scienceMaking an impact  with data science
Making an impact with data scienceJordan Engbers
 
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)Galit Shmueli
 
Knowledge Management in the AI Driven Scintific System
Knowledge Management in the AI Driven Scintific SystemKnowledge Management in the AI Driven Scintific System
Knowledge Management in the AI Driven Scintific SystemSubhasis Dasgupta
 
Research design decisions and be competent in the process of reliable data co...
Research design decisions and be competent in the process of reliable data co...Research design decisions and be competent in the process of reliable data co...
Research design decisions and be competent in the process of reliable data co...Stats Statswork
 
Analytics and reporting context linkedin final
Analytics and reporting context linkedin finalAnalytics and reporting context linkedin final
Analytics and reporting context linkedin finalDennis Crow
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfDr. Radhey Shyam
 
Selecting Experts Using Data Quality Concepts
Selecting Experts Using Data Quality ConceptsSelecting Experts Using Data Quality Concepts
Selecting Experts Using Data Quality Conceptsijdms
 
Review of the Implications of Uploading Unverified Dataset in A Data Banking ...
Review of the Implications of Uploading Unverified Dataset in A Data Banking ...Review of the Implications of Uploading Unverified Dataset in A Data Banking ...
Review of the Implications of Uploading Unverified Dataset in A Data Banking ...ssuser793b4e
 
Tech sem 2_dilip
Tech sem 2_dilipTech sem 2_dilip
Tech sem 2_dilipDilip Kolli
 
New Data Science Framework for Analysing and Mining Big Data - Charith Silva
New Data Science Framework for Analysing and Mining Big Data - Charith SilvaNew Data Science Framework for Analysing and Mining Big Data - Charith Silva
New Data Science Framework for Analysing and Mining Big Data - Charith SilvaInstitute of Contemporary Sciences
 
AstraZeneca at Neo4j GraphSummit London 14Nov23.pptx
AstraZeneca at Neo4j GraphSummit London 14Nov23.pptxAstraZeneca at Neo4j GraphSummit London 14Nov23.pptx
AstraZeneca at Neo4j GraphSummit London 14Nov23.pptxNeo4j
 
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)Gianluca Tarasconi
 
City of hope research informatics common data elements
City of hope research informatics common data elementsCity of hope research informatics common data elements
City of hope research informatics common data elementsAbdul-Malik Shakir
 
Fundamentals of Data science Introduction Unit 1
Fundamentals of Data science Introduction Unit 1Fundamentals of Data science Introduction Unit 1
Fundamentals of Data science Introduction Unit 1sasi
 
Reproducibility in human cognitive neuroimaging: a community-­driven data sha...
Reproducibility in human cognitive neuroimaging: a community-­driven data sha...Reproducibility in human cognitive neuroimaging: a community-­driven data sha...
Reproducibility in human cognitive neuroimaging: a community-­driven data sha...Nolan Nichols
 

Similaire à Kenett on info q and pse (20)

Kenett On Information NYU-Poly 2013
Kenett On Information NYU-Poly 2013Kenett On Information NYU-Poly 2013
Kenett On Information NYU-Poly 2013
 
How Data Scientists Make Reliable Decisions with Data
How Data Scientists Make Reliable Decisions with DataHow Data Scientists Make Reliable Decisions with Data
How Data Scientists Make Reliable Decisions with Data
 
Interactive informationretrieval 토인모_201202
Interactive informationretrieval 토인모_201202Interactive informationretrieval 토인모_201202
Interactive informationretrieval 토인모_201202
 
Making an impact with data science
Making an impact  with data scienceMaking an impact  with data science
Making an impact with data science
 
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
 
Data literacy
Data literacyData literacy
Data literacy
 
Knowledge Management in the AI Driven Scintific System
Knowledge Management in the AI Driven Scintific SystemKnowledge Management in the AI Driven Scintific System
Knowledge Management in the AI Driven Scintific System
 
Research design decisions and be competent in the process of reliable data co...
Research design decisions and be competent in the process of reliable data co...Research design decisions and be competent in the process of reliable data co...
Research design decisions and be competent in the process of reliable data co...
 
ml-03x01.pdf
ml-03x01.pdfml-03x01.pdf
ml-03x01.pdf
 
Analytics and reporting context linkedin final
Analytics and reporting context linkedin finalAnalytics and reporting context linkedin final
Analytics and reporting context linkedin final
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdf
 
Selecting Experts Using Data Quality Concepts
Selecting Experts Using Data Quality ConceptsSelecting Experts Using Data Quality Concepts
Selecting Experts Using Data Quality Concepts
 
Review of the Implications of Uploading Unverified Dataset in A Data Banking ...
Review of the Implications of Uploading Unverified Dataset in A Data Banking ...Review of the Implications of Uploading Unverified Dataset in A Data Banking ...
Review of the Implications of Uploading Unverified Dataset in A Data Banking ...
 
Tech sem 2_dilip
Tech sem 2_dilipTech sem 2_dilip
Tech sem 2_dilip
 
New Data Science Framework for Analysing and Mining Big Data - Charith Silva
New Data Science Framework for Analysing and Mining Big Data - Charith SilvaNew Data Science Framework for Analysing and Mining Big Data - Charith Silva
New Data Science Framework for Analysing and Mining Big Data - Charith Silva
 
AstraZeneca at Neo4j GraphSummit London 14Nov23.pptx
AstraZeneca at Neo4j GraphSummit London 14Nov23.pptxAstraZeneca at Neo4j GraphSummit London 14Nov23.pptx
AstraZeneca at Neo4j GraphSummit London 14Nov23.pptx
 
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
 
City of hope research informatics common data elements
City of hope research informatics common data elementsCity of hope research informatics common data elements
City of hope research informatics common data elements
 
Fundamentals of Data science Introduction Unit 1
Fundamentals of Data science Introduction Unit 1Fundamentals of Data science Introduction Unit 1
Fundamentals of Data science Introduction Unit 1
 
Reproducibility in human cognitive neuroimaging: a community-­driven data sha...
Reproducibility in human cognitive neuroimaging: a community-­driven data sha...Reproducibility in human cognitive neuroimaging: a community-­driven data sha...
Reproducibility in human cognitive neuroimaging: a community-­driven data sha...
 

Dernier

Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Delhi Call girls
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...Paul Menig
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLSeo
 
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyThe Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyEthan lee
 
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...Aggregage
 
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756dollysharma2066
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Serviceritikaroy0888
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Dave Litwiller
 
John Halpern sued for sexual assault.pdf
John Halpern sued for sexual assault.pdfJohn Halpern sued for sexual assault.pdf
John Halpern sued for sexual assault.pdfAmzadHosen3
 
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...amitlee9823
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdfRenandantas16
 
RSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors DataRSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors DataExhibitors Data
 
Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communicationskarancommunications
 
Regression analysis: Simple Linear Regression Multiple Linear Regression
Regression analysis:  Simple Linear Regression Multiple Linear RegressionRegression analysis:  Simple Linear Regression Multiple Linear Regression
Regression analysis: Simple Linear Regression Multiple Linear RegressionRavindra Nath Shukla
 
How to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityHow to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityEric T. Tung
 
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...Any kyc Account
 

Dernier (20)

Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
 
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyThe Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
 
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
 
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Service
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
 
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
 
John Halpern sued for sexual assault.pdf
John Halpern sued for sexual assault.pdfJohn Halpern sued for sexual assault.pdf
John Halpern sued for sexual assault.pdf
 
Forklift Operations: Safety through Cartoons
Forklift Operations: Safety through CartoonsForklift Operations: Safety through Cartoons
Forklift Operations: Safety through Cartoons
 
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabiunwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
 
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
 
Mifty kit IN Salmiya (+918133066128) Abortion pills IN Salmiyah Cytotec pills
Mifty kit IN Salmiya (+918133066128) Abortion pills IN Salmiyah Cytotec pillsMifty kit IN Salmiya (+918133066128) Abortion pills IN Salmiyah Cytotec pills
Mifty kit IN Salmiya (+918133066128) Abortion pills IN Salmiyah Cytotec pills
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
 
RSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors DataRSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors Data
 
Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communications
 
Regression analysis: Simple Linear Regression Multiple Linear Regression
Regression analysis:  Simple Linear Regression Multiple Linear RegressionRegression analysis:  Simple Linear Regression Multiple Linear Regression
Regression analysis: Simple Linear Regression Multiple Linear Regression
 
How to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityHow to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League City
 
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
 

Kenett on info q and pse

  • 1. On InfoQ and PSE: A brief introduction Ron S. Kenett KPA Ltd., Raanana, Israel and University of Torino, Torino, Italy ron@kpa.co.il 1
  • 2. Introduction This presentation is about doing the right research with statistical methods, the right way - we call it Quality Research. Research is a critical activity leading to knowledge acquisition and formulation of policies and management decisions. By effective research we mean research that produces an impact, as intended by decision makers. One measure of effective research is Information Quality (InfoQ), an approach developed by Kenett and Shmueli (2009) to assess Information Quality. Practical Statistical Efficiency (PSE) is assessing the level of implementation of the research recommendations (Kenett, Coleman and Stewardson, 2003). 2
  • 3. Information Quality (InfoQ) Are we doing the right research? 3
  • 4. Information Quality (InfoQ) Knowledge Goals Information Quality Data Analysis Quality Quality Primary Data Secondary Data Kenett, R. abd Shmueli, G., “On Information Quality”, - Experimental - Experimental - Observational - Observational http://ssrn.com/abstract=1464444, 2009. 4
  • 5. Practical Statistical Efficiency (PSE) Are our research recommendations having an impact? 5
  • 6. Practical Statistical Efficiency (PSE) PSE = E{R} x T {I} x P {I} x V {PS} x P {S} x V {P} x V {M} x V {D} • V{D} = value of the data actually collected • V{M} = value of the statistical method employed • V{P} = value of the problem to be solved • P{S} = probability that the problem actually gets solved • V{PS} = value of the problem being solved • P{I} = probability the solution is actually implemented • T{I} = time the solution stays implemented • E{R} = expected number of replications Kenett, R.S., Coleman, S.Y. and Stewardson, D. (2003), “Statistical Efficiency: The Practical Perspective”, Quality and Reliability Engineering International, 19: 265-272. 6
  • 7. Information Quality (InfoQ) Knowledge Goals Information Quality Data Analysis Quality Quality 7
  • 8. Information Quality (InfoQ) 1. Data resolution 2. Data structure 3. Data integration 4. Temporal relevance 5. Sampling bias 6. Chronology of data and goal 7. Concept operationalization 8. Communication and data visualization 8
  • 9. The InfoQ Suisse Cheese Model Sampling Concept bias operationalization Communication and data visualization Chronology of data and goal Data resolution Data structure Data Temporal integration relevance 9
  • 10. InfoQ1: Data Resolution • Two aspects of data resolution are measurement scale and data aggregation. • The measurement scale of the data must be adequate for the purpose of the study. • The level of aggregation of the data relative to the task at hand. For example, consider data on daily purchases of over-the-counter medications at a large pharmacy. If the goal of the analysis is to forecast future inventory levels of different medications, when re-stocking is done on a weekly basis, then we would prefer weekly aggregate data to daily aggregate data. 10
  • 11. InfoQ2: Data Structure • The data can combine structured quantitative data with unstructured, semantic based data. • For example, in assessing the reputation of an organization one might combine data derived from balance sheets with data mined from text such as newspaper archives or press reports. 11
  • 12. InfoQ3: Data Integration • Knowledge is often spread out across multiple data sources. • Hence, identifying the different relevant sources, collecting the relevant data, and integrating the data, directly affect information quality. 12
  • 13. InfoQ4: Temporal Relevance • A data set contains information collected during a certain period of time. The degree of relevance of the data to the current goal at hand must be assessed. • For instance, in order to learn about current online shopping behaviors, a dataset that records online purchase behavior (such as Comscore data (www.comscore.com)) can be irrelevant if it is even several years old, because of the fast changing online shopping environment. 13
  • 14. InfoQ5: Chronology of Data and Goal • A data set contains daily weather information for a particular city for a certain period as well as information on the Air Quality Index (AQI) on those days. • For the United States such data are publicly available from the National Oceanic and Atmospheric Administration website (http://www.noaa.gov). To assess the quality of the information contained in this data set, we must consider the purpose of the analysis. • Although AQI is widely used (for instance, for issuing a “code red” day), how it is computed is not easy to figure out. One analysis goal might therefore be to find out how AQI is computed from weather data (by reverse-engineering). For such a purpose, this data is likely to contain high quality information. In contrast, if the goal is to predict future AQI levels, then the data on past temperatures contains low- quality information. 14
  • 15. InfoQ6: Sampling Bias • A clear definition of the population of interest and how the sample relates to that population is necessary in both primary and secondary analyses. • Dealing with sampling bias can be proactive or retroactive. In studies where there is control over the design (e.g., surveys), sampling schemes are selected to reduce bias. Such methods do not apply to retrospective studies. However, retroactive measures such as post-stratification weighting, which are often used in survey analysis, can be useful in secondary studies as well. 15
  • 16. InfoQ7: Concept Operationalization • Observable data are an operationalization of underlying concepts. “Anger” can be measured via a questionnaire or by measuring blood pressure; “economic prosperity” can be measured via income or by unemployment rate; and “length” can be measured in centimeters or in inches. • The role of concept operationalization is different for explanatory, predictive, and descriptive goals,. 16
  • 17. InfoQ8: Communication and Data Visualization • If crucial information does not reach the right person at the right time, then the quality of information becomes poor. • Data visualization is also directly related to the quality of information. Poor visualization can lead to degradation of the information contained in the data. 17
  • 18. The InfoQ Score For each measure, Yi(x) is defined as a univariate desirability function di(Yi) which assigns numbers between 0 and 1 to the possible values of Yi, with di(Yi)=0 representing a completely undesirable value of Yi and di(Yi)=1 representing a completely desirable or ideal response value. The individual desirabilities are then combined to an overall desirability index using the geometric mean of the individual desirabilities: Desirability Function = [(d1(Y1) x d2(Y2))x … dk(Yk))]1/k with k denoting the number of measures. Notice that if any response Yi is completely undesirable (di(Yi) = 0), then the overall desirability is zero. We use the Desirability Function to compute an InfoQ Score based on an assessment of indicators reflecting the 8 InfoQ dimensions. Derringer, G., and Suich, R., (1980), "Simultaneous Optimization of Several Response Variables," Journal of Quality Technology, 12, 4, 214-219. Harrington, E. C. (1965). The desirability function. Industrial Quality Control, 21, 494-498 18
  • 19. The InfoQ Score InfoQ Score = [(d1(Y1) x d2(Y2))x …d8(Y8))]1/8 1. Data resolution 2. Data structure 3. Data integration 4. Temporal relevance 5. Sampling bias 6. Chronology of data and goal 7. Concept operationalization 5 1 2 4 6 3 7 8 8. Communication and data visualization The lower The higher On target the better the better 19
  • 20. Practical Statistical Efficiency (PSE) PSE = E{R} x T {I} x P {I} x V {PS} x P {S} x V {P} x V {M} x V {D} • V{D} = value of the data actually collected • V{M} = value of the statistical method employed • V{P} = value of the problem to be solved • P{S} = probability that the problem actually gets solved • V{PS} = value of the problem being solved • P{I} = probability the solution is actually implemented • T{I} = time the solution stays implemented • E{R} = expected number of replications 20
  • 21. V{D} = value of the data actually collected PSE = E{R} x T {I} x P {I} x V {PS} x P {S} x V {P} x V {M} x V {D} Readily accessible data, is like observations below the lamppost where there is light - not necessarily where you lost your key or where the answer to your problem lies 21
  • 22. V{M} = value of the statistical method employed PSE = E{R} x T {I} x P {I} x V {PS} x P {S} x V {P} x V {M} x V {D} A mathematical definition of statistical efficiency is given by: Relative Efficiency of Test A versus Test B = Ratio of sample size for test A to sample size for test B, where sample sizes are determined so that both tests reach a certain power against the same alternative. 22
  • 23. V{P} = value of the problem to be solved PSE = E{R} x T {I} x P {I} x V {PS} x P {S} x V {P} x V {M} x V {D} Statisticians too often forget this part of the equation. We frequently choose problems to be solved on the basis of their statistical interest rather than the value of solving them. 23
  • 24. P{S} = probability that the problem actually gets solved PSE = E{R} x T {I} x P {I} x V {PS} x P {S} x V {P} x V {M} x V {D} Usually no one method or attempt actually solves the entire problem, only part of it. So this part of the equation could be expressed as a fraction 24
  • 25. V{PS} = value of the problem being solved PSE = E{R} x T {I} x P {I} x V {PS} x P {S} x V {P} x V {M} x V {D} This is both a statistical question and a management question. Did the method work and lead to a solution that worked and were the data, information and resources available to solve the problem? 25
  • 26. P{I} = probability the solution is actually implemented PSE = E{R} x T {I} x P {I} x V {PS} x P {S} x V {P} x V {M} x V {D} Here is the non-statistical part of the equation that is often the most difficult to evaluate. Implementing the solution may be far harder than just coming up with the solution. 26
  • 27. T{I} = time the solution stays implemented PSE = E{R} x T {I} x P {I} x V {PS} x P {S} x V {P} x V {M} x V {D} Problems have the tendency not to stay solved. This is why we need to put much emphasis on holding the gains in any process improvement. 27
  • 28. E{R} = expected number of replications PSE = E{R} x T {I} x P {I} x V {PS} x P {S} x V {P} x V {M} x V {D} This is the part most often missed in companies. If the basic idea of the solution could be replicated in other areas of the company, the savings could be enormous. 28
  • 29. The Quality Ladder: Matching Management Approach with Statistical Methods Quality by Design Design of Experiments Process Improvement Statistical Process Control Inspection Sampling Fire Fighting Data Accumulation Kenett, R. and Zacks S., Modern Industrial Statistics: Design and Control of Quality and Reliability (with S. Zacks), Duxbury Press, San Francisco, 1998, Spanish edition 2002, 2nd paperback edition 2002, Chinese edition 2004. 29
  • 30. The Statistical Efficiency Conjecture Let PSE = PSE of a specific project and L= the maturity level of an organization on the Quality Ladder (L=1,…4). PSE is a random variable with specific realisations for individual projects. E{ PSE } = The expected value of PSE in a given organisation over all projects. The Statistical Efficiency Conjecture is linking Expected Practical Statistical Efficiency with the maturity of an organisation on the Quality Ladder. In more formal terms it is stated as: Conditioned on the right variable, E{ PSE } is an increasing function of L We partially demonstrated this with 21 case studies Kenett, R., De Frenne, A., Tort-Martorell, X and McCollin, C., The Statistical Efficiency Conjecture, Chapter 4 in Applying Statistical Methods in Business and Industry – the state of the art , Coleman S., Greenfield, T. and Montgomery, D. (editors), John 30 Wiley and Sons, 2008.