SlideShare une entreprise Scribd logo
1  sur  32
DATA
Data Data – Input for Analysis and Interpretation Data are generally collected as a basis for action You must always use some method of analysis to extract and interpret the information that lies in the data The type of data that has been collected will determine the type of statistics or analysis that can be performed Making sense of the data is a process in itself Always provide a “context” for data Data has no meaning apart for their context Data should always be presented in such a way that preserves the evidence in the data for all the predictions that might be made from these data
Data - 2 Data should be completely and fully described Who collected the data? How were the data collected? When were the data collected? Where were the data collected? What do these values represent? If the data are computed values, how were the values computed from the raw inputs?
Data - 3 Variation exists in all data and consists of both noise (random or common cause variation) and signal (nonrandom or special cause variation) Without formal and standardized approaches for analyzing data, you may have difficulty interpreting and using your measurement results When you interpret and act on measurement results, you are presuming that the measurements represent reality
Data - 4 To use data safely, you must have simple and effective methods not only for detecting signals that are surrounded by noise,  but also for recognizing and dealing with normal process variations when there are no signals present Drawing conclusions and predictions from data depends not only on using appropriate analytical methods and tools,  but also on understanding the underlying nature of the data and the appropriateness of assumptions about the conditions and environments in which the data were obtained
Data Definitions Categorical vs. Quantitative Variables - Variables can be classified as categorical (aka, qualitative) or quantitative (aka, numerical) Categorical - Categorical variables take on values that are names or labels. The color of a ball (e.g., red, green, blue) or the breed of a dog (e.g., collie, shepherd, terrier) would be examples of categorical variables.  Quantitative - Quantitative variables are numerical. They represent a measurable quantity.  For example, when we speak of the population of a city, we are talking about the number of people in the city - a measurable attribute of the city. Therefore, population would be a quantitative variable
Data Definitions - 2 Discrete vs. Continuous Variables - Quantitative variables can be further classified as discrete or continuous.  If a variable can take on any value between two specified values, it is called a continuous variable; otherwise, it is called a discrete variable.  Examples to clarify the difference between discrete and continuous variables. Suppose the fire department mandates that all fire fighters must weigh between 150 and 250 pounds. The weight of a fire fighter would be an example of a continuous variable; since a fire fighter's weight could take on any value between 150 and 250 pounds.  Suppose we flip a coin and count the number of heads. The number of heads could be any integer value between 0 and plus infinity. However, it could not be any number between 0 and plus infinity. We could not, for example, get 2.5 heads. Therefore, the number of heads must be a discrete variable.
Attributes Data vs. Variables Data
Variables Data Variables data is measured and plotted on a continuous scale With variables data, an actual numeric estimate is derived for one or more characteristics of the population being sampled such as: Time Temperature Length Weight Height Volume Voltage Horsepower Torque Speed Cost
Variables Data - 2 In software, examples of variables data include: Effort expended - (Number of hours, days, weeks, years, etc., that have been expended by a workforce member on an identified topic) Years of experience - (Total number of years of experience per category) Memory utilization - (% of total memory available) CPU utilization - (% of CPU used at any given moment in time) Cost of rework - (Dollars and cents calculation of the rework based on the effort put forth by anyone involved in the finding and fixing of reported problems)
“Counts” Could Be Treated as Variables Data There are many situations where “counts” get used as measures of size: Total number of requirements Total lines of code Total bubbles in a data-flow diagram Customer sites Change requests received Total people assigned to a project When we count these things, we are counting all the entities in a population, not just the occurrence of entities with specific attributes These should always be treated as “variables” data even though they are instances of discrete counts
Attributes Data When working with attributes data, the focus is on learning about one or more specific non-numerical characteristics of the population being sampled When attributes data are used for direct comparisons, they must be based on consistent “areas of opportunity” if the comparisons are to be meaningful If the number of defects that are likely to be observed depends on the size (lines of code)  of a module or component, all sizes must be nearly equal If the probabilities associated with defect discovery depend on the time spent on inspecting or testing  the elapsed time spent must be nearly equal
Attributes Data - 2 In general, when the areas of opportunity for observing a specific event are not equal or nearly so, the chances of observing the event will differ across the observations Then we must normalize (convert to rates) by dividing each count by its area of opportunity before valid comparisons are made Conditions that make us willing to assume constant areas of opportunity seem to be less in software environments Normalization is almost always needed for software!
Attributes Data - 3 Example:  If the defects are being counted and the size of an item inspected influences the number of defects found, some measure of item size will also be needed to convert defect counts to relative rates that can be compared in meaningful ways (defects per lines of code) If the variations in the amount of time spent inspecting or testing can influence the number of defects found, these times should be clearly defined and measured as well
Attributes Data - 4 One of the keys to making effective use of attributes data lies in preserving the ordering of each count in space and time Sequence information (the order in time or space in which the data is collected) is almost always needed to correctly interpret counts of attributes Make the counts specific – Make sure there is an operational definition (clear set of rules and procedures) for recognizing an attribute or entity if what gets counted is to be what the user of the data expects the data to be
Attributes Data - 5 Attributes data is counted and plotted as discrete events: Shipping errors Percentage waste Number of defects found Number of defective items Number of source statements of a given type Number of lines of comments in a module of n lines Number of people with certain skills on a project Percentage of projects using formal inspections Team size Elapsed time between milestones Staff hours logged per task Backlog Number of priority-one customer complaints Percentage of non-conforming products in the output of an activity or a process
The Key to Classifying Data The key to classifying data as attributes data or variables data depends not so much on whether the data are discrete or continuous, but on how they are collected and used The total number of defects found is often used as a measure of the amount of rework or retesting to be performed It is viewed as a measure of size and treated as variables data It is normally used as a count based on attributes The method of analysis you choose for any data will depend on: The questions you are asking The data distribution model you have in mind The assumptions you are willing to make with respect to the nature of the data (Page 79)
Data Type Classifications Discrete Continuous
Distributional ModelsRelationship to Chart Types Each type of chart is related to a set of assumptions (a distributional model) that must hold for that type of chart to be valid.  There are six types of charts for “attributes data” NP P C U XmR for counts XmR for rates
XmR charts have an advantage over np, p, c, and u charts in that they require fewer and less stringent assumptions They are easier to plat and use They have wide applicability Recommended by many quality-control professionals When assumptions of the distributional model are met, however, the more specialized np, p, c, and u charts can give better bounds for control limits and can offer advantages Distributional Models Relationship to Chart Types - 2
Distributional ModelsRelationship to Chart Types - 3 NP Chart – An np chart is used when the count data are binomially distributed and all samples have equal areas of opportunity These conditions occur in manufacturing settings – when there is 100% of lots of size n (n is constant) and the number of defective units in each lot is recorded P Chart – a p chart is used when the data are binomially distributed but the areas of opportunity vary from sample to sample A p chart could be appropriate if the lot size n were to change from lot to lot
Distributional ModelsRelationship to Chart Types - 4 C Chart – a c chart is used when the count data are samples from a Poisson distribution and the samples all have equal-sized areas of opportunity U Chart – a u chart is used in place of a c chart when the count data are samples from a Poisson distribution and the areas of opportunity are not constant Defects per thousand lines of code is an example for software NP, P, C and U charts are the traditional control charts used with attributes data XmR Chart – Useful when little is known about the underlying distribution of when the justification for assuming a binomial or Poisson process is questionable Almost always a reasonable choice
Distributional ModelsRelationship to Chart Types - 5 More About U Charts – U charts seem to have the greatest prospects for use in software settings U charts require normalization (conversion to rates) when the areas of opportunity are not constant  Poisson might be appropriate when counting the number of defects in modules during inspection or testing Defects per thousand lines of source code is an example of attributes data that is a candidate for u charts Although u charts may be appropriate for studying software defect densities in an operational environment, we are not aware of any empirical studies that have generally validated the use of Poisson models for nonoperational environments such as inspections
Distributional ModelsRelationship to Chart Types - 6 Defects per module or defects per test are unlikely candidates for u charts, c charts, or any other charts for that matter The ratios are not based on equal areas of opportunity – Can’t be normalized There is no reason to expect them to be constant across all modules or tests when the process is in statistical control
Distributional ModelsRelationship to Chart Types - 7 If you are uncertain as to the model that applies, it can make sense to use more than one set of charts If you think you may have a Poisson situation but are not sure that all conditions for a Poisson process are present, then plotting both a u chart and the corresponding XmR charts should bracket the situation If both charts point to the same conclusions, you are unlikely to be led astray If the conclusions differ, then you should investigate your assumptions or the events
Presenting Data While it is simple and easy to compare one number with another, such comparisons are limited and weak Limited because the small amount of data used Weak because both of the numbers are subject to variation This makes it difficult to determine just how much of the differences between the values is due to variation in numbers and how much is due to real changes in the process
Presenting Data - 2 Graphs – there are two basic graphs that are the most helpful is providing the context for interpreting the current value Time series graph (Run Chart) Have months or years marked off on the horizontal axis and possible values marked off on the vertical axis As you move from left to right, there is a passage of time By visually comparing the current value with the plotted values for the preceding months you can quickly see if the current value is unusual or not Histogram (Tally Plot) An accumulation of the different values as they occur without trying to display the time order sequence
Run Charts Number of Required Changes to a Module  as the Project Approaches Systems Test Syntax Check Desk Check Code Review Unit Test Integration and Test Systems Test
20 18 16 14 12 10 Number of Days 8 6 4 2 0 32 56 48 46 44 42 40 38 36 54 52 50 34 Product – Service Staff Hours Histograms
                                                                                                                                                      PROCESS CONTROL CHART TYPE:  METRIC: A point above or below the control lines  suggests that the measurement has a special preventable or removable cause Upper Control Limit (UCL)   The chart is used for continuous  and time control of  the process  and prevention of causes     Upper and Lower Control Limits represent  the natural variation In the process Center Line (CL) (Mean of data used to set up the chart) The chart is analyzed using   standard Rules to define the         control status of the process           Plotted points are either individual measurements or the means of small groups of measurements         Lower Control Limit (LCL)     Data relating to the process Statistical Methods for Software Quality Adrian Burr – Mal Owen, 1996 Numerical data taken in time sequence
Impacts of Poor Data Quality Inability to conduct hypothesis and predictive modeling Inability to manage the quality and performance software or application development Ineffective process change instead of process improvement Ineffective and inefficient testing causing issues with time to market, field quality, and development costs Products that are costly to use within real-life usage profiles
References Brassard, Michael & Ritter, Diane, The Memory Jogger II – A Pocket Guide of Tools for Continuous Improvement & Effective Planning, GOAL/QPC, Salem, New Hampshire, 1994 Florac, W.A. & Carleton, A.D. Measuring the Software Process Addison-Wesley, 1999 Six Sigma Academy, The Black Belt Memory Jogger – A Pocket Guide for Six Sigma Success, GOAL/QPC, Salem, New Hampshire, 2002 Wheeler, Donald J. Understanding Variation: The Key to Managing Chaos, Knoxville, Tennessee: SPC Press, 2000

Contenu connexe

Tendances

On Statistical Analysis and Optimization of Information Retrieval Effectivene...
On Statistical Analysis and Optimization of Information Retrieval Effectivene...On Statistical Analysis and Optimization of Information Retrieval Effectivene...
On Statistical Analysis and Optimization of Information Retrieval Effectivene...Jun Wang
 
Aggregating Multiple Dimensions for Computing Document Relevance
Aggregating Multiple Dimensions for Computing Document RelevanceAggregating Multiple Dimensions for Computing Document Relevance
Aggregating Multiple Dimensions for Computing Document RelevanceJosé Ramón Ríos Viqueira
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive StatisticsCIToolkit
 
SOC2002 Lecture 11
SOC2002 Lecture 11SOC2002 Lecture 11
SOC2002 Lecture 11Bonnie Green
 
Portfolio Theory of Information Retrieval
Portfolio Theory of Information RetrievalPortfolio Theory of Information Retrieval
Portfolio Theory of Information RetrievalJun Wang
 
Exploratory data analysis project
Exploratory data analysis project Exploratory data analysis project
Exploratory data analysis project BabatundeSogunro
 
Knowledge discovery claudiad amato
Knowledge discovery claudiad amatoKnowledge discovery claudiad amato
Knowledge discovery claudiad amatoSSSW
 
QNT 275 qnt275 QNT275 Qnt 275 qnt275 QNT/275 STATISTICS FOR DECISION MAKING h...
QNT 275 qnt275 QNT275 Qnt 275 qnt275 QNT/275 STATISTICS FOR DECISION MAKING h...QNT 275 qnt275 QNT275 Qnt 275 qnt275 QNT/275 STATISTICS FOR DECISION MAKING h...
QNT 275 qnt275 QNT275 Qnt 275 qnt275 QNT/275 STATISTICS FOR DECISION MAKING h...UOPCourseHelp
 
Data mining approaches and methods
Data mining approaches and methodsData mining approaches and methods
Data mining approaches and methodssonangrai
 
SPSS statistics - get help using SPSS
SPSS statistics - get help using SPSSSPSS statistics - get help using SPSS
SPSS statistics - get help using SPSScsula its training
 

Tendances (12)

On Statistical Analysis and Optimization of Information Retrieval Effectivene...
On Statistical Analysis and Optimization of Information Retrieval Effectivene...On Statistical Analysis and Optimization of Information Retrieval Effectivene...
On Statistical Analysis and Optimization of Information Retrieval Effectivene...
 
Aggregating Multiple Dimensions for Computing Document Relevance
Aggregating Multiple Dimensions for Computing Document RelevanceAggregating Multiple Dimensions for Computing Document Relevance
Aggregating Multiple Dimensions for Computing Document Relevance
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
SOC2002 Lecture 11
SOC2002 Lecture 11SOC2002 Lecture 11
SOC2002 Lecture 11
 
Portfolio Theory of Information Retrieval
Portfolio Theory of Information RetrievalPortfolio Theory of Information Retrieval
Portfolio Theory of Information Retrieval
 
Exploratory data analysis project
Exploratory data analysis project Exploratory data analysis project
Exploratory data analysis project
 
Knowledge discovery claudiad amato
Knowledge discovery claudiad amatoKnowledge discovery claudiad amato
Knowledge discovery claudiad amato
 
Panel slides
Panel slidesPanel slides
Panel slides
 
QNT 275 qnt275 QNT275 Qnt 275 qnt275 QNT/275 STATISTICS FOR DECISION MAKING h...
QNT 275 qnt275 QNT275 Qnt 275 qnt275 QNT/275 STATISTICS FOR DECISION MAKING h...QNT 275 qnt275 QNT275 Qnt 275 qnt275 QNT/275 STATISTICS FOR DECISION MAKING h...
QNT 275 qnt275 QNT275 Qnt 275 qnt275 QNT/275 STATISTICS FOR DECISION MAKING h...
 
Data presenatation
Data presenatationData presenatation
Data presenatation
 
Data mining approaches and methods
Data mining approaches and methodsData mining approaches and methods
Data mining approaches and methods
 
SPSS statistics - get help using SPSS
SPSS statistics - get help using SPSSSPSS statistics - get help using SPSS
SPSS statistics - get help using SPSS
 

En vedette

Applied Math 40S March 12, 2008
Applied Math 40S March 12, 2008Applied Math 40S March 12, 2008
Applied Math 40S March 12, 2008Darren Kuropatwa
 
GOOGLE ANALYTICS by Donny BU
GOOGLE ANALYTICS by Donny BUGOOGLE ANALYTICS by Donny BU
GOOGLE ANALYTICS by Donny BUAkademi Berbagi
 
Ewil survey results
Ewil survey resultsEwil survey results
Ewil survey resultsImede
 
Computer data type and Terminologies
Computer data type and Terminologies Computer data type and Terminologies
Computer data type and Terminologies glyvive
 
Type of data @ Web Mining Discussion
Type of data @ Web Mining DiscussionType of data @ Web Mining Discussion
Type of data @ Web Mining DiscussionCherryBerry2
 
Using hoshin planning for six sigma project selection
Using hoshin planning for six sigma project selectionUsing hoshin planning for six sigma project selection
Using hoshin planning for six sigma project selectionEd Powers
 
Introduction to Statistics
Introduction to StatisticsIntroduction to Statistics
Introduction to StatisticsSaurav Shrestha
 
211344558 certified-six-sigma-black-belt-asq-cssbb (1)
211344558 certified-six-sigma-black-belt-asq-cssbb (1)211344558 certified-six-sigma-black-belt-asq-cssbb (1)
211344558 certified-six-sigma-black-belt-asq-cssbb (1)Saieesha Chitoori
 
Data structure,abstraction,abstract data type,static and dynamic,time and spa...
Data structure,abstraction,abstract data type,static and dynamic,time and spa...Data structure,abstraction,abstract data type,static and dynamic,time and spa...
Data structure,abstraction,abstract data type,static and dynamic,time and spa...Hassan Ahmed
 
Introduction to statistics 2013
Introduction to statistics 2013Introduction to statistics 2013
Introduction to statistics 2013Mohammad Ihmeidan
 
Introduction to Statistics - Basic Statistical Terms
Introduction to Statistics - Basic Statistical TermsIntroduction to Statistics - Basic Statistical Terms
Introduction to Statistics - Basic Statistical Termssheisirenebkm
 
Introduction to Statistics
Introduction to StatisticsIntroduction to Statistics
Introduction to Statisticsjasondroesch
 
Introduction to Statistics - Part 1
Introduction to Statistics - Part 1Introduction to Statistics - Part 1
Introduction to Statistics - Part 1Damian T. Gordon
 
There is an App for That! The Best iPad Apps for Teaching
There is an App for That! The Best iPad Apps for TeachingThere is an App for That! The Best iPad Apps for Teaching
There is an App for That! The Best iPad Apps for TeachingRafael Scapin, Ph.D.
 
Statistics lesson 1
Statistics   lesson 1Statistics   lesson 1
Statistics lesson 1Katrina Mae
 

En vedette (20)

Applied Math 40S March 12, 2008
Applied Math 40S March 12, 2008Applied Math 40S March 12, 2008
Applied Math 40S March 12, 2008
 
GOOGLE ANALYTICS by Donny BU
GOOGLE ANALYTICS by Donny BUGOOGLE ANALYTICS by Donny BU
GOOGLE ANALYTICS by Donny BU
 
Ewil survey results
Ewil survey resultsEwil survey results
Ewil survey results
 
Computer data type and Terminologies
Computer data type and Terminologies Computer data type and Terminologies
Computer data type and Terminologies
 
All Nationwide Internal Certs
All Nationwide Internal CertsAll Nationwide Internal Certs
All Nationwide Internal Certs
 
04 type of data
04 type of data04 type of data
04 type of data
 
Type of data @ Web Mining Discussion
Type of data @ Web Mining DiscussionType of data @ Web Mining Discussion
Type of data @ Web Mining Discussion
 
Using hoshin planning for six sigma project selection
Using hoshin planning for six sigma project selectionUsing hoshin planning for six sigma project selection
Using hoshin planning for six sigma project selection
 
Introduction to Statistics
Introduction to StatisticsIntroduction to Statistics
Introduction to Statistics
 
211344558 certified-six-sigma-black-belt-asq-cssbb (1)
211344558 certified-six-sigma-black-belt-asq-cssbb (1)211344558 certified-six-sigma-black-belt-asq-cssbb (1)
211344558 certified-six-sigma-black-belt-asq-cssbb (1)
 
Data structure,abstraction,abstract data type,static and dynamic,time and spa...
Data structure,abstraction,abstract data type,static and dynamic,time and spa...Data structure,abstraction,abstract data type,static and dynamic,time and spa...
Data structure,abstraction,abstract data type,static and dynamic,time and spa...
 
Introduction to statistics 2013
Introduction to statistics 2013Introduction to statistics 2013
Introduction to statistics 2013
 
Introduction to Statistics - Basic Statistical Terms
Introduction to Statistics - Basic Statistical TermsIntroduction to Statistics - Basic Statistical Terms
Introduction to Statistics - Basic Statistical Terms
 
Data type
Data typeData type
Data type
 
Rapport final 2
Rapport final 2Rapport final 2
Rapport final 2
 
Learning Six Sigma
Learning Six SigmaLearning Six Sigma
Learning Six Sigma
 
Introduction to Statistics
Introduction to StatisticsIntroduction to Statistics
Introduction to Statistics
 
Introduction to Statistics - Part 1
Introduction to Statistics - Part 1Introduction to Statistics - Part 1
Introduction to Statistics - Part 1
 
There is an App for That! The Best iPad Apps for Teaching
There is an App for That! The Best iPad Apps for TeachingThere is an App for That! The Best iPad Apps for Teaching
There is an App for That! The Best iPad Apps for Teaching
 
Statistics lesson 1
Statistics   lesson 1Statistics   lesson 1
Statistics lesson 1
 

Similaire à Data What Type Of Data Do You Have V2.1

Machine learning module 2
Machine learning module 2Machine learning module 2
Machine learning module 2Gokulks007
 
Module 4 data analysis
Module 4 data analysisModule 4 data analysis
Module 4 data analysisILRI-Jmaru
 
Chap_05_Data_Collection_and_Analysis.ppt
Chap_05_Data_Collection_and_Analysis.pptChap_05_Data_Collection_and_Analysis.ppt
Chap_05_Data_Collection_and_Analysis.pptRosaHildaFlix
 
How much does iso 9001 cost
How much does iso 9001 costHow much does iso 9001 cost
How much does iso 9001 costjondarita
 
Lu2 introduction to statistics
Lu2 introduction to statisticsLu2 introduction to statistics
Lu2 introduction to statisticsLamineKaba6
 
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHONUNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHONNandakumar P
 
Research methodology-Research Report
Research methodology-Research ReportResearch methodology-Research Report
Research methodology-Research ReportDrMAlagupriyasafiq
 
Research Methodology-Data Processing
Research Methodology-Data ProcessingResearch Methodology-Data Processing
Research Methodology-Data ProcessingDrMAlagupriyasafiq
 
Iso 9001 india
Iso 9001 indiaIso 9001 india
Iso 9001 indiajomjintra
 
Iso 9001 consultants uk
Iso 9001 consultants ukIso 9001 consultants uk
Iso 9001 consultants ukjondarita
 
QNT 275 Week 2 Apply: Connect Week 2 Case Qnt 275 qnt275 https://uopcourses.c...
QNT 275 Week 2 Apply: Connect Week 2 Case Qnt 275 qnt275 https://uopcourses.c...QNT 275 Week 2 Apply: Connect Week 2 Case Qnt 275 qnt275 https://uopcourses.c...
QNT 275 Week 2 Apply: Connect Week 2 Case Qnt 275 qnt275 https://uopcourses.c...NewUOPCourse
 
QNT 275 Week 4 Practice Connect Knowledge Check Qnt 275 qnt275 https://uopcou...
QNT 275 Week 4 Practice Connect Knowledge Check Qnt 275 qnt275 https://uopcou...QNT 275 Week 4 Practice Connect Knowledge Check Qnt 275 qnt275 https://uopcou...
QNT 275 Week 4 Practice Connect Knowledge Check Qnt 275 qnt275 https://uopcou...NewUOPCourse
 
QNT 275 Week 5 Practice Connect Knowledge Check Qnt 275 qnt275 https://uopcou...
QNT 275 Week 5 Practice Connect Knowledge Check Qnt 275 qnt275 https://uopcou...QNT 275 Week 5 Practice Connect Knowledge Check Qnt 275 qnt275 https://uopcou...
QNT 275 Week 5 Practice Connect Knowledge Check Qnt 275 qnt275 https://uopcou...NewUOPCourse
 
QNT 275 Week 4 Apply: Connect Week 4 Case Qnt 275 qnt275 https://uopcourses.c...
QNT 275 Week 4 Apply: Connect Week 4 Case Qnt 275 qnt275 https://uopcourses.c...QNT 275 Week 4 Apply: Connect Week 4 Case Qnt 275 qnt275 https://uopcourses.c...
QNT 275 Week 4 Apply: Connect Week 4 Case Qnt 275 qnt275 https://uopcourses.c...NewUOPCourse
 
QNT 275 Week 3 Practice: Connect Knowledge Check Qnt 275 qnt275 https://uopco...
QNT 275 Week 3 Practice: Connect Knowledge Check Qnt 275 qnt275 https://uopco...QNT 275 Week 3 Practice: Connect Knowledge Check Qnt 275 qnt275 https://uopco...
QNT 275 Week 3 Practice: Connect Knowledge Check Qnt 275 qnt275 https://uopco...NewUOPCourse
 
QNT 275 Week 1 Apply Connect Week 1 Exercise Qnt 275 qnt275 https://uopcourse...
QNT 275 Week 1 Apply Connect Week 1 Exercise Qnt 275 qnt275 https://uopcourse...QNT 275 Week 1 Apply Connect Week 1 Exercise Qnt 275 qnt275 https://uopcourse...
QNT 275 Week 1 Apply Connect Week 1 Exercise Qnt 275 qnt275 https://uopcourse...NewUOPCourse
 

Similaire à Data What Type Of Data Do You Have V2.1 (20)

Machine learning module 2
Machine learning module 2Machine learning module 2
Machine learning module 2
 
Module 4 data analysis
Module 4 data analysisModule 4 data analysis
Module 4 data analysis
 
Chap_05_Data_Collection_and_Analysis.ppt
Chap_05_Data_Collection_and_Analysis.pptChap_05_Data_Collection_and_Analysis.ppt
Chap_05_Data_Collection_and_Analysis.ppt
 
Stat-Lesson.pptx
Stat-Lesson.pptxStat-Lesson.pptx
Stat-Lesson.pptx
 
How much does iso 9001 cost
How much does iso 9001 costHow much does iso 9001 cost
How much does iso 9001 cost
 
Unit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdfUnit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdf
 
Unit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdfUnit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdf
 
Unit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdfUnit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdf
 
Lu2 introduction to statistics
Lu2 introduction to statisticsLu2 introduction to statistics
Lu2 introduction to statistics
 
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHONUNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
 
Research methodology-Research Report
Research methodology-Research ReportResearch methodology-Research Report
Research methodology-Research Report
 
Research Methodology-Data Processing
Research Methodology-Data ProcessingResearch Methodology-Data Processing
Research Methodology-Data Processing
 
Iso 9001 india
Iso 9001 indiaIso 9001 india
Iso 9001 india
 
Iso 9001 consultants uk
Iso 9001 consultants ukIso 9001 consultants uk
Iso 9001 consultants uk
 
QNT 275 Week 2 Apply: Connect Week 2 Case Qnt 275 qnt275 https://uopcourses.c...
QNT 275 Week 2 Apply: Connect Week 2 Case Qnt 275 qnt275 https://uopcourses.c...QNT 275 Week 2 Apply: Connect Week 2 Case Qnt 275 qnt275 https://uopcourses.c...
QNT 275 Week 2 Apply: Connect Week 2 Case Qnt 275 qnt275 https://uopcourses.c...
 
QNT 275 Week 4 Practice Connect Knowledge Check Qnt 275 qnt275 https://uopcou...
QNT 275 Week 4 Practice Connect Knowledge Check Qnt 275 qnt275 https://uopcou...QNT 275 Week 4 Practice Connect Knowledge Check Qnt 275 qnt275 https://uopcou...
QNT 275 Week 4 Practice Connect Knowledge Check Qnt 275 qnt275 https://uopcou...
 
QNT 275 Week 5 Practice Connect Knowledge Check Qnt 275 qnt275 https://uopcou...
QNT 275 Week 5 Practice Connect Knowledge Check Qnt 275 qnt275 https://uopcou...QNT 275 Week 5 Practice Connect Knowledge Check Qnt 275 qnt275 https://uopcou...
QNT 275 Week 5 Practice Connect Knowledge Check Qnt 275 qnt275 https://uopcou...
 
QNT 275 Week 4 Apply: Connect Week 4 Case Qnt 275 qnt275 https://uopcourses.c...
QNT 275 Week 4 Apply: Connect Week 4 Case Qnt 275 qnt275 https://uopcourses.c...QNT 275 Week 4 Apply: Connect Week 4 Case Qnt 275 qnt275 https://uopcourses.c...
QNT 275 Week 4 Apply: Connect Week 4 Case Qnt 275 qnt275 https://uopcourses.c...
 
QNT 275 Week 3 Practice: Connect Knowledge Check Qnt 275 qnt275 https://uopco...
QNT 275 Week 3 Practice: Connect Knowledge Check Qnt 275 qnt275 https://uopco...QNT 275 Week 3 Practice: Connect Knowledge Check Qnt 275 qnt275 https://uopco...
QNT 275 Week 3 Practice: Connect Knowledge Check Qnt 275 qnt275 https://uopco...
 
QNT 275 Week 1 Apply Connect Week 1 Exercise Qnt 275 qnt275 https://uopcourse...
QNT 275 Week 1 Apply Connect Week 1 Exercise Qnt 275 qnt275 https://uopcourse...QNT 275 Week 1 Apply Connect Week 1 Exercise Qnt 275 qnt275 https://uopcourse...
QNT 275 Week 1 Apply Connect Week 1 Exercise Qnt 275 qnt275 https://uopcourse...
 

Data What Type Of Data Do You Have V2.1

  • 2. Data Data – Input for Analysis and Interpretation Data are generally collected as a basis for action You must always use some method of analysis to extract and interpret the information that lies in the data The type of data that has been collected will determine the type of statistics or analysis that can be performed Making sense of the data is a process in itself Always provide a “context” for data Data has no meaning apart for their context Data should always be presented in such a way that preserves the evidence in the data for all the predictions that might be made from these data
  • 3. Data - 2 Data should be completely and fully described Who collected the data? How were the data collected? When were the data collected? Where were the data collected? What do these values represent? If the data are computed values, how were the values computed from the raw inputs?
  • 4. Data - 3 Variation exists in all data and consists of both noise (random or common cause variation) and signal (nonrandom or special cause variation) Without formal and standardized approaches for analyzing data, you may have difficulty interpreting and using your measurement results When you interpret and act on measurement results, you are presuming that the measurements represent reality
  • 5. Data - 4 To use data safely, you must have simple and effective methods not only for detecting signals that are surrounded by noise, but also for recognizing and dealing with normal process variations when there are no signals present Drawing conclusions and predictions from data depends not only on using appropriate analytical methods and tools, but also on understanding the underlying nature of the data and the appropriateness of assumptions about the conditions and environments in which the data were obtained
  • 6. Data Definitions Categorical vs. Quantitative Variables - Variables can be classified as categorical (aka, qualitative) or quantitative (aka, numerical) Categorical - Categorical variables take on values that are names or labels. The color of a ball (e.g., red, green, blue) or the breed of a dog (e.g., collie, shepherd, terrier) would be examples of categorical variables. Quantitative - Quantitative variables are numerical. They represent a measurable quantity. For example, when we speak of the population of a city, we are talking about the number of people in the city - a measurable attribute of the city. Therefore, population would be a quantitative variable
  • 7. Data Definitions - 2 Discrete vs. Continuous Variables - Quantitative variables can be further classified as discrete or continuous. If a variable can take on any value between two specified values, it is called a continuous variable; otherwise, it is called a discrete variable. Examples to clarify the difference between discrete and continuous variables. Suppose the fire department mandates that all fire fighters must weigh between 150 and 250 pounds. The weight of a fire fighter would be an example of a continuous variable; since a fire fighter's weight could take on any value between 150 and 250 pounds. Suppose we flip a coin and count the number of heads. The number of heads could be any integer value between 0 and plus infinity. However, it could not be any number between 0 and plus infinity. We could not, for example, get 2.5 heads. Therefore, the number of heads must be a discrete variable.
  • 8. Attributes Data vs. Variables Data
  • 9. Variables Data Variables data is measured and plotted on a continuous scale With variables data, an actual numeric estimate is derived for one or more characteristics of the population being sampled such as: Time Temperature Length Weight Height Volume Voltage Horsepower Torque Speed Cost
  • 10. Variables Data - 2 In software, examples of variables data include: Effort expended - (Number of hours, days, weeks, years, etc., that have been expended by a workforce member on an identified topic) Years of experience - (Total number of years of experience per category) Memory utilization - (% of total memory available) CPU utilization - (% of CPU used at any given moment in time) Cost of rework - (Dollars and cents calculation of the rework based on the effort put forth by anyone involved in the finding and fixing of reported problems)
  • 11. “Counts” Could Be Treated as Variables Data There are many situations where “counts” get used as measures of size: Total number of requirements Total lines of code Total bubbles in a data-flow diagram Customer sites Change requests received Total people assigned to a project When we count these things, we are counting all the entities in a population, not just the occurrence of entities with specific attributes These should always be treated as “variables” data even though they are instances of discrete counts
  • 12. Attributes Data When working with attributes data, the focus is on learning about one or more specific non-numerical characteristics of the population being sampled When attributes data are used for direct comparisons, they must be based on consistent “areas of opportunity” if the comparisons are to be meaningful If the number of defects that are likely to be observed depends on the size (lines of code) of a module or component, all sizes must be nearly equal If the probabilities associated with defect discovery depend on the time spent on inspecting or testing the elapsed time spent must be nearly equal
  • 13. Attributes Data - 2 In general, when the areas of opportunity for observing a specific event are not equal or nearly so, the chances of observing the event will differ across the observations Then we must normalize (convert to rates) by dividing each count by its area of opportunity before valid comparisons are made Conditions that make us willing to assume constant areas of opportunity seem to be less in software environments Normalization is almost always needed for software!
  • 14. Attributes Data - 3 Example: If the defects are being counted and the size of an item inspected influences the number of defects found, some measure of item size will also be needed to convert defect counts to relative rates that can be compared in meaningful ways (defects per lines of code) If the variations in the amount of time spent inspecting or testing can influence the number of defects found, these times should be clearly defined and measured as well
  • 15. Attributes Data - 4 One of the keys to making effective use of attributes data lies in preserving the ordering of each count in space and time Sequence information (the order in time or space in which the data is collected) is almost always needed to correctly interpret counts of attributes Make the counts specific – Make sure there is an operational definition (clear set of rules and procedures) for recognizing an attribute or entity if what gets counted is to be what the user of the data expects the data to be
  • 16. Attributes Data - 5 Attributes data is counted and plotted as discrete events: Shipping errors Percentage waste Number of defects found Number of defective items Number of source statements of a given type Number of lines of comments in a module of n lines Number of people with certain skills on a project Percentage of projects using formal inspections Team size Elapsed time between milestones Staff hours logged per task Backlog Number of priority-one customer complaints Percentage of non-conforming products in the output of an activity or a process
  • 17. The Key to Classifying Data The key to classifying data as attributes data or variables data depends not so much on whether the data are discrete or continuous, but on how they are collected and used The total number of defects found is often used as a measure of the amount of rework or retesting to be performed It is viewed as a measure of size and treated as variables data It is normally used as a count based on attributes The method of analysis you choose for any data will depend on: The questions you are asking The data distribution model you have in mind The assumptions you are willing to make with respect to the nature of the data (Page 79)
  • 18. Data Type Classifications Discrete Continuous
  • 19. Distributional ModelsRelationship to Chart Types Each type of chart is related to a set of assumptions (a distributional model) that must hold for that type of chart to be valid. There are six types of charts for “attributes data” NP P C U XmR for counts XmR for rates
  • 20. XmR charts have an advantage over np, p, c, and u charts in that they require fewer and less stringent assumptions They are easier to plat and use They have wide applicability Recommended by many quality-control professionals When assumptions of the distributional model are met, however, the more specialized np, p, c, and u charts can give better bounds for control limits and can offer advantages Distributional Models Relationship to Chart Types - 2
  • 21. Distributional ModelsRelationship to Chart Types - 3 NP Chart – An np chart is used when the count data are binomially distributed and all samples have equal areas of opportunity These conditions occur in manufacturing settings – when there is 100% of lots of size n (n is constant) and the number of defective units in each lot is recorded P Chart – a p chart is used when the data are binomially distributed but the areas of opportunity vary from sample to sample A p chart could be appropriate if the lot size n were to change from lot to lot
  • 22. Distributional ModelsRelationship to Chart Types - 4 C Chart – a c chart is used when the count data are samples from a Poisson distribution and the samples all have equal-sized areas of opportunity U Chart – a u chart is used in place of a c chart when the count data are samples from a Poisson distribution and the areas of opportunity are not constant Defects per thousand lines of code is an example for software NP, P, C and U charts are the traditional control charts used with attributes data XmR Chart – Useful when little is known about the underlying distribution of when the justification for assuming a binomial or Poisson process is questionable Almost always a reasonable choice
  • 23. Distributional ModelsRelationship to Chart Types - 5 More About U Charts – U charts seem to have the greatest prospects for use in software settings U charts require normalization (conversion to rates) when the areas of opportunity are not constant Poisson might be appropriate when counting the number of defects in modules during inspection or testing Defects per thousand lines of source code is an example of attributes data that is a candidate for u charts Although u charts may be appropriate for studying software defect densities in an operational environment, we are not aware of any empirical studies that have generally validated the use of Poisson models for nonoperational environments such as inspections
  • 24. Distributional ModelsRelationship to Chart Types - 6 Defects per module or defects per test are unlikely candidates for u charts, c charts, or any other charts for that matter The ratios are not based on equal areas of opportunity – Can’t be normalized There is no reason to expect them to be constant across all modules or tests when the process is in statistical control
  • 25. Distributional ModelsRelationship to Chart Types - 7 If you are uncertain as to the model that applies, it can make sense to use more than one set of charts If you think you may have a Poisson situation but are not sure that all conditions for a Poisson process are present, then plotting both a u chart and the corresponding XmR charts should bracket the situation If both charts point to the same conclusions, you are unlikely to be led astray If the conclusions differ, then you should investigate your assumptions or the events
  • 26. Presenting Data While it is simple and easy to compare one number with another, such comparisons are limited and weak Limited because the small amount of data used Weak because both of the numbers are subject to variation This makes it difficult to determine just how much of the differences between the values is due to variation in numbers and how much is due to real changes in the process
  • 27. Presenting Data - 2 Graphs – there are two basic graphs that are the most helpful is providing the context for interpreting the current value Time series graph (Run Chart) Have months or years marked off on the horizontal axis and possible values marked off on the vertical axis As you move from left to right, there is a passage of time By visually comparing the current value with the plotted values for the preceding months you can quickly see if the current value is unusual or not Histogram (Tally Plot) An accumulation of the different values as they occur without trying to display the time order sequence
  • 28. Run Charts Number of Required Changes to a Module as the Project Approaches Systems Test Syntax Check Desk Check Code Review Unit Test Integration and Test Systems Test
  • 29. 20 18 16 14 12 10 Number of Days 8 6 4 2 0 32 56 48 46 44 42 40 38 36 54 52 50 34 Product – Service Staff Hours Histograms
  • 30.                                                                                                                                                       PROCESS CONTROL CHART TYPE: METRIC: A point above or below the control lines suggests that the measurement has a special preventable or removable cause Upper Control Limit (UCL)   The chart is used for continuous and time control of the process and prevention of causes Upper and Lower Control Limits represent the natural variation In the process Center Line (CL) (Mean of data used to set up the chart) The chart is analyzed using standard Rules to define the control status of the process Plotted points are either individual measurements or the means of small groups of measurements Lower Control Limit (LCL)     Data relating to the process Statistical Methods for Software Quality Adrian Burr – Mal Owen, 1996 Numerical data taken in time sequence
  • 31. Impacts of Poor Data Quality Inability to conduct hypothesis and predictive modeling Inability to manage the quality and performance software or application development Ineffective process change instead of process improvement Ineffective and inefficient testing causing issues with time to market, field quality, and development costs Products that are costly to use within real-life usage profiles
  • 32. References Brassard, Michael & Ritter, Diane, The Memory Jogger II – A Pocket Guide of Tools for Continuous Improvement & Effective Planning, GOAL/QPC, Salem, New Hampshire, 1994 Florac, W.A. & Carleton, A.D. Measuring the Software Process Addison-Wesley, 1999 Six Sigma Academy, The Black Belt Memory Jogger – A Pocket Guide for Six Sigma Success, GOAL/QPC, Salem, New Hampshire, 2002 Wheeler, Donald J. Understanding Variation: The Key to Managing Chaos, Knoxville, Tennessee: SPC Press, 2000