SlideShare une entreprise Scribd logo
1  sur  54
Chapter 1 Chapter 1 Exploring Data
1.1 Displaying Data with graphs
Categorical variables Bar graphs Recall that horizontal axis is the category name and the vertical axis is the count or percentage 	Create a bar graph for “mobile phone carrier” for the students in this period in class 	/start with a survey!
Categorical Variables Pie Chart the area of each slice of pie reflects the relative frequency of the category the slice represents i.e. if “ATT” is used by 25% of the class, the area of the ATT slice must be 25% of the entire pie Remember/ all categories must be represented in the pie Typically, these are not fun to create 
Quantitative Data Stemplot (a.k.a. “Stem and Leaf Plot”) 	A stemplot displays the distribution in a very meaningful way  	Preview the example of pg 43!
Quantitative Data Stemplot steps Arrange the observations numerical order Separate each observation into a stem and a leaf Write stems in a vertical column Write the leaf of each observation next to the stem.  Leaves that are closest to the stem are lower in numerical value.
Quantitative Data 	The following measurements are the number of points scored by THS football in each game of the 2009 season. 42, 27, 19, 14, 20, 47, 53, 28, 32, 30, 44, 20
Quantitative Data Stemplot steps Arrange the observations numerical order 	14, 19, 20, 20, 27, 28, 30, 32, 42, 44, 47, 53
Quantitative Data Stemplot steps Separate each observation into a stem and a leaf 	1/4, 1/9, 2/0, 2/0, 2/7, 2/8, 3/0, 3/2, 4/2, 4/4, 4/7, 5/3
Quantitative Data Stemplot steps Write stems in a vertical column 	1/4, 1/9, 2/0, 2/0, 2/7, 2/8, 3/0, 3/2, 4/2, 4/4, 4/7, 5/3 1  2  	3  4 5
Quantitative Data Write the leaf of each observation next to the stem.  Leaves that are closest to the stem are lower in numerical value. 	1/4, 1/9, 2/0, 2/0, 2/7, 2/8, 3/0, 3/2, 4/2, 4/4, 4/7, 5/3 	1 	4, 9 	2 	0, 0, 7, 8 	3 	0, 2 	4	2, 4, 7 	5	3 YAY!
Quantitative Data Histogram A histogram is similar to a bar graph, but is used for quantitative data only. Observations are separated into classes (number ranges) All classes must have equal width Like a bar graph, the height of each bar represents the count for each class Example 1.6 on pg 49
Quantitative Data Histogram Let’s use the same data from our previous example 14, 19, 20, 20, 27, 28, 30, 32, 42, 44, 47, 53
Quantitative Data Histogram 1.	Separate the range into classes of equal width Let’s try the following: 00 < score < 14 15 < score < 29  30 < score < 44  45 < score < 60
Quantitative Data Histogram 2.Count the number of individuals in each class:
Quantitative Data Histogram Draw and label each Axis: 6 5 4 3 2 1 C O U N T 0	10	20	30	40	50	60 Number of points scored
Quantitative Data Histogram Draw each bar to the correct height 6 5 4 3 2 1 C O U N T 0	10	20	  30	40	  50	  60 Number of points scored
Assignment 1A 1.1-1.12 all Starts on pg 46
Examining Distributions Look for the pattern and any deviations from the general pattern In written work, you must describe C.U.S.S. Center Unusual features (outliers) Shape Spread Note: CUSS is just a mnemonic device.  It is customary to discuss “unusual features” last
Examining Distributions Center- 	We will discuss at greater length later. For now, you can use the median as a measure of center Spread- 	Also discussed later.  For now, give the minimum and maximum values to describe spread
Examining Distributions Shape- We generally want to know two things How many peaks?  Is it unimodal (one distinct peak)  or is it uniform (no distinct peaks)? Is the distribution symmetric (both tails are approximately equal) or skewed (one of the tails is longer)	 Left skewed- left tail is longer Right skewed- right tail is longer
Examining Distributions 	Outliers- like many things in statistics, outliers can be a judgment call.  Although we will learn a customary formula, to determine outliers, to formula is arbitrary. In a histogram, outliers will be clearly separated from the rest of the observations Because class widths can be arbitrary, be sure to thoroughly examine the data before classifying an observation as an outlier. Do not ignore or delete outlier observations!
Relative Freq. and Cumulative Freq. Let’s return to THS Football ‘09
Relative Freq. and Cumulative Freq. We will add a column to show relative frequency 	Yes, “relative frequency” is the same thing as “percentage” At this point, you could make a histogram using relative frequencies, if desired.
Relative Freq. and Cumulative Freq. Now add a column to show cumulative frequency 	Yes, keep adding the next rel. freq. 	The last cell in the column should be 100, unless there is roundoff error (not a big deal)
Relative Freq. and Cumulative Freq. To create a “Cumulative Frequency Plot” or “Ogive” start by creating axes similar to a histogram 	The vertical axis is percentage and should be labeled 0 to 100% 100 80 60 40 20 Cumulative freq. (%) 0	10	20	30	40	50	60 Number of points scored
Relative Freq. and Cumulative Freq. 	Plot points for each Cum. Freq.  The left boundary of the first class should be plotted at zero.  The last point plotted will be the right boundary of the last class at 100% 100 80 60 40 20 Cumulative freq. (%) 0	10	20	30	40	50	60 Number of points scored
Relative Freq. and Cumulative Freq. 	CONNECT THE DOTS! 100 80 60 40 20 Cumulative freq. (%) 0	10	20	30	40	50	60 Number of points scored
Relative Freq. and Cumulative Freq. Some notes about ogives. It’s pronounced “Oh-Jives” Ogives can be used to find approx. percentile rank The vertical axis is percentile! In particular, we are interested in: Median (50th percentile) First Quartile (25th percentile) Third Quartile (75th percentile) The above vocab. Will come up again.  Memorize it!
Assignment 1.1B P 64 #13-15, 21-25
1.2 Describing Data with NUmb3rs
Measuring Center MEAN- calculated the same way you always calculate mean (average)  The symbol      is read as “x-bar” The mean is affected by not a resistant measure of center- it is sensitive to a few extreme observations.
Measuring Center Median- the “middle” number in a set of observations is known as the median If the data set has an even number of observations, then the median is the average of the two middle numbers Unlike the mean, the median is a resistant measure of center.
Measuring Spread The Quartiles The median of the subset of data less than the median is the First Quartile (Q1) The median of the subset of data greater than the median is the Third Quartile (Q3) Notice that the median is not included in either of the above calculations  Q1 is the 25th percentile  Q3 is the 75th percentile
Measuring Spread Recall the data from THS Football 2009 14, 19, 20, 20, 27, 28, 30, 32, 42, 44, 47, 53 We can order the numbers to help 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12 14, 19, 20, 20, 27, 28, 30, 32, 42, 44, 47, 53
Measuring Spread 01, 02, 03, 04, 05, 06, 		07, 08, 09, 10, 11, 12 14, 19, 20, 20, 27, 28, 		30, 32, 42, 44, 47, 53 	Notice that the median is the average of 28 and 30 	Med. = 29
Measuring Spread 01, 02, 03,		 04, 05, 06,  	 Q1 is the avg 14, 19, 20, 		20, 27, 28, 		 of 20 and 20 07, 08, 09, 		10, 11, 12		Q3 is the avg. 30, 32, 42, 		44, 47, 53		of 42 and 44 Q1 = 20 	Med. = 29Q3 = 43
Measuring Spreat InterQuartile Range (IQR) 	IQR is the preferred measurement of spread when the median is used to describe center IQR =  Q3 - Q1  IQR = 43–20 IQR = 23
Measuring Spread InterQuartile Range and Outliers 	The previously mentioned formula for determining outlier observations depends on IQR High outliers (outliers to the right) measurements greater than Q3 +1.5 x IQR Low Outliers (outliers to the left) measurements less than:  Q1 -1.5 x IQR
Measuring Spread InterQuartile Range and Outliers High outliers  greater than Q3 +1.5 x IQR = 43 + 1.5 x 23 or any observation greater than 77.5 Low Outliers 		less than:  Q1 -1.5 x IQR = 20 – 1.5 x 23 	or observations less than -14.5 Clearly, THS had no outlier football scores in 2009!
Five Number Summary 	A snapshot of a data distribution can be given with the 5 number summary: 	Minimum, Q1, Median, Q3, Maximum 	For our THS Football 2009, the five number summary is: 		14, 20, 29, 43, 53
Five Number Summary 	The 5 number summary is used to create a box plot (“box and whiskers” plot) Min Q1 Med Q3 Max 0	10	20	30	40	50	60
Five Number Summary BOX PLOT a number line must be included with a box plot outliers appear as unconnected dots 0	10	20	30	40	50	60
Assignment 1C P74 #27-30, 32, 34, 37
The Standard Deviation 	The preferred measure of spread when using mean as a measure of center is the related measurements of “variance” and “standard deviation” variance = s2 standard deviation = s 	Yes, standard deviation is the square root of variance.
The Standard Deviation Formulation of variance 	Yes, take the square root to find the std. dev.
The Standard Deviation For the THS 2009 data Mean = 31.33 s2 = [(14-31.33)2+(19-31.33)2+(20-31.33)2+(20-31.33)2+(27-31.33)2+(28-31.33)2+(30-31.33)2+(32-31.33)2+(42-31.33)2+(44-31.33)2+(47-31.33)2+(53-31.33)2] / (12-1) s2 = 1730.66 / 11 s2 = 157.33
The Standard Deviation Notice that the number s2 = 157.33 doesn’t really have much to do with the data set! However we can see that s = 12.54 has some meaning in our data. With all data sets, “the majority” of observations are within the standard deviation of the mean Most data is btwn 31.33 - 12.54 and 31.33 + 12.54-or- Most data is btwn 18.79 and 43.87
Which measurements do I choose? Use “mean and standard deviation” when the data is reasonably symmetric with no outliers. Use “median and IQR” or 5 num. sum. in cases where the “mean and std. dev.” is not appropriate. Remember: “5 num sum” is resistant to outliers, while the “mean and std dev” is not resistant
Linear Transformation of Data If every member of a data set is multiplied by a positive number b, then the measures of center and spread are also multiplied by b. If a constant a is added to every member of a data set, then a is added to the measure center, but the measures of spread remain unchanged.
Linear Transformation of Data
Comparing Data Sets The AP Exam always asks students to compare data. Clearly identify the populations that are being compared Make sure to compare each of CUSS  Make reference to the measurement you are comparing i.e. use “mean” and not “center” Give the values of the measurements you are comparing. Make use of comparison phrases “is greater than” “is less than”
Assignment 1D P89 #39-41, 45-47
Stats chapter 1

Contenu connexe

Tendances

Statistics
StatisticsStatistics
Statisticsitutor
 
Graphs, charts, and tables ppt @ bec doms
Graphs, charts, and tables ppt @ bec domsGraphs, charts, and tables ppt @ bec doms
Graphs, charts, and tables ppt @ bec domsBabasab Patil
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsMmedsc Hahm
 
3.3 Measures of relative standing and boxplots
3.3 Measures of relative standing and boxplots3.3 Measures of relative standing and boxplots
3.3 Measures of relative standing and boxplotsLong Beach City College
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statisticsDhwani Shah
 
Aed1222 lesson 5
Aed1222 lesson 5Aed1222 lesson 5
Aed1222 lesson 5nurun2010
 
Percentiles and Deciles
Percentiles and DecilesPercentiles and Deciles
Percentiles and DecilesMary Espinar
 
analytical representation of data
 analytical representation of data analytical representation of data
analytical representation of dataUnsa Shakir
 
Aed1222 lesson 6 2nd part
Aed1222 lesson 6 2nd partAed1222 lesson 6 2nd part
Aed1222 lesson 6 2nd partnurun2010
 
Graphical presentation of data
Graphical presentation of dataGraphical presentation of data
Graphical presentation of dataprince irfan
 
Aed1222 lesson 2
Aed1222 lesson 2Aed1222 lesson 2
Aed1222 lesson 2nurun2010
 
Statistics Based On Ncert X Class
Statistics Based On Ncert X ClassStatistics Based On Ncert X Class
Statistics Based On Ncert X ClassRanveer Kumar
 
Statistics Math project class 10th
Statistics Math project class 10thStatistics Math project class 10th
Statistics Math project class 10thRiya Singh
 

Tendances (20)

Statistics
StatisticsStatistics
Statistics
 
Statistical ppt
Statistical pptStatistical ppt
Statistical ppt
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Graphs, charts, and tables ppt @ bec doms
Graphs, charts, and tables ppt @ bec domsGraphs, charts, and tables ppt @ bec doms
Graphs, charts, and tables ppt @ bec doms
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Histograms
HistogramsHistograms
Histograms
 
Tabular and Graphical Representation of Data
Tabular and Graphical Representation of Data Tabular and Graphical Representation of Data
Tabular and Graphical Representation of Data
 
3.3 Measures of relative standing and boxplots
3.3 Measures of relative standing and boxplots3.3 Measures of relative standing and boxplots
3.3 Measures of relative standing and boxplots
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
 
Aed1222 lesson 5
Aed1222 lesson 5Aed1222 lesson 5
Aed1222 lesson 5
 
Percentiles and Deciles
Percentiles and DecilesPercentiles and Deciles
Percentiles and Deciles
 
analytical representation of data
 analytical representation of data analytical representation of data
analytical representation of data
 
Aed1222 lesson 6 2nd part
Aed1222 lesson 6 2nd partAed1222 lesson 6 2nd part
Aed1222 lesson 6 2nd part
 
Statistics
StatisticsStatistics
Statistics
 
Frequency Distributions
Frequency DistributionsFrequency Distributions
Frequency Distributions
 
Graphical presentation of data
Graphical presentation of dataGraphical presentation of data
Graphical presentation of data
 
Exploring Data
Exploring DataExploring Data
Exploring Data
 
Aed1222 lesson 2
Aed1222 lesson 2Aed1222 lesson 2
Aed1222 lesson 2
 
Statistics Based On Ncert X Class
Statistics Based On Ncert X ClassStatistics Based On Ncert X Class
Statistics Based On Ncert X Class
 
Statistics Math project class 10th
Statistics Math project class 10thStatistics Math project class 10th
Statistics Math project class 10th
 

Similaire à Stats chapter 1

Describing and exploring data
Describing and exploring dataDescribing and exploring data
Describing and exploring dataTarun Gehlot
 
TSTD 6251  Fall 2014SPSS Exercise and Assignment 120 PointsI.docx
TSTD 6251  Fall 2014SPSS Exercise and Assignment 120 PointsI.docxTSTD 6251  Fall 2014SPSS Exercise and Assignment 120 PointsI.docx
TSTD 6251  Fall 2014SPSS Exercise and Assignment 120 PointsI.docxnanamonkton
 
Wynberg girls high-Jade Gibson-maths-data analysis statistics
Wynberg girls high-Jade Gibson-maths-data analysis statisticsWynberg girls high-Jade Gibson-maths-data analysis statistics
Wynberg girls high-Jade Gibson-maths-data analysis statisticsWynberg Girls High
 
Class1.ppt
Class1.pptClass1.ppt
Class1.pptGautam G
 
STATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICS
STATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICSSTATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICS
STATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICSnagamani651296
 
Introduction to Statistics - Basics of Data - Class 1
Introduction to Statistics - Basics of Data - Class 1Introduction to Statistics - Basics of Data - Class 1
Introduction to Statistics - Basics of Data - Class 1RajnishSingh367990
 
2-L2 Presentation of data.pptx
2-L2 Presentation of data.pptx2-L2 Presentation of data.pptx
2-L2 Presentation of data.pptxssuser03ba7c
 
Numerical measures stat ppt @ bec doms
Numerical measures stat ppt @ bec domsNumerical measures stat ppt @ bec doms
Numerical measures stat ppt @ bec domsBabasab Patil
 
Source of DATA
Source of DATASource of DATA
Source of DATANahid Amin
 
2 biostatistics presenting data
2  biostatistics presenting data2  biostatistics presenting data
2 biostatistics presenting dataDr. Nazar Jaf
 
Descriptive Statistics and Data Visualization
Descriptive Statistics and Data VisualizationDescriptive Statistics and Data Visualization
Descriptive Statistics and Data VisualizationDouglas Joubert
 
presentation
presentationpresentation
presentationPwalmiki
 
Student’s presentation
Student’s presentationStudent’s presentation
Student’s presentationPwalmiki
 
Statistics in research
Statistics in researchStatistics in research
Statistics in researchBalaji P
 

Similaire à Stats chapter 1 (20)

Chapter3
Chapter3Chapter3
Chapter3
 
Describing and exploring data
Describing and exploring dataDescribing and exploring data
Describing and exploring data
 
TSTD 6251  Fall 2014SPSS Exercise and Assignment 120 PointsI.docx
TSTD 6251  Fall 2014SPSS Exercise and Assignment 120 PointsI.docxTSTD 6251  Fall 2014SPSS Exercise and Assignment 120 PointsI.docx
TSTD 6251  Fall 2014SPSS Exercise and Assignment 120 PointsI.docx
 
Wynberg girls high-Jade Gibson-maths-data analysis statistics
Wynberg girls high-Jade Gibson-maths-data analysis statisticsWynberg girls high-Jade Gibson-maths-data analysis statistics
Wynberg girls high-Jade Gibson-maths-data analysis statistics
 
Class1.ppt
Class1.pptClass1.ppt
Class1.ppt
 
Class1.ppt
Class1.pptClass1.ppt
Class1.ppt
 
Class1.ppt
Class1.pptClass1.ppt
Class1.ppt
 
Class1.ppt
Class1.pptClass1.ppt
Class1.ppt
 
STATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICS
STATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICSSTATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICS
STATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICS
 
Introduction to Statistics - Basics of Data - Class 1
Introduction to Statistics - Basics of Data - Class 1Introduction to Statistics - Basics of Data - Class 1
Introduction to Statistics - Basics of Data - Class 1
 
2-L2 Presentation of data.pptx
2-L2 Presentation of data.pptx2-L2 Presentation of data.pptx
2-L2 Presentation of data.pptx
 
Numerical measures stat ppt @ bec doms
Numerical measures stat ppt @ bec domsNumerical measures stat ppt @ bec doms
Numerical measures stat ppt @ bec doms
 
Source of DATA
Source of DATASource of DATA
Source of DATA
 
2 biostatistics presenting data
2  biostatistics presenting data2  biostatistics presenting data
2 biostatistics presenting data
 
Statistics
StatisticsStatistics
Statistics
 
Descriptive Statistics and Data Visualization
Descriptive Statistics and Data VisualizationDescriptive Statistics and Data Visualization
Descriptive Statistics and Data Visualization
 
presentation
presentationpresentation
presentation
 
Student’s presentation
Student’s presentationStudent’s presentation
Student’s presentation
 
Statistics.ppt
Statistics.pptStatistics.ppt
Statistics.ppt
 
Statistics in research
Statistics in researchStatistics in research
Statistics in research
 

Plus de Richard Ferreria (20)

Chapter6
Chapter6Chapter6
Chapter6
 
Chapter2
Chapter2Chapter2
Chapter2
 
Chapter8
Chapter8Chapter8
Chapter8
 
Chapter1
Chapter1Chapter1
Chapter1
 
Chapter4
Chapter4Chapter4
Chapter4
 
Chapter7
Chapter7Chapter7
Chapter7
 
Chapter5
Chapter5Chapter5
Chapter5
 
Chapter9
Chapter9Chapter9
Chapter9
 
Chapter14
Chapter14Chapter14
Chapter14
 
Chapter15
Chapter15Chapter15
Chapter15
 
Chapter11
Chapter11Chapter11
Chapter11
 
Chapter12
Chapter12Chapter12
Chapter12
 
Chapter10
Chapter10Chapter10
Chapter10
 
Chapter13
Chapter13Chapter13
Chapter13
 
Adding grades to your google site v2 (dropbox)
Adding grades to your google site v2 (dropbox)Adding grades to your google site v2 (dropbox)
Adding grades to your google site v2 (dropbox)
 
Stats chapter 14
Stats chapter 14Stats chapter 14
Stats chapter 14
 
Stats chapter 15
Stats chapter 15Stats chapter 15
Stats chapter 15
 
Stats chapter 13
Stats chapter 13Stats chapter 13
Stats chapter 13
 
Stats chapter 12
Stats chapter 12Stats chapter 12
Stats chapter 12
 
Stats chapter 11
Stats chapter 11Stats chapter 11
Stats chapter 11
 

Stats chapter 1

  • 1. Chapter 1 Chapter 1 Exploring Data
  • 2. 1.1 Displaying Data with graphs
  • 3. Categorical variables Bar graphs Recall that horizontal axis is the category name and the vertical axis is the count or percentage Create a bar graph for “mobile phone carrier” for the students in this period in class /start with a survey!
  • 4. Categorical Variables Pie Chart the area of each slice of pie reflects the relative frequency of the category the slice represents i.e. if “ATT” is used by 25% of the class, the area of the ATT slice must be 25% of the entire pie Remember/ all categories must be represented in the pie Typically, these are not fun to create 
  • 5. Quantitative Data Stemplot (a.k.a. “Stem and Leaf Plot”) A stemplot displays the distribution in a very meaningful way Preview the example of pg 43!
  • 6. Quantitative Data Stemplot steps Arrange the observations numerical order Separate each observation into a stem and a leaf Write stems in a vertical column Write the leaf of each observation next to the stem. Leaves that are closest to the stem are lower in numerical value.
  • 7. Quantitative Data The following measurements are the number of points scored by THS football in each game of the 2009 season. 42, 27, 19, 14, 20, 47, 53, 28, 32, 30, 44, 20
  • 8. Quantitative Data Stemplot steps Arrange the observations numerical order 14, 19, 20, 20, 27, 28, 30, 32, 42, 44, 47, 53
  • 9. Quantitative Data Stemplot steps Separate each observation into a stem and a leaf 1/4, 1/9, 2/0, 2/0, 2/7, 2/8, 3/0, 3/2, 4/2, 4/4, 4/7, 5/3
  • 10. Quantitative Data Stemplot steps Write stems in a vertical column 1/4, 1/9, 2/0, 2/0, 2/7, 2/8, 3/0, 3/2, 4/2, 4/4, 4/7, 5/3 1 2 3 4 5
  • 11. Quantitative Data Write the leaf of each observation next to the stem. Leaves that are closest to the stem are lower in numerical value. 1/4, 1/9, 2/0, 2/0, 2/7, 2/8, 3/0, 3/2, 4/2, 4/4, 4/7, 5/3 1 4, 9 2 0, 0, 7, 8 3 0, 2 4 2, 4, 7 5 3 YAY!
  • 12. Quantitative Data Histogram A histogram is similar to a bar graph, but is used for quantitative data only. Observations are separated into classes (number ranges) All classes must have equal width Like a bar graph, the height of each bar represents the count for each class Example 1.6 on pg 49
  • 13. Quantitative Data Histogram Let’s use the same data from our previous example 14, 19, 20, 20, 27, 28, 30, 32, 42, 44, 47, 53
  • 14. Quantitative Data Histogram 1. Separate the range into classes of equal width Let’s try the following: 00 < score < 14 15 < score < 29 30 < score < 44 45 < score < 60
  • 15. Quantitative Data Histogram 2.Count the number of individuals in each class:
  • 16. Quantitative Data Histogram Draw and label each Axis: 6 5 4 3 2 1 C O U N T 0 10 20 30 40 50 60 Number of points scored
  • 17. Quantitative Data Histogram Draw each bar to the correct height 6 5 4 3 2 1 C O U N T 0 10 20 30 40 50 60 Number of points scored
  • 18. Assignment 1A 1.1-1.12 all Starts on pg 46
  • 19. Examining Distributions Look for the pattern and any deviations from the general pattern In written work, you must describe C.U.S.S. Center Unusual features (outliers) Shape Spread Note: CUSS is just a mnemonic device. It is customary to discuss “unusual features” last
  • 20. Examining Distributions Center- We will discuss at greater length later. For now, you can use the median as a measure of center Spread- Also discussed later. For now, give the minimum and maximum values to describe spread
  • 21. Examining Distributions Shape- We generally want to know two things How many peaks? Is it unimodal (one distinct peak) or is it uniform (no distinct peaks)? Is the distribution symmetric (both tails are approximately equal) or skewed (one of the tails is longer) Left skewed- left tail is longer Right skewed- right tail is longer
  • 22. Examining Distributions Outliers- like many things in statistics, outliers can be a judgment call. Although we will learn a customary formula, to determine outliers, to formula is arbitrary. In a histogram, outliers will be clearly separated from the rest of the observations Because class widths can be arbitrary, be sure to thoroughly examine the data before classifying an observation as an outlier. Do not ignore or delete outlier observations!
  • 23. Relative Freq. and Cumulative Freq. Let’s return to THS Football ‘09
  • 24. Relative Freq. and Cumulative Freq. We will add a column to show relative frequency Yes, “relative frequency” is the same thing as “percentage” At this point, you could make a histogram using relative frequencies, if desired.
  • 25. Relative Freq. and Cumulative Freq. Now add a column to show cumulative frequency Yes, keep adding the next rel. freq. The last cell in the column should be 100, unless there is roundoff error (not a big deal)
  • 26. Relative Freq. and Cumulative Freq. To create a “Cumulative Frequency Plot” or “Ogive” start by creating axes similar to a histogram The vertical axis is percentage and should be labeled 0 to 100% 100 80 60 40 20 Cumulative freq. (%) 0 10 20 30 40 50 60 Number of points scored
  • 27. Relative Freq. and Cumulative Freq. Plot points for each Cum. Freq. The left boundary of the first class should be plotted at zero. The last point plotted will be the right boundary of the last class at 100% 100 80 60 40 20 Cumulative freq. (%) 0 10 20 30 40 50 60 Number of points scored
  • 28. Relative Freq. and Cumulative Freq. CONNECT THE DOTS! 100 80 60 40 20 Cumulative freq. (%) 0 10 20 30 40 50 60 Number of points scored
  • 29. Relative Freq. and Cumulative Freq. Some notes about ogives. It’s pronounced “Oh-Jives” Ogives can be used to find approx. percentile rank The vertical axis is percentile! In particular, we are interested in: Median (50th percentile) First Quartile (25th percentile) Third Quartile (75th percentile) The above vocab. Will come up again. Memorize it!
  • 30. Assignment 1.1B P 64 #13-15, 21-25
  • 31. 1.2 Describing Data with NUmb3rs
  • 32. Measuring Center MEAN- calculated the same way you always calculate mean (average) The symbol is read as “x-bar” The mean is affected by not a resistant measure of center- it is sensitive to a few extreme observations.
  • 33. Measuring Center Median- the “middle” number in a set of observations is known as the median If the data set has an even number of observations, then the median is the average of the two middle numbers Unlike the mean, the median is a resistant measure of center.
  • 34. Measuring Spread The Quartiles The median of the subset of data less than the median is the First Quartile (Q1) The median of the subset of data greater than the median is the Third Quartile (Q3) Notice that the median is not included in either of the above calculations Q1 is the 25th percentile Q3 is the 75th percentile
  • 35. Measuring Spread Recall the data from THS Football 2009 14, 19, 20, 20, 27, 28, 30, 32, 42, 44, 47, 53 We can order the numbers to help 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12 14, 19, 20, 20, 27, 28, 30, 32, 42, 44, 47, 53
  • 36. Measuring Spread 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12 14, 19, 20, 20, 27, 28, 30, 32, 42, 44, 47, 53 Notice that the median is the average of 28 and 30 Med. = 29
  • 37. Measuring Spread 01, 02, 03, 04, 05, 06, Q1 is the avg 14, 19, 20, 20, 27, 28, of 20 and 20 07, 08, 09, 10, 11, 12 Q3 is the avg. 30, 32, 42, 44, 47, 53 of 42 and 44 Q1 = 20 Med. = 29Q3 = 43
  • 38. Measuring Spreat InterQuartile Range (IQR) IQR is the preferred measurement of spread when the median is used to describe center IQR = Q3 - Q1 IQR = 43–20 IQR = 23
  • 39. Measuring Spread InterQuartile Range and Outliers The previously mentioned formula for determining outlier observations depends on IQR High outliers (outliers to the right) measurements greater than Q3 +1.5 x IQR Low Outliers (outliers to the left) measurements less than: Q1 -1.5 x IQR
  • 40. Measuring Spread InterQuartile Range and Outliers High outliers greater than Q3 +1.5 x IQR = 43 + 1.5 x 23 or any observation greater than 77.5 Low Outliers less than: Q1 -1.5 x IQR = 20 – 1.5 x 23 or observations less than -14.5 Clearly, THS had no outlier football scores in 2009!
  • 41. Five Number Summary A snapshot of a data distribution can be given with the 5 number summary: Minimum, Q1, Median, Q3, Maximum For our THS Football 2009, the five number summary is: 14, 20, 29, 43, 53
  • 42. Five Number Summary The 5 number summary is used to create a box plot (“box and whiskers” plot) Min Q1 Med Q3 Max 0 10 20 30 40 50 60
  • 43. Five Number Summary BOX PLOT a number line must be included with a box plot outliers appear as unconnected dots 0 10 20 30 40 50 60
  • 44. Assignment 1C P74 #27-30, 32, 34, 37
  • 45. The Standard Deviation The preferred measure of spread when using mean as a measure of center is the related measurements of “variance” and “standard deviation” variance = s2 standard deviation = s Yes, standard deviation is the square root of variance.
  • 46. The Standard Deviation Formulation of variance Yes, take the square root to find the std. dev.
  • 47. The Standard Deviation For the THS 2009 data Mean = 31.33 s2 = [(14-31.33)2+(19-31.33)2+(20-31.33)2+(20-31.33)2+(27-31.33)2+(28-31.33)2+(30-31.33)2+(32-31.33)2+(42-31.33)2+(44-31.33)2+(47-31.33)2+(53-31.33)2] / (12-1) s2 = 1730.66 / 11 s2 = 157.33
  • 48. The Standard Deviation Notice that the number s2 = 157.33 doesn’t really have much to do with the data set! However we can see that s = 12.54 has some meaning in our data. With all data sets, “the majority” of observations are within the standard deviation of the mean Most data is btwn 31.33 - 12.54 and 31.33 + 12.54-or- Most data is btwn 18.79 and 43.87
  • 49. Which measurements do I choose? Use “mean and standard deviation” when the data is reasonably symmetric with no outliers. Use “median and IQR” or 5 num. sum. in cases where the “mean and std. dev.” is not appropriate. Remember: “5 num sum” is resistant to outliers, while the “mean and std dev” is not resistant
  • 50. Linear Transformation of Data If every member of a data set is multiplied by a positive number b, then the measures of center and spread are also multiplied by b. If a constant a is added to every member of a data set, then a is added to the measure center, but the measures of spread remain unchanged.
  • 52. Comparing Data Sets The AP Exam always asks students to compare data. Clearly identify the populations that are being compared Make sure to compare each of CUSS Make reference to the measurement you are comparing i.e. use “mean” and not “center” Give the values of the measurements you are comparing. Make use of comparison phrases “is greater than” “is less than”
  • 53. Assignment 1D P89 #39-41, 45-47

Notes de l'éditeur

  1. Pg 42 (!)
  2. P 48