# Introductory Statistics Explained.pdf

28 May 2023
1 sur 459

### Introductory Statistics Explained.pdf

• 1. . Introductory Statistics Explained Edition 1.10CC ©2015 Jeremy Balka This work is licensed under: https://creativecommons.org/licenses/by-nc-nd/4.0/ See http://www.jbstatistics.com/ for the complete set of supporting videos and other supporting material. The complete set of videos is also available at: https://www.youtube.com/user/jbstatistics/videos
• 2. Balka ISE 1.10 2
• 3. Contents 1 Introduction 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Inferential Statistics . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Gathering Data 7 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Populations and Samples, Parameters and Statistics . . . . . . . 9 2.3 Types of Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3.1 Simple Random Sampling . . . . . . . . . . . . . . . . . . 11 2.3.2 Other Types of Random Sampling . . . . . . . . . . . . . 13 2.4 Experiments and Observational Studies . . . . . . . . . . . . . . 14 2.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3 Descriptive Statistics 21 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2 Plots for Categorical and Quantitative Variables . . . . . . . . . 23 3.2.1 Plots for Categorical Variables . . . . . . . . . . . . . . . 23 3.2.2 Graphs for Quantitative Variables . . . . . . . . . . . . . 25 3.3 Numerical Measures . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.3.1 Summation Notation . . . . . . . . . . . . . . . . . . . . . 33 3.3.2 Measures of Central Tendency . . . . . . . . . . . . . . . . 34 3.3.3 Measures of Variability . . . . . . . . . . . . . . . . . . . . 37 3.3.4 Measures of Relative Standing . . . . . . . . . . . . . . . 45 3.4 Boxplots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.5 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . 53 3.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4 Probability 59 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.2 Basics of Probability . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.2.1 Interpreting Probability . . . . . . . . . . . . . . . . . . . 62 i
• 4. Balka ISE 1.10 CONTENTS ii 4.2.2 Sample Spaces and Sample Points . . . . . . . . . . . . . 63 4.2.3 Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.3 Rules of Probability . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.3.1 The Intersection of Events . . . . . . . . . . . . . . . . . . 65 4.3.2 Mutually Exclusive Events . . . . . . . . . . . . . . . . . . 65 4.3.3 The Union of Events and the Addition Rule . . . . . . . . 66 4.3.4 Complementary Events . . . . . . . . . . . . . . . . . . . 67 4.3.5 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.3.6 Conditional Probability . . . . . . . . . . . . . . . . . . . 69 4.3.7 Independent Events . . . . . . . . . . . . . . . . . . . . . 71 4.3.8 The Multiplication Rule . . . . . . . . . . . . . . . . . . . 73 4.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.5 Bayes’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.5.2 The Law of Total Probability and Bayes’ Theorem . . . . 85 4.6 Counting rules: Permutations and Combinations . . . . . . . . . 88 4.6.1 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.6.2 Combinations . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.7 Probability and the Long Run . . . . . . . . . . . . . . . . . . . . 91 4.8 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5 Discrete Random Variables and Discrete Probability Distribu- tions 97 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.2 Discrete and Continuous Random Variables . . . . . . . . . . . . 99 5.3 Discrete Probability Distributions . . . . . . . . . . . . . . . . . . 101 5.3.1 The Expectation and Variance of Discrete Random Variables102 5.4 The Bernoulli Distribution . . . . . . . . . . . . . . . . . . . . . . 110 5.5 The Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . 111 5.5.1 Binomial or Not? . . . . . . . . . . . . . . . . . . . . . . . 114 5.5.2 A Binomial Example with Probability Calculations . . . . 115 5.6 The Hypergeometric Distribution . . . . . . . . . . . . . . . . . . 117 5.7 The Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . 121 5.7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 121 5.7.2 The Relationship Between the Poisson and Binomial Dis- tributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5.7.3 Poisson or Not? More Discussion on When a Random Vari- able has a Poisson distribution . . . . . . . . . . . . . . . 125 5.8 The Geometric Distribution . . . . . . . . . . . . . . . . . . . . . 127 5.9 The Negative Binomial Distribution . . . . . . . . . . . . . . . . 131 5.10 The Multinomial Distribution . . . . . . . . . . . . . . . . . . . . 133 5.11 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 135
• 5. Balka ISE 1.10 CONTENTS iii 6 Continuous Random Variables and Continuous Probability Dis- tributions 139 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 6.2 Properties of Continuous Probability Distributions . . . . . . . . 143 6.2.1 An Example Using Integration . . . . . . . . . . . . . . . 144 6.3 The Continuous Uniform Distribution . . . . . . . . . . . . . . . 146 6.4 The Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . 150 6.4.1 Finding Areas Under the Standard Normal Curve . . . . . 153 6.4.2 Standardizing Normally Distributed Random Variables . . 158 6.5 Normal Quantile-Quantile Plots . . . . . . . . . . . . . . . . . . . 162 6.5.1 Examples of Normal QQ Plots . . . . . . . . . . . . . . . 163 6.6 Other Important Continuous Probability Distributions . . . . . . 166 6.6.1 The χ2 Distribution . . . . . . . . . . . . . . . . . . . . . 166 6.6.2 The t Distribution . . . . . . . . . . . . . . . . . . . . . . 168 6.6.3 The F Distribution . . . . . . . . . . . . . . . . . . . . . . 169 6.7 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 172 7 Sampling Distributions 175 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 7.2 The Sampling Distribution of the Sample Mean . . . . . . . . . . 180 7.3 The Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . 183 7.3.1 Illustration of the Central Limit Theorem . . . . . . . . . 185 7.4 Some Terminology Regarding Sampling Distributions . . . . . . . 187 7.4.1 Standard Errors . . . . . . . . . . . . . . . . . . . . . . . 187 7.4.2 Unbiased Estimators . . . . . . . . . . . . . . . . . . . . . 187 7.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 188 8 Confidence Intervals 191 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 8.2 Interval Estimation of µ when σ is Known . . . . . . . . . . . . . 194 8.2.1 Interpretation of the Interval . . . . . . . . . . . . . . . . 198 8.2.2 What Factors Affect the Margin of Error? . . . . . . . . . 200 8.2.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 8.3 Confidence Intervals for µ When σ is Unknown . . . . . . . . . . 205 8.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 205 8.3.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 8.3.3 Assumptions of the One-Sample t Procedures . . . . . . . 212 8.4 Determining the Minimum Sample Size n . . . . . . . . . . . . . 218 8.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 220 9 Hypothesis Tests (Tests of Significance) 223 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 9.2 The Logic of Hypothesis Testing . . . . . . . . . . . . . . . . . . 227
• 6. Balka ISE 1.10 CONTENTS iv 9.3 Hypothesis Tests for µ When σ is Known . . . . . . . . . . . . . 229 9.3.1 Constructing Appropriate Hypotheses . . . . . . . . . . . 229 9.3.2 The Test Statistic . . . . . . . . . . . . . . . . . . . . . . 230 9.3.3 The Rejection Region Approach to Hypothesis Testing . . 233 9.3.4 P-values . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 9.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 9.5 Interpreting the p-value . . . . . . . . . . . . . . . . . . . . . . . 243 9.5.1 The Distribution of the p-value When H0 is True . . . . . 243 9.5.2 The Distribution of the p-value When H0 is False . . . . . 245 9.6 Type I Errors, Type II Errors, and the Power of a Test . . . . . . 245 9.6.1 Calculating Power and the Probability of a Type II Error 248 9.6.2 What Factors Affect the Power of the Test? . . . . . . . . 253 9.7 One-sided Test or Two-sided Test? . . . . . . . . . . . . . . . . . 254 9.7.1 Choosing Between a One-sided Alternative and a Two- sided Alternative . . . . . . . . . . . . . . . . . . . . . . . 254 9.7.2 Reaching a Directional Conclusion from a Two-sided Al- ternative . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 9.8 Statistical Significance and Practical Significance . . . . . . . . . 257 9.9 The Relationship Between Hypothesis Tests and Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . 258 9.10 Hypothesis Tests for µ When σ is Unknown . . . . . . . . . . . . 260 9.10.1 Examples of Hypothesis Tests Using the t Statistic . . . . 261 9.11 More on Assumptions . . . . . . . . . . . . . . . . . . . . . . . . 266 9.12 Criticisms of Hypothesis Testing . . . . . . . . . . . . . . . . . . 268 9.13 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 271 10 Inference for Two Means 273 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 10.2 The Sampling Distribution of the Difference in Sample Means . . 276 10.3 Inference for µ1 − µ2 When σ1 and σ2 are Known . . . . . . . . . 277 10.4 Inference for µ1 − µ2 when σ1 and σ2 are unknown . . . . . . . . 279 10.4.1 Pooled Variance Two-Sample t Procedures . . . . . . . . . 280 10.4.2 The Welch Approximate t Procedure . . . . . . . . . . . . 286 10.4.3 Guidelines for Choosing the Appropriate Two-Sample t Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 290 10.4.4 More Examples of Inferences for the Difference in Means . 292 10.5 Paired-Difference Procedures . . . . . . . . . . . . . . . . . . . . 297 10.5.1 The Paired-Difference t Procedure . . . . . . . . . . . . . 300 10.6 Pooled-Variance t Procedures: Investigating the Normality As- sumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 10.7 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 306 11 Inference for Proportions 309
• 7. Balka ISE 1.10 CONTENTS v 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 11.2 The Sampling Distribution of the Sample Proportion . . . . . . . 312 11.2.1 The Mean and Variance of the Sampling Distribution of p̂ 312 11.2.2 The Normal Approximation . . . . . . . . . . . . . . . . . 313 11.3 Confidence Intervals and Hypothesis Tests for the Population Pro- portion p . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 11.3.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 11.4 Determining the Minimum Sample Size n . . . . . . . . . . . . . 320 11.5 Inference Procedures for Two Population Proportions . . . . . . . 321 11.5.1 The Sampling Distribution of p̂1 − p̂2 . . . . . . . . . . . . 322 11.5.2 Confidence Intervals and Hypothesis Tests for p1 − p2 . . 323 11.6 More on Assumptions . . . . . . . . . . . . . . . . . . . . . . . . 327 11.7 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 329 12 Inference for Variances 331 12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 12.2 The Sampling Distribution of the Sample Variance . . . . . . . . 334 12.2.1 The Sampling Distribution of the Sample Variance When Sampling from a Normal Population . . . . . . . . . . . . 334 12.2.2 The Sampling Distribution of the Sample Variance When Sampling from Non-Normal Populations . . . . . . . . . . 336 12.3 Inference Procedures for a Single Variance . . . . . . . . . . . . . 337 12.4 Comparing Two Variances . . . . . . . . . . . . . . . . . . . . . . 343 12.4.1 The Sampling Distribution of the Ratio of Sample Variances343 12.4.2 Inference Procedures for the Ratio of Population Variances 345 12.5 Investigating the Effect of Violations of the Normality Assumption 352 12.5.1 Inference Procedures for One Variance: How Robust are these Procedures? . . . . . . . . . . . . . . . . . . . . . . 352 12.5.2 Inference Procedures for the Ratio of Variances: How Ro- bust are these Procedures? . . . . . . . . . . . . . . . . . 355 12.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 357 13 χ2 Tests for Count Data 359 13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361 13.2 χ2 Tests for One-Way Tables . . . . . . . . . . . . . . . . . . . . 361 13.2.1 The χ2 Test Statistic . . . . . . . . . . . . . . . . . . . . . 362 13.2.2 Testing Goodness-of-Fit for Specific Parametric Distributions366 13.3 χ2 Tests for Two-Way Tables . . . . . . . . . . . . . . . . . . . . 368 13.3.1 The χ2 Test Statistic for Two-Way Tables . . . . . . . . . 370 13.3.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 371 13.4 A Few More Points . . . . . . . . . . . . . . . . . . . . . . . . . . 375 13.4.1 Relationship Between the Z Test and χ2 Test for 2×2 Tables375 13.4.2 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . 376
• 8. Balka ISE 1.10 CONTENTS vi 13.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 379 14 One-Way Analysis of Variance (ANOVA) 381 14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383 14.2 One-Way ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . 384 14.3 Carrying Out the One-Way Analysis of Variance . . . . . . . . . 386 14.3.1 The Formulas . . . . . . . . . . . . . . . . . . . . . . . . . 386 14.3.2 An Example with Full Calculations . . . . . . . . . . . . . 388 14.4 What Should be Done After One-Way ANOVA? . . . . . . . . . 391 14.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 391 14.4.2 Fisher’s LSD Method . . . . . . . . . . . . . . . . . . . . . 393 14.4.3 The Bonferroni Correction . . . . . . . . . . . . . . . . . . 395 14.4.4 Tukey’s Honest Significant Difference Method . . . . . . . 398 14.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401 14.6 A Few More Points . . . . . . . . . . . . . . . . . . . . . . . . . . 406 14.6.1 Different Types of Experimental Design . . . . . . . . . . 406 14.6.2 One-Way ANOVA and the Pooled-Variance t Test . . . . 407 14.6.3 ANOVA Assumptions . . . . . . . . . . . . . . . . . . . . 407 14.7 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 408 15 Introduction to Simple Linear Regression 411 15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413 15.2 The Linear Regression Model . . . . . . . . . . . . . . . . . . . . 413 15.3 The Least Squares Regression Line . . . . . . . . . . . . . . . . . 417 15.4 Statistical Inference in Simple Linear Regression . . . . . . . . . 420 15.4.1 Model Assumptions . . . . . . . . . . . . . . . . . . . . . 421 15.4.2 Statistical Inference for the Parameter β1 . . . . . . . . . 422 15.5 Checking Model Assumptions with Residual Plots . . . . . . . . . 424 15.6 Measures of the Strength of the Linear Relationship . . . . . . . 426 15.6.1 The Pearson Correlation Coefficient . . . . . . . . . . . . 426 15.6.2 The Coefficient of Determination . . . . . . . . . . . . . . 429 15.7 Estimation and Prediction Using the Fitted Line . . . . . . . . . 431 15.8 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433 15.9 A Complete Example . . . . . . . . . . . . . . . . . . . . . . . . . 435 15.10Outliers, Leverage, and Influential Points . . . . . . . . . . . . . . 437 15.11Some Cautions about Regression and Correlation . . . . . . . . . 439 15.11.1Always Plot Your Data . . . . . . . . . . . . . . . . . . . 439 15.11.2Avoid Extrapolating . . . . . . . . . . . . . . . . . . . . . 440 15.11.3Correlation Does Not Imply Causation . . . . . . . . . . . 441 15.12A Brief Multiple Regression Example . . . . . . . . . . . . . . . . 442 15.13Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 446
• 9. Chapter 1 Introduction “I have been impressed with the urgency of doing. Knowing is not enough; we must apply. Being willing is not enough; we must do.” -Leonardo da Vinci 1