1. INTRODUCTION TO STATISTICS &
PROBABILITY
Chapter 2:
Looking at Data–Relationships (Part 1)
1
Dr. Nahid Sultana
2. Chapter 2:
Looking at Data–Relationships
2
2.1: Scatterplots
2.2: Correlation
2.3: Least-Squares Regression
2.5: Data Analysis for Two-Way Tables
3. 3
Objectives
Bivariate data
Explanatory and response variables
Scatterplots
Interpreting scatterplots
Outliers
Categorical variables in scatterplots
2.1: Scatterplots
4. Bivariate data
4
For each individual studied, we record
data on two variables.
We then examine whether there is a
relationship between these two
variables: Do changes in one variable
tend to be associated with specific
changes in the other variables?
Student
ID
Number
of Beers
Blood Alcohol
Content
1 5 0.1
2 2 0.03
3 9 0.19
6 7 0.095
7 3 0.07
9 3 0.02
11 4 0.07
13 5 0.085
4 8 0.12
5 3 0.04
8 5 0.06
10 5 0.05
12 6 0.1
14 7 0.09
15 1 0.01
16 4 0.05
Here we have two quantitative variables
recorded for each of 16 students:
1. how many beers they drank
2. their resulting blood alcohol content
(BAC)
5. 5
Many interesting examples of the use of statistics involve
relationships between pairs of variables.
Two variables measured on the same cases are associated if
knowing the value of one of the variables tells you something about
the values of the other variable that you would not know without this
information.
5
Associations Between Variables
A response (dependent) variable measures an outcome of a study.
An explanatory (independent) variable explains changes in the
response variable.
6. 6
Scatterplot
6
The most useful graph for displaying the relationship between two
quantitative variables on the same individuals is a scatterplot.
1. Decide which variable should go on which axis.
2. Typically, the explanatory or independent variable is plotted
on the x-axis, and the response or dependent variable is plotted
on the y-axis.
3. Label and scale your axes.
4. Plot individual data values.
How to Make a Scatterplot
7. 7
Scatterplot (Cont…)
Example: Make a scatterplot of the relationship between body
weight and backpack weight for a group of hikers.
7
Body weight (lb) 120 187 109 103 131 165 158 116
Backpack weight (lb) 26 30 26 24 29 35 31 28
8. 8
Interpreting Scatterplots
8
After plotting two variables on a scatterplot, we describe the
overall pattern of the relationship. Specifically, we look for form,
direction, and strength .
Form: linear, curved, clusters, no pattern
Direction: positive, negative, no direction
Strength: how closely the points fit the “form”
… and clear deviations from that pattern
Outliers of the relationship, , an individual value that falls
outside the overall pattern of the relationship
How to Examine a Scatterplot
10. 10
Interpreting Scatterplots (Cont…)
(Direction)
Positive association: High values of one variable tend to occur
together with high values of the other variable.
Negative association: High values of one variable tend to occur
together with low values of the other variable
14. 14
Interpreting Scatterplots (Cont…)
Direction FormStrength
There is one possible
outlier―the hiker with
the body weight of 187
pounds seems to be
carrying relatively less
weight than are the
other group members.
There is a moderately strong, positive, linear relationship between body
weight and backpack weight.
It appears that lighter hikers are carrying lighter backpacks.
15. How to scale a scatterplot
15
Using an inappropriate
scale for a scatterplot can
give an incorrect
impression.
Both variables should be
given a similar amount of
space:
• Plot roughly square
• Points should occupy all
the plot space (no blank
space)
Same data in all four plots
16. Categorical variables in scatterplots
16
What may look like a positive
linear relationship is in fact a
series of negative linear
associations.
Plotting different habitats in
different colors allows us to
make that important distinction.
To add a categorical variable, use a different plot color or symbol for
each category.
17. 17
Categorical variables in scatterplots
(Cont…)
Comparison of men and women
racing records over time.
Each group shows a very strong
negative linear relationship that
would not be apparent without the
gender categorization.
Relationship between lean body
mass and metabolic rate in men
and women.
Both men and women follow the
same positive linear trend, but
women show a stronger association.
18. Categorical explanatory variables
When the explanatory variable is categorical, you cannot make a
scatterplot, but you can compare the different categories side by side on
the same graph (boxplots, or mean +/− standard deviation).
Comparison of income (quantitative
response variable) for different
education levels (five categories).
But be careful in your
interpretation: This is NOT a
positive association, because
education is not quantitative.