3. We
are
making
a
big
assump1on
here
–
that
the
rela1onship
is
a
straight
line
Wouldn’t
life
be
so
much
easier
if
all
rela1onships
are
straight
lines?
3
4. The
Pearson
correla1on
r
is
a
numeric
index
of
the
rela1onship
between
two
con1nuous
(interval/ra1o)
variables
Cau1on:
if
a
variable
is
categorical
(e.g.,
gender
–
male
vs.
female;
ethnic
–
white,
black,
asian)
you
cannot
correlate
it
with
another
variable.
Pearson
r
can
only
be
calculated
between
two
number
variables
(e.g.,
age,
salary,
height,
weight)
R
tells
us
how
much
the
rela1onship
is
a
straight
line
These
graphs
show
possible
ways
two
variables
relate
to
one
another
The
more
the
graph
looks
like
a
straight
line,
the
stronger
the
r
value
is
The
graphs
that
resemble
a
circle
indicate
very
low
or
even
no
correla1on
between
the
two
variables
The
direc1on
of
the
line
indicates
whether
the
correla1on
is
posi1ve
or
nega1ve
If
the
line
goes
up
to
the
right,
it’s
a
posi1ve
rela1onship
(meaning,
when
X
goes
up,
Y
goes
up
too)
If
the
line
goes
down
to
the
right,
it’s
a
nega1ve
rela1onship
(meaning,
when
X
goes
up,
Y
goes
down
and
vice
versa)
For
example,
“when
we
get
older,
we
also
get
wiser”.
If
this
is
true,
that
means
there
should
be
a
posi1ve
and
strong
Pearson
correla1on
r
between
the
age
variable
and
the
wisdom
variable.
If
we
are
less
happy
when
we
have
more
money,
that
means
there
should
be
a
nega1ve
Pearson
correla1on
r
between
the
happiness
variable
and
the
money
variable
4
5. As
you
can
see
from
these
charts,
Pearson
correla1on
r
becomes
stronger
as
the
data
points
cluster
more
1ghtly
around
a
straight
line.
When
the
data
points
are
distributed
like
a
round
circle,
that
means
the
X
and
Y
variables
have
liTle
rela1onship
to
each
other.
Note
that
most
of
these
(except
for
the
first
graph)
have
posi1ve
correla1ons,
although
some
of
them
are
weaker
(more
rounded)
than
others
(more
straight
lines).
5
6. The
same
principle
applies
to
the
nega1ve
correla1ons.
The
trend
goes
down
to
the
right
when
the
correla1on
is
nega1ve
6
7. Again,
to
summarize
there
are
two
components
to
the
correla1on
value:
1. It’s
direc1on,
2. it’s
strength
What
kind
of
correla1on
are
you
predic1ng
for
your
group
project?
7
8. Cau1on:
Correla1on
measures
the
linear
rela1onship
between
two
variables.
When
the
assump1on
of
normality
is
violated,
weird
things
happen.
This
slide
illustrates
4
different
datasets
all
with
the
same
correla1on.
The
moral
of
the
story
is
that
we
should
always
inspect
the
scaTerplot
when
running
correla1ons.
Numbers
should
be
interpreted
sensibly.
8
9. We
can
never
stress
enough
that
correla1on
is
NOT
the
same
as
causa1on.
One
of
my
favorite
examples
by
a
student
is
about
shoe
size
and
intelligence.
A
posi1ve
correla1on
was
found
between
shoe
size
and
intelligence
levels,
leading
people
to
think
that
bigger
feet
=
smarter
people.
Then
they
realized
that
bigger
shoe
size
also
generally
means
older
people,
and
in
fact
it
wasn’t
the
size
of
peoples’
feet
that
was
causing
increased
intelligence,
it
was
simply
the
fact
that
they
were
older
and
therefore
scored
higher
on
tests!
9
10. We
all
want
to
have
a
posi1ve
rela1onship
with
our
family,
friends,
coworkers,
etc.
Who
wants
a
nega1ve
rela1onship,
right?
In
that
spirit,
why
would
anyone
want
a
nega1ve
correla1on?
And
we
should
celebrate
every
1me
we
have
a
posi1ve
correla1on,
right?
How
about
a
posi1ve
correla1on
between
GDP
and
obesity
level?
How
about
a
posi1ve
correla1on
between
smoking
and
cancer?
How
about
a
posi1ve
correla1on
between
the
CEO’s
compensa1on
and
corrup1on
level?
Now
let’s
look
at
some
nega1ve
correla1ons
that
are
supposed
to
be
“depressing:”
more
exercise
associated
with
lower
levels
of
obesity,
more
educa1on
associated
with
lower
crime
rate,
fewer
mee1ngs
associated
with
increased
produc1vity,
and,
how
about
more
relaxing
weekends
associated
with
lower
stress
levels?
What’s
the
moral
of
the
story?
Correla1on
is
what
it
is
–
it’s
a
number
that
indicates
the
strength
and
direc1on
of
a
rela1onship
between
two
numerical
(con1nuous)
variables.
Whether
the
rela1onship
is
good
for
the
mankind
or
not
is
beyond
the
scope
of
the
humble
liTle
number’s
responsibility!
10
11. Assigning
numbers
to
categorical
variables
do
not
make
them
interval/ra1o
variables.
This
is
because
we
can
only
do
math
with
interval/ra1on
variables.
Basic
math
principles
don’t
apply
to
categorical
variables,
even
if
they
have
numbers
associated
with
them.
The
numbers
assign
to
categorical
variables
are
just
for
iden1fica1on,
just
like
SSN,
or
zip
codes.
For
example,
1+1=2
In
the
gender
case,
this
means
that
if
you
add
a
female
and
another
female
together,
that’s
equal
to
a
male.
Another
math
principle
is
that
2
is
twice
as
big
as
1.
In
the
gender
case,
that
would
mean
that
a
male
is
twice
as
big
as
a
female.
All
this
madness
would
happen
if
we
try
to
treat
categorical
variables
in
numeric
ways.
Keep
in
mind
that
the
Pearson
correla1on
r
value
is
calculated
based
on
a
math
formula.
If
you
try
to
feed
the
gender
variables
into
SPSS
as
numbers,
SPSS
CAN
and
WILL
calculate
a
Pearson
correla1on
value
for
you,
but
using
that
number
requires
you
to
make
the
kinds
of
crazy
assump1ons
illustrated
above.
11