Fa2013 mba724-session 5 week 2 correlation-za edit

We
are
making
a
big
assump1on
here
–
that
the
rela1onship
is
a
straight
line

Wouldn’t
life
be
so
much
easier
if
all
rela1onships
are
straight
lines?

3

The
Pearson
correla1on
r
is
a
numeric
index
of
the
rela1onship
between
two
con1nuous

(interval/ra1o)
variables

Cau1on:
if
a
variable
is
categorical
(e.g.,
gender
–
male
vs.
female;
ethnic
–
white,
black,

asian)
you
cannot
correlate
it
with
another
variable.
Pearson
r
can
only
be
calculated

between
two
number
variables
(e.g.,
age,
salary,
height,
weight)

R
tells
us
how
much
the
rela1onship
is
a
straight
line

These
graphs
show
possible
ways
two
variables
relate
to
one
another

The
more
the
graph
looks
like
a
straight
line,
the
stronger
the
r
value
is

The
graphs
that
resemble
a
circle
indicate
very
low
or
even
no
correla1on
between
the
two

variables

The
direc1on
of
the
line
indicates
whether
the
correla1on
is
posi1ve
or
nega1ve

If
the
line
goes
up
to
the
right,
it’s
a
posi1ve
rela1onship
(meaning,
when
X
goes
up,
Y
goes

up
too)

If
the
line
goes
down
to
the
right,
it’s
a
nega1ve
rela1onship
(meaning,
when
X
goes
up,
Y

goes
down
and
vice
versa)

For
example,
“when
we
get
older,
we
also
get
wiser”.
If
this
is
true,
that
means
there
should

be
a
posi1ve
and
strong
Pearson
correla1on
r
between
the
age
variable
and
the
wisdom

variable.

If
we
are
less
happy
when
we
have
more
money,
that
means
there
should
be
a
nega1ve

Pearson
correla1on
r
between
the
happiness
variable
and
the
money
variable

4

As
you
can
see
from
these
charts,
Pearson
correla1on
r
becomes
stronger
as
the
data

points
cluster
more
1ghtly
around
a
straight
line.

When
the
data
points
are
distributed
like
a
round
circle,
that
means
the
X
and
Y
variables

have
liTle
rela1onship
to
each
other.

Note
that
most
of
these
(except
for
the
ﬁrst
graph)
have
posi1ve
correla1ons,
although

some
of
them
are
weaker
(more
rounded)
than
others
(more
straight
lines).

5

The
same
principle
applies
to
the
nega1ve
correla1ons.
The
trend
goes
down
to
the
right

when
the
correla1on
is
nega1ve

6

Again,
to
summarize
there
are
two
components
to
the
correla1on
value:

1.  It’s
direc1on,

2.  it’s
strength

What
kind
of
correla1on
are
you
predic1ng
for
your
group
project?

7

Cau1on:

Correla1on
measures
the
linear
rela1onship
between
two
variables.

When
the
assump1on
of
normality
is
violated,
weird
things
happen.

This
slide
illustrates
4
diﬀerent
datasets
all
with
the
same
correla1on.

The
moral
of
the
story
is
that
we
should
always
inspect
the
scaTerplot
when
running

correla1ons.
Numbers
should
be
interpreted
sensibly.

8

We
can
never
stress
enough
that
correla1on
is
NOT
the
same
as
causa1on.

One
of
my
favorite
examples
by
a
student
is
about
shoe
size
and
intelligence.

A
posi1ve

correla1on
was
found
between
shoe
size
and
intelligence
levels,
leading
people
to
think

that
bigger
feet
=
smarter
people.
Then
they
realized
that
bigger
shoe
size
also
generally

means
older
people,
and
in
fact
it
wasn’t
the
size
of
peoples’
feet
that
was
causing

increased
intelligence,
it
was
simply
the
fact
that
they
were
older
and
therefore
scored

higher
on
tests!

9

We
all
want
to
have
a
posi1ve
rela1onship
with
our
family,
friends,
coworkers,
etc.
Who

wants
a
nega1ve
rela1onship,
right?

In
that
spirit,
why
would
anyone
want
a
nega1ve
correla1on?
And
we
should
celebrate

every
1me
we
have
a
posi1ve
correla1on,
right?

How
about
a
posi1ve
correla1on
between
GDP
and
obesity
level?
How
about
a
posi1ve

correla1on
between
smoking
and
cancer?
How
about
a
posi1ve
correla1on
between
the

CEO’s
compensa1on
and
corrup1on
level?

Now
let’s
look
at
some
nega1ve
correla1ons
that
are
supposed
to
be
“depressing:”
more

exercise
associated
with
lower
levels
of
obesity,
more
educa1on
associated
with
lower

crime
rate,
fewer
mee1ngs
associated
with
increased
produc1vity,
and,
how
about
more

relaxing
weekends
associated
with
lower
stress
levels?

What’s
the
moral
of
the
story?
Correla1on
is
what
it
is
–
it’s
a
number
that
indicates
the

strength
and
direc1on
of
a
rela1onship
between
two
numerical
(con1nuous)
variables.

Whether
the
rela1onship
is
good
for
the
mankind
or
not
is
beyond
the
scope
of
the
humble

liTle
number’s
responsibility!

10

Assigning
numbers
to
categorical
variables
do
not
make
them
interval/ra1o
variables.

This
is
because
we
can
only
do
math
with
interval/ra1on
variables.
Basic
math
principles

don’t
apply
to
categorical
variables,
even
if
they
have
numbers
associated
with
them.
The

numbers
assign
to
categorical
variables
are
just
for
iden1ﬁca1on,
just
like
SSN,
or
zip
codes.

For
example,
1+1=2

In
the
gender
case,
this
means
that
if
you
add
a
female
and
another
female
together,
that’s

equal
to
a
male.

Another
math
principle
is
that
2
is
twice
as
big
as
1.

In
the
gender
case,
that
would
mean
that
a
male
is
twice
as
big
as
a
female.

All
this
madness
would
happen
if
we
try
to
treat
categorical
variables
in
numeric
ways.

Keep
in
mind
that
the
Pearson
correla1on
r
value
is
calculated
based
on
a
math
formula.
If

you
try
to
feed
the
gender
variables
into
SPSS
as
numbers,
SPSS
CAN
and
WILL
calculate
a

Pearson
correla1on
value
for
you,
but
using
that
number
requires
you
to
make
the
kinds
of

crazy
assump1ons
illustrated
above.

11

Fa2013 mba724-session 5 week 2 correlation-za edit

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (8)

Similaire à Fa2013 mba724-session 5 week 2 correlation-za edit

Similaire à Fa2013 mba724-session 5 week 2 correlation-za edit (20)

Dernier

Dernier (20)

Fa2013 mba724-session 5 week 2 correlation-za edit