correlation and regression

INTRODUCTION TO
STATISTICAL THEORY FOR
SCIENTIST
CORRELATION AND REGRESSION

• If we have question like “are two or more variables linearly
related? If so, what is the strength of the relationship?”

• Numerical measure used to determine whether two or more
variables are linearly related and to determine the strength of
the relationship. This measure called CORRELATION
COEFFICIENT

• There are two types of relationship; SIMPLE RELATIONSHIP
AND MULTIPLE RELATIONSHIP.

• Statistical method used to determine
CORRELATION whether a linear relationship between
variables exist
• Used to describe the nature of relationship
REGRESSION between variables; positive/negative or
linear/nonlinear

• Have two variables; an independent
SIMPLE
variable (explanatory) and a dependent
REGRESSION
variable (response)

MULTIPLE • Two or more independent variables where
REGRESSION used to predict one dependent variable

POSITIVE • Both variables increase or decrease at the
RELATIONSHIP same time

NEGATIVE • As one variable increase, the other variable
RELATIONSHIP decrease and vice versa.

Scatter plots and Correlation
• In order to find relationship between two different variables, data
need to be collected. Example: relationship between number of
hours study and grades for exam

• Independent variable is variable that can be controlled or
manipulated while dependent variable cannot

• Dependent and independent variable can be plotted in graph named
scatter plot

• Independent variable x plotted on the horizontal axis while
dependent y on vertical axis

• Scatter plot is visual way to show the relationship between two
variable

SCATTER PLOT is a graph of the ordered pairs (x,y) of
number consisting of the independent variable x and
dependent variable y

Cars (in ten Revenue (in
Company thousand) billion)
A 63 7
B 29 3.9
C 20.8 2.1
D 19.1 2.8
E 13.4 1.4
F 8.5 1.5

Correlation
• Correlation explained here is from Pearson Product Moment
Correlation Coefficient (PPMC) by Karl Pearson

Correlation coefficient computed from the sample data
measures the strength and direction of a linear relationship
between two quantitative variables. The symbol for the sample
correlation is r while ρ (rho) for population correlation

• Value range for correlation is from -1 to +1.
• Correlation value which is close to +1 shows that there were a
strong positive correlation while when the value is close to -1,
it shows that there were a strong negative correlation
• Value of r close to zero means that no linear relationship
between the variable or only a weak relationship between
both variables.

Regression
• We previously test the significance of the correlation
coefficient. If the correlation is significant, the next step is to
determine the equation of regression line

• LINE OF BEST FIT: best fit means that the sum of squares of
the vertical distance from each point to the line is at minimum

• Reason best fit needed is that the value of y will be predicted
from the values of x; hence the closer the points to the lines,
the better prediction will be

• MARGINAL CHANGE: the magnitude of the change in one variable
when the other variable changes exactly 1 unit.

• See example 10-9; the slope of the regression line is 0.106 which
means for each increase of 10,000 cars, the value of y changes 0.106
unit ($ 106 million) on average.

• EXTRAPOLATION: making prediction beyond the bounds of the data.

• When prediction are made, they are based on present condition or
on the premise that present trends will continue.

• OUTLIER: point that seems out of place when compared with the
other points

• Some of this points can affect the equation of the regression line
where the points are called influential points or influential
observation

Coefficient of determination

x 1 2 3 4 5
y 10 8 12 16 20

Coefficient of determination
•

correlation and regression

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à correlation and regression

Similaire à correlation and regression (20)

correlation and regression