2. • If we have question like “are two or more variables linearly
related? If so, what is the strength of the relationship?”
• Numerical measure used to determine whether two or more
variables are linearly related and to determine the strength of
the relationship. This measure called CORRELATION
COEFFICIENT
• There are two types of relationship; SIMPLE RELATIONSHIP
AND MULTIPLE RELATIONSHIP.
3. • Statistical method used to determine
CORRELATION whether a linear relationship between
variables exist
• Used to describe the nature of relationship
REGRESSION between variables; positive/negative or
linear/nonlinear
• Have two variables; an independent
SIMPLE
variable (explanatory) and a dependent
REGRESSION
variable (response)
MULTIPLE • Two or more independent variables where
REGRESSION used to predict one dependent variable
POSITIVE • Both variables increase or decrease at the
RELATIONSHIP same time
NEGATIVE • As one variable increase, the other variable
RELATIONSHIP decrease and vice versa.
4. Scatter plots and Correlation
• In order to find relationship between two different variables, data
need to be collected. Example: relationship between number of
hours study and grades for exam
• Independent variable is variable that can be controlled or
manipulated while dependent variable cannot
• Dependent and independent variable can be plotted in graph named
scatter plot
• Independent variable x plotted on the horizontal axis while
dependent y on vertical axis
• Scatter plot is visual way to show the relationship between two
variable
5. SCATTER PLOT is a graph of the ordered pairs (x,y) of
number consisting of the independent variable x and
dependent variable y
Cars (in ten Revenue (in
Company thousand) billion)
A 63 7
B 29 3.9
C 20.8 2.1
D 19.1 2.8
E 13.4 1.4
F 8.5 1.5
6. Correlation
• Correlation explained here is from Pearson Product Moment
Correlation Coefficient (PPMC) by Karl Pearson
Correlation coefficient computed from the sample data
measures the strength and direction of a linear relationship
between two quantitative variables. The symbol for the sample
correlation is r while ρ (rho) for population correlation
• Value range for correlation is from -1 to +1.
• Correlation value which is close to +1 shows that there were a
strong positive correlation while when the value is close to -1,
it shows that there were a strong negative correlation
• Value of r close to zero means that no linear relationship
between the variable or only a weak relationship between
both variables.
9. Regression
• We previously test the significance of the correlation
coefficient. If the correlation is significant, the next step is to
determine the equation of regression line
• LINE OF BEST FIT: best fit means that the sum of squares of
the vertical distance from each point to the line is at minimum
• Reason best fit needed is that the value of y will be predicted
from the values of x; hence the closer the points to the lines,
the better prediction will be
11. • MARGINAL CHANGE: the magnitude of the change in one variable
when the other variable changes exactly 1 unit.
• See example 10-9; the slope of the regression line is 0.106 which
means for each increase of 10,000 cars, the value of y changes 0.106
unit ($ 106 million) on average.
• EXTRAPOLATION: making prediction beyond the bounds of the data.
• When prediction are made, they are based on present condition or
on the premise that present trends will continue.
• OUTLIER: point that seems out of place when compared with the
other points
• Some of this points can affect the equation of the regression line
where the points are called influential points or influential
observation