3. It deals with association between two or
more variables
Correlation analysis deals with
covariation between two or more
variables
Types
1. Positive or negative
Simple or multiple
Linear or non-linear
4. Methods of Measuring correlation
1. Graphic Method
2. Diagramatic Method- Scatter Diagram
3. Algebraic method
a. Karl Pearson’s Coefficient of correlation
b. Spearman’s Rank Co-efficient Correlation
c. Coefficient of Concurrent deviations
d. Least Squares Method
5. Karl Pearson’s Coefficient of Correlation
Σ dx dy
γ ( Gamma) = -------------------------
√ Σ dx2
Σ dy2
Σ dx dy
= -------------------------
N σxσy
dx = x-xbar
dy = y- ybar
dx dy = sum of products of deviations from respective
arithmetic means of both series
6. Karl Pearson’s Coefficient of Correlation
After calculating assumed or working mean Ax & Ay
Σ dx dy – (Σ dx) x( Σ dy)
γ ( Gamma) = --------------------------------
√ [ NΣ dx2
- (Σ dx)2
x [Σ Ndy2
- (Σ dy)2
]
Σ dx dy = total of products of deviation from assumed
means of x and y series
Σ dx = total of deviations of x series
Σ dy = total of deviations of y series
Σ dx2
= total of squared deviations of x series
Σ dy2
= total of squared deviations of y series
N= No. of items ( no. of paired items
7. Karl Pearson’s Coefficient of Correlation
After calculating assumed or working mean Ax & Ay
Σ dx x Σ dy
Σ dx dy - ----------------
N
γ ( Gamma) = -------------------------
(Σ dx)2
(Σ dy)2
√ [ Σ dx2
- --------- ] x [ Σ dy2
- ------------]
N N
8. Assumptions of Karl Pearson’s Coefficient of Correlation
1. Linear relationship exists between the variables
Properties of Karl Pearson’s Coefficient of Correlation
1.value lies between +1 & - 1
2.Zero means no correlation
3.γ ( Gamma) = √ bxy X byx
Where bxy X byx are regression coefficicent
Merit
Convenient for accurate interpretation as it gives degree &
direction of relationship between two variables
9. Limitations
1. Assumes linear relationship , even though it
may not be
2. Method & process of calculation is difficult &
time consuming
3. Affected by extreme values in distribution
10. Probable Error of Karl Pearson’s Coefficient of
Correlation
1- γ2
Probable Error of γ ( Gamma) = 0.6745 --------
√ N
11. Q7.Calculate coefficient of correlation for following data
X
65 63 67 64 68 62 70 66 68 67 69 71
Y 68 66 68 65 69 66 68 65 71 67 68 70
Ans Σ dx dy
γ ( Gamma) = -------------------------
√ Σ dx2
Σ dy2
Σ dx dy
= -------------------
N σxσy
15. Rank Correlation : some times variable are not
quantitative in nature but can be arranged in
serial order.
Specially while eading with attributes like –
honesty , beauty , character , morality etc
To deal with such situations , Charles Edward
Spearman , in 1904 developed a formula for
obtaining correlation coefficient between ranks
of n individuals in two attributes under study , or
ranks given by two or three judges
16. Rank coefficient of correlation
6Σ d2
ρ (rho) = 1 - -------------------
N3
-N
6Σ d2
ρ (rho) = 1 - -------------------
N(N2
-1)
Σ d2
= total of squared difference
N = number of items
17. Q9. ten competitors in a cooking competition are ranked
by three judges in the following way .by using rank
coorelation method find out which pair of judges have
nearest approach
P Q R
1 1 3 6
2 6 5 4
3 5 8 9
4 10 4 8
5 3 7 1
6 2 10 2
7 4 2 3
8 9 1 10
9 7 6 5
10 8 9 7
19. Regression Analysis is the process of
developing a statistical model which is used
to predict the value of a dependant variable
by an independent variable
Application
Advertising v/s sales revenue
First used by Sir Francis Gatton in 1877 for
study of height of sons w.r.t height of fathers
20. Regression Analysis – going back or to revert to
the former condition or return
Refers to functional relationship between x & y
and estimates of value of depebdent variable y
for given values of independeny variable x
Relationship between income of employees and
savings
Regression coefficients can be used to calculate ,
correlation coeffecient.γ ( Gamma) = √ bxy X
byx
21. Types of Regression
1. Simple & Multiple Regression
2. Total or Partial
3. Linear / Non-linear
Methods of Regression Analysis
1. Scatter Diagram
2. Regression Equations
3. Regression Lines
Regression of x on y y= a + bx
Regression of y on x x= a + by
22. Regression coefficients coefficient of regression
of x on y = coefficient of regression of x on y =
Σ( x- x-) (y- y-) Σdx dy
bxy= ------------------= -------
Σ (y- y-)2 Σ dy2
coefficient of regression of y on x
Σ( x- x-) (y- y-) Σdx dy
byx= ------------------= ----------
Σ (x- x-)2 Σ dx2
23. Q2.From the data given below find
two regression coefficients
two regression equations
coefficient of correlation between marks in
Economics & statistics
most likely marks in statistics when marks in
Economics are 30
let marks in Economics be x and that in statistics
be y
Marks in Eco 25 28 35 32 31 36 29 38 34 32
Marks in Stat 43 46 49 41 36 32 31 30 33 39
24. Marks in
Eco
25 28 35 32 31 36 29 38 34 32 Σx 320 x-
32
Marks in
Stat
43 46 49 41 36 32 31 30 33 39 Σy 380 y-
38
27. Regression coefficients coefficient of regression
of x on y = coefficient of regression of x on y =
Σ( x- x-) (y- y-) Σdx dy -93
bxy= ------------------= ------- = ------ = -0.2337
Σ (y- y-)2 Σ dy2 398
coefficient of regression of y on x =
Σ( x- x-) (y- y-) Σdx dy -93
byx= ------------------= ---------- = --------= -0.6643
Σ (x- x-)2 Σ dx2 140
28. regression of x on y
x-x- = bxy (y-y-)
x-32 = -0.2337(y-38)
= - 0.2337 y +0.2337 *38
= -0.2337y + 8.8806
x = -0.2337y +32 + 8.8806
x = -0.2337y +40.8806
29. Correlation Coefficient = √ bxy *byx
= √ -0.2337 *-0.6643 = √ 0.1552 = -0.394
Since byx & bxy are both negative
30. regression of y on x
y-y- = bxy (x-x-)
y-38 = -0.6643(x-32)
y -38= -0.6643x+0.6643*32
y = -0.6643x+38+0.6643*32
y = -0.6643x+38+21.2576
y = -0.6643x+59.2576
31. In order to estimate most likely marks in statistics
(y) when Economics (x) are 30 , we shall use the
line regression of y x viz
The required estimate is given by
y = -0.6643* 30+59.2576= -19.929+59.2576 =
=39.3286
32. Sum of Squares- x&y
(Σx )*(Σy)
SSxy = Σ ( x-x-
) ( y-y-
)= = Σxy - --------------
n
Sum of Squares xx
(Σx )
SSxx = Σ ( x-x-
)2
=Σx2
- -------------
n
38. SSxy 6565
b = ------------- = ----------------= 19.0704
SSxx 344.25
y=a+bx
Σ y= Σ a+b Σ x
Σ y= n* a+b Σ x
n* a = b Σ x - Σ y
Σ y - bΣ x Σ y bΣ x 13060 19.0704*1221
a = ----------- = ------- - ------- = ---------- - --------------
n n n 12 12
= - 852.08
39. equation for simple regression line
y= a+bx
y= -852.08+ 19.0704 x
for regression of y on x
40. For testing the Fit
yi = yi- value of y –recorded value in the given data
y-
= Mean ( Average )of y
y^ = Predicted Values from regression line
deviation = (yi- y-
) = difference in actual value of y from
mean
Residuals = (yi- y^)= gap ( error , difference ) between
actual value of y & predicted value calculated from
regression line
Deviation of predicted value from mean = (y^- y-
)
a = intercept on y -axis
b= slope of regression line
41. total sum of squares = SST = Σ (yi-y-
)2
regression sum of squares = SSR = Σ (y^- y-
)2
Error sum of squares = SSE = Σ (yi-y^)2
SSR
coefficient of determination = γ2= -------
SST
42. SSE
Standard Error of Estimate =Syx= √----------------
n-2
In order to to determine whether a significant
linear relationship exists between independent
variable x and dependent variable y we perform
whether population slope is zero
b - β
t= ----------
Sb
Syx
Sb = Standard error of b= -----------
√ SSxx
43. H0:Slope of thr regression line is zero
H1-Slope of the regression line is not zero
44. SSE
Syx= Standard Error of Estimate =√--------
n-2
Σ (yi-y^)2 13769.21
=√ -------- = √------------ = √1376.92 = 37.1068
n-2 10-2
(Σx )2 (1221)2
SSxx = Σx2 - -------- = 124581 - -------= 344.25
n 12
Syx
Sb = Standard error of b= -----------
√ SSxx
45. Syx
Sb = Standard error of b= -----------
√ SSxx
b- β 19.07-0
t= ---------- = ------------------------------- = 9.53
Sb 37.1068/( √344.25)
As calculated value of t is more than table
value of t for 12-2 = 10 degrees of freedom
Null hypothesis is rejected
46. Coefficient of Determination Definition
The Coefficient of Determination, also known as R
Squared, is interpreted as the goodness of fit of a
regression.
The higher the coefficient of determination, the
better the variance that the dependent variable is
explained by the independent variable.
The coefficient of determination is the overall
measure of the usefulness of a regression.
For example,r2
is given at 0.95. This means that the
variation in the regression is 95% explained by the
independent variable. That is a good regression.
47. The Coefficient of Determination can be
calculated as the Regression sum of squares,
SSR, divided by the total sum of squares, SST
SSR
Coefficient of Determination γ2
= ---------- SST
48. Campus Overview
907/A Uvarshad,
Gandhinagar
Highway, Ahmedabad –
382422.
Ahmedabad Kolkata
Infinity Benchmark,
10th
Floor, Plot G1,
Block EP & GP,
Sector V, Salt-Lake,
Kolkata – 700091.
Mumbai
Goldline Business Centre
Linkway Estate,
Next to Chincholi Fire
Brigade, Malad (West),
Mumbai – 400 064.