3. Correlation and Regression Analysis
Definition & Background
In statistics, regression analysis refers to techniques for modeling and
analyzing several variables, when the focus is on the relationship
between a dependent variable and one or more independent variables.
Regression analysis helps us understand how the typical value of the
dependent variable changes when any one of the independent variables
is varied, while the other independent variables are held fixed.
In statistics, correlation indicates the strength and direction of a linear
relationship between two random variables.
In general statistical usage, correlation or co-relation refers to the
departure of two random variables from independence.
4. Correlation and Regression Analysis
• The earliest form of regression was the method of least squares
published by Legendre in 1805, and by Gauss in 1809. Legendre
and Gauss both applied the method to the problem of
determining, from astronomical observations, the orbits of
bodies about the Sun.
• Sir Francis Galton was the first who used the term regression
analysis. Galton fit a least squares line and used it to predict
the son’s height from the father’s height.
5. Importance and Applications
Regression can be useful when we have multiple independent
variable affecting the dependent variable (e.g. Demand of a
product) as a function of other parameters (e.g. interest rates,
growth in GNP, housing starts.)
Regression methods continue to be an area of active research. In
recent decades, new methods have been developed for
Robust Regression
Time Series and Growth Curves
Bayesian Methods for regression
Regression is widely used and frequently misused
e.g. Relate the shear strength of spot welds with the number of
parking spaces.
6. Importance and Applications
• Design of experiments
It helps to determine the level of each factor in the model
• Forecasting in time series
Linear regression finds a target
• Epidemiology
Early evidence relating tobacco smoking to mortality and
morbidity came from studies employing regression
• Finance
The capital asset pricing model uses linear regression as well as
the concept of Beta for analyzing and quantifying the systematic
risk of an investment.
• Environmental science
Linear regression finds application in a wide range of
environmental science applications.
7. Glossary
x
^
y
MSR
MSE
ρ
Independent variable ,predictor or regressor
Dependent variable , response.
Mean square regression
Mean square error
Correlation coefficient
^
0
Intercept
^
1
Slope
8. Linear Regression Assumptions
Errors are uncorrelated random variables with mean
zero and constant variance .
Errors behave normally distributed.
9. Simple Linear Regression Equations
Regression Equation
Intercept and Slope
^
y o 1 x
^
0 y 1 x
n
n
Errors
n
S xx ( xi x)
^
1
2
S xy ( yi y )( xi x)
i 1
i 1
y * x
i
i 1
i i
i 1
n
n
n
i 1
n
yx
n
( xi ) 2
i 1
n
xi2
i 1
i
10. Simple Linear Regression Equations
Analysis of variance for testing significance of a regression
Source of
Variation
Sum of Squares
Degrees of
Freedom
Mean Square
Fo
1
MSR=SSR/1
MS R
MS E
n-2
n-1
MSE=SSE/(n-2)
^
Regression
SSR= 1 S xy
^
Error
Total
H 0 : 0 0
H1 : B0 0
SSE= SS T - 1 S xy
SST
If p-value < .05 reject Ho
11. Simple Linear Regression Equations
R2
SS R
SST
0 R2 1
The coefficient is often used to judge the
adequacy of a regression Model. The square
correlation between X and Y.
12. Correlation
-1<ρ<1
-1 inverse dependency
0 independence
+1 direct relation
General Rules
1. A coefficient of correlation r >.87 or <-.87 will mean a strong relation
between x and Y
2. The effectiveness o the study will depend on the sample size
Hypothesis test
Ho: The data is independent (there’s not relation)
Ha: The data is dependent
If p-value < .05 reject Ho
14. Simple Linear Regression Equations
H0 : 0
H1 : 0
T0
R n2
1 R2
n 25
Z 0 (arctan hR arctan h 0 )(n 3)1/ 2
r
S xy
1/ 2
( S xx * SST )
15. Multiple Linear Regression
With many independent variables we will apply ordinary
least squares which is a method to estimate unknown
parameters
y x .... x
^
0
1 1
n n
^
( X T * X ) 1 X T * Y
1
1
X
1
bin
yi1
y
Y i1
.
yn1
bi 2
bi 3
bi 2
bi 3
bi 2
bi 3
bi 2
bi 3
bi 4
bi 4
bi 4
bi 4
17. Time Series Models
Definition
• Predict a future parameter as a function of past values of
that parameter.
• What TSM do is to try to capture past trends and
extrapolate them into the future.
• E.G. Demand of a product is a parameter that can be
described based on the historical demand reported. So
past demand is often a good predictor of future
demand.
18. Applications/Importance
• Whenever we want to follow the development of some
random quantity over time, we are dealing with a Time
Series.
• Time series are very common, and are familiar from the
general media: charts of stock prices, popularity ratings of
politicians, and temperature curves are all examples.
• Whenever somebody uses the word “trend”, you know we
are dealing with a time series (Janert, P.K. 2006).
19. Equations and Calculations
• Although there are many different time series models,
the basic procedure is the same for all.
• We treat in time periods (e.g., months), labeled
i=1,2,…,t, where period t is the most recent data
observation.
• The actual observation are denoted as A(i) and the
forecast for periods t+τ , τ=1,2,…, be represented by
f(t+ τ).
• A time series model takes as input the past
observations A(i) and generate predictions for future
values f(t+ τ).
20. Moving Average
The best well-known and
most commonly applied
smoothing technique is
the Moving Average.
F (t )
t
i t m 1
The idea is very simple:
only average the last m
observations and use this
average for all future
forecast.
F(t + τ) = F(t)
A(i )
m
τ = 1,2…
21. Example of Moving Average Model:
Month
t
Demand
A(t)
Using m = 3;
Forecast f(t)
m=3
m=5
1
10
F (3)
2
12
F (4)
3
12
4
11
11.33
5
15
11.67
6
14
12.67
12.0
7
18
13.33
12.8
8
22
15.67
14.0
9
18
18.00
16.0
10
28
19.33
17.4
10 12 12
11.33
3
12 12 11
11.67
3
Observation:
The
moving
average
approach gives equal
weight to each of the m
most recent observations
and
no
weight
to
observations older than
these.
22. Example: Moving Average with m=3
and m=5
30
25
Demand
20
15
Demand
10
Moving Average m=3
5
Moving Average m=5
0
0
5
10
Month (t)
15
23. Exponential Smoothing
Computes a smoothed estimate as a weighted average
of the most recent observation and the previous
smoothed estimate, and it works as follows. We
compute the smoothed estimate and forecast at time t
as
F (t ) A(t ) (1 ) F (t 1)
F(t + τ) = F(t)
τ = 1,2…
where α is a smoothing constant between 0 and 1
chosen by the user. The best value will depend on the
particular data.
24. Example of Exponential Smoothing with α =
0.2 and α = 0.6
Month
t
Demand
A(t)
Forecast f(t)
α = 0.2
α = 0.6
The
simplest
possible
initialization method is to set
F(1)=A(1)=10 and start the
process.
1
10
----
----
2
12
10.00
10.00
F (2) (0.2)(12) (1 0.2)(10)
3
12
10.40
11.20
F (2) 10.40
4
11
10.72
11.68
5
15
10.78
11.27
6
14
11.62
13.51
7
18
12.10
13.80
8
22
13.28
16.32
9
18
15.02
19.73
10
28
15.62
18.69
F (2) A(2) (1 ) F (1)
Observation:
Lower values of α make the
model more stable, but
less responsive. The model
will tend to underestimate
parameters
with
an
increasing trend and the
opposite also.
25. Example: Exponential Smoothing
with α=0.2 and α=0.6
30
25
Demand
Demand
20
15
Exponential Smoothing
with α=0.2
10
5
Exponential Smoothing
with α=0.6
0
0
5
10
Month (t)
15
26. Exponential Smoothing with a Linear Trend
(Double)
• Estimates
the
smoothed
estimate in a manner similar to
exponential smoothing, but also
computes a smoothed trend, or
slope in the data.
• Specifically designed to track
data with upward or downward
trends (model assumes it is
linear).
• The basic method updates a
smoothed estimate F(t) and a
smoothed trend T(t) each time a
new
observation
becomes
available.
F (t ) A(t ) (1 )[ F (t 1) T (t 1)]
T (t ) [ F (t ) F (t 1)] (1 )T (t 1)
f (t ) F (t ) T (t )
• Where
α and β are
smoothing
constants
between 0 and 1 to be
chosen by the user.
27. Example of Exponential Smoothing with
Linear Trend, α = 0.2 and β = 0.2
Month
t
Demand
A(t)
Smoothed
Estimate
F(t)
Smoothed
Trend
T(t)
Forecast
f(t)
1
10
10.00
0.00
----
2
12
10.40
0.08
10.00
3
12
10.78
0.14
10.48
4
11
10.94
0.14
10.92
5
15
11.87
0.30
11.08
6
14
12.53
0.37
12.17
7
18
13.93
0.58
12.91
8
22
16.00
0.88
14.50
9
18
17.10
0.92
16.88
10
28
20.02
1.32
18.03
The simplest initialization
method
is
to
set
F(1)=A(1) and T(1)=0.
F (2) A(2) (1 )[ F (1) T (1)]
F (2) 0.2(12) (1 0.2)(10 0)
F (2) 10.4
T (2) [ F (2) F (1)] (1 )T (1)
T (2) 0.2(10.4 10) (1 0.2)(0)
T (2) 0.08
29. Quantitative Measures for evaluating
models
The three most common
Objective:
quantitative measures are
the
mean
absolute
deviation (MAD), mean
square deviation (MSD),
and bias (BIAS).
Each of these takes the
differences between the
forecast and the actual
values,
f(t)-A(t),
and
computes a numerical
score.
t 1| f (t ) A(t ) |
n
MAD
n
Find model coefficients
that make MAD and/or
MSD small as possible and
make BIAS close to zero.
Zero BIAS does not mean
that
the forecast
is
accurate, only that the
errors tend to be balanced
high and low.
t 1[ f (t ) A(t )]2
n
MSD
n
n
BIAS
t 1
f (t ) A(t )
n
30. When to use?
Moving Average
• Commonly used with time series data to smooth out shortterm fluctuations and highlight longer-term trends or
cycles.
• For example, it is often used in technical analysis of financial
data, like stock prices, returns or trading volumes. It is also
used in economics to examine gross domestic product,
employment or other macroeconomic time series. Many
accounting processes and chemical processes fit into this
categorization.
31. When to use?
Exponential Smoothing
• Stationary data with no trend or seasonality. It is a
technique that can be applied either to produce
smoothed data for presentation, or to make forecasts.
• Commonly applied to financial market and economic
data, but it can be used with any discrete set of repeated
measurements. Very common for small samples of data.
Double Exponential Smoothing
• Data with a trend but no seasonality.
• Examples: Tourist arrivals, drugs demand.
32. Interactive Example
Suppose the monthly sales
for a particular product for
the past 20 months have
been as follows:
Using Minitab run a fiveperiod
(m=5)
moving
average
model,
an
exponential
smoothing
model with smoothing
constant α=0.2, and a
double
exponential
smoothing model with
smoothing constants α=0.4
and β=0.2. Determine which
model fits better for this
data.
Month
Sales
1
22
2
21
3
24
4
30
5
25
6
25
7
33
8
40
9
36
10
39
11
50
12
55
13
44
14
48
15
55
16
47
17
61
18
58
19
55
20
60
34. Cost Indexes
Definition: Cost indexes are numerical values that reflect historical
change in engineering cost. They compare cost or price changes
between two points in time for a fixed quantity of goods or services. On
conclusion cost index are just dimensionless numbers for a given year
showing the cost at that time relative to a certain base year.
History: Italian G. R. Carli, devised the index numbers on the 1750; to
investigated the effects of the discovery of America on the purchasing
power of money in Europe.
Relevance: because prices vary across time due to economic conditions
indexes are useful to engineer as a base of reference to evaluate different
alternative on a given project since it convert applicable costs on the
past to equivalents costs now or in the future. They are mostly use to
calculate materials and labor costs. Their more popular use are in
construction industries to compare the cost of building now using
previous designs and in government agencies to forecast the state of the
economy. Indexes Costs are publish on the Engineering News Record.
35. Cost Indexes (cont.)
Equation:
Cc = Cr(Ic/Ir)
Where:
Cc = present or future or past cost, dollars
Cr = original reference cost, dollars
Ic = index number at the present or future or past time
Ir = index number at the time reference cost was obtained
Example:
Construction of a 70,000 square foot warehouse is planned for a future
period. Several years ago a similar warehouse was constructed for a unit
estimate of $162.50 when the index was 118. The index for the
construction period is forecast as 143; at construction time what will be
the cost per square foot?
Cc = ?
Cr = $162.50
Ic = 143
Ir = 118
Cc = 162.50(143/118) = $196.93/ft²
36. References
Hopp, W.J., Spearman, M.L. (2008). Factory Physics, 3rd Edition,p.415
430, NY: McGraw-Hill.
Janert, P.K. “Exponential Smoothing.” toyproblems.org. Feb.
2006<http://www. toyproblems.org>.
Marshall, G. "Time-Series Data." A Dictionary of Sociology. 1998.
Encyclopedia.com. 9 Sep. 2009 <http://www.encyclopedia.com>.
Montgomery, D.C., Runger, G (2003). Applied Statistics and Probability
for Engineers, 3rd Edition, p.391-426.
Newnan, D. G., Lavelle J. P. & Eschenbach, T. G. (2000). Engineering Cost
and Cost Estimating. Engineering Economic Analysis (8th Ed.) (pp. 5051). Texas: Engineering Press.
Ostwald, P. F. (1992). Forecasting. Engineering Cost Estimating (pp. 170176) (3rd Ed.),. New Jersey: prentice Hall.