This Presentation describes, in short, Introduction to Time Series and the overall procedure required for Time Series Modelling including general terminologies and algorithms. However the detailed Mathematics is excluded in the slides, this ppt means to give a start to understanding the Time Series Modelling before going into detailed Statistics.
2. • Continuous Time Series
• Discrete Time Series
o E.g. Adjustment of a price P in response to non-zero excess
demand for a product can be modeled in continuous time as:
𝑑𝑃
𝑑𝑡
= λ
o A discrete signal or discrete-time signal is a time series
consisting of a sequence of quantities e.g. Weather data etc.
Time Series
4. General Approach
Plot the series and examine the main features of the graph, checking in particular whether there is:
o A trend,
o A seasonal component,
o Any apparent sharp changes in behavior,
o Any outlying observations.
Remove the trend and seasonal components to get stationary residuals.
Choose a model to fit the residuals, making use of various sample statistics including the sample
autocorrelation function.
Forecasting will be achieved by forecasting the residuals and then inverting the transformations
described above to arrive at forecasts of the original series {𝑋𝑡}.
1
5. Stationarity
• Strict Stationarity
• Weak Stationarity
{𝑋𝑡} is (weakly) stationary if
The mean function of { 𝑋𝑡} = µ 𝑋 (t) = E(𝑋𝑡) is independent of t,
The covariance function of {𝑋𝑡} = γ 𝑋(r, s) = Cov(𝑋𝑟, 𝑋𝑠) = E[(𝑋𝑟 − µ 𝑋 (r))( 𝑋𝑠 − µ 𝑋 (s))] for all integers r and s
and γ 𝑋 (h) := γ 𝑋 (h, 0 ) = γ 𝑋(t + h, t) = γ 𝑋 (t + h, t) is independent of t for each h.
{𝑋𝑡} is a strictly stationary time series if
• (𝑋1, … … . , 𝑋 𝑛) ′ ≜ (𝑋1+ℎ, … … . , 𝑋 𝑛+ℎ) ′, for all integers h and n ≥ 1.
1
6. 𝑋𝑡 = 𝑚 𝑡 + 𝑠𝑡 + 𝑌𝑡, t = 1,…..,n, where E 𝑌𝑡 = 0, 𝑠𝑡+𝑑 = 𝑠𝑡, and
𝑗=1
𝑑
𝑠𝑗 = 0
Removal of Trend
• By estimation of 𝑚 𝑡 and 𝑠𝑡
• By differencing the series {𝑋𝑡 }
• Smoothing with a finite
moving average filter
• Exponential smoothing
• Smoothing by elimination
of high-frequency
components
• Polynomial fitting
lag-1 difference operator ∇
1
7. Removal of Trend
• By estimation of 𝑚 𝑡 and 𝑠𝑡
• Smoothing with a finite
moving average filter
• Let q be a nonnegative integer and consider the two-sided moving average: 𝑊𝑡 =
1
(2𝑞+1) 𝑗=−q
𝑞
𝑋𝑡−𝑗.
• Assuming that 𝑚 𝑡 is approximately linear over the interval [t − q, t + q] and that the average of the error terms
over this interval is close to zero.
• The moving average thus provides us with the estimates 𝑚 𝑡 =
1
(2𝑞+1) 𝑗=−q
𝑞
𝑋𝑡−𝑗 .
• For large q it will attenuate noise as 𝑚 𝑡 ≈ 0. So, to overcome this we can use the Spencer 15-point moving
average as a filter that passes polynomials of degree 3 without distortions
Linear Filter
{𝒙 𝒕} { 𝒎 𝒕 = 𝒂𝒋 𝒙 𝒕−𝒋}
1
8. Removal of Trend
• By estimation of 𝑚 𝑡 and 𝑠𝑡
• Exponential smoothing
• For any fixed α ∈ [0 , 1], the one-sided moving averages 𝑚 𝑡, t = 1 ,...,n , defined by the recursions:
𝑚 𝑡 = α𝑋𝑡 + ( 1 − α) 𝑚 𝑡−1 , t = 2 ,...,n, and 𝑚1 = 𝑋1
• Weighted moving average with weights decreasing exponentially (except for the last one).
1
9. Removal of Trend
• By differencing the series {𝑋𝑡 }
lag-1 difference operator ∇
• We define the lag-1 difference operator ∇ by:
∇𝑋𝑡 = 𝑋𝑡 − 𝑋𝑡−1= (1 − B) 𝑋𝑡, where B is the backward shift operator.
• B𝑋𝑡 = 𝑋𝑡−1.
• Powers of the operators B and ∇ are defined as 𝐵 𝑗
(𝑋𝑡) = 𝑋𝑡−𝑗 and 𝛻 𝑗
(𝑋𝑡) = ∇(𝛻 𝑗−1
(𝑋𝑡)), j ≥ 1, with 𝛻0
(𝑋𝑡) = 𝑋𝑡.
1
10. Removal of Seasonality
𝑋𝑡 = 𝑚 𝑡 + 𝑠𝑡 + 𝑌𝑡, t = 1,…..,n, where E 𝑌𝑡 = 0, 𝑠𝑡+𝑑 = 𝑠𝑡, and
𝑗=1
𝑑
𝑠𝑗 = 0
Estimation of Trend and then
Seasonal Components
lag- d differencing operator 𝛻𝑑
lag-d difference operator 𝛻𝑑
1
11. Removal of Seasonality
Estimation of Trend and then
Seasonal Components
• The trend is first estimated by applying a moving average filter specially chosen to eliminate the seasonal
component and to dampen the noise.
• If the period d is even, say d = 2q , then we use, 𝑚 𝑡 = (0.5 𝑥𝑡−𝑞 + 𝑥𝑡−𝑞+1 + ……… + 𝑥𝑡+𝑞)/d, q <t ≤ n −q.
• If the period is odd, say d = 2 q + 1, then we use the simple moving average.
• The second step is to estimate the seasonal component. For each k = 1 ,...,d ,we compute the average 𝜔 𝑘 of the
deviations {( 𝑥 𝑘+𝑗𝑑 -m 𝑘+𝑗𝑑), q < k + jd ≤ n−q}.
• Since these average deviations do not necessarily sum to zero, we estimate the seasonal component 𝑠 𝑘 as, s 𝑘 =
𝜔 𝑘 -
1
𝑑 𝑖−1
𝑑
𝜔𝑖, k = 1, ….., d, and s 𝑘 = s 𝑘−𝑑, k > d.
• The deseasonalized data is then defined to be the original series with the estimated seasonal component
removed, i.e., 𝑑 𝑡 = 𝑥𝑡 - s𝑡, t = 1, …. ,n. Finally, we re-estimate the trend from the deseasonalized data { 𝑑 𝑡} using
one of the methods already described.
1
12. lag- d differencing operator 𝛻𝑑
Removal of Seasonality
• Lag- d differencing operator 𝛻𝑑 defined as: 𝛻𝑑 𝑋𝑡 = 𝑋𝑡 - 𝑋𝑡−𝑑 = (1 – 𝐵 𝑑) 𝑋𝑡.
• Applying the operator 𝛻𝑑 to the model 𝑋𝑡 = 𝑚 𝑡 + 𝑠𝑡 + 𝑌𝑡, where {𝑠𝑡} has period d, we obtain 𝛻𝑑 𝑋𝑡 = 𝑚 𝑡 - 𝑚 𝑡−𝑑
+ 𝑌𝑡 - 𝑌𝑡−𝑑, which gives a decomposition of the difference 𝛻𝑑 𝑋𝑡 into a trend component (𝑚 𝑡 - 𝑚 𝑡−𝑑) and a noise
term (𝑌𝑡 - 𝑌𝑡−𝑑).
• The trend, 𝑚 𝑡 - 𝑚 𝑡−𝑑, can then be eliminated using the methods already described, in particular by applying a
power of the operator ∇.
• This doubly differenced series can in fact be well represented by a stationary time series model.
lag-d difference operator 𝛻𝑑
1
13. Test of Randomness1
• The Portmanteau test
• The Turning point test
• The Difference-sign test
• The rank test
• Fitting an Auto-regressive model
• Ljung and Box test
• McLeod and Li Test
14. Stationary Processes
• Linear Processes
o The time series {𝑋𝑡} is a linear process if it has the representation 𝑋𝑡 = 𝑗= −∞
∞
𝜓𝑗 𝑍𝑡−𝑗 or 𝑋𝑡 =
𝜓(𝐵)𝑍𝑡, where 𝜓 𝐵 = 𝑗= −∞
∞
𝜓𝑗 B 𝑗 for all t, where {𝑍𝑡} ∼ WN(0, σ 2) and {𝜓𝑗} is a sequence of
constants with 𝑗= −∞
∞
𝜓𝑗 < ∞.
o The class of linear time series models includes the class of Auto-Regressive Moving-Average (ARMA)
models
o Every second-order stationary process is either a linear process or can be transformed to a linear
process by subtracting a deterministic component
2
15. • MA(q) Process
o {𝑋𝑡} is a moving-average process of order q if 𝑋𝑡 = 𝑍𝑡 + θ1 𝑍𝑡−1 +.…+ θq 𝑍𝑡−q, where {𝑍𝑡] ∼ WN(0 ,σ 2)
and θ 1 ,...,θ q are constants
o Every q-correlated process is an MA(q) process.
• AR(p) Process
o {𝑋𝑡} is an Auto-Regressive process of order p if 𝑋𝑡 = φ1 X 𝑡−1 +.…+ φp X 𝑡−p + 𝑍𝑡, where {𝑍𝑡] ∼ WN(0
,σ 2) and 𝑍𝑡 is uncorrelated with 𝑋𝑠 for each s < t.
• ARMA(p, q) Process
o {𝑋𝑡} is an ARMA(p, q) process if 𝑋𝑡 - φ1 X 𝑡−1 -.…- φp X 𝑡−p = 𝑍𝑡 + θ1 𝑍𝑡−1 +.…+ θq 𝑍𝑡−q, where {𝑍𝑡] ∼
WN(0 ,σ 2
) and the polynomials (1 - φ1z -…- φp 𝑧 𝑝
) and (1 + θ1z +…+ θq 𝑧 𝑞
) have no common factors.
AR(p), MA(q) and ARMA(p, q) Processes2
16. ARMA(p, q) Processes
• ARMA(p, q) Process
o {𝑋𝑡} is an ARMA(p, q) process if 𝑋𝑡 - φ1 X 𝑡−1 -.…- φp X 𝑡−p = 𝑍𝑡 + θ1 𝑍𝑡−1 +.…+ θq 𝑍𝑡−q, where {𝑍𝑡] ∼ WN(0 ,σ 2
)
and the polynomials (1 - φ1z -…- φp 𝑧 𝑝
) and (1 + θ1z +…+ θq 𝑧 𝑞
) have no common factors.
o 𝑋𝑡 in above definition must be Stationary.
o A stationary solution {𝑋𝑡} of above equation exists (and is also the unique stationary solution) if and only if φ(z)
= 1 − φ1z − ··· − φpzp ≠ 0 for all |z| = 1.
o An ARMA(p, q) process {𝑋𝑡} is causal, or a causal function of {𝑍𝑡} , if there exist constants {ψ 𝑗} such that 𝑋𝑡 =
𝑗=0
∞
|𝜓𝑗|𝑍𝑡−𝑗. Causality is equivalent to the condition φ(z) = 1 − φ1z − ··· − φpzp ≠ 0 for all |z| ≤ 1.
o An ARMA(p, q) process {𝑋𝑡} is Invertible if there exist constants {𝜋𝑗} such that 𝑗=0
∞
𝜋𝑗 < ∞ and 𝑍𝑡 = 𝑗=0
∞
𝜋𝑗 𝑋𝑡−𝑗
for all t. Invertibility is equivalent to the condition θ(z) = 1 + θ1z +…+ θq 𝑧 𝑞 ≠ 0 for all |z| ≤ 1.
o We will focus our attention principally on Causal and Invertible ARMA processes.
2
17. ACF and PACF of ARMA(p, q) process
• PACF
o The partial autocorrelation function (PACF) of an ARMA process {𝑋𝑡} is the function α(·) defined by the
equations: α(0) = 0, and α(h) = ∅ℎℎ, h ≥ 1, where ∅ℎℎ = Γℎ
−1
𝛾ℎ, Γℎ =[𝛾(I - j)]
ℎ
𝑖, 𝑗 = 1
and 𝛾ℎ =
[𝛾(1), 𝛾(2),…, 𝛾(h)]′
o PAC For a causal AR(p) process is zero for lags greater than p.
• ACF
o If the sample ACF 𝜌(h) is significantly different from zero for 0 ≤ h ≤ q and negligible for h > q, then it
is MA(q) process
o In order to apply this criterion we need to take into account the random variation expected in the
sample autocorrelation function before we can classify ACF values as “negligible.” To resolve this
problem we can use Bartlett’s formula (Section 2.4), which implies that for a large sample of size n from
an MA( q ) process, the sample ACF values at lags greater than q are approximately normally distributed
with means 0 and variances 𝜔ℎℎ/n = (1 + 2𝜌2
(1) +…+ 2𝜌2
(q))/n
o This means that if the sample is from MA(q) process and if h > q, then 𝜌(h) should fall between the
bounds ±1.96 𝜔ℎℎ/𝑛 with probability approximately 0.95. In practice we frequently use the more
stringent values ±1.96
2
18. Forecasting ARMA Processes
• Innovations Algorithm
o It provides us with a recursive method for forecasting second-order zero-mean processes that are not
necessarily stationary.
o For the causal ARMA process φ(B) 𝑋𝑡 = θ(B) 𝑍𝑡, {𝑍𝑡} ∼ WN(0, 𝜎2
), it is possible to simplify the application of
the algorithm drastically.
2
19. 3 Modelling and Forecasting with ARMA Processes
• General
o Estimation of the parameters φ = (φ𝑖 ,…, φ 𝑝), θ = (θ𝑖 ,…, θ 𝑞), and 𝜎2
when p and q are assumed to be known
o Assumption that data have been “mean-corrected” by subtraction of the sample mean, so that it is appropriate
to fit a zero-mean ARMA model to the adjusted data x1 ,..., x 𝑛 . If the model fitted to the mean-corrected data is
φ(B)X 𝑡 = θ(B)Z 𝑡, {Z 𝑡} ∼ WN(0, 𝜎2
)
o When p and q are known, good estimators of φ and θ can be found by imagining the data to be observations of a
stationary Gaussian time series and maximizing the likelihood with respect to the p + q + 1 parameters φ1 ,..., φ 𝑝,
θ1 ,..., θ 𝑞 and 𝜎2
. The estimators obtained by this procedure are known as maximum likelihood (or maximum
Gaussian likelihood) estimators
• Preliminary Estimation of parameters
o Yule-Walker Estimation: The Yule–Walker and Burg procedures apply to the fitting of pure autoregressive models.
(Although the former can be adapted to models with q > 0, its performance is less efficient than when q = 0.).
Assumption is that the ACF of {X 𝑡} coincides with the sample ACF at lags 1,…,p.
o Burg’s Algorithm: Assumption is that the PACF of {X 𝑡} coincides with the sample ACF at lags 1,…,p.
o The Innovations Algorithm:
o The Hannan-Rissanen Algorithm:
• After getting Preliminary Estimates we apply Maximum Likelihood Estimation (MLE) (or maximum Gaussian likelihood)
to estimate the parameters.
20. Diagnostic Checking3
• Residuals are defined by: 𝑊𝑡 = (𝑋𝑡 - 𝑋𝑡(φ, 𝜃)) / (𝑟𝑡−1(φ, 𝜃))
1/2
, t = 1,…,n.
o E(𝑋 𝑛+1 − 𝑋 𝑛+1)
2
= 𝜎2
E(𝑊𝑛+1 − 𝑊𝑛+1)
2
= 𝜎2
𝑟𝑛
• Rescaled Residuals 𝑅𝑡, t = 1 ,…, n, are obtained by dividing the residuals 𝑊𝑡, t = 1 ,…, n, by the estimate 𝜎 =
( 𝑡=1
𝑛
𝑊𝑡
2
)/𝑛 of the white noise standard deviation. Thus, 𝑅𝑡 = 𝑊𝑡/ 𝜎
• If the fitted model is appropriate, the rescaled residuals should have properties similar to those of a
WN(0,1) sequence or of an iid(0,1) sequence if we make the stronger assumption that the white noise {𝑍𝑡}
driving the ARMA process is independent white noise.
21. Diagnostic Checking3
The Graph of { 𝑅𝑡, t = 1 ,…, n}
• If the fitted model is appropriate, then the graph of the rescaled residuals { 𝑅𝑡, t = 1 ,…, n} should resemble
that of a white noise sequence with variance one.
Rescaled residuals after fitting the ARMA(1,1) model to some data
25. Reference(s)
1) Introduction to Time Series and Forecasting, Brockwell, Peter J., Davis, Richard A.,
https://www.springer.com/us/book/9781475777505
2) Discrete time Series, https://www.wikiwand.com/en/Discrete-time_signal