2. Contents
• Introduction
• Components of time series
• Stationarity
• What is stationary signal
• Test for stationarity
• How to make stationary signal
• ACF
• Forecasting Models
• AR Model
• MA Model
• ARMA Model
• ARIMA model
• Exponential Smoothing
• Time Series decomposition
• Validation of a forecast model
3. Introduction to Time Series
• A time series is a time-oriented or chronological sequence of
observations on a variable of interest.
• Time series analysis attempts to model the underlying structure of
observations taken over time.
• A time series, is an ordered sequence of equally spaced values over
time. Figure 1 provides a plot of the monthly number of international airline
passengers over a 12-year period.
4. Applications
The usage of time series models is two fold:
• Obtain an understanding of the underlying forces and structure that produced the
observed data
• Fit a model and proceed to forecasting, monitoring or even feedback and
feedforward control.
Time Series Analysis is used for many applications such as:
• Economic Forecasting
• Sales Forecasting
• Budgetary Analysis
• Stock Market Analysis
• Yield Projections
• and many, many more...
5. Some failed forecasting..
The varying fortunes of forecasters arise
because good forecasts can seem almost
magical, while bad forecasts may be
dangerous. Consider the following famous
predictions about computing.
• I think there is a world market for maybe
five computers. (Chairman of IBM, 1943)
• Computers in the future may weigh no more
than 1.5 tons. (Popular Mechanics, 1949)
• There is no reason anyone would want a
computer in their home. (President, DEC,
1977)
6. Difference b/w Prediction and
Forecasting
• Prediction :- In prediction you predict something on the basis of
dependent and independent variables.(No time interval, or order is
involved). Ex :- Linear regression
• Forecasting :- You have a series or sequence of values say Y1, Y2,
Y3…..Yn and you have to forecast about Yn+1. In forecasting order and
time intervals are important. Ex :- Time series
7. Univariate time series
• The term "univariate time series" refers to a time series that consists
of single (scalar) observations recorded sequentially over equal time
increments. Some examples are monthly CO2 concentrations and
monthly returns of a data stock.
• Although a univariate time series data set is usually given as a single
column of numbers, time is in fact an implicit variable in the time
series. If the data are equi-spaced, the time variable, or index, does
not need to be explicitly given. The time variable may sometimes be
explicitly used for plotting the series. However, it is not used in the
time series model itself.
8. Multivariate Time Series (MTS)
A Multivariate time series has more than one time-dependent
variable. Each variable depends not only on its past values but
also has some dependency on other variables. This dependency
is used for forecasting future values.
9. Time series patterns
In describing these time series, we have used words such as “trend” and “seasonal” which need to
be defined more carefully.
Trend
• A trend exists when there is a long-term increase or decrease in the data. It does not have to be
linear. Sometimes we will refer to a trend as “changing direction”, when it might go from an
increasing trend to a decreasing trend.
Seasonal
• A seasonal pattern occurs when a time series is affected by seasonal factors such as the time of
the year or the day of the week. Seasonality is always of a fixed and known frequency.
Cyclic
• A cycle occurs when the data exhibit rises and falls that are not of a fixed frequency. These
fluctuations are usually due to economic conditions, and are often related to the “business cycle”.
The duration of these fluctuations is usually at least 2 years.
13. Stationary
• A common assumption in many time series techniques is that the data are stationary.
• A stationary process has the property that the mean, variance and autocorrelation structure do not
change over time.
• Stationarity can be defined in precise mathematical terms, but for our purpose we mean a flat
looking series, without trend, constant variance over time, a constant autocorrelation structure over
time.
15. How to remove stationarity
• A series which is non-stationary can be made stationary after differencing.
• A series which is stationary after being differentiated once is said to be integrated
of order 1 and is denoted by I(1).
• In general a series which is stationary after being differentiated d times is said to
be integrated of the order d, denoted I(d).
• Therefore a series which is stationary without differentiation is said to be I(0).
18. Autocorrelation Function (ACF)
• Just as correlation measures the extent of a linear relationship between two variables,
autocorrelation measures the linear relationship between lagged values of a time series.
• There are several autocorrelation coefficients, corresponding to each panel in the lag plot.
For example, r1 measures the relationship between yt and yt−1, r2 measures the
relationship between yt and yt−2, and so on.
20. Auto regressive (AR) process
• A time series is said to be AR when present value of the time
series can be obtained using previous values of the same time
series i.e the present value is weighted average of its past
values. Stock prices and global temperature rise can be thought
of as an AR processes.
The AR process of an order p can be written as,
Where ϵt is a white noise and yt-₁ and yt-₂ are the lags. Order p is the lag
value after which PACF plot crosses the upper confidence interval for the
first time. These p lags will act as our features while forecasting the AR
21. Moving average (MA) process
• A process where the present value of series is defined as a
linear combination of past errors. We assume the errors to be
independently distributed with the normal distribution. The MA
process of order q is defined as ,
• Here ϵt is a white noise. To get intuition of MA process lets
consider order 1 MA process which will look like,
22. ARMA model
• An, or Autoregressive Moving Average model, is used to
describe weakly stationary stochastic time series in terms of
two polynomials. The first of these polynomials is
for autoregression, the second for the moving average.
• Often this model is referred to as the ARMA(p,q) model;
where:
• p is the order of the autoregressive polynomial,
• q is the order of the moving average polynomial.
• The equation is given by:
23. ARIMA Model
• ARIMA models provide another approach to time series
forecasting. Exponential smoothing and ARIMA models are the
two most widely used approaches to time series forecasting,
and provide complementary approaches to the problem.
• While exponential smoothing models are based on a description
of the trend and seasonality in the data, ARIMA models aim to
describe the autocorrelations in the data.
24. ARIMA
• An ARIMA model can be understood by outlining each of its
components as follows:
• Autoregression (AR) refers to a model that shows a changing
variable that regresses on its own lagged, or prior, values.
• Integrated (I) represents the differencing of raw observations to
allow for the time series to become stationary, i.e., data values
are replaced by the difference between the data values and the
previous values.
• Moving average (MA) incorporates the dependency between an
observation and a residual error from a moving average model
applied to lagged observations.
25. ARIMA Model
• Because the need to make a time series stationary is common,
the differencing can be included (integrated) into the ARMA
model definition by defining the Autoregressive Integrated
Moving Average model, denoted ARIMA(p,d,q).
• The structure of the ARIMA model is identical to the expression
of ARMA, but the ARMA(p,q) model is applied to the time
series, Y, after applying differencing d times.
26. • If we combine differencing with autoregression and a moving
average model, we obtain a non-seasonal ARIMA model. ARIMA is
an acronym for AutoRegressive Integrated Moving Average (in this
context, “integration” is the reverse of differencing). The full model
can be written as
• where y′t is the differenced series (it may have been differenced
more than once). The “predictors” on the right hand side include both
lagged values of yt and lagged errors. We call this an ARIMA(p,d, q)
model.
27. Exponential smoothing
• Forecasts produced using exponential smoothing methods are
weighted averages of past observations, with the weights
decaying exponentially as the observations get older.
• In other words, the more recent the observation the higher the
associated weight.
• This framework generates reliable forecasts quickly and for a
wide range of time series, which is a great advantage and of
major importance to applications in industry.
28. Simple exponential smoothing
• The simplest of the exponentially smoothing methods is naturally
called simple exponential smoothing (SES). This method is suitable for
forecasting data with no clear trend or seasonal pattern. For example, the
data in Figure do not display any clear trending behavior or any
seasonality.
29. • Using the naïve method, all forecasts for the future are equal to
the last observed value of the series,
• for h=1,2,…Hence, the naïve method assumes that the most
recent observation is the only important one, and all previous
observations provide no information for the future.
• This can be thought of as a weighted average where all of the
weight is given to the last observation.
30. • Using the average method, all future forecasts are equal to a simple
average of the observed data,
• for h=1,2,… Hence, the average method assumes that all
observations are of equal importance, and gives them equal weights
when generating forecasts.
• We often want something between these two extremes. For
example, it may be sensible to attach larger weights to more recent
observations than to observations from the distant past. This is
exactly the concept behind simple exponential smoothing.
31. Simple exponential smoothing
• Forecasts are calculated using weighted averages, where the
weights decrease exponentially as observations come from
further in the past — the smallest weights are associated with
the oldest observations:
• where 0≤α≤1 is the smoothing parameter. The one-step-ahead
forecast for time T+1 is a weighted average of all of the
observations in the series y1,…,yT. The rate at which the
weights decrease is controlled by the parameter α.
32. Moving Average
• A simple moving average of span N assigns weights l/N to the most recent N observations Yt. Yt- 1 •... , Yt-
n+1· and weight zero to all other observations. If we let Mt be the moving average, then the N -span moving
average at time period T is
• Clearly, as each new observation becomes available it is added into the sum from which the moving average
is computed and the oldest observation is discarded.
33. Moving Average
• The simple moving average is a linear data smoother, or a linear filter, because it replaces each
observation y, with a linear combination of the other data points that are near to it in time. The
weights in the linear combination are equal, so the linear combination here is an average. Of
course, unequal weights could be used. For example, the Hanning filter is a weighted, centered
moving average
An obvious disadvantage of a linear filter such as a moving average is that an unusual or erroneous data point or
an outlier will dominate the averages that contain that observation, contaminating the moving averages for a
length of time equal to the span of the filter. For example, consider the sequence of observations
15, 18, 13, 12, 16, 14, 16, 17, 18, 15, 18, 200, 19, 14, 21, 24, 19, 25
34. • Odd-span moving medians (also called running medians) are an alternative to moving averages that are
effective data smoothers when the time series may be contaminated with unusual values or outliers. The
moving median of span N is defined as
• where N = 2u + I. The median is the middle observation in rank order (or order of value). The moving median
of span 3 is a very popular and effective data smoother. where
35. • This smoother would process the data three values at a time, and replace the three original observations by
their median. If we apply this smoother to the data above we obtain
------- 15, 13, 13, 14, 16, 17, 17, 18, 18, 19, 19, 19, 21, 21, 24 --------
• This smoothed data is a reasonable representation of the original data, but it conveniently ignores the value
200. The end values are lost when using the moving median.
• If there are a lot of observations, the information loss from the missing end values is not serious. However, if
it is necessary or desirable to keep the lengths of the original and smoothed data sets the same, a simple way
to do this is to "copy on" or add back the end values from the original data.
36. Autocovariance and Autocorrelation Functions
• If a time series is stationary this means that the joint probability distribution of any
two observations, say, y, and Yt+k. is the same for any two time periods t and t + k
that are separated by the same interval k. Useful information about this joint
distribution and hence about the nature of the time series, can be obtained by
plotting a scatter diagram of all of the data pairs y,. Yt+k that are separated by the
same interval k. The interval k is called the lag.
41. Time series decomposition
• Time series data can exhibit a variety of patterns, and it is often
helpful to split a time series into several components, each
representing an underlying pattern category.
• When we decompose a time series into components, we
usually combine the trend and cycle into a single trend-
cycle component (sometimes called the trend for simplicity).
Thus we think of a time series as comprising three components:
a trend-cycle component, a seasonal component, and a
remainder component (containing anything else in the time
series).
42. • There is a '"classical"' approach to decomposition of a time series into trends and seasonal components
(actually, there are a lot of different decomposition algorithms: here we explain a very simple but useful
approach). The general mathematical model for this decomposition is
• where S is the seasonal component, T is the trend effect (sometimes called the trend cycle effect), and εis the
random error component.
43. Time series components
• If we assume an additive decomposition, then we can write
yt=St+Tt+Rt,
where yt is the data, St is the seasonal component, Tt is the trend-cycle
component, and Rt is the remainder component, all at period t.
Alternatively, a multiplicative decomposition would be written as
yt=St×Tt×Rt.
44. • The additive decomposition is the most appropriate if the
magnitude of the seasonal fluctuations, or the variation around
the trend-cycle, does not vary with the level of the time series.
• When the variation in the seasonal pattern, or the variation
around the trend-cycle, appears to be proportional to the level
of the time series, then a multiplicative decomposition is more
appropriate.
• Multiplicative decompositions are common with economic time
series.
45. • An alternative to using a multiplicative decomposition is to first
transform the data until the variation in the series appears to be
stable over time, then use an additive decomposition. When a
log transformation has been used, this is equivalent to using a
multiplicative decomposition because
46. Example
• We will look at several methods for obtaining the components St, Tt and Rt later in this
chapter, but first, it is helpful to see an example. We will decompose the new orders index
for electrical equipment shown in Figure. The data show the number of new orders for
electrical equipment (computer, electronic and optical products) in the Euro area (16
countries). The data have been adjusted by working days and normalized so that a value
of 100 corresponds to 2005.
47. • This Figure shows an additive decomposition of these data. The method used for
estimating components in this example is STL, which is discussed in later Section
48. • The three components are shown separately in the bottom
three panels of Figure.
• These components can be added together to reconstruct the
data shown in the top panel.
• Notice that the seasonal component changes slowly over time,
so that any two consecutive years have similar patterns, but
years far apart may have different seasonal patterns.
• The remainder component shown in the bottom panel is what is
left over when the seasonal and trend-cycle components have
been subtracted from the data.
49. Seasonally adjusted data
• If the seasonal component is removed from the original data, the resulting
values are the “seasonally adjusted” data. For an additive decomposition,
the seasonally adjusted data are given by yt−St, and for multiplicative
data, the seasonally adjusted values are obtained using yt/St.
50. Implementation example
• from pandas import read_csv
• from matplotlib import pyplot
• from statsmodels.tsa.seasonal import seasonal_decompose
• series = read_csv('airline-passengers.csv', header=0, index_col=0)
• result = seasonal_decompose(series, model='multiplicative')
• result.plot()
• pyplot.show()
52. Moving average Model
• The classical method of time series decomposition originated in
the 1920s and was widely used until the 1950s. It still forms the
basis of many time series decomposition methods, so it is
important to understand how it works.
• The first step in a classical decomposition is to use a moving
average method to estimate the trend-cycle, so we begin by
discussing moving averages.
53. Moving average smoothing
• A moving average of order m can be written as:
where m=2k+1.
That is, the estimate of the trend-cycle at time t is obtained by
averaging values of the time series within k periods of t. Observations
that are nearby in time are also likely to be close in value. Therefore,
the average eliminates some of the randomness in the data, leaving a
smooth trend-cycle component. We call this an m-MA, meaning a
moving average of order m.
54. Example (5-MA Smoothing)
Notice that the trend-cycle (in red) is smoother than the original data and captures the main
movement of the time series without all of the minor fluctuations.
The order of the moving average determines the smoothness of the trend-cycle estimate. In general,
a larger order means a smoother curve.
55. Comparison of different MA Models
Simple moving averages such as these are usually of an odd order (e.g., 3, 5, 7, etc.). This is so
they are symmetric: in a moving average of order m=2k+1, the middle observation,
and k observations on either side, are averaged. But if m was even, it would no longer be
symmetric
56. Moving averages of moving averages
• It is possible to apply a moving average to a moving average.
One reason for doing this is to make an even-order moving
average symmetric.
• For example, we might take a moving average of order 4, and
then apply another moving average of order 2 to the results. In
the following table, this has been done for the first few years of
the Australian quarterly beer production data.
57. A moving average of order 4 applied to the quarterly data, followed by a moving average of
order 2.
The notation “2×4-MA” in the last column means a 4-MA followed by a 2-MA. The values in the last
column are obtained by taking a moving average of order 2 of the values in the previous column. For
example, the first two values in the 4-MA column are 451.25=(443+410+420+532)/4 and
448.75=(410+420+532+433)/4. The first value in the 2x4-MA column is the average of these two:
58. • When a 2-MA follows a moving average of an even order (such
as 4), it is called a “centered moving average of order 4”. This is
because the results are now symmetric. To see that this is the
case, we can write the 2×4-MA as follows:
59. Estimating the trend-cycle with
seasonal data
• The most common use of centered moving averages is for
estimating the trend-cycle from seasonal data. Consider the 2×4-MA:
• When applied to quarterly data, each quarter of the year is given
equal weight as the first and last terms apply to the same quarter in
consecutive years.
• Consequently, the seasonal variation will be averaged out and the
resulting values of T^t will have little or no seasonal variation
remaining. A similar effect would be obtained using a 2×8-MA or
a 2×12-MA to quarterly data.
60. • If the seasonal period is even and of order m, we use a 2×m-
MA to estimate the trend-cycle. If the seasonal period is odd
and of order m, we use a m-MA to estimate the trend-cycle.
• For example, a 2×12-MA can be used to estimate the trend-
cycle of monthly data and a 7-MA can be used to estimate the
trend-cycle of daily data with a weekly seasonality.
• Other choices for the order of the MA will usually result in trend-
cycle estimates being contaminated by the seasonality in the
data.
61. Classical decomposition
• The classical decomposition method originated in the 1920s. It is a
relatively simple procedure, and forms the starting point for most
other methods of time series decomposition. There are two forms of
classical decomposition: an additive decomposition and a
multiplicative decomposition.
• These are described below for a time series with seasonal
period m (e.g.=4 for quarterly data, m=12 for monthly data, m=7 for
daily data with a weekly pattern).
• In classical decomposition, we assume that the seasonal component
is constant from year to year. For multiplicative seasonality,
the m values that form the seasonal component are sometimes
called the “seasonal indices”.
62. Additive decompositionStep 1
If m is an even number, compute the trend-cycle component T^t using a 2×m-MA.
If m is an odd number, compute the trend-cycle component T^t using an m-MA.
Step 2
Calculate the detrended series: yt−T^t.
Step 3
To estimate the seasonal component for each season, simply average the detrended values for that
season.
For example, with monthly data, the seasonal component for March is the average of all the detrended
March
values in the data.
These seasonal component values are then adjusted to ensure that they add to zero.
The seasonal component is obtained by stringing together these monthly values, and then replicating the
sequence for each year of data.
This gives S^t.
Step 4
The remainder component is calculated by subtracting the estimated seasonal and trend-cycle
components: R^t=yt−T^t−S^t.
63. Multiplicative decompositionA classical multiplicative decomposition is similar, except that the subtractions are replaced by divisions.
Step 1
If m is an even number, compute the trend-cycle component ^Tt using a 2×m-MA.
If m is an odd number, compute the trend-cycle component ^Tt using an m-MA.
Step 2
Calculate the detrended series: yt/^Tt.
Step 3
To estimate the seasonal component for each season, simply average the detrended values for that
season.
For example, with monthly data, the seasonal index for March is the average of all the detrended March
values in
the data.
These seasonal indexes are then adjusted to ensure that they add to m.
The seasonal component is obtained by stringing together these monthly indexes, and then replicating the
sequence for each year of data. This gives ^St.
Step 4
The remainder component is calculated by dividing out the estimated seasonal and trend-cycle
components: R^t=yt/(T^tS^t).
64.
65. X11 decomposition
• Another popular method for decomposing quarterly and monthly data is
the X11 method which originated in the US Census Bureau and Statistics
Canada.
• This method is based on classical decomposition, but includes many extra
steps and features in order to overcome the drawbacks of classical
decomposition that were discussed in the previous section.
• In particular, trend-cycle estimates are available for all observations
including the end points, and the seasonal component is allowed to vary
slowly over time.
• X11 also has some sophisticated methods for handling trading day
variation, holiday effects and the effects of known predictors.
• It handles both additive and multiplicative decomposition. The process is
entirely automatic and tends to be highly robust to outliers and level shifts
in the time series.
66.
67. SEATS decomposition
• “SEATS” stands for “Seasonal Extraction in ARIMA Time Series”
.
• This procedure was developed at the Bank of Spain, and is now
widely used by government agencies around the world.
• The procedure works only with quarterly and monthly data. So
seasonality of other kinds, such as daily data, or hourly data, or
weekly data, require an alternative approach.
69. STL decomposition
STL has several advantages over the classical, SEATS and X11
decomposition methods:
• Unlike SEATS and X11, STL will handle any type of seasonality, not only
monthly and quarterly data.
• The seasonal component is allowed to change over time, and the rate of
change can be controlled by the user.
• The smoothness of the trend-cycle can also be controlled by the user.
• It can be robust to outliers (i.e., the user can specify a robust
decomposition), so that occasional unusual observations will not affect the
estimates of the trend-cycle and seasonal components. They will,
however, affect the remainder component.
On the other hand, STL has some disadvantages. In particular, it does not
handle trading day or calendar variation automatically, and it only provides
facilities for additive decompositions.
70.
71. Measuring strength of trend and
seasonality
• A time series decomposition can be used to measure the strength of trend
and seasonality in a time series. Recall that the decomposition is written
as:
• where Tt is the smoothed trend component, St is the seasonal component
and Rt is a remainder component.
• For strongly trended data, the seasonally adjusted data should have much
more variation than the remainder component.
• Therefore Var(Rt)/Var(Tt+Rt) should be relatively small. But for data with
little or no trend, the two variances should be approximately the same.
72. • So we define the strength of trend as:
• This will give a measure of the strength of the trend between 0 and 1.
Because the variance of the remainder might occasionally be even larger
than the variance of the seasonally adjusted data, we set the minimal
possible value of Ft equal to zero.
73. • The strength of seasonality is defined similarly, but with respect to the
detrended data rather than the seasonally adjusted data:
• A series with seasonal strength Fs close to 0 exhibits almost no
seasonality, while a series with strong seasonality will have Fs close to 1
because Var(Rt) will be much smaller than Var(St+Rt).
• These measures can be useful, for example, when there you have a large
collection of time series, and you need to find the series with the most
trend or the most seasonality.
74. EVALUATING AND MONITORING FORECASTING MODEL PERFORMANCE
• It is customary to evaluate forecasting model performance using the one-step-ahead forecast errors
• where is the forecast of yt that was made one period prior. Forecast errors at other lags, or at
several different lags, could be used if interest focused on those particular forecasts.
• Suppose that there are n observations for which forecasts have been made and n one-step-ahead forecast
errors, et(1). t =1, 2, ... , n.
75. ME, MAD, MSE
• Standard measures of forecast accuracy are the average error or mean error
• The mean absolute deviation(MAD)(Or mean absolute deviation).
• and the mean squared error
76. • The one-step-ahead forecast error and its summary measures. the ME, MAD, And
MSE, are all scale-dependent measures of forecast accuracy: that is, their values
are expressed in terms of the original units of measurement (or in the case of
MSE, the square of the original units).
• So, for example, if we were forecasting demand for electricity during the summer,
the units would be megawatts (MW). If the MAD for the forecast error during
summer months was 5 MW, we might not know whether this was a large forecast
error or a relatively small one.
• Furthermore, accuracy measures that are scale dependent do not facilitate
comparisons of a single forecasting technique across different time series. or
comparisons across different time periods.
• To accomplish this, we need a measure of relative forecast error.
77. MAPE & MPE
• Define the relative forecast error (in percent) as
• This is customarily called the percent forecast error. The mean percent forecast error (MPE) is
• and the mean absolute percent forecast error (MAPE) is
79. • Table illustrates the calculation of the one-step-ahead forecast error, the absolute errors, the squared errors, the
relative (percent) error, and the absolute percent error from a forecasting model for 20 time periods. The last
row of columns (3) through(7) display the sums required to calculate the ME, MAD, MSE, MPE, and MAPE.
• The mean (or average) forecast error is
80. AIC & SIC
• Two other important criteria are the Akaike Information Criterion
(AIC) and the Schwarz Information Criterion (SIC)
• where T periods of data have been used to fit a model with p
parameters and et is the residual from the model-fitting process in
period t.
• These two criteria penalize the sum of squared residuals for including
additional parameters in the model. Models that have small values of
the AIC or SIC are considered good models.