Visualizing, Modeling and Forecasting of Functional Time Series

Visualizing functional data Forecasting functional data Forecasting seasonal univariate time series Conclusion

Visualizing and forecasting functional time series

Han Lin Shang

Department of Econometrics and Business Statistics

HanLin.Shang@monash.edu


Outline

1 Visualizing functional time series.
2 Modeling and forecasting functional time series.
3 Modeling and forecasting seasonal univariate time series via
functional approach.
4 Present empirical analysis on estimation, modeling,
forecasting techniques, with no theoretical proof.


Aim of the ﬁrst paper

Introduce three visualization methods
1 rainbow plot
2 functional bagplot
3 functional highest density region (HDR) boxplot
Functional bagplot and functional HDR boxplot can detect outliers.


Overview of functional data

1 A collection of functions, represented by curves, surfaces,
shapes or images.
2 Some applications include
Age-speciﬁc mortality and fertility rates (Hyndman and Ullah,
2007)
Term-structured yield curve (Kargin and Onatski, 2008)
Spectrometry data (Reiss and Odgen, 2007)
El Ni˜o data (Ferraty and Vieu, 2006)
n


Visualizing functional data

Help discovery characteristics that might not apparent from
mathematical models and summary statistics.
Visualization plays a minor role.


Some visualization methods

1 Phase-plane plot
2 Rug-plot
3 Singular value decomposition plot


Rainbow plot

1 A simple plot of all the data, with added feature being a
rainbow color palette based on an ordering of functional data.
2 Functional data can be ordered by depth and density.


Example of rainbow plot

Annual age-speciﬁc mortality curves for French males between
1899 and 2005
France: male log mortality rate (1899−2005)
0
−2
Log mortality rate

−4
−6
−8
−10

0 20 40 60 80 100

Age


Multivariate principal component analysis

1 PC1 is calculated by maximizing the variance of φ1 X , that is

argmax var(φ1 X ) = argmax φ1 X Xφ1 .
φ1 =1 φ1 =1

2 Successive PC are obtained iteratively by subtracting the ﬁrst
k PC from X.

Xk = Xk−1 − Xk−1 φk φk ,

3 Treating Xk as the new data matrix to ﬁnd φk+1 by
maximizing the variance of φk+1 Xk , subject to
1
φk+1 = ( p φ2
j=1 k+1,j ) = 1 and φk+1 ⊥ φj , j = 1, . . . , k.
2


Properties of functional principal component analysis

PCA FPCA
Variables X = [x1 , . . . , xp ], f(x) =
xi = [x1i , . . . , xni ] , i = [f1 (x), . . . , fn (x)],
1, . . . , p x ∈ [x1 , xp ]
Data Vectors ∈ R p Curves ∈ L2 [x1 , xp ]
Covariance Matrix Operator T bounded
V = Cov(X) ∈ R p between x1 and xp , T :
L2 [x1 , xp ] → L2 [x1 , xp ]
Eigen Vector ξk ∈ R, Function
structure Vξk = λk ξk , for ξk (x) ∈ L2 [x1 , xp ],
xp
1 ≤ k < min(n, p) x1 T ξk (x)dx =
λk ξk (x), for 1 ≤ k < n
Components Random variables in Random variables in
Rp L2 [x1 , xp ]


Bivariate and functional bagplots

1 Apply robust functional principal component analysis (FPCA)
to {yt (x)} and obtain the ﬁrst two PC scores.



2 Bivariate PC scores then ordered by Tukey’s halfspace
location depth and plotted by bivariate bagplot.



2 Bivariate PC scores then ordered by Tukey’s halfspace
location depth and plotted by bivariate bagplot.
3 Mapping the features of bivariate bagplot into the functional
space.


Bivariate and functional HDR boxplots

1 Compute a bivariate kernel density estimate on the ﬁrst two
robust PC scores.



robust PC scores.
2 Apply the bivariate HDR boxplot.



robust PC scores.
2 Apply the bivariate HDR boxplot.
3 Mapping the features of the HDR boxplots into the functional
space.


Example of El Ni˜o data
n
Average monthly sea surface temperatures (Celsius) from January
1951 to December 2007
28
Sea surface temperature

26
24
22
20

2 4 6 8 10 12

Month


Rainbow plots ordered by depth and density
28

28

26

26
24

24
22

22
20

20

2 4 6 8 10 12 2 4 6 8 10 12

Month Month


Outlier detection by bagplots

0
1914q q
1915 q q

1916q

4
q

−2
1918q q

1944
3
1940q q q q
q

1917 q

Log mortality rate
PC score 2

−4
qq
2

q
q
q
qqqq
q
q q
q
q
1943q q

q
q
q
qq
qq
q
1

q

−6
q
qq q q
q
q
q q
q
qq
q
1919q q

q
q
qq
qq q
q
q
0

q q
q qq
q
q
q q q
q q q qq
qq
q
q q

−8
qqq
q q
q q q
qqq q q q q
q
q q
q
q
−1

q q
q q
q q q q
q q q q
q
q q
q

−10 −5 0 5 10 15 0 20 40 60 80 100
PC score 1 Age

1998 q
4

q

28
q
2

q
q q
q
1983 q q

q q

26
q
q q q
q q
q
q
q
q q q
q q
q q
q q
q q
q q
0

q q
PC score 2

q q q
q
q q
qq q
q
q
q
24
q
q
q
q q
q
q
−2

q q
q
q q
22
−4

1982 q
20

q
−6

1997 q q

−4 −2 0 2 4 6 8 10 2 4 6 8 10 12
PC score 1 Month


Outlier detection by HDR boxplots

0
6
1914q q
1915 q q

−2
1916q
4
q

1918q q

Log mortality rate
1944
1940q q q q
q

1917 q
PC score 2

−4
2

qq

q
q q
q
qqqq
q qq
q
q
1943q q

q
q
qq
qq
q
q
q
q
q q
q

1919q

−6
q q q
qq
q
qq q
q
qq
qq
0

qq q
q qq
q
q
q q qq
qq q
q
q
q
qq q
q
q q
q
qo
qq q q
q q qqq
q q q
q
qqq
q
q q
q q qqq q
q

−8
q q
q
−2

−15 −10 −5 0 5 10 15 20 0 20 40 60 80 100
PC score 1 Age
6

28
1998q
4

q

q

2

q

26
q q
q
q
q
q 1983q q

q q q
q
q q
q q
q q q
o q
PC score 2

q q q q q
q
0

q q
qq q q
q
q q
q q
q q
24
q
q q q
q
q q
q
q
−2

q q
q
q q
22
−4

1982q q
−6

20

1997q q
−8

−5 0 5 10 2 4 6 8 10 12
PC score 1 Month


Other outlier detection methods

1 Notion of functional depth and calculates a likelihood ratio
test statistics for each curve.
2 A curve is an outlier if the maximum of the test statistics
exceeds a given critical value.
3 Remove the outlier, the remaining data are tested again.


Integrated squared error

1 Utilizes robust FPCA. Integrated squared error for each curve
is
xp xp K
2
ˆ2
et (x)dx = yt (x) − µ(x) −
ˆ ˆ ˆ
βt,k φk (x) dx
x1 x1 k=1

2 High integrated squared errors indicate a high likelihood of
curves being detected as outliers.


Robust Mahalanobis distance method

1 Discretize functional data on an equally spaced dense grid.
2 The squared robust Mahalanobis distance is deﬁned by

rt = [yt (xi )−ˆ(xi )] Σ−1 [yt (xi )−ˆ(xi )],
µ ˆ µ i = 1, . . . , p, t = 1, . . . , n

3 Outliers have squared robust Mahalanobis distances greater
than χ2 .
.99,p


Outlier detection comparison of mortality data

Method Outliers detected
Functional depth None
Integrated squared error 1914–1918, 1940, 1943–1945
Functional bagplot 1914–1919, 1940, 1943–1944
Functional HDR boxplot 1914–1919, 1940, 1943–1944
Robust Mahalanobis distance 1914–1918, 1940, 1944
Table: The outliers are 1914-1919, 1940, 1943-1944.


Outlier detection comparison of El Ni˜o data
n

Method Outliers detected
Functional depth 1983, 1997
Integrated squared error 1973, 1982–1983, 1997–1998
Functional bagplot 1982–1983, 1997–1998
Functional HDR boxplot 1982–1983, 1997–1998
Robust Mahalanobis distance 1982–1983, 1997–1998
Table: The outliers are 1982-1983, 1997-1998.


Conclusion of the ﬁrst paper

1 Three graphical methods to visualize functional data.
2 Functional bagplots and HDR boxplots can detect outliers.
3 One limitation is only ﬁrst two principal component scores are
considered.
4 Probability of outliers needs to be pre-chosen.


Possible extension

1 FPCA can be replaced by other dimension reduction
techniques.
2 Other ways of ordering functional data or determining
functional median or mode.
3 Tukey’s location depth can be replaced by other depth
measures.
4 Extend from two-dimensional curves to three-dimensional
images.


Aim of the second paper

1 New functional data analytic tool for forecasting age-speciﬁc
mortality and fertility rates.
2 Mortality rate forecasting is vital for planning insurance and
pension policies.
3 Fertility rate forecasting is important for planning child care
policy.


Australian fertility data set
Annual Australian fertility rates (1921-2006) for age groups
from 15 to 49.
These are deﬁned as the number of live births during the
calendar year, according to the age of the mother, per 1000 of
the female resident population of the same age at 30 June.
Australia fertility rate (1921−2006)
250
200
Fertility rate

150
100
50
0

15 20 25 30 35 40 45 50

Age


French female mortality data set
Annual French female mortality rates (1899-2005) for single year of
age. These are simply the ratio of death counts to population
exposure in the relevant interval of age and time.

France: female log mortality rate (1899−2005)
0
−2
Log mortality rate

−4
−6
−8
−10

0 20 40 60 80 100

Age


Modeling step
1 Smooth the data for each year using a nonparametric
ˆ
smoothing method to estimate ft (x) for x ∈ [x1 , xp ] from
{xi , yt (xi )}, i = 1, 2, . . . , p.
2 Decompose the realized curves via FPCA
K
yt (x) = µ(x) +
ˆ ˆ ˆ
βt,k φk (x) + et (x) + σt (x)ηt ,
ˆ (1)
k=1

µ(x) is the mean function.
ˆ
ˆ ˆ
{φ1 (x), . . . , φK (x)} is the functional principal components,
which are assumed to be ﬁxed.
ˆ ˆ
{βt,1 , . . . , βt,K } is the uncorrelated principal component scores
K ˆ2
satisfying k=1 βt,k < ∞.
et (x) is the estimated model residual function.
ˆ
σt (x)ηt takes into account heterogeneity, and ηt ∼ N(0, 1).
K is the number of functional principal components.


Forecasting step

1 Model and forecast the coeﬃcients
ˆ ˆ
{β1,k , . . . , βn,k }, k = 1, . . . , K via univariate time series.
2 Use the forecast coeﬃcients with (1) to obtain forecasts of
fn+h (x), where h is forecast horizon.
3 Estimated variances of the error terms in (1) are used to
compute prediction intervals.


Weighted mean function

1 Mean function µ(x) estimated by a weighted average
n
∗ ˆ
µ (x) =
ˆ wt ft (x),
t=1

ˆ
where ft (x) is the smoothed curve estimated from yt (x), and
wt = κ(1 − κ)n−t is a geometrically decreasing weight with
0 < κ < 1.
2 ˆ ˆ
ft∗ (x) = ft (x) − µ∗ (x) is the de-centralized functional curves,
ˆ
let G = W f ∗ (x), where W = diag (w1 , . . . , wn ) is a diagonal
weight matrix.
3 Apply singular value decomposition to G = UDV , where
ˆ
φk (xi∗ ) is the (i, k)th element of V.


Weighted functional principal components
1 Weighted functional principal component decomposition is
K
yt (x) = µ∗ (x) +
ˆ βt,k φ∗ (x) + et (x) + σt (x)ηt
ˆ ˆ
k ˆ
k=1

2 ˆ ˆ
Since the scores {βt,1 , . . . , βt,K } are uncorrelated, they can be
forecasted using an univariate time series model.
3 Conditioning on the observations I and the set of ﬁxed
weighted functional principal components
ˆ ˆ ˆ
Φ∗ = {φ∗ (x), . . . , φ∗ (x)}, h-step-ahead forecasts of yn+h (x)
1 K
is
K
yn+h|n (x) = E[yn+h (x)|I, Φ∗ ] = µ∗ (x) +
ˆ ˆ ˆ βn+h|n,k φ∗ (x),
ˆ ˆ
k
k=1

ˆ
where βn+h|n,k denotes the h-step-ahead forecast of βn+h,k .


Selection of weight parameter

κ can be determined by minimizing the mean integrated forecast
error (MISFE):
xp 2
MISFE(h) = yn+h (x) − yn+h|n (x) dx,
ˆ
x1

over a set of grid points of κ.


Selection of number of components

Optimal number of components is determined by minimizing the
MISFE.


Australian fertility rates

K FPCA FPCAw RW
1 99.0611 16.7304
2 56.3095 3.3019
3 24.9330 3.2580
4 15.6845 3.1995
5 4.4495 3.2132
6 3.4310 3.2123 4.9800
Table: MSE: Australian fertility rates.


French female mortality rates

K FPCA FPCAw RW
1 0.5956 0.0293
2 0.0537 0.0310
3 0.0316 0.0310
4 0.0296 0.0311
5 0.0287 0.0311
6 0.0425 0.0311 0.0437
Table: MSE (×1000): French female log mortality rates.


Conclusion of the second paper

1 Proposed a weighted FPCA to forecast age-speciﬁc fertility
and mortality rates.
2 Compared point forecast accuracy between the unweighted
and weighted FPCA.
3 Extend weighting idea to other dimension reduction
techniques, such as functional partial least squares regression.


Aim of the third paper

1 Sea surface temperature (SST) is rising.
2 Rising sea surface temperatures increases intensity of nature
disaster, such as hurricanes and storms.
3 Provide a better way, a multivariate way and a nonparametric
way for modeling and predicting sea surface temperature.


El Ni˜o data set
n

1 Average monthly sea surface temperature from January 1950
to December 2008, available online at
www.cpc.noaa.gov/data/indices/sstoi.indices.
2 Sea surface temperatures are measured by moored buoys in
the “Nino region” deﬁned by the coordinate 0 − 10◦ South
and 90 − 80◦ West.


Univariate graphical display

28

26
24
22
20

1950 1960 1970 1980 1990 2000 2010

Month


Functional graphical display

28

26
24
22
20

2 4 6 8 10 12

Month


Functional time series analysis

{Zw , w ∈ [1, N]} be a seasonal time series observed at N
equispaced times.
For unequally-spaced data set, the smoothing methods may
be applied.
Observed time series {Z1 , . . . , Z708 } divided into 59 successive
paths of length 12,

yt (x) = {Zw , w ∈ (p(t−1), pt]}, ∀t = 1, . . . , 59, p = 1, . . . , 12.

To forecast future processes, yn+h,h>0 (x), from the observed
data.


FPCA

1 Decompose a complete (12 × 59) data matrix,
y(x) = [y1 (x), . . . , yn (x)] , into a number of functional
principal components and their uncorrelated scores.
2 FPCA decomposition can be written as
K
yt (x) = µ(x) +
ˆ ˆ ˆ
βt,k φk (x) + ˆt (x), (2)
k=1


Functional principal component regression

Conditioning on historical curves I and ﬁxed functional
ˆ ˆ ˆ
principal components {Φ = φ1 (x), . . . , φK (x)}, forecasted
curves are
K
ˆ TS ˆ
yn+h|n (x) = E[yn+h (x)|I, Φ] = µ(x)+
ˆ ˆ ˆ
βn+h|n,k φk (x), (3)
k=1

ˆ
where βn+h|n,k denotes the h-step-ahead forecast of βn+h,k .
Hereafter, we refer this method as the time series (TS)
method.


Problem statement

1 As observe most recent data points consisting of ﬁrst m0 time
period of yn+1 (x), denoted by
yn+1 (xe ) = [yn+1 (x1 ), . . . , yn+1 (xm0 )] , we want update
forecasts for the remaining time period of year n + 1, denoted
by yn+1 (xl ) = [yn+1 (xm0 +1 ), . . . , yn+1 (x12 )] .
2 Using (3), TS forecasts of yn+1 (xl ) is given as
K
ˆ TS ˆ
yn+1|n (xl ) = E[yn+1 (xl )|I l , Φl ] = µ(xl ) +
ˆ ˆTS ˆ
βk,n+1|n φk (xl ).
k=1

3 TS method does not consider any new observations.
4 Introduce four dynamic updating methods and compare their
point forecast performance.


Block moving (BM)
1 BM method considers most recent data as last observation in
a complete data matrix.
2 Because time is a continuous variable, we observe a complete
data matrix at any given time interval.
3 TS method can be applied by sacriﬁcing a number of data
points in the ﬁrst year.


Ordinary least squares (OLS) regression
1 ˆ
Denote Fe as a m0 × K matrix whose (j, k)th entry is φj,k for
1 ≤ j ≤ m0 , 1 ≤ k ≤ K .


1 Denote Fe as a m0 × K matrix whose (j, k)th entry is φj,k for ˆ
1 ≤ j ≤ m0 , 1 ≤ k ≤ K .
2 ˆ ˆ ˆ
Let βn+1 = [βn+1,1 , . . . , βn+1,K ] be a K × 1 vector, and
ˆn+1 (xe ) = [ˆn+1 (x1 ), . . . , ˆn+1 (xm0 )] be a m0 × 1 vector.


1 ≤ j ≤ m0 , 1 ≤ k ≤ K .
2 ˆ ˆ ˆ
3 ˆ∗
As the mean-adjusted yn+1 (xe ) = yn+1 (xe ) − µ(xe ) becomes
ˆ
available, OLS regression

ˆ∗ ˆ
yn+1 (xe ) = Fe βn+1 + ˆn+1 (xe ).


1 ≤ j ≤ m0 , 1 ≤ k ≤ K .
2 ˆ ˆ ˆ
3 ˆ∗
ˆ

ˆ∗ ˆ
yn+1 (xe ) = Fe βn+1 + ˆn+1 (xe ).

4 ˆOLS
Via OLS, βn+1 = (Fe Fe )−1 Fe yn+1 (xe ).
ˆ∗


1 ≤ j ≤ m0 , 1 ≤ k ≤ K .
2 ˆ ˆ ˆ
3 ˆ∗
ˆ

ˆ∗ ˆ
yn+1 (xe ) = Fe βn+1 + ˆn+1 (xe ).

4 ˆOLS
Via OLS, βn+1 = (Fe Fe )−1 Fe yn+1 (xe ).
ˆ∗
5 OLS forecast of yn+1 (xl ) is given by
K
ˆ OLS ˆ
yn+1|n (xl ) = E[yn+1 (xl )|I l , Φl ] = µ(xl ) +
ˆ ˆ ˆ
βn+1,k φk (xl ).
k=1


Ridge regression (RR)

1 RR penalizes the OLS coeﬃcients, which deviate from 0. RR
coeﬃcients minimize a penalized residual sum of squares

y∗ ˆ y∗ ˆ ˆ ˆ
argmin{(ˆn+1 (xe )−Fe βn+1 ) (ˆn+1 (xe )−Fe βn+1 )+λβn+1 βn+1 }
ˆ
βn+1




ˆ
βn+1

2 ˆ
Taking derivative with respect to βn+1 ,

βn+1 = (Fe Fe + λI)−1 Fe yn+1 (xe ).
ˆRR ˆ∗




ˆ
βn+1

2 ˆ
Taking derivative with respect to βn+1 ,

βn+1 = (Fe Fe + λI)−1 Fe yn+1 (xe ).
ˆRR ˆ∗

3 RR forecast of yn+1 (xl ) is
K
ˆ RR ˆ
yn+1 (xl ) = E[yn+1 (xl )|I, Φl ] = µ(xl ) +
ˆ ˆRR ˆ
βn+1,k φk (xl ).
k=1


Penalized least square (PLS) regression
1 OLS method needs a suﬃcient number of observation (≥ K )
ˆOLS
in order for βn+1 to be numerically stable.


ˆOLS
2 βn+1 obtained from the PLS methods minimizes
y∗ ˆ y∗ ˆ
(ˆn+1 (xe ) − Fe βn+1 ) (ˆn+1 (xe ) − Fe βn+1 ) +
ˆ ˆ ˆ
λ(βn+1 − β TS ) (βn+1 − β TS ) ˆ
n+1|n n+1|n


ˆOLS
y∗ ˆ y∗ ˆ
ˆ ˆ ˆ
λ(βn+1 − β TS ) (βn+1 − β TS ) ˆ
n+1|n n+1|n

3 ˆ
Taking ﬁrst derivative with respect to βn+1 ,

βn+1 = (Fe Fe + λI)−1 (Fe yn+1 (xe ) + λβn+1|n ).
ˆPLS ˆ ˆTS (4)


ˆOLS
y∗ ˆ y∗ ˆ
ˆ ˆ ˆ
λ(βn+1 − β TS ) (βn+1 − β TS ) ˆ
n+1|n n+1|n

3 ˆ

ˆPLS ˆ ˆTS (4)
4 PLS forecasts is a weighted average between the TS and OLS
forecasts, subject to a penalty parameter λ.


ˆOLS
y∗ ˆ y∗ ˆ
ˆ ˆ ˆ
λ(βn+1 − β TS ) (βn+1 − β TS ) ˆ
n+1|n n+1|n

3 ˆ

ˆPLS ˆ ˆTS (4)
4 PLS forecasts is a weighted average between the TS and OLS
forecasts, subject to a penalty parameter λ.
5 PLS forecast of yn+1 (xl ) is given as
K
ˆ PLS ˆ
yn+1 (xl ) = E[yn+1 (xl )|I l , Φl ] = µ(xl ) +
ˆ ˆPLS ˆ
βn+1,k φk (xl ).
k=1


Penalty parameter selection

Split the data into a training set
1 a training sample (SST from 1950 to 1970), and
2 a validation sample (SST from 1971 to 1992).
and a testing set (SST from 1993 to 2007).
Optimal penalty parameters λ for diﬀerent updating periods
are determined by minimizing the mean absolute error (MAE).
h p
1
MAE = |yn+j (xi ) − yn+j (xi )|,
ˆ
hp
j=1 i=1

over a grid of candidates (from 10−6 to 106 in steps of
0.0001).


Component selection

With data in training set, select number of components by
minimizing MAE within the validation set.
Optimal number of components is K = 5.


Some benchmark forecasting methods

1 Mean predictor (MP) method predicts values at n + 1 by
empirical mean from ﬁrst year to nth year.
2 Random walk (RW) method predicts new values at year n + 1
by observations at year n.
3 Seasonal autoregressive moving average (SARIMA) is a
benchmark method for forecasting seasonal univariate time
series. Requires the speciﬁcations of order of the seasonal and
non-seasonal components of an ARIMA model. Implement an
automatic algorithm of Hyndman and Khandakar (2008) to
select the optimal orders.


Point forecast comparison

Non-dynamic updating method Dynamic updating methods
Update MP RW SARIMA TS OLS Block PLS RR
Mar-Dec 0.72 0.86 0.96 0.73 0.72 0.70 0.67 0.76
Apr-Dec 0.73 0.87 0.98 0.74 0.69 0.73 0.68 0.65
May-Dec 0.71 0.86 0.88 0.71 0.94 0.71 0.68 0.62
Jun-Dec 0.71 0.84 0.86 0.71 1.07 0.70 0.66 0.58
Jul-Dec 0.72 0.87 0.86 0.73 0.94 0.68 0.60 0.57
Aug-Dec 0.71 0.91 0.84 0.74 0.94 0.69 0.63 0.62
Sep-Dec 0.71 0.93 0.84 0.74 1.03 0.70 0.65 0.64
Oct-Dec 0.72 0.96 0.57 0.78 0.69 0.74 0.71 0.64
Nov-Dec 0.72 0.92 0.52 0.79 0.25 0.75 0.58 0.24
Dec 0.64 0.83 0.21 0.71 0.29 0.59 0.23 0.29
Mean 0.71 0.88 0.75 0.74 0.76 0.70 0.61 0.56

Table: MAE of the point forecasts using diﬀerent methods.


Parametric prediction intervals

1 Based on orthogonality and linear additivity, total forecast
variance is approximated by the sum of individual variances
K
ˆ ˆ
ξn+h|n = Var[yn+h |I, Φ] ≈ ˆ
ηn+h|n,k φ2 (x) + vn+h ,
ˆ ˆ
k
k=1

ˆ ˆ ˆ
ηn+h|n,k = Var(βn+h,k |β1,k , . . . , βn,k ) is obtained by a time
ˆ
series model.
vn+h is estimated by averaging ˆ2 (x) in (3) for each x
ˆ n+h
variable.
2 Under the normality, the (1 − α) prediction intervals for
yn+h (x) are
1
ˆ
yn+h|n (x) ± zα (ξn+h|n ) 2 ,
ˆ
where zα is the (1 − α/2) standard normal quantile.


Nonparametric prediction intervals
1 h-step-ahead forecast errors of principal component scores is
ˆ ˆ
πt,h,k = βt,k − βt|t−h,k , for t = h + 1, . . . , n where h < n − 1.
ˆ


ˆ ˆ
ˆ
2 By sampling with replacement, obtain bootstrap samples of
βn+h,k ,
ˆb,TS ˆTS ˆb
βn+h|n,k = βn+h|n,k + π∗,h,k , for b = 1, . . . , B.


ˆ ˆ
ˆ
βn+h,k ,
ˆb,TS ˆTS ˆb
3 Since the residual {ˆ1 (x), . . . , ˆn (x)} is uncorrelated to the
principal components, bootstrap the model residual term
ˆb
n+h|n (x) by iid sampling.


ˆ ˆ
ˆ
βn+h,k ,
ˆb,TS ˆTS ˆb
ˆb
4 Based on orthogonality and linear additivity, obtain B forecast
variants of yn+h|n (x),
K
ˆb
yn+h|n (x) = µ(x) +
ˆ ˆb,TS ˆ
βn+h|n,k φk (x) + ˆb
n+h|n (x).
k=1


ˆ ˆ
ˆ
βn+h,k ,
ˆb,TS ˆTS ˆb
ˆb
4 Based on orthogonality and linear additivity, obtain B forecast
variants of yn+h|n (x),
K
ˆb
yn+h|n (x) = µ(x) +
ˆ ˆb,TS ˆ
βn+h|n,k φk (x) + ˆb
n+h|n (x).
k=1

5 ˆb
(1 − α) prediction intervals are quantiles of yn+h|n (x).


Distributional forecast updating

βn+1,k for year n + 1,

ˆb,TS ˆTS ˆb
βn+1|n,k = βn+1|n,k + π∗,1,k , for b = 1, . . . , B.

2 ˆb,TS
With bootstrapped samples βn+1|n,k , these lead to
ˆb,PLS
bootstrapped samples βn+1 by (4).
3 From β b,PLS , obtain B replications of
ˆ
n+1

K
ˆ b,PLS
yn+1 (xl ) = µ(xl ) +
ˆ ˆb,PLS ˆ
βn+1,k φk (xl ) + ˆn+1 (xl ).
k=1

4 ˆ b,PLS
(1 − α) prediction intervals are quantiles of yn+1 (xl ).


Distributional forecast measure
1 Empirical conditional coverage probability was calculated as
the ratio between number of ‘future’ samples falling into the
calculated prediction intervals and number of testing samples.

p h
1
coverage = y lb ˆ ub
I (ˆn+j|n (xi ) < yn+j (xi ) < yn+j|n (xi )),
hp
i=1 j=1

Mean coverage probability deviance = average(empirical
coverage - nominal coverage).
2 To assess which approach gives narrower prediction intervals,
calculate the width of prediction intervals
p h
1
Width = y ub ˆ lb
|ˆn+j|n (xi ) − yn+j|n (xi )|.
hp
i=1 j=1


Distributional forecast comparison

Parametric Nonparametric
Period TS BM TS BM PLS
Mar-Dec 97% 98% 97% 97% 95%
Apr-Dec 97% 98% 97% 97% 95%
May-Dec 96% 96% 96% 96% 96%
Jun-Dec 96% 96% 96% 95% 95%
Jul-Dec 95% 96% 95% 94% 94%
Aug-Dec 94% 94% 94% 94% 93%
Sep-Dec 93% 95% 93% 95% 93%
Oct-Dec 93% 93% 93% 93% 90%
Nov-Dec 93% 96% 93% 93% 93%
Dec 93% 100% 93% 93% 93%
MCD 1.58% 1.88% 1.58% 1.40% 1.49%
Table: Nominal = 95%, smaller the mean coverage probability deviance
(MCD) is, the better the method is.


Distributional forecast comparison

Parametric Nonparametric
Period TS BM TS BM PLS
Mar-Dec 3.65 3.64 3.55 3.51 3.15
Apr-Dec 3.73 3.73 3.62 3.66 3.21
May-Dec 3.69 3.69 3.57 3.61 3.21
Jun-Dec 3.58 3.58 3.47 3.50 3.05
Jul-Dec 3.47 3.46 3.38 3.41 2.90
Aug-Dec 3.34 3.33 3.26 3.37 2.61
Sep-Dec 3.26 3.26 3.19 3.25 2.82
Oct-Dec 3.27 3.28 3.20 3.23 2.78
Nov-Dec 3.23 3.24 3.16 3.26 2.69
Dec 3.19 3.18 3.12 3.30 2.48
Mean width 3.44 3.44 3.35 3.41 2.89
Table: Width comparison at nominal = 95%.


Conclusion of the third paper

1 Presented a nonparametric method to forecast univariate
seasonal time series.
2 Showed importance of dynamic updating for improving point
forecast accuracy.
3 Among all dynamic updating methods, RR turns out to be
best.
4 Possible to examine other penalty functions used in both the
PLS and RR methods.


Summary of the paper

1 Proposed three graphical tools for visualizing functional data
and identifying functional outliers.



2 Proposed a weighted functional principal component analysis
to model and forecast mortality and fertility.



2 Proposed a weighted functional principal component analysis
to model and forecast mortality and fertility.
3 Applied the functional data analytic approach to model and
forecast seasonal univariate time series.


References of three papers

Hyndman, R. J. and Shang, H. L. (2010) Rainbow plot, bagplot
and boxplot for functional data, Journal of Computational and
Graphical Statistics, 19(1), 29-45.

Hyndman, R. J. and Shang, H. L. (2009) Forecasting functional
time series (with discussion), Journal of Korean Statistical Society,
38(3), 199-221.

Shang, H. L. and Hyndman, R. J. (2011) Nonparametric time
series forecasting with dynamic updating, Mathematics and
Computers in Simulation, 81(7), 1310-1324.


References of three R packages

Shang, H. L. and Hyndman, R. J. (2011) rainbow: Rainbow plots,
bagplots and boxplots for functional data, R package version 2.3.4,
http://CRAN.R-project.org/package=rainbow.

Shang, H. L. and Hyndman, R. J. (2011) fds: Functional data sets,
R package version 1.6,
http://CRAN.R-project.org/package=fds.

Hyndman, R. J. and Shang, H. L. (2011) ftsa: Functional time
series analysis, R package version 2.6,
http://CRAN.R-project.org/package=ftsa.


Contact detail

Thank you for your attention.

Keep contact HanLin.Shang@monash.edu

Visualizing, Modeling and Forecasting of Functional Time Series

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Visualizing, Modeling and Forecasting of Functional Time Series

Similaire à Visualizing, Modeling and Forecasting of Functional Time Series (20)

Dernier

Dernier (20)

Visualizing, Modeling and Forecasting of Functional Time Series