Why is it that we can accurately forecast a solar eclipse in 1000 years time, but we have no idea whether Yahoo's stock price will rise or fall tomorrow? Or why can we forecast electricity consumption next week with remarkable precision, but we cannot forecast exchange rate fluctuations in the next hour?
In this talk, I will discuss the conditions we need for predictability, how to measure the uncertainty of predictions, and the consequences of thinking we can predict something more accurately than we can.
I will draw on my experiences in forecasting Australia's health budget for the next few years, in developing forecasting models for peak electricity demand in 20 years time, and in identifying unpredictable activity on Yahoo's mail servers.
2. Outline
1 Forecasting is difficult
2 Stock market data
3 Drug sales
4 Extreme electricity demand
5 What can we forecast?
6 M3 competition data
7 Yahoo web traffic
8 What next?
Exploring the boundaries of predictability Forecasting is difficult 2
4. Reputations can be made and lost
“I think there is a world market for maybe
five computers.” (Chairman of IBM, 1943)
“Computers in the future may weigh no
more than 1.5 tons.” (Popular Mechanics, 1949)
“There is no reason anyone would want a
computer in their home.” (President, DEC, 1977)
Exploring the boundaries of predictability Forecasting is difficult 4
5. Reputations can be made and lost
“I think there is a world market for maybe
five computers.” (Chairman of IBM, 1943)
“Computers in the future may weigh no
more than 1.5 tons.” (Popular Mechanics, 1949)
“There is no reason anyone would want a
computer in their home.” (President, DEC, 1977)
Exploring the boundaries of predictability Forecasting is difficult 4
6. Reputations can be made and lost
“I think there is a world market for maybe
five computers.” (Chairman of IBM, 1943)
“Computers in the future may weigh no
more than 1.5 tons.” (Popular Mechanics, 1949)
“There is no reason anyone would want a
computer in their home.” (President, DEC, 1977)
Exploring the boundaries of predictability Forecasting is difficult 4
7. Forecasting in a changing environment
A good forecasting model captures the way things
move, not just where things are.
a highly volatile environment will continue to
be highly volatile;
a business with fluctuating sales will continue
to have fluctuating sales;
an economy that has gone through booms and
busts will continue to go through booms and
busts.
“If we could first know where we are and whither we
are tending, we could better judge what to do and
how to do it.” Abraham Lincoln
Exploring the boundaries of predictability Forecasting is difficult 5
8. Forecasting in a changing environment
A good forecasting model captures the way things
move, not just where things are.
a highly volatile environment will continue to
be highly volatile;
a business with fluctuating sales will continue
to have fluctuating sales;
an economy that has gone through booms and
busts will continue to go through booms and
busts.
“If we could first know where we are and whither we
are tending, we could better judge what to do and
how to do it.” Abraham Lincoln
Exploring the boundaries of predictability Forecasting is difficult 5
9. Forecasting in a changing environment
A good forecasting model captures the way things
move, not just where things are.
a highly volatile environment will continue to
be highly volatile;
a business with fluctuating sales will continue
to have fluctuating sales;
an economy that has gone through booms and
busts will continue to go through booms and
busts.
“If we could first know where we are and whither we
are tending, we could better judge what to do and
how to do it.” Abraham Lincoln
Exploring the boundaries of predictability Forecasting is difficult 5
10. Forecasting in a changing environment
A good forecasting model captures the way things
move, not just where things are.
a highly volatile environment will continue to
be highly volatile;
a business with fluctuating sales will continue
to have fluctuating sales;
an economy that has gone through booms and
busts will continue to go through booms and
busts.
“If we could first know where we are and whither we
are tending, we could better judge what to do and
how to do it.” Abraham Lincoln
Exploring the boundaries of predictability Forecasting is difficult 5
11. Forecasting in a changing environment
A good forecasting model captures the way things
move, not just where things are.
a highly volatile environment will continue to
be highly volatile;
a business with fluctuating sales will continue
to have fluctuating sales;
an economy that has gone through booms and
busts will continue to go through booms and
busts.
“If we could first know where we are and whither we
are tending, we could better judge what to do and
how to do it.” Abraham Lincoln
Exploring the boundaries of predictability Forecasting is difficult 5
12. Outline
1 Forecasting is difficult
2 Stock market data
3 Drug sales
4 Extreme electricity demand
5 What can we forecast?
6 M3 competition data
7 Yahoo web traffic
8 What next?
Exploring the boundaries of predictability Stock market data 6
13. Stock market data
Exploring the boundaries of predictability Stock market data 7
Daily closing prices: Yahoo
Date
$
20304050
2012 2013 2014 2015
14. Stock market data
Exploring the boundaries of predictability Stock market data 7
Forecasts from ETS(M,N,N)
Date
$
20304050
2012 2013 2014 2015
15. Stock market data
Exploring the boundaries of predictability Stock market data 8
Forecasts from ETS(M,N,N)
Date
$
20304050
2012 2013 2014 2015
fit <- ets(yahoo)
plot(forecast(fit, h=100))
16. Stock market data
Exploring the boundaries of predictability Stock market data 9
Daily Log Returns: Yahoo
Date
%
−10−50510
2012 2013 2014 2015
17. Stock market data
Exploring the boundaries of predictability Stock market data 9
Forecasts from ARIMA(0,0,3) with non−zero mean
Date
%
−10−50510
2012 2013 2014 2015
18. Stock market data
Exploring the boundaries of predictability Stock market data 10
Forecasts from ARIMA(0,0,3) with non−zero mean
Date
%
−10−50510
2012 2013 2014 2015
fit <- auto.arima(logreturns)
plot(forecast(fit, h=100, bootstrap=TRUE))
19. Efficient Market Hypothesis
Market efficiency causes existing values to
always incorporate and reflect all relevant
information. Thus, current values are their
own forecasts.
Consequences
No such thing as an undervalued stock or
inflated stock.
Insider information or waiting a long time
are the only ways to win.
In reality, slight inefficiencies exist but are
usually insufficient to beat transaction costs.
Exploring the boundaries of predictability Stock market data 11
20. Efficient Market Hypothesis
Market efficiency causes existing values to
always incorporate and reflect all relevant
information. Thus, current values are their
own forecasts.
Consequences
No such thing as an undervalued stock or
inflated stock.
Insider information or waiting a long time
are the only ways to win.
In reality, slight inefficiencies exist but are
usually insufficient to beat transaction costs.
Exploring the boundaries of predictability Stock market data 11
21. Efficient Market Hypothesis
Market efficiency causes existing values to
always incorporate and reflect all relevant
information. Thus, current values are their
own forecasts.
Consequences
No such thing as an undervalued stock or
inflated stock.
Insider information or waiting a long time
are the only ways to win.
In reality, slight inefficiencies exist but are
usually insufficient to beat transaction costs.
Exploring the boundaries of predictability Stock market data 11
22. Efficient Market Hypothesis
Market efficiency causes existing values to
always incorporate and reflect all relevant
information. Thus, current values are their
own forecasts.
Consequences
No such thing as an undervalued stock or
inflated stock.
Insider information or waiting a long time
are the only ways to win.
In reality, slight inefficiencies exist but are
usually insufficient to beat transaction costs.
Exploring the boundaries of predictability Stock market data 11
23. Efficient Market Hypothesis
Market efficiency causes existing values to
always incorporate and reflect all relevant
information. Thus, current values are their
own forecasts.
Consequences
No such thing as an undervalued stock or
inflated stock.
Insider information or waiting a long time
are the only ways to win.
In reality, slight inefficiencies exist but are
usually insufficient to beat transaction costs.
Exploring the boundaries of predictability Stock market data 11
24. Outline
1 Forecasting is difficult
2 Stock market data
3 Drug sales
4 Extreme electricity demand
5 What can we forecast?
6 M3 competition data
7 Yahoo web traffic
8 What next?
Exploring the boundaries of predictability Drug sales 12
25. Australian drug sales
The Pharmaceutical Benefits Scheme (PBS) is
the Australian government drugs subsidy scheme.
Many drugs bought from pharmacies (“drug
stores” in the US) are subsidised to allow more
equitable access to modern drugs.
The cost to government is determined by the
number and types of drugs purchased.
Currently nearly 1% of GDP.
The total cost is budgeted based on forecasts
of drug usage.
Exploring the boundaries of predictability Drug sales 13
26. Australian drug sales
The Pharmaceutical Benefits Scheme (PBS) is
the Australian government drugs subsidy scheme.
Many drugs bought from pharmacies (“drug
stores” in the US) are subsidised to allow more
equitable access to modern drugs.
The cost to government is determined by the
number and types of drugs purchased.
Currently nearly 1% of GDP.
The total cost is budgeted based on forecasts
of drug usage.
Exploring the boundaries of predictability Drug sales 13
27. Australian drug sales
The Pharmaceutical Benefits Scheme (PBS) is
the Australian government drugs subsidy scheme.
Many drugs bought from pharmacies (“drug
stores” in the US) are subsidised to allow more
equitable access to modern drugs.
The cost to government is determined by the
number and types of drugs purchased.
Currently nearly 1% of GDP.
The total cost is budgeted based on forecasts
of drug usage.
Exploring the boundaries of predictability Drug sales 13
28. Australian drug sales
The Pharmaceutical Benefits Scheme (PBS) is
the Australian government drugs subsidy scheme.
Many drugs bought from pharmacies (“drug
stores” in the US) are subsidised to allow more
equitable access to modern drugs.
The cost to government is determined by the
number and types of drugs purchased.
Currently nearly 1% of GDP.
The total cost is budgeted based on forecasts
of drug usage.
Exploring the boundaries of predictability Drug sales 13
29. Forecasting the PBS
In 2001: $4.5 billion budget, under-forecasted
by $800 million.
Subject to covert marketing, volatile products,
uncontrollable expenditure.
Monthly data on thousands of drug groups and
4 concession types available from 1991.
Data were aggregated to annual values, and
only the first three years were being used in
estimating the forecasts.
All forecasts being done with the FORECAST
function in MS-Excel!
Exploring the boundaries of predictability Drug sales 14
30. Forecasting the PBS
In 2001: $4.5 billion budget, under-forecasted
by $800 million.
Subject to covert marketing, volatile products,
uncontrollable expenditure.
Monthly data on thousands of drug groups and
4 concession types available from 1991.
Data were aggregated to annual values, and
only the first three years were being used in
estimating the forecasts.
All forecasts being done with the FORECAST
function in MS-Excel!
Exploring the boundaries of predictability Drug sales 14
31. Forecasting the PBS
In 2001: $4.5 billion budget, under-forecasted
by $800 million.
Subject to covert marketing, volatile products,
uncontrollable expenditure.
Monthly data on thousands of drug groups and
4 concession types available from 1991.
Data were aggregated to annual values, and
only the first three years were being used in
estimating the forecasts.
All forecasts being done with the FORECAST
function in MS-Excel!
Exploring the boundaries of predictability Drug sales 14
32. Forecasting the PBS
In 2001: $4.5 billion budget, under-forecasted
by $800 million.
Subject to covert marketing, volatile products,
uncontrollable expenditure.
Monthly data on thousands of drug groups and
4 concession types available from 1991.
Data were aggregated to annual values, and
only the first three years were being used in
estimating the forecasts.
All forecasts being done with the FORECAST
function in MS-Excel!
Exploring the boundaries of predictability Drug sales 14
33. Forecasting the PBS
In 2001: $4.5 billion budget, under-forecasted
by $800 million.
Subject to covert marketing, volatile products,
uncontrollable expenditure.
Monthly data on thousands of drug groups and
4 concession types available from 1991.
Data were aggregated to annual values, and
only the first three years were being used in
estimating the forecasts.
All forecasts being done with the FORECAST
function in MS-Excel!
Exploring the boundaries of predictability Drug sales 14
34. ATC drug classification
A Alimentary tract and metabolism14 classes
A10 Drugs used in diabetes84 classes
A10B Blood glucose lowering drugs
A10BA Biguanides
A10BA02 Metformin
Exploring the boundaries of predictability Drug sales 15
35. ETS forecasts of PBS data
Exploring the boundaries of predictability Drug sales 16
Total cost: A03 concession safety net group
$thousands
1995 2000 2005 2010
020040060080010001200
36. ETS forecasts of PBS data
Exploring the boundaries of predictability Drug sales 16
Total cost: A05 general copayments group
$thousands
1995 2000 2005 2010
050100150200250
37. ETS forecasts of PBS data
Exploring the boundaries of predictability Drug sales 16
Total cost: D01 general copayments group
$thousands
1995 2000 2005 2010
0100200300400500600700
38. ETS forecasts of PBS data
Exploring the boundaries of predictability Drug sales 16
Total cost: S01 general copayments group
$thousands
1995 2000 2005 2010
0100020003000400050006000
39. ETS forecasts of PBS data
Exploring the boundaries of predictability Drug sales 16
Total cost: R03 general copayments group
$thousands
1995 2000 2005 2010
1000200030004000500060007000
40. Forecasting the PBS
As part of this project, we developed an
automatic forecasting algorithm for exponential
smoothing state space models based on the
AIC.
Exponential smoothing models allowed for
time-changing trend and seasonal patterns.
Forecast MAPE reduced from 15–20% to 0.6%.
State space models provide prediction intervals
which give a sense of uncertainty.
Algorithm now implemented as ets function
in forecast package in R.
Exploring the boundaries of predictability Drug sales 17
41. Forecasting the PBS
As part of this project, we developed an
automatic forecasting algorithm for exponential
smoothing state space models based on the
AIC.
Exponential smoothing models allowed for
time-changing trend and seasonal patterns.
Forecast MAPE reduced from 15–20% to 0.6%.
State space models provide prediction intervals
which give a sense of uncertainty.
Algorithm now implemented as ets function
in forecast package in R.
Exploring the boundaries of predictability Drug sales 17
42. Forecasting the PBS
As part of this project, we developed an
automatic forecasting algorithm for exponential
smoothing state space models based on the
AIC.
Exponential smoothing models allowed for
time-changing trend and seasonal patterns.
Forecast MAPE reduced from 15–20% to 0.6%.
State space models provide prediction intervals
which give a sense of uncertainty.
Algorithm now implemented as ets function
in forecast package in R.
Exploring the boundaries of predictability Drug sales 17
43. Forecasting the PBS
As part of this project, we developed an
automatic forecasting algorithm for exponential
smoothing state space models based on the
AIC.
Exponential smoothing models allowed for
time-changing trend and seasonal patterns.
Forecast MAPE reduced from 15–20% to 0.6%.
State space models provide prediction intervals
which give a sense of uncertainty.
Algorithm now implemented as ets function
in forecast package in R.
Exploring the boundaries of predictability Drug sales 17
44. Forecasting the PBS
As part of this project, we developed an
automatic forecasting algorithm for exponential
smoothing state space models based on the
AIC.
Exponential smoothing models allowed for
time-changing trend and seasonal patterns.
Forecast MAPE reduced from 15–20% to 0.6%.
State space models provide prediction intervals
which give a sense of uncertainty.
Algorithm now implemented as ets function
in forecast package in R.
Exploring the boundaries of predictability Drug sales 17
45. Outline
1 Forecasting is difficult
2 Stock market data
3 Drug sales
4 Extreme electricity demand
5 What can we forecast?
6 M3 competition data
7 Yahoo web traffic
8 What next?
Exploring the boundaries of predictability Extreme electricity demand 18
46. The problem
We want to forecast the peak electricity
demand in a half-hour period in twenty years
time.
We have fifteen years of half-hourly electricity
data, temperature data and some economic
and demographic data.
The location is South Australia: home to the
most volatile electricity demand in the world.
Sounds impossible?
Exploring the boundaries of predictability Extreme electricity demand 19
47. The problem
We want to forecast the peak electricity
demand in a half-hour period in twenty years
time.
We have fifteen years of half-hourly electricity
data, temperature data and some economic
and demographic data.
The location is South Australia: home to the
most volatile electricity demand in the world.
Sounds impossible?
Exploring the boundaries of predictability Extreme electricity demand 19
48. South Australian demand data
Exploring the boundaries of predictability Extreme electricity demand 20
49. South Australian demand data
Exploring the boundaries of predictability Extreme electricity demand 20
Black Saturday →
51. South Australian demand data
Exploring the boundaries of predictability Extreme electricity demand 22
SA State wide demand (summer 2015)
SAStatewidedemand(GW)
1.01.52.02.53.0
Oct Nov Dec Jan Feb Mar
52. South Australian demand data
Exploring the boundaries of predictability Extreme electricity demand 22
53. Temperature data (Sth Aust)
Exploring the boundaries of predictability Extreme electricity demand 23
54. Temperature data (Sth Aust)
Exploring the boundaries of predictability Extreme electricity demand 24
10 20 30 40
1.01.52.02.53.03.5
Time: 12 midnight
Temperature (deg C)
Demand(GW)
Workday
Non−workday
55. Predictors
calendar effects
prevailing and recent weather conditions
climate changes
economic and demographic changes
changing technology
Modelling framework
Semi-parametric additive models
with correlated errors.
Each half-hour period modelled separately for:
each season (x2)
work-days/non-work-days (x2)
morning/afternoon/evening (x3)
Total: 48 × 2 × 2 × 3 = 576 models.
Variables selected to provide best out-of-sample
predictions using cross-validation on last two years.
Exploring the boundaries of predictability Extreme electricity demand 25
56. Predictors
calendar effects
prevailing and recent weather conditions
climate changes
economic and demographic changes
changing technology
Modelling framework
Semi-parametric additive models
with correlated errors.
Each half-hour period modelled separately for:
each season (x2)
work-days/non-work-days (x2)
morning/afternoon/evening (x3)
Total: 48 × 2 × 2 × 3 = 576 models.
Variables selected to provide best out-of-sample
predictions using cross-validation on last two years.
Exploring the boundaries of predictability Extreme electricity demand 25
57. Half-hourly models
Exploring the boundaries of predictability Extreme electricity demand 26
Demand (January 2015)
Date in January
SAdemand(GW)
012345
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31
Actual
Predicted
Temperatures (January 2015)
)
40
temp_23090
temp_23083
59. Outline
1 Forecasting is difficult
2 Stock market data
3 Drug sales
4 Extreme electricity demand
5 What can we forecast?
6 M3 competition data
7 Yahoo web traffic
8 What next?
Exploring the boundaries of predictability What can we forecast? 28
60. What can we forecast?
Exploring the boundaries of predictability What can we forecast? 29
61. What can we forecast?
Exploring the boundaries of predictability What can we forecast? 29
62. What can we forecast?
Exploring the boundaries of predictability What can we forecast? 29
63. What can we forecast?
Exploring the boundaries of predictability What can we forecast? 29
64. What can we forecast?
Exploring the boundaries of predictability What can we forecast? 29
65. What can we forecast?
Exploring the boundaries of predictability What can we forecast? 29
66. What can we forecast?
Exploring the boundaries of predictability What can we forecast? 29
67. What can we forecast?
Exploring the boundaries of predictability What can we forecast? 29
68. Predictability factors
1 We understand and can measure the causal factors.
2 There is a lot of historical data available.
3 The forecasts do not affect the thing we are trying to
forecast.
4 The future is somewhat similar to the past.
Exploring the boundaries of predictability What can we forecast? 30
69. Predictability factors
1 We understand and can measure the causal factors.
2 There is a lot of historical data available.
3 The forecasts do not affect the thing we are trying to
forecast.
4 The future is somewhat similar to the past.
Exploring the boundaries of predictability What can we forecast? 30
70. Predictability factors
Measure Big No Future
Causal Data Feedback ∼ Past
Finance N Y N short-term
Economics N N N short-term
Drugs partly Y Y short-term
Electricity short-term Y Y short-term
Weather short-term Y Y short-term
Astronomy Y Y Y Y
Exploring the boundaries of predictability What can we forecast? 31
71. Outline
1 Forecasting is difficult
2 Stock market data
3 Drug sales
4 Extreme electricity demand
5 What can we forecast?
6 M3 competition data
7 Yahoo web traffic
8 What next?
Exploring the boundaries of predictability M3 competition data 32
74. M3 forecasting competition
“The M3-Competition is a final attempt by the authors to
settle the accuracy issue of various time series methods. . .
The extension involves the inclusion of more methods/
researchers (in particular in the areas of neural networks
and expert systems) and more series.”
Makridakis & Hibon, IJF 2000
3003 series
All data from business, demography, finance and
economics.
Series length between 14 and 126.
Either non-seasonal, monthly or quarterly.
All time series positive.
Exploring the boundaries of predictability M3 competition data 34
76. Key idea
Examples for time series
lag correlation
size and direction of trend
strength of seasonality
timing of peak seasonality
spectral entropy
Called “features” or “characteristics” in the
machine learning literature.
Exploring the boundaries of predictability M3 competition data 36
John W Tukey
Cognostics
Computer-produced diagnostics
(Tukey and Tukey, 1985).
77. Key idea
Examples for time series
lag correlation
size and direction of trend
strength of seasonality
timing of peak seasonality
spectral entropy
Called “features” or “characteristics” in the
machine learning literature.
Exploring the boundaries of predictability M3 competition data 36
John W Tukey
Cognostics
Computer-produced diagnostics
(Tukey and Tukey, 1985).
78. Key idea
Examples for time series
lag correlation
size and direction of trend
strength of seasonality
timing of peak seasonality
spectral entropy
Called “features” or “characteristics” in the
machine learning literature.
Exploring the boundaries of predictability M3 competition data 36
John W Tukey
Cognostics
Computer-produced diagnostics
(Tukey and Tukey, 1985).
79. An STL decomposition
Exploring the boundaries of predictability M3 competition data 37
Time
1984 1986 1988 1990 1992
6000650070007500
80. An STL decomposition
Yt = St + Tt + Rt
60007000
data
−1500
seasonal
60007000
trend
−60−2020
1984 1986 1988 1990 1992
remainder
time
Exploring the boundaries of predictability M3 competition data 37
81. Candidate features
STL decomposition
Yt = St + Tt + Rt
Seasonal period
Strength of seasonality: 1 − Var(Rt)
Var(Yt−Tt)
Strength of trend: 1 − Var(Rt)
Var(Yt−St)
Spectral entropy: H = −
π
−π fy(λ) log fy(λ)dλ,
where fy(λ) is spectral density of Yt.
Low values of H suggest a time series that is
easier to forecast (more signal).
Autocorrelations: r1, r2, r3, . . .
Optimal Box-Cox transformation parameter λ
Exploring the boundaries of predictability M3 competition data 38
82. Candidate features
STL decomposition
Yt = St + Tt + Rt
Seasonal period
Strength of seasonality: 1 − Var(Rt)
Var(Yt−Tt)
Strength of trend: 1 − Var(Rt)
Var(Yt−St)
Spectral entropy: H = −
π
−π fy(λ) log fy(λ)dλ,
where fy(λ) is spectral density of Yt.
Low values of H suggest a time series that is
easier to forecast (more signal).
Autocorrelations: r1, r2, r3, . . .
Optimal Box-Cox transformation parameter λ
Exploring the boundaries of predictability M3 competition data 38
83. Candidate features
STL decomposition
Yt = St + Tt + Rt
Seasonal period
Strength of seasonality: 1 − Var(Rt)
Var(Yt−Tt)
Strength of trend: 1 − Var(Rt)
Var(Yt−St)
Spectral entropy: H = −
π
−π fy(λ) log fy(λ)dλ,
where fy(λ) is spectral density of Yt.
Low values of H suggest a time series that is
easier to forecast (more signal).
Autocorrelations: r1, r2, r3, . . .
Optimal Box-Cox transformation parameter λ
Exploring the boundaries of predictability M3 competition data 38
84. Candidate features
STL decomposition
Yt = St + Tt + Rt
Seasonal period
Strength of seasonality: 1 − Var(Rt)
Var(Yt−Tt)
Strength of trend: 1 − Var(Rt)
Var(Yt−St)
Spectral entropy: H = −
π
−π fy(λ) log fy(λ)dλ,
where fy(λ) is spectral density of Yt.
Low values of H suggest a time series that is
easier to forecast (more signal).
Autocorrelations: r1, r2, r3, . . .
Optimal Box-Cox transformation parameter λ
Exploring the boundaries of predictability M3 competition data 38
85. Candidate features
STL decomposition
Yt = St + Tt + Rt
Seasonal period
Strength of seasonality: 1 − Var(Rt)
Var(Yt−Tt)
Strength of trend: 1 − Var(Rt)
Var(Yt−St)
Spectral entropy: H = −
π
−π fy(λ) log fy(λ)dλ,
where fy(λ) is spectral density of Yt.
Low values of H suggest a time series that is
easier to forecast (more signal).
Autocorrelations: r1, r2, r3, . . .
Optimal Box-Cox transformation parameter λ
Exploring the boundaries of predictability M3 competition data 38
86. Candidate features
STL decomposition
Yt = St + Tt + Rt
Seasonal period
Strength of seasonality: 1 − Var(Rt)
Var(Yt−Tt)
Strength of trend: 1 − Var(Rt)
Var(Yt−St)
Spectral entropy: H = −
π
−π fy(λ) log fy(λ)dλ,
where fy(λ) is spectral density of Yt.
Low values of H suggest a time series that is
easier to forecast (more signal).
Autocorrelations: r1, r2, r3, . . .
Optimal Box-Cox transformation parameter λ
Exploring the boundaries of predictability M3 competition data 38
87. Candidate features
STL decomposition
Yt = St + Tt + Rt
Seasonal period
Strength of seasonality: 1 − Var(Rt)
Var(Yt−Tt)
Strength of trend: 1 − Var(Rt)
Var(Yt−St)
Spectral entropy: H = −
π
−π fy(λ) log fy(λ)dλ,
where fy(λ) is spectral density of Yt.
Low values of H suggest a time series that is
easier to forecast (more signal).
Autocorrelations: r1, r2, r3, . . .
Optimal Box-Cox transformation parameter λ
Exploring the boundaries of predictability M3 competition data 38
88. Candidate features
Exploring the boundaries of predictability M3 competition data 39
Seasonality
N0001
1976 1978 1980 1982 1984 1986 1988
100030005000
N2602
1978 1980 1982 1984 1986
01000020000
N1906
1984 1986 1988 1990 1992
2000600010000
89. Candidate features
Exploring the boundaries of predictability M3 competition data 39
Trend
N0125
1976 1978 1980 1982 1984 1986 1988
200040006000
N1978
1982 1984 1986 1988 1990 1992
30005000
N0546
1975 1980 1985
100040007000
90. Candidate features
Exploring the boundaries of predictability M3 competition data 39
ACF1
N0001
1987 1988 1989 1990
580060006200
N2658
1987 1988 1989 1990 1991
300050007000
N2409
1984 1986 1988 1990 1992
700080009000
91. Candidate features
Exploring the boundaries of predictability M3 competition data 39
Spectral entropy
N2487
1964 1966 1968 1970 1972 1974
250040005500
N0794
1986 1988 1990 1992
30004500
N0121
1976 1978 1980 1982 1984 1986 1988
200024002800
92. Candidate features
Exploring the boundaries of predictability M3 competition data 39
Box Cox
N0002
1976 1978 1980 1982 1984 1986 1988
200040006000
N0468
1960 1965 1970 1975 1980 1985
500070009000
N0354
1960 1965 1970 1975 1980 1985
20006000
93. Candidate features
Exploring the boundaries of predictability M3 competition data 40
SpecEntr
0.0 0.4 0.8 2 6 10 0.0 0.4 0.8
0.50.9
0.00.6
Trend
Season
0.00.6
28
Freq
ACF
−0.40.6
0.5 0.7 0.9
0.00.6
0.0 0.4 0.8 −0.4 0.2 0.8
Lambda
94. Dimension reduction for time series
Exploring the boundaries of predictability M3 competition data 41
q
95. Dimension reduction for time series
Exploring the boundaries of predictability M3 competition data 41
q
SpecEntr
0.0 0.4 0.8 2 6 10 0.0 0.4 0.8
0.50.9
0.00.6
Trend
Season
0.00.6
28
Freq
ACF
−0.40.6
0.5 0.7 0.9
0.00.6
0.0 0.4 0.8 −0.4 0.2 0.8
Lambda
Feature
calculation