Publicité
Publicité

Contenu connexe

Publicité

lecture3.pdf

  1. Social Forecasting Lecture 3: Smoothing and Arima Thomas Chadefaux 1
  2. Time series components
  3. Time series patterns Trend pattern exists when there is a long-term increase or decrease in the data. Cyclic pattern exists when data exhibit rises and falls that are not of fixed period (duration usually of at least 2 years). Seasonal pattern exists when a series is influenced by seasonal factors (e.g., the quarter of the year, the month, or day of the week). 2
  4. Time series decomposition yt = f (St, Tt, Rt) where yt = data at period t Tt = trend-cycle component at period t St = seasonal component at period t Rt = remainder component at period t Additive decomposition: yt = St + Tt + Rt. Multiplicative decomposition: yt = St × Tt × Rt. 3
  5. Time series decomposition • Additive model appropriate if magnitude of seasonal fluctuations does not vary with level. • If seasonal are proportional to level of series, then multiplicative model appropriate. • Multiplicative decomposition more prevalent with economic series • technical (can ignore): Alternative: use a Box-Cox transformation, and then use additive decomposition, since Logs turn multiplicative relationship into an additive relationship: yt = St×Tt×Et ⇒ log yt = log St+log Tt+log Rt. 4
  6. Euro electrical equipment Monthly manufacture of electrical equipment: computer, electronic and optical products. January 1996 - March 2012. data trend seasonal remainder 60 80 100 120 80 90 100 110 −20 −10 0 10 −4 0 4 5
  7. Helper functions • seasonal() extracts the seasonal component • trendcycle() extracts the trend-cycle component • remainder() extracts the remainder component. • seasadj() returns the seasonally adjusted series. 6
  8. Your turn Repeat the decomposition using library(fpp2) elecequip %>% stl(s.window=15, t.window=2) %>% autoplot() What happens as you change s.window and t.window? 7
  9. Seasonal adjustment
  10. Seasonal adjustment • Useful by-product of decomposition: an easy way to calculate seasonally adjusted data. • Additive decomposition: seasonally adjusted data given by yt − St = Tt + Rt 8
  11. Euro electrical equipment fit <- stl(elecequip, s.window=7) autoplot(elecequip, series="Data") + autolayer(seasadj(fit), series="Seasonally Adjusted") 80 100 120 New orders index series Data Seasonally Adjusted Electrical equipment manufacturing (Euro area) 9
  12. Seasonal adjustment • We use estimates of S based on past values to seasonally adjust a current value. • Seasonally adjusted series reflect remainders as well as trend. Therefore they are not “smooth”” and “downturns”” or “upturns” can be misleading. • It is better to use the trend-cycle component to look for turning points. 10
  13. How to decompose? 1. Compute the trend T̂t using a moving average over the past m values 2. Calculate the detrended series: yt − T̂t 3. Calculate the seasonal component: average the detrended values for that season. E.g. the seasonal component for March is the average of all the detrended March values in the data. Then normalize these to sum to 0. This gives Ŝt 4. The remainder component is calculated by subtracting the estimated seasonal and trend-cycle components: R̂t = yt − T̂t − Ŝt 11
  14. Smoothing
  15. Simple methods Time series y1, y2, . . . , yT . Random walk forecasts ŷT+h|T = yT Average forecasts ŷT+h|T = 1 T T X t=1 yt • Want something in between that weights most recent data more highly. • Simple exponential smoothing uses a weighted moving average with weights that decrease exponentially. 12
  16. Simple Exponential Smoothing Forecast equation ŷT+1|T = αyT + α(1 − α)yT−1 + α(1 − α)2 yT−2 + · · · where 0 ≤ α ≤ 1. Weights assigned to observations for: Observation α = 0.2 α = 0.4 α = 0.6 α = 0.8 yT 0.2 0.4 0.6 0.8 yT−1 0.16 0.24 0.24 0.16 yT−2 0.128 0.144 0.096 0.032 yT−3 0.1024 0.0864 0.0384 0.0064 yT−4 (0.2)(0.8)4 (0.4)(0.6)4 (0.6)(0.4)4 (0.8)(0.2)4 yT−5 (0.2)(0.8)5 (0.4)(0.6)5 (0.6)(0.4)5 (0.8)(0.2)5 13
  17. Optimisation • Need to choose value for α • Similarly to regression — we choose α by minimising SSE: SSE = T X t=1 (yt − ŷt|t−1)2 . • Unlike regression there is no closed form solution — use numerical optimization. 14
  18. Example: Oil production oildata <- window(oil, start=1996) # Estimate parameters fc <- ses(oildata, h=5) summary(fc[["model"]]) ## Simple exponential smoothing ## ## Call: ## ses(y = oildata, h = 5) ## ## Smoothing parameters: ## alpha = 0.8339 ## ## Initial states: ## l = 446.5868 ## ## sigma: 29.8282 ## ## AIC AICc BIC 15
  19. Example: Oil production Year Time Observation Level Forecast 1995 0 446.59 1996 1 445.36 445.57 446.59 1997 2 453.20 451.93 445.57 1998 3 454.41 454.00 451.93 1999 4 422.38 427.63 454.00 2000 5 456.04 451.32 427.63 2001 6 440.39 442.20 451.32 2002 7 425.19 428.02 442.20 2003 8 486.21 476.54 428.02 2004 9 500.43 496.46 476.54 2005 10 521.28 517.15 496.46 2006 11 508.95 510.31 517.15 2007 12 488.89 492.45 510.31 2008 13 509.87 506.98 492.45 2009 14 456.72 465.07 506.98 2010 15 473.82 472.36 465.07 2011 16 525.95 517.05 472.36 2012 17 549.83 544.39 517.05 2013 18 542.34 542.68 544.39 2014 19 2014 1 542.68 2015 2 542.68 2016 3 542.68 16
  20. Example: Oil production autoplot(fc) + autolayer(fitted(fc), series="Fitted") + ylab("Oil (millions of tonnes)") + xlab("Year") 450 500 550 600 Oil (millions of tonnes) series Fitted Forecasts from Simple exponential smoothing 17
  21. Autocorrelation
  22. Autocorrelation Autocorrelation measures the linear relationship between lagged values of a time series rk = PT t=k+1(yt − ȳ)(yt−k − ȳ) PT t=1(yt − ȳ)2 18
  23. Autocorrelation 400 450 500 1995 2000 2005 2010 Time beer2 Beer production ## [1] "Cor(beer, beer.lag) = -0.103" 19
  24. Trend and seasonality in ACF aelec <- window(elec, start=1980) autoplot(aelec) + xlab("Year") + ylab("GWh") 8000 10000 12000 14000 GWh 20
  25. Trend and seasonality in ACF ggAcf(aelec, lag=48) 0.00 0.25 0.50 0.75 0 12 24 36 48 Lag ACF Series: aelec 21
  26. Stationarity and differencing
  27. Stationarity Definition If {yt} is a stationary time series, then for all s, the distribution of (yt, . . . , yt+s) does not depend on t. A stationary series is: • roughly horizontal • constant variance • no patterns predictable in the long-term 22
  28. Stationary? 3600 3700 3800 3900 4000 0 50 100 150 200 250 300 Day Dow Jones Index 23
  29. Stationary? −100 −50 0 50 0 50 100 150 200 250 300 Day Change in Dow Jones Index 24
  30. Stationary? 4000 5000 6000 1950 1955 1960 1965 1970 1975 1980 Year Number of strikes 25
  31. Stationary? 40 60 80 1975 1980 1985 1990 1995 Year Total sales Sales of new one−family houses, USA 26
  32. Stationary? 100 200 300 1900 1920 1940 1960 1980 Year $ Price of a dozen eggs in 1993 dollars 27
  33. Stationary? 80 90 100 110 1990 1991 1992 1993 1994 1995 Year thousands Number of pigs slaughtered in Victoria 28
  34. Stationary? 0 2000 4000 6000 1820 1840 1860 1880 1900 1920 Year Number trapped Annual Canadian Lynx Trappings 29
  35. Stationary? 400 450 500 1995 2000 2005 2010 Year megalitres Australian quarterly beer production 30
  36. Stationarity Definition If {yt} is a stationary time series, then for all s, the distribution of (yt, . . . , yt+s) does not depend on t. Transformations help to stabilize the variance. For ARIMA modelling, we also need to stabilize the mean. 31
  37. Non-stationarity in the mean Identifying non-stationary series • time plot. • The ACF of stationary data drops to zero relatively quickly • The ACF of non-stationary data decreases slowly. • For non-stationary data, the value of r1 is often large and positive. 32
  38. Example: Dow-Jones index 3600 3700 3800 3900 4000 0 50 100 150 200 250 300 Day Dow Jones Index 33
  39. Example: Dow-Jones index 0.00 0.25 0.50 0.75 1.00 0 5 10 15 20 25 Lag ACF Series: dj 34
  40. Example: Dow-Jones index −100 −50 0 50 0 50 100 150 200 250 300 Day Change in Dow Jones Index 35
  41. Example: Dow-Jones index −0.10 −0.05 0.00 0.05 0.10 0 5 10 15 20 25 Lag ACF Series: diff(dj) 36
  42. Differencing • Differencing helps to stabilize the mean. • The differenced series is the change between each observation in the original series: y0 t = yt − yt−1. • The differenced series will have only T − 1 values since it is not possible to calculate a difference y0 1 for the first observation. 37
  43. Second-order differencing Occasionally the differenced data will not appear stationary and it may be necessary to difference the data a second time: y00 t = y0 t − y0 t−1 = (yt − yt−1) − (yt−1 − yt−2) = yt − 2yt−1 + yt−2. • y00 t will have T − 2 values. • In practice, it is almost never necessary to go beyond second-order differences. 38
  44. Seasonal differencing A seasonal difference is the difference between an observation and the corresponding observation from the previous year. y0 t = yt − yt−m where m = number of seasons. • For monthly data m = 12. • For quarterly data m = 4. 39
  45. Electricity production usmelec %>% autoplot() 200 300 400 1980 1990 2000 2010 Time . 40
  46. Electricity production usmelec %>% log() %>% autoplot() 5.1 5.4 5.7 6.0 1980 1990 2000 2010 Time . 41
  47. Electricity production usmelec %>% log() %>% diff(lag=12) %>% autoplot() 0.0 0.1 1980 1990 2000 2010 Time . 42
  48. Electricity production usmelec %>% log() %>% diff(lag=12) %>% diff(lag=1) %>% autoplot() −0.15 −0.10 −0.05 0.00 0.05 0.10 1980 1990 2000 2010 Time . 43
  49. Electricity production • Seasonally differenced series is closer to being stationary. • Remaining non-stationarity can be removed with further first difference. If y0 t = yt − yt−12 denotes seasonally differenced series, then twice-differenced series i y∗ t = y0 t − y0 t−1 = (yt − yt−12) − (yt−1 − yt−13) = yt − yt−1 − yt−12 + yt−13 . 44
  50. Seasonal differencing When both seasonal and first differences are applied: • it makes no difference which is done first—the result will be the same. • If seasonality is strong, we recommend that seasonal differencing be done first because sometimes the resulting series will be stationary and there will be no need for further first difference. It is important that if differencing is used, the differences are interpretable. 45
  51. Interpretation of differencing • first differences are the change between one observation and the next; • seasonal differences are the change between one year to the next. But taking lag 3 differences for yearly data, for example, results in a model which cannot be sensibly interpreted. 46
  52. ARIMA MODELS
  53. Autoregressive models Autoregressive (AR) models: yt = c + φ1yt−1 + φ2yt−2 + · · · + φpyt−p + εt, where εt is white noise. This is a multiple regression with lagged values of yt as predictors. 8 10 12 0 20 40 60 80 100 Time AR(1) 15.0 17.5 20.0 22.5 25.0 0 20 40 60 80 100 Time AR(2) 47
  54. AR(1) model yt = 2 − 0.8yt−1 + εt εt ∼ N(0, 1), T = 100. 8 10 12 0 20 40 60 80 100 Time AR(1) 48
  55. AR(1) model yt = c + φ1yt−1 + εt • When φ1 = 0, yt is equivalent to white noise • When φ1 = 1 and c = 0, yt is equivalent to a random walk • When φ1 = 1 and c 6= 0, yt is equivalent to a random walk with drift • When φ1 < 0, yt tends to oscillate between positive and negative values. 49
  56. AR(2) model yt = 8 + 1.3yt−1 − 0.7yt−2 + εt εt ∼ N(0, 1), T = 100. 15.0 17.5 20.0 22.5 25.0 0 20 40 60 80 100 Time AR(2) 50
  57. Stationarity conditions We normally restrict autoregressive models to stationary data, and then some constraints on the values of the parameters are required. 51
  58. Moving Average (MA) models Moving Average (MA) models: yt = c + εt + θ1εt−1 + θ2εt−2 + · · · + θqεt−q, where εt is white noise. This is a multiple regression with past errors as predictors. Don’t confuse this with moving average smoothing! 18 20 22 0 20 40 60 80 100 Time MA(1) −5.0 −2.5 0.0 2.5 0 20 40 60 80 100 Time MA(2) ## MA(1) model yt = 20 + εt + 0.8εt−1 εt ∼ N(0, 1), T = 100. 52
  59. ARIMA modelling in R
  60. How does auto.arima() work? Need to select appropriate orders: p, q, d end{block} Hyndman and Khandakar (JSS, 2008) algorithm: • Select no. differences d via KPSS test (tests for a unit root) . • Select p, q by minimising AICc. • Use stepwise search to traverse model space. 53
  61. How does auto.arima() work? AICc = −2 log(L) + 2(p + q + k + 1) h 1 + (p+q+k+2) T−p−q−k−2 i . where L is the maximised likelihood fitted to the differenced data, k = 1 if c 6= 0 and k = 0 otherwise. Step1: Select current model (with smallest AICc) from: ARIMA(2, d, 2) ARIMA(0, d, 0) ARIMA(1, d, 0) ARIMA(0, d, 1) Step 2: Consider variations of current model: • vary one of p, q, from current model by ±1; • p, q both vary from current model by ±1; • Include/exclude c from current model. Model with lowest AICc becomes current model. 54
  62. Choosing your own model Number of users logged on to an internet server each minute over a 100-minute period. ggtsdisplay(internet) 80 120 160 200 0 20 40 60 80 100 −0.5 0.0 0.5 1.0 5 10 15 20 Lag ACF −0.5 0.0 0.5 1.0 5 10 15 20 Lag PACF 55
  63. Choosing your own model ggtsdisplay(diff(internet)) −15 −10 −5 0 5 10 15 0 20 40 60 80 100 −0.4 0.0 0.4 0.8 5 10 15 20 Lag ACF −0.4 0.0 0.4 0.8 5 10 15 20 Lag PACF 56
  64. Choosing your own model (fit <- Arima(internet,order=c(1,1,0))) ## Series: internet ## ARIMA(1,1,0) ## ## Coefficients: ## ar1 ## 0.8026 ## s.e. 0.0580 ## ## sigma^2 = 11.79: log likelihood = -262.62 ## AIC=529.24 AICc=529.36 BIC=534.43 57
  65. Choosing your own model ggtsdisplay(resid(fit)) −5 0 5 0 20 40 60 80 100 −0.2 0.0 0.2 5 10 15 20 Lag ACF −0.2 0.0 0.2 5 10 15 20 Lag PACF 58
  66. Choosing your own model (fit <- Arima(internet,order=c(2,1,0))) ## Series: internet ## ARIMA(2,1,0) ## ## Coefficients: ## ar1 ar2 ## 1.0449 -0.2966 ## s.e. 0.0961 0.0961 ## ## sigma^2 = 10.85: log likelihood = -258.09 ## AIC=522.18 AICc=522.43 BIC=529.96 59
  67. Choosing your own model ggtsdisplay(resid(fit)) −5 0 5 0 20 40 60 80 100 −0.2 −0.1 0.0 0.1 0.2 5 10 15 20 Lag ACF −0.2 −0.1 0.0 0.1 0.2 5 10 15 20 Lag PACF 60
  68. Choosing your own model (fit <- Arima(internet,order=c(3,1,0))) ## Series: internet ## ARIMA(3,1,0) ## ## Coefficients: ## ar1 ar2 ar3 ## 1.1513 -0.6612 0.3407 ## s.e. 0.0950 0.1353 0.0941 ## ## sigma^2 = 9.656: log likelihood = -252 ## AIC=511.99 AICc=512.42 BIC=522.37 61
  69. Choosing your own model ggtsdisplay(resid(fit)) −5 0 5 0 20 40 60 80 100 −0.2 −0.1 0.0 0.1 0.2 5 10 15 20 Lag ACF −0.2 −0.1 0.0 0.1 0.2 5 10 15 20 Lag PACF 62
  70. Choosing your own model auto.arima(internet) ## Series: internet ## ARIMA(1,1,1) ## ## Coefficients: ## ar1 ma1 ## 0.6504 0.5256 ## s.e. 0.0842 0.0896 ## ## sigma^2 = 9.995: log likelihood = -254.15 ## AIC=514.3 AICc=514.55 BIC=522.08 63
  71. Choosing your own model checkresiduals(fit) −5 0 5 0 20 40 60 80 100 Residuals from ARIMA(3,1,0) −0.2 −0.1 0.0 0.1 0.2 5 10 15 20 Lag ACF 0 5 10 15 20 −10 −5 0 5 10 residuals df$y 64
  72. Choosing your own model fit %>% forecast %>% autoplot 100 150 200 250 0 30 60 90 Time internet Forecasts from ARIMA(3,1,0) 65
  73. Modelling procedure with Arima 1. Plot the data. Identify any unusual observations. 2. If necessary, transform the data (e.g. by logging it) to stabilize the variance. 3. If the data are non-stationary: take first differences of the data until the data are stationary. 4. Examine the ACF/PACF: Is an AR(p) or MA(q) model appropriate? 5. Try your chosen model(s), and use the AICc to search for a better model. 6. Check the residuals from your chosen model by plotting the ACF of the residuals, and doing a portmanteau test of the residuals. If they do not look like white noise, try a modified model. 7. Once the residuals look like white noise, calculate forecasts. 66
  74. Modelling procedure with auto.arima 1. Plot the data. Identify any unusual observations. 2. If necessary, transform the data (using a Box-Cox transformation) to stabilize the variance. 3. Use auto.arima to select a model. 6. Check the residuals from your chosen model by plotting the ACF of the residuals, and doing a portmanteau test of the residuals. If they do not look like white noise, try a modified model. 7. Once the residuals look like white noise, calculate forecasts. 67
  75. Seasonally adjusted electrical equipment eeadj <- seasadj(stl(elecequip, s.window="periodic")) autoplot(eeadj) + xlab("Year") + ylab("Seasonally adjusted new orders index") 80 90 100 110 2000 2005 2010 Year Seasonally adjusted new orders index 68
  76. Another example: Seasonally adjusted electrical equipment 1. Time plot shows sudden changes, particularly big drop in 2008/2009 due to global economic environment. Otherwise nothing unusual and no need for data adjustments. 2. No evidence of changing variance, so no Box-Cox transformation. 3. Data are clearly non-stationary, so we take first differences. 69
  77. Seasonally adjusted electrical equipment ggtsdisplay(diff(eeadj)) −10 −5 0 5 10 2000 2005 2010 −0.2 0.0 0.2 12 24 36 Lag ACF −0.2 0.0 0.2 12 24 36 Lag PACF 70
  78. Seasonally adjusted electrical equipment 4. PACF is suggestive of AR(3). So initial candidate model is ARIMA(3,1,0). 5. Fit ARIMA(3,1,0) model along with variations: ARIMA(4,1,0), ARIMA(2,1,0), ARIMA(3,1,1), etc. ARIMA(3,1,1) has smallest AICc value. 71
  79. Seasonally adjusted electrical equipment (fit <- Arima(eeadj, order=c(3,1,1))) ## Series: eeadj ## ARIMA(3,1,1) ## ## Coefficients: ## ar1 ar2 ar3 ma1 ## 0.0044 0.0916 0.3698 -0.3921 ## s.e. 0.2201 0.0984 0.0669 0.2426 ## ## sigma^2 = 9.577: log likelihood = -492.69 ## AIC=995.38 AICc=995.7 BIC=1011.72 72
  80. Seasonally adjusted electrical equipment 6. ACF plot of residuals from ARIMA(3,1,1) model look like white noise. checkresiduals(fit, test=F) −10 −5 0 5 10 2000 2005 2010 Residuals from ARIMA(3,1,1) −0.15 −0.10 −0.05 0.00 0.05 0.10 0.15 12 24 36 Lag ACF 0 10 20 30 −10 −5 0 5 10 residuals df$y ## ## Ljung-Box test ## ## data: Residuals from ARIMA(3,1,1) ## Q* = 24.034, df = 20, p-value = 0.2409 73
  81. Seasonally adjusted electrical equipment ## ## Ljung-Box test ## ## data: Residuals from ARIMA(3,1,1) ## Q* = 24.034, df = 20, p-value = 0.2409 ## ## Model df: 4. Total lags used: 24 74
  82. Seasonally adjusted electrical equipment fit %>% forecast %>% autoplot 60 80 100 120 2000 2005 2010 2015 Time eeadj Forecasts from ARIMA(3,1,1) 75
  83. Forecasting
  84. Point forecasts 1. Rearrange ARIMA equation so yt is on LHS. 2. Rewrite equation by replacing t by T + h. 3. On RHS, replace future observations by their forecasts, future errors by zero, and past errors by corresponding residuals. Start with h = 1. Repeat for h = 2, 3, . . .. 76
  85. Point forecasts ARIMA(3,1,1) forecasts: Step 1 (1 − φ1B − φ2B2 − φ3B3 )(1 − B)yt = (1 + θ1B)εt, h 1 − (1 + φ1)B + (φ1 − φ2)B2 + (φ2 − φ3)B3 + φ3B4 i yt = (1 + θ1B)εt, yt − (1 + φ1)yt−1 + (φ1 − φ2)yt−2 + (φ2 − φ3)yt−3 + φ3yt−4 = εt + θ1εt−1. yt = (1 + φ1)yt−1 − (φ1 − φ2)yt−2 − (φ2 − φ3)yt−3 − φ3yt−4 + εt + θ1εt−1. 77
  86. Point forecasts (h=1) yt = (1 + φ1)yt−1 − (φ1 − φ2)yt−2 − (φ2 − φ3)yt−3 − φ3yt−4 + εt + θ1εt−1. ARIMA(3,1,1) forecasts: Step 2 yT+1 = (1 + φ1)yT − (φ1 − φ2)yT−1 − (φ2 − φ3)yT−2 − φ3yT−3 + εT+1 + θ1εT . ARIMA(3,1,1) forecasts: Step 3 ŷT+1|T = (1 + φ1)yT − (φ1 − φ2)yT−1 − (φ2 − φ3)yT−2 − φ3yT−3 + θ1eT .78
  87. Point forecasts (h=2) yt = (1 + φ1)yt−1 − (φ1 − φ2)yt−2 − (φ2 − φ3)yt−3 − φ3yt−4 + εt + θ1εt−1. ARIMA(3,1,1) forecasts: Step 2 yT+2 = (1 + φ1)yT+1 − (φ1 − φ2)yT − (φ2 − φ3)yT−1 − φ3yT−2 + εT+2 + θ1εT+1. ARIMA(3,1,1) forecasts: Step 3 ŷT+2|T = (1 + φ1)ŷT+1|T − (φ1 − φ2)yT − (φ2 − φ3)yT−1 − φ3yT−2. 79
  88. European quarterly retail trade euretail %>% diff(lag=4) %>% ggtsdisplay() −2 0 2 2000 2005 2010 0.0 0.5 4 8 12 16 Lag ACF 0.0 0.5 4 8 12 16 Lag PACF 80
  89. European quarterly retail trade euretail %>% diff(lag=4) %>% diff() %>% ggtsdisplay() −2 −1 0 1 2000 2005 2010 −0.4 −0.2 0.0 0.2 4 8 12 16 Lag ACF −0.4 −0.2 0.0 0.2 4 8 12 16 Lag PACF 81
  90. European quarterly retail trade • d = 1 and D = 1 seems necessary. • Significant spike at lag 1 in ACF suggests non-seasonal MA(1) component. • Significant spike at lag 4 in ACF suggests seasonal MA(1) component. • Initial candidate model: ARIMA(0,1,1)(0,1,1). 82
  91. European quarterly retail trade fit <- Arima(euretail, order=c(0,1,1), seasonal=c(0,1,1)) checkresiduals(fit) −1.0 −0.5 0.0 0.5 1.0 2000 2005 2010 Residuals from ARIMA(0,1,1)(0,1,1)[4] −0.2 −0.1 0.0 0.1 0.2 0.3 4 8 12 16 Lag ACF 0 5 10 15 −1.0 −0.5 0.0 0.5 1.0 residuals df$y ## ## Ljung-Box test ## ## data: Residuals from ARIMA(0,1,1)(0,1,1)[4] 83
  92. European quarterly retail trade ## ## Ljung-Box test ## ## data: Residuals from ARIMA(0,1,1)(0,1,1)[4] ## Q* = 10.654, df = 6, p-value = 0.09968 ## ## Model df: 2. Total lags used: 8 84
  93. European quarterly retail trade • ACF and PACF of residuals show significant spikes at lag 2, and maybe lag 3. • AICc of ARIMA(0,1,2)(0,1,1)4 model is 74.27. • AICc of ARIMA(0,1,3)(0,1,1)4 model is 68.39. fit <- Arima(euretail, order=c(0,1,3), seasonal=c(0,1,1)) checkresiduals(fit) 85
  94. European quarterly retail trade ## Series: euretail ## ARIMA(0,1,3)(0,1,1)[4] ## ## Coefficients: ## ma1 ma2 ma3 sma1 ## 0.2630 0.3694 0.4200 -0.6636 ## s.e. 0.1237 0.1255 0.1294 0.1545 ## ## sigma^2 = 0.156: log likelihood = -28.63 ## AIC=67.26 AICc=68.39 BIC=77.65 86
  95. European quarterly retail trade checkresiduals(fit) −1.0 −0.5 0.0 0.5 2000 2005 2010 Residuals from ARIMA(0,1,3)(0,1,1)[4] −0.2 −0.1 0.0 0.1 0.2 4 8 12 16 Lag ACF 0 5 10 15 −1.0 −0.5 0.0 0.5 1.0 residuals df$y ## ## Ljung-Box test ## ## data: Residuals from ARIMA(0,1,3)(0,1,1)[4] 87
  96. European quarterly retail trade ## ## Ljung-Box test ## ## data: Residuals from ARIMA(0,1,3)(0,1,1)[4] ## Q* = 0.51128, df = 4, p-value = 0.9724 ## ## Model df: 4. Total lags used: 8 88
  97. European quarterly retail trade autoplot(forecast(fit, h=12)) 90 95 100 2000 2005 2010 2015 Time euretail Forecasts from ARIMA(0,1,3)(0,1,1)[4] 89
  98. European quarterly retail trade auto.arima(euretail) ## Series: euretail ## ARIMA(0,1,3)(0,1,1)[4] ## ## Coefficients: ## ma1 ma2 ma3 sma1 ## 0.2630 0.3694 0.4200 -0.6636 ## s.e. 0.1237 0.1255 0.1294 0.1545 ## ## sigma^2 = 0.156: log likelihood = -28.63 ## AIC=67.26 AICc=68.39 BIC=77.65 90
  99. If time allows
  100. Cortecosteroid drug sales H02 sales (million scripts) Log H02 sales 1995 2000 2005 0.50 0.75 1.00 1.25 −0.8 −0.4 0.0 Year 91
  101. Cortecosteroid drug sales ggtsdisplay(diff(lh02,12), xlab="Year", main="Seasonally differenced H02 scripts") 0.0 0.2 0.4 1995 2000 2005 Year Seasonally differenced H02 scripts 0.0 0.2 0.4 ACF 0.0 0.2 0.4 PACF 92
  102. Cortecosteroid drug sales • Choose D = 1 and d = 0. • Spikes in PACF at lags 12 and 24 suggest seasonal AR(2) term. • Spikes in PACF suggests possible non-seasonal AR(3) term. • Initial candidate model: ARIMA(3,0,0)(2,1,0)12. 93
  103. Cortecosteroid drug sales Model AICc ARIMA(3,0,1)(0,1,2)12 -485.48 ARIMA(3,0,1)(1,1,1)12 -484.25 ARIMA(3,0,1)(0,1,1)12 -483.67 ARIMA(3,0,1)(2,1,0)12 -476.31 ARIMA(3,0,0)(2,1,0)12 -475.12 ARIMA(3,0,2)(2,1,0)12 -474.88 ARIMA(3,0,1)(1,1,0)12 -463.40 94
  104. Cortecosteroid drug sales (fit <- Arima(h02, order=c(3,0,1), seasonal=c(0,1,2), lambda=0)) ## Series: h02 ## ARIMA(3,0,1)(0,1,2)[12] ## Box Cox transformation: lambda= 0 ## ## Coefficients: ## ar1 ar2 ar3 ma1 ## -0.1603 0.5481 0.5678 0.3827 ## s.e. 0.1636 0.0878 0.0942 0.1895 ## sma1 sma2 ## -0.5222 -0.1768 ## s.e. 0.0861 0.0872 ## ## sigma^2 = 0.004278: log likelihood = 250.04 ## AIC=-486.08 AICc=-485.48 BIC=-463.28 95
  105. Cortecosteroid drug sales checkresiduals(fit, lag=36) −0.2 −0.1 0.0 0.1 0.2 1995 2000 2005 Residuals from ARIMA(3,0,1)(0,1,2)[12] −0.1 0.0 0.1 12 24 36 Lag ACF 0 10 20 30 −0.2 −0.1 0.0 0.1 0.2 residuals df$y ## ## Ljung-Box test ## ## data: Residuals from ARIMA(3,0,1)(0,1,2)[12] ## Q* = 50.712, df = 30, p-value = 0.01045 96
  106. Cortecosteroid drug sales ## ## Ljung-Box test ## ## data: Residuals from ARIMA(3,0,1)(0,1,2)[12] ## Q* = 50.712, df = 30, p-value = 0.01045 ## ## Model df: 6. Total lags used: 36 97
  107. Cortecosteroid drug sales (fit <- auto.arima(h02, lambda=0)) ## Series: h02 ## ARIMA(2,1,1)(0,1,2)[12] ## Box Cox transformation: lambda= 0 ## ## Coefficients: ## ar1 ar2 ma1 sma1 ## -1.1358 -0.5753 0.3683 -0.5318 ## s.e. 0.1608 0.0965 0.1884 0.0838 ## sma2 ## -0.1817 ## s.e. 0.0881 ## ## sigma^2 = 0.004278: log likelihood = 248.25 ## AIC=-484.51 AICc=-484.05 BIC=-465 98
  108. Cortecosteroid drug sales checkresiduals(fit, lag=36) −0.2 −0.1 0.0 0.1 0.2 1995 2000 2005 Residuals from ARIMA(2,1,1)(0,1,2)[12] −0.1 0.0 0.1 12 24 36 Lag ACF 0 10 20 30 −0.2 −0.1 0.0 0.1 0.2 residuals df$y ## ## Ljung-Box test ## ## data: Residuals from ARIMA(2,1,1)(0,1,2)[12] ## Q* = 51.096, df = 31, p-value = 0.01298 ## 99
  109. Cortecosteroid drug sales ## ## Ljung-Box test ## ## data: Residuals from ARIMA(2,1,1)(0,1,2)[12] ## Q* = 51.096, df = 31, p-value = 0.01298 ## ## Model df: 5. Total lags used: 36 100
  110. Cortecosteroid drug sales (fit <- auto.arima(h02, lambda=0, max.order=9, stepwise=FALSE, approximation=FALSE)) ## Series: h02 ## ARIMA(4,1,1)(2,1,2)[12] ## Box Cox transformation: lambda= 0 ## ## Coefficients: ## ar1 ar2 ar3 ar4 ## -0.0425 0.2098 0.2017 -0.2273 ## s.e. 0.2167 0.1813 0.1144 0.0810 ## ma1 sar1 sar2 sma1 ## -0.7424 0.6213 -0.3832 -1.2019 ## s.e. 0.2074 0.2421 0.1185 0.2491 ## sma2 ## 0.4959 ## s.e. 0.2135 ## ## sigma^2 = 0.004049: log likelihood = 254.31 ## AIC=-488.63 AICc=-487.4 BIC=-456.1 101
  111. Cortecosteroid drug sales checkresiduals(fit, lag=36) −0.2 −0.1 0.0 0.1 1995 2000 2005 Residuals from ARIMA(4,1,1)(2,1,2)[12] −0.1 0.0 0.1 12 24 36 Lag ACF 0 10 20 30 −0.2 −0.1 0.0 0.1 0.2 residuals df$y ## ## Ljung-Box test ## ## data: Residuals from ARIMA(4,1,1)(2,1,2)[12] ## Q* = 36.456, df = 27, p-value = 0.1057 ## 102
  112. Cortecosteroid drug sales ## ## Ljung-Box test ## ## data: Residuals from ARIMA(4,1,1)(2,1,2)[12] ## Q* = 36.456, df = 27, p-value = 0.1057 ## ## Model df: 9. Total lags used: 36 103
  113. Cortecosteroid drug sales Training data: July 1991 to June 2006 Test data: July 2006–June 2008 getrmse <- function(x,h,...) { train.end <- time(x)[length(x)-h] test.start <- time(x)[length(x)-h+1] train <- window(x,end=train.end) test <- window(x,start=test.start) fit <- Arima(train,...) fc <- forecast(fit,h=h) return(accuracy(fc,test)[2,"RMSE"]) } getrmse(h02,h=24,order=c(3,0,0),seasonal=c(2,1,0),lambda=0) getrmse(h02,h=24,order=c(3,0,1),seasonal=c(2,1,0),lambda=0) getrmse(h02,h=24,order=c(3,0,2),seasonal=c(2,1,0),lambda=0) getrmse(h02,h=24,order=c(3,0,1),seasonal=c(1,1,0),lambda=0) 104
  114. Cortecosteroid drug sales Model RMSE ARIMA(4,1,1)(2,1,2)[12] 0.0615 ARIMA(3,0,1)(0,1,2)[12] 0.0622 ARIMA(3,0,1)(1,1,1)[12] 0.0630 ARIMA(2,1,4)(0,1,1)[12] 0.0632 ARIMA(2,1,3)(0,1,1)[12] 0.0634 ARIMA(3,0,3)(0,1,1)[12] 0.0638 ARIMA(2,1,5)(0,1,1)[12] 0.0640 ARIMA(3,0,1)(0,1,1)[12] 0.0644 ARIMA(3,0,2)(0,1,1)[12] 0.0644 ARIMA(3,0,2)(2,1,0)[12] 0.0645 ARIMA(3,0,1)(2,1,0)[12] 0.0646 ARIMA(3,0,0)(2,1,0)[12] 0.0661 ARIMA(3,0,1)(1,1,0)[12] 0.0679 105
  115. Cortecosteroid drug sales • Models with lowest AICc values tend to give slightly better results than the other models. • AICc comparisons must have the same orders of differencing. But RMSE test set comparisons can involve any models. • Use the best model available, even if it does not pass all tests. 106
  116. Cortecosteroid drug sales fit <- Arima(h02, order=c(3,0,1), seasonal=c(0,1,2), lambda=0) autoplot(forecast(fit)) + ylab("H02 sales (million scripts)") + xlab("Year") 0.5 1.0 1.5 1995 2000 2005 2010 Year H02 sales (million scripts) Forecasts from ARIMA(3,0,1)(0,1,2)[12] 107
Publicité