DevoxxFR 2024 Reproducible Builds with Apache Maven
Time series-mining-slides
1. R and Time
Series Data
Time Series
Decomposi-
Time Series Analysis and Mining with R
tion
Time Series
Forecasting
Time Series
Yanchang Zhao
Clustering
Time Series RDataMining.com
Classification
http://www.rdatamining.com/
R Functions &
Packages for
Time Series
18 July 2011
Conclusions
1/42
2. Outline
1 R and Time Series Data
R and Time
Series Data
Time Series 2 Time Series Decomposition
Decomposi-
tion
Time Series 3 Time Series Forecasting
Forecasting
Time Series
Clustering 4 Time Series Clustering
Time Series
Classification
5 Time Series Classification
R Functions &
Packages for
Time Series
6 R Functions & Packages for Time Series
Conclusions
7 Conclusions
2/42
3. R
R and Time
Series Data
a free software environment for statistical computing and
Time Series
Decomposi- graphics
tion
Time Series
runs on Windows, Linux and MacOS
Forecasting
widely used in academia and research, as well as industrial
Time Series
Clustering applications
Time Series
Classification
over 3,000 packages
R Functions & CRAN Task View: Time Series Analysis
Packages for
Time Series http://cran.r-project.org/web/views/TimeSeries.html
Conclusions
3/42
4. Time Series Data in R
R and Time
Series Data
Time Series
Decomposi-
class ts
tion
represents data which has been sampled at equispaced
Time Series
Forecasting points in time
Time Series
Clustering frequency=7: a weekly series
Time Series frequency=12: a monthly series
Classification
R Functions & frequency=4: a quarterly series
Packages for
Time Series
Conclusions
4/42
5. Time Series Data in R
> a <- ts(1:20, frequency=12, start=c(2011,3))
> print(a)
R and Time
Series Data Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov
Time Series 2011 1 2 3 4 5 6 7 8 9
Decomposi-
tion 2012 11 12 13 14 15 16 17 18 19 20
Time Series Dec
Forecasting
Time Series
2011 10
Clustering 2012
Time Series
Classification > str(a)
R Functions & Time-Series [1:20] from 2011 to 2013: 1 2 3 4 5 6 7 8
Packages for
Time Series
> attributes(a)
Conclusions
$tsp
[1] 2011.167 2012.750 12.000
$class 5/42
6. Outline
1 R and Time Series Data
R and Time
Series Data
Time Series 2 Time Series Decomposition
Decomposi-
tion
Time Series 3 Time Series Forecasting
Forecasting
Time Series
Clustering 4 Time Series Clustering
Time Series
Classification
5 Time Series Classification
R Functions &
Packages for
Time Series
6 R Functions & Packages for Time Series
Conclusions
7 Conclusions
6/42
7. What is Time Series Decomposition
R and Time
Series Data
Time Series To decompose a time series into components:
Decomposi-
tion
Trend component: long term trend
Time Series
Forecasting Seasonal component: seasonal variation
Time Series
Clustering Cyclical component: repeated but non-periodic
Time Series fluctuations
Classification
R Functions &
Irregular component: the residuals
Packages for
Time Series
Conclusions
7/42
8. Data AirPassengers
Data AirPassengers: Monthly totals of Box Jenkins
international airline passengers, 1949 to 1960. It has
R and Time 144(=12×12) values.
Series Data
Time Series > plot(AirPassengers)
Decomposi-
tion
Time Series
Forecasting
600
Time Series
Clustering
500
Time Series
Classification
AirPassengers
400
R Functions &
Packages for
Time Series
300
Conclusions
200
100
1950 1952 1954 1956 1958 1960
8/42
9. Decomposition
> apts <- ts(AirPassengers, frequency = 12)
> f <- decompose(apts)
R and Time
Series Data > # seasonal figures
Time Series > plot(f$figure,type="b")
Decomposi-
tion
Time Series
Forecasting q q
60
Time Series
Clustering
40
q
Time Series
20
Classification q
f$figure
R Functions &
0
q
q
Packages for q
Time Series
−20
q
q
q
Conclusions q
−40
q
2 4 6 8 10 12
Index
9/42
10. Decomposition
> plot(f)
Decomposition of additive time series
R and Time
Series Data
500
observed
Time Series
Decomposi-
300
tion
100
Time Series
450
Forecasting
350
trend
Time Series
250
Clustering
150
Time Series
Classification
40
seasonal
R Functions &
0
Packages for
60 −40
Time Series
Conclusions
random
0 20
−40
2 4 6 8 10 12
Time
10/42
11. Outline
1 R and Time Series Data
R and Time
Series Data
Time Series 2 Time Series Decomposition
Decomposi-
tion
Time Series 3 Time Series Forecasting
Forecasting
Time Series
Clustering 4 Time Series Clustering
Time Series
Classification
5 Time Series Classification
R Functions &
Packages for
Time Series
6 R Functions & Packages for Time Series
Conclusions
7 Conclusions
11/42
12. Time Series Forecasting
R and Time
Series Data
Time Series
Decomposi- To forecast future events based on known past data
tion
Time Series
E.g., to predict the opening price of a stock based on its
Forecasting
past performance
Time Series
Clustering Popular models
Time Series Autoregressive moving average (ARMA)
Classification
Autoregressive integrated moving average (ARIMA)
R Functions &
Packages for
Time Series
Conclusions
12/42
13. Forecasting
R and Time
> # build an ARIMA model
Series Data > fit <- arima(AirPassengers, order=c(1,0,0),
Time Series
Decomposi-
+ list(order=c(2,1,0), period=12))
tion
> fore <- predict(fit, n.ahead=24)
Time Series
Forecasting
> # error bounds at 95% confidence level
Time Series > U <- fore$pred + 2*fore$se
Clustering
> L <- fore$pred - 2*fore$se
Time Series
Classification > ts.plot(AirPassengers, fore$pred, U, L,
R Functions & + col=c(1,2,4,4), lty = c(1,1,2,2))
Packages for
Time Series > legend("topleft", col=c(1,2,4), lty=c(1,1,2),
Conclusions + c("Actual", "Forecast",
+ "Error Bounds (95% Confidence)"))
13/42
14. Forecasting
Actual
Forecast
R and Time 700 Error Bounds (95% Confidence)
Series Data
600
Time Series
Decomposi-
tion
500
Time Series
Forecasting
400
Time Series
Clustering
Time Series
300
Classification
R Functions &
Packages for
200
Time Series
Conclusions
100
1950 1952 1954 1956 1958 1960 1962
Time 14/42
15. Outline
1 R and Time Series Data
R and Time
Series Data
Time Series 2 Time Series Decomposition
Decomposi-
tion
Time Series 3 Time Series Forecasting
Forecasting
Time Series
Clustering 4 Time Series Clustering
Time Series
Classification
5 Time Series Classification
R Functions &
Packages for
Time Series
6 R Functions & Packages for Time Series
Conclusions
7 Conclusions
15/42
16. Time Series Clustering
R and Time
Series Data To partition time series data into groups based on
Time Series similarity or distance, so that time series in the same
Decomposi-
tion cluster are similar
Time Series Measure of distance/dissimilarity
Forecasting
Time Series
Euclidean distance
Clustering Manhattan distance
Time Series Maximum norm
Classification
Hamming distance
R Functions &
Packages for
The angle between two vectors (inner product)
Time Series Dynamic Time Warping (DTW) distance
Conclusions ...
16/42
17. Dynamic Time Warping (DTW)
DTW finds optimal alignment between two time series.
> library(dtw)
R and Time
Series Data
> idx <- seq(0, 2*pi, len=100)
Time Series
> a <- sin(idx) + runif(100)/10
Decomposi-
tion
> b <- cos(idx)
Time Series
> align <- dtw(a, b, step=asymmetricP1, keep=T)
Forecasting > dtwPlotTwoWay(align)
Time Series
Clustering
1.0
Time Series
Classification
0.5
R Functions &
Query value
Packages for
Time Series
0.0
Conclusions
−0.5
−1.0
0 20 40 60 80 100
Index 17/42
18. Synthetic Control Chart Time Series
The dataset contains 600 examples of control charts
R and Time
Series Data synthetically generated by the process in Alcock and
Time Series Manolopoulos (1999).
Decomposi-
tion Each control chart is a time series with 60 values.
Time Series
Forecasting
Six classes:
Time Series 1-100 Normal
Clustering 101-200 Cyclic
Time Series
Classification
201-300 Increasing trend
301-400 Decreasing trend
R Functions &
Packages for 401-500 Upward shift
Time Series
501-600 Downward shift
Conclusions
http://kdd.ics.uci.edu/databases/synthetic_control/synthetic_
control.html
18/42
19. Synthetic Control Chart Time Series
R and Time
Series Data
> # read data into R
Time Series
Decomposi- > # sep="": the separator is white space, i.e., one
tion
> # or more spaces, tabs, newlines or carriage returns
Time Series
Forecasting > sc <- read.table("synthetic_control.data",
Time Series + header=F, sep="")
Clustering
> # show one sample from each class
Time Series
Classification > idx <- c(1,101,201,301,401,501)
R Functions & > sample1 <- t(sc[idx,])
Packages for
Time Series > plot.ts(sample1, main="")
Conclusions
19/42
20. Six Classes
36
30
34
32
20
301
1
30
R and Time
10
Series Data
28
26
Time Series
0
Decomposi-
45 24
45
tion
40
Time Series
35
Forecasting
101
401
35
Time Series
25
Clustering
30
15
Time Series
35 25
Classification
45
R Functions &
30
Packages for
40
25
201
501
Time Series
35
20
Conclusions
15
30
10
25
0 10 20 30 40 50 60 0 10 20 30 40 50 60
Time Time
20/42
21. Hierarchical Clustering with Euclidean distance
R and Time
Series Data > # sample n cases from every class
Time Series > n <- 10
Decomposi-
tion > s <- sample(1:100, n)
Time Series > idx <- c(s, 100+s, 200+s, 300+s, 400+s, 500+s)
Forecasting
Time Series
> sample2 <- sc[idx,]
Clustering > observedLabels <- c(rep(1,n), rep(2,n), rep(3,n),
Time Series
Classification
+ rep(4,n), rep(5,n), rep(6,n))
R Functions &
> # hierarchical clustering with Euclidean distance
Packages for > hc <- hclust(dist(sample2), method="ave")
Time Series
Conclusions
> plot(hc, labels=observedLabels, main="")
21/42
22. Hierarchical Clustering with Euclidean distance
R and Time 140
120
Series Data
Time Series
Decomposi-
100
tion
Time Series
Forecasting
Height
80
Time Series
Clustering
Time Series
60
Classification
22
R Functions &
Packages for
5
6
Time Series
6
40
2 2
2 2
Conclusions
4
5 2
2
66
5 5
66
6
64
55
3 5
2
2
55
44
33
1
44
4
33
4
4
3
3
20
11
5
3
3
4
11
6
6
11
3
1
1
1
22/42
23. Hierarchical Clustering with Euclidean distance
R and Time > # cut tree to get 8 clusters
Series Data
> memb <- cutree(hc, k=8)
Time Series
Decomposi- > table(observedLabels, memb)
tion
Time Series memb
Forecasting
observedLabels 1 2 3 4 5 6 7 8
Time Series
Clustering 1 10 0 0 0 0 0 0 0
Time Series 2 0 3 1 1 3 2 0 0
Classification
R Functions &
3 0 0 0 0 0 0 10 0
Packages for
Time Series
4 0 0 0 0 0 0 0 10
Conclusions
5 0 0 0 0 0 0 10 0
6 0 0 0 0 0 0 0 10
23/42
24. Hierarchical Clustering with DTW Distance
> myDist <- dist(sample2, method="DTW")
R and Time
> hc <- hclust(myDist, method="average")
Series Data
> plot(hc, labels=observedLabels, main="")
Time Series
Decomposi- > # cut tree to get 8 clusters
tion
> memb <- cutree(hc, k=8)
Time Series
Forecasting > table(observedLabels, memb)
Time Series
Clustering memb
Time Series observedLabels 1 2 3 4 5 6 7 8
Classification
1 10 0 0 0 0 0 0 0
R Functions &
Packages for 2 0 4 3 2 1 0 0 0
Time Series
3 0 0 0 0 0 6 4 0
Conclusions
4 0 0 0 0 0 0 0 10
5 0 0 0 0 0 0 10 0
6 0 0 0 0 0 0 0 10
24/42
25. Hierarchical Clustering with DTW Distance
R and Time 1000
800
Series Data
Time Series
Decomposi-
tion
600
Time Series
Forecasting
Height
Time Series
Clustering
400
Time Series
Classification
R Functions &
Packages for
Time Series
200
Conclusions
2
2 2
22
2
22
2
2
4 6
66
33
35
35
46
33
6
55
3
4
6
6
35
1
1
4
6
6
6
55
5
33
4
4
4
4
1
5
5
1
1
1
1
4
4
1
1
1
0
25/42
26. Outline
1 R and Time Series Data
R and Time
Series Data
Time Series 2 Time Series Decomposition
Decomposi-
tion
Time Series 3 Time Series Forecasting
Forecasting
Time Series
Clustering 4 Time Series Clustering
Time Series
Classification
5 Time Series Classification
R Functions &
Packages for
Time Series
6 R Functions & Packages for Time Series
Conclusions
7 Conclusions
26/42
27. Time Series Classification
Time Series Classification
To build a classification model based on labelled time
R and Time
Series Data series
Time Series
Decomposi-
and then use the model to predict the lable of unlabelled
tion time series
Time Series
Forecasting Feature Extraction
Time Series
Clustering
Singular Value Decomposition (SVD)
Time Series Discrete Fourier Transform (DFT)
Classification
R Functions & Discrete Wavelet Transform (DWT)
Packages for
Time Series Piecewise Aggregate Approximation (PAA)
Conclusions
Perpetually Important Points (PIP)
Piecewise Linear Representation
Symbolic Representation
27/42
28. Decision Tree (ctree)
R and Time
Series Data ctree from package party
Time Series
Decomposi- > classId <- c(rep("1",100), rep("2",100),
tion
+ rep("3",100), rep("4",100),
Time Series
Forecasting + rep("5",100), rep("6",100))
Time Series > newSc <- data.frame(cbind(classId, sc))
Clustering
> library(party)
Time Series
Classification > ct <- ctree(classId ~ ., data=newSc,
R Functions & + controls = ctree_control(minsplit=20,
Packages for
Time Series + minbucket=5, maxdepth=5))
Conclusions
28/42
29. Decision Tree
> pClassId <- predict(ct)
> table(classId, pClassId)
R and Time
Series Data
pClassId
Time Series
Decomposi- classId 1 2 3 4 5 6
tion
1 100 0 0 0 0 0
Time Series
Forecasting 2 1 97 2 0 0 0
Time Series
Clustering
3 0 0 99 0 1 0
Time Series
4 0 0 0 100 0 0
Classification 5 4 0 8 0 88 0
R Functions &
Packages for
6 0 3 0 90 0 7
Time Series
Conclusions
> # accuracy
> (sum(classId==pClassId)) / nrow(sc)
[1] 0.8183333
29/42
30. DWT (Discrete Wavelet Transform)
Wavelet transform provides a multi-resolution
representation using wavelets.
R and Time
Series Data Haar Wavelet Transform – the simplest DWT
Time Series
Decomposi-
http://dmr.ath.cx/gfx/haar/
tion
Time Series
Forecasting
Time Series
Clustering
Time Series
Classification
R Functions &
Packages for
Time Series
Conclusions
DFT (Discrete Fourier Transform): another popular
feature extraction technique
30/42
31. DWT (Discrete Wavelet Transform)
R and Time > # extract DWT (with Haar filter) coefficients
Series Data
Time Series
> library(wavelets)
Decomposi-
tion
> wtData <- NULL
Time Series
> for (i in 1:nrow(sc)) {
Forecasting + a <- t(sc[i,])
Time Series
Clustering
+ wt <- dwt(a, filter="haar", boundary="periodic")
Time Series + wtData <- rbind(wtData,
Classification
+ unlist(c(wt@W, wt@V[[wt@level]])))
R Functions &
Packages for + }
Time Series
> wtData <- as.data.frame(wtData)
Conclusions
> wtSc <- data.frame(cbind(classId, wtData))
31/42
32. Decision Tree with DWT
> ct <- ctree(classId ~ ., data=wtSc, controls =
+ ctree_control(minsplit=20,
R and Time
Series Data
+ minbucket=5, maxdepth=5))
Time Series > pClassId <- predict(ct)
Decomposi-
tion > table(classId, pClassId)
Time Series pClassId
Forecasting
Time Series
classId 1 2 3 4 5 6
Clustering 1 98 2 0 0 0 0
Time Series
Classification
2 1 99 0 0 0 0
R Functions &
3 0 0 81 0 19 0
Packages for 4 0 0 0 74 0 26
Time Series
Conclusions
5 0 0 16 0 84 0
6 0 0 0 3 0 97
> (sum(classId==pClassId)) / nrow(wtSc)
[1] 0.8883333
32/42
34. k-NN Classification
find the k nearest neighbours of a new instance
R and Time
label it by majority voting
Series Data
needs an efficient indexing structure for large datasets
Time Series
Decomposi- > k <- 20
tion
Time Series
> newTS <- sc[501,] + runif(100)*15
Forecasting
> distances <- dist(newTS, sc, method="DTW")
Time Series
Clustering > s <- sort(as.vector(distances), index.return=TRUE)
Time Series > # class IDs of k nearest neighbours
Classification
> table(classId[s$ix[1:k]])
R Functions &
Packages for
Time Series
4 6
Conclusions 3 17
Results of Majority Voting
Label of newTS ← class 6
34/42
35. Outline
1 R and Time Series Data
R and Time
Series Data
Time Series 2 Time Series Decomposition
Decomposi-
tion
Time Series 3 Time Series Forecasting
Forecasting
Time Series
Clustering 4 Time Series Clustering
Time Series
Classification
5 Time Series Classification
R Functions &
Packages for
Time Series
6 R Functions & Packages for Time Series
Conclusions
7 Conclusions
35/42
36. Functions - Construction, Plot & Smoothing
R and Time Construction
Series Data
Time Series ts() create time-series objects (stats)
Decomposi-
tion
Plot
Time Series
Forecasting
plot.ts() plot time-series objects (stats)
Time Series
Clustering
Smoothing & Filtering
Time Series
Classification
smoothts() time series smoothing (ast)
R Functions &
Packages for
Time Series
sfilter() remove seasonal fluctuation using moving
Conclusions
average (ast)
36/42
37. Functions - Decomposition & Forecasting
Decomposition
decomp() time series decomposition by square-root filter
R and Time
Series Data
(timsac)
Time Series decompose() classical seasonal decomposition by moving
Decomposi-
tion averages (stats)
Time Series stl() seasonal decomposition of time series by loess
Forecasting
(stats)
Time Series
Clustering tsr() time series decomposition (ast)
Time Series ardec() time series autoregressive decomposition
Classification
(ArDec)
R Functions &
Packages for
Time Series
Forecasting
Conclusions arima() fit an ARIMA model to a univariate time series
(stats)
predict.Arima() forecast from models fitted by arima
(stats)
37/42
38. Packages
Packages
timsac time series analysis and control program
R and Time
Series Data ast time series analysis
Time Series
Decomposi-
ArDec time series autoregressive-based decomposition
tion
ares a toolbox for time series analyses using generalized
Time Series
Forecasting additive models
Time Series
Clustering
dse tools for multivariate, linear, time-invariant, time
Time Series
series models
Classification
forecast displaying and analysing univariate time series
R Functions &
Packages for forecasts
Time Series
dtw Dynamic Time Warping – find optimal alignment
Conclusions
between two time series
wavelets wavelet filters, wavelet transforms and
multiresolution analyses
38/42
39. Online Resources
An R Time Series Tutorial
http://www.stat.pitt.edu/stoffer/tsa2/R_time_series_quick_fix.htm
R and Time Time Series Analysis with R
Series Data
http://www.statoek.wiso.uni-goettingen.de/veranstaltungen/zeitreihen/sommer03/ts_r_
Time Series
Decomposi- intro.pdf
tion
Using R (with applications in Time Series Analysis)
Time Series
Forecasting http://people.bath.ac.uk/masgs/time%20series/TimeSeriesR2004.pdf
Time Series CRAN Task View: Time Series Analysis
Clustering
http://cran.r-project.org/web/views/TimeSeries.html
Time Series
Classification R Functions for Time Series Analysis
R Functions & http://cran.r-project.org/doc/contrib/Ricci-refcard-ts.pdf
Packages for
Time Series R Reference Card for Data Mining;
Conclusions R and Data Mining: Examples and Case Studies
http://www.rdatamining.com/
Time Series Analysis for Business Forecasting
http://home.ubalt.edu/ntsbarsh/stat-data/Forecast.htm
39/42
40. Outline
1 R and Time Series Data
R and Time
Series Data
Time Series 2 Time Series Decomposition
Decomposi-
tion
Time Series 3 Time Series Forecasting
Forecasting
Time Series
Clustering 4 Time Series Clustering
Time Series
Classification
5 Time Series Classification
R Functions &
Packages for
Time Series
6 R Functions & Packages for Time Series
Conclusions
7 Conclusions
40/42
41. Conclusions
Time series decomposition and forecasting: many R
functions and packages available
R and Time
Series Data Time series classification and clustering: no R functions or
Time Series
Decomposi-
packages specially for this purpose; have to work it out by
tion
yourself
Time Series
Forecasting Time series classification: extract and build features, and
Time Series then apply existing classification techniques, such as SVM,
Clustering
Time Series
k-NN, neural networks, regression and decision trees
Classification
Time series clustering: work out your own
R Functions &
Packages for distance/similarity metrics, and then use existing clustering
Time Series
techniques, such as k-means and hierarchical clustering
Conclusions
Techniques specially for classifying/clustering time series
data: a lot of research publications, but no R
implementations (as far as I know)
41/42
42. The End
Email: yanchangzhao@gmail.com
RDataMining: http://www.rdatamining.com
R and Time
Series Data Twitter: http://twitter.com/rdatamining
Time Series
Decomposi-
Group on Linkedin: http://group.rdatamining.com
tion Group on Google: http://group2.rdatamining.com
Time Series
Forecasting
Time Series
Clustering
Time Series
Classification
R Functions &
Packages for
Time Series
Conclusions
42/42