SlideShare une entreprise Scribd logo
1  sur  40
Télécharger pour lire hors ligne
Time Series Analysis with R 
Yanchang Zhao 
http://www.RDataMining.com 
30 September 2014 
1 / 39
Outline 
Introduction 
Time Series Decomposition 
Time Series Forecasting 
Time Series Clustering 
Time Series Classi
cation 
Online Resources 
2 / 39
Time Series Analysis with R 1 
I time series data in R 
I time series decomposition, forecasting, clustering and 
classi
cation 
I autoregressive integrated moving average (ARIMA) model 
I Dynamic Time Warping (DTW) 
I Discrete Wavelet Transform (DWT) 
I k-NN classi
cation 
1Chapter 8: Time Series Analysis and Mining, in book 
R and Data Mining: Examples and Case Studies. 
http://www.rdatamining.com/docs/RDataMining.pdf 
3 / 39
R 
I a free software environment for statistical computing and 
graphics 
I runs on Windows, Linux and MacOS 
I widely used in academia and research, as well as industrial 
applications 
I 5,800+ packages (as of 13 Sept 2014) 
I CRAN Task View: Time Series Analysis 
http://cran.r-project.org/web/views/TimeSeries.html 
4 / 39
Time Series Data in R 
I class ts 
I represents data which has been sampled at equispaced points 
in time 
I frequency=7: a weekly series 
I frequency=12: a monthly series 
I frequency=4: a quarterly series 
5 / 39
Time Series Data in R 
a <- ts(1:20, frequency = 12, start = c(2011, 3)) 
print(a) 
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 
## 2011 1 2 3 4 5 6 7 8 9 10 
## 2012 11 12 13 14 15 16 17 18 19 20 
str(a) 
## Time-Series [1:20] from 2011 to 2013: 1 2 3 4 5 6 7 8 9 10... 
attributes(a) 
## $tsp 
## [1] 2011 2013 12 
## 
## $class 
## [1] "ts" 
6 / 39
Outline 
Introduction 
Time Series Decomposition 
Time Series Forecasting 
Time Series Clustering 
Time Series Classi
cation 
Online Resources 
7 / 39
What is Time Series Decomposition 
To decompose a time series into components: 
I Trend component: long term trend 
I Seasonal component: seasonal variation 
I Cyclical component: repeated but non-periodic 
uctuations 
I Irregular component: the residuals 
8 / 39
Data AirPassengers 
Data AirPassengers: monthly totals of Box Jenkins international 
airline passengers, 1949 to 1960. It has 144(=1212) values. 
plot(AirPassengers) 
Time 
AirPassengers 
1950 1952 1954 1956 1958 1960 
100 200 300 400 500 600 
9 / 39
Decomposition 
apts - ts(AirPassengers, frequency = 12) 
f - decompose(apts) 
plot(f$figure, type = b) # seasonal figures 
2 4 6 8 10 12 
−40 −20 0 20 40 60 
Index 
f$figure 
10 / 39
Decomposition 
plot(f) 
observed 
150 250 350 450 
100 300 500 
trend 
seasonal 
−40 0 20 40 60 
−40 0 20 40 60 
2 4 6 8 10 12 
random 
Time 
Decomposition of additive time series 
11 / 39
Outline 
Introduction 
Time Series Decomposition 
Time Series Forecasting 
Time Series Clustering 
Time Series Classi
cation 
Online Resources 
12 / 39
Time Series Forecasting 
I To forecast future events based on known past data 
I For example, to predict the price of a stock based on its past 
performance 
I Popular models 
I Autoregressive moving average (ARMA) 
I Autoregressive integrated moving average (ARIMA) 
13 / 39
Forecasting 
# build an ARIMA model 
fit - arima(AirPassengers, order = c(1, 0, 0), list(order = c(2, 
1, 0), period = 12)) 
fore - predict(fit, n.ahead = 24) 
# error bounds at 95% confidence level 
U - fore$pred + 2 * fore$se 
L - fore$pred - 2 * fore$se 
ts.plot(AirPassengers, fore$pred, U, L, 
col = c(1, 2, 4, 4), lty = c(1, 1, 2, 2)) 
legend(topleft, col = c(1, 2, 4), lty = c(1, 1, 2), 
c(Actual, Forecast, Error Bounds (95% Confidence))) 
14 / 39
Forecasting 
1950 1952 1954 1956 1958 1960 1962 
Time 
100 200 300 400 500 600 700 
Actual 
Forecast 
Error Bounds (95% Confidence) 
15 / 39
Outline 
Introduction 
Time Series Decomposition 
Time Series Forecasting 
Time Series Clustering 
Time Series Classi
cation 
Online Resources 
16 / 39
Time Series Clustering 
I To partition time series data into groups based on similarity or 
distance, so that time series in the same cluster are similar 
I Measure of distance/dissimilarity 
I Euclidean distance 
I Manhattan distance 
I Maximum norm 
I Hamming distance 
I The angle between two vectors (inner product) 
I Dynamic Time Warping (DTW) distance 
I ... 
17 / 39
Dynamic Time Warping (DTW) 
DTW
nds optimal alignment between two time series. 
library(dtw) 
idx - seq(0, 2 * pi, len = 100) 
a - sin(idx) + runif(100)/10 
b - cos(idx) 
align - dtw(a, b, step = asymmetricP1, keep = T) 
dtwPlotTwoWay(align) 
Index 
Query value 
0 20 40 60 80 100 
−1.0 −0.5 0.0 0.5 1.0 
18 / 39
Synthetic Control Chart Time Series 
I The dataset contains 600 examples of control charts 
synthetically generated by the process in Alcock and 
Manolopoulos (1999). 
I Each control chart is a time series with 60 values. 
I Six classes: 
I 1-100 Normal 
I 101-200 Cyclic 
I 201-300 Increasing trend 
I 301-400 Decreasing trend 
I 401-500 Upward shift 
I 501-600 Downward shift 
I http://kdd.ics.uci.edu/databases/synthetic_control/synthetic_ 
control.html 
19 / 39
Synthetic Control Chart Time Series 
# read data into R sep='': the separator is white space, i.e., one 
# or more spaces, tabs, newlines or carriage returns 
sc - read.table(./data/synthetic_control.data, header = F, sep = ) 
# show one sample from each class 
idx - c(1, 101, 201, 301, 401, 501) 
sample1 - t(sc[idx, ]) 
plot.ts(sample1, main = ) 
20 / 39
Six Classes 
1 
24 26 28 30 32 34 36 
101 
15 20 25 30 35 40 45 
25 30 35 40 45 
0 10 20 30 40 50 60 
201 
Time 
301 
0 10 20 30 
401 
25 30 35 40 45 
10 15 20 25 30 35 
0 10 20 30 40 50 60 
501 
Time 
21 / 39
Hierarchical Clustering with Euclidean distance 
# sample n cases from every class 
n - 10 
s - sample(1:100, n) 
idx - c(s, 100 + s, 200 + s, 300 + s, 400 + s, 500 + s) 
sample2 - sc[idx, ] 
observedLabels - rep(1:6, each = n) 
# hierarchical clustering with Euclidean distance 
hc - hclust(dist(sample2), method = ave) 
plot(hc, labels = observedLabels, main = ) 
22 / 39
Hierarchical Clustering with Euclidean distance 
20 40 60 80 100 120 140 
4 6 66 
1 22 11 
11 11 
1 11 66 
6 6 6 66 
4 4 44 
44 
4 44 
33 
33 33 
5 5 55 
55 
33 
33 
55 
55 
22 
2 22 
2 22 
hclust (*, average) 
dist(sample2) 
Height 
23 / 39
Hierarchical Clustering with Euclidean distance 
# cut tree to get 8 clusters 
memb - cutree(hc, k = 8) 
table(observedLabels, memb) 
## memb 
## observedLabels 1 2 3 4 5 6 7 8 
## 1 10 0 0 0 0 0 0 0 
## 2 0 3 1 1 3 2 0 0 
## 3 0 0 0 0 0 0 10 0 
## 4 0 0 0 0 0 0 0 10 
## 5 0 0 0 0 0 0 10 0 
## 6 0 0 0 0 0 0 0 10 
24 / 39
Hierarchical Clustering with DTW Distance 
myDist - dist(sample2, method = DTW) 
hc - hclust(myDist, method = average) 
plot(hc, labels = observedLabels, main = ) 
# cut tree to get 8 clusters 
memb - cutree(hc, k = 8) 
table(observedLabels, memb) 
## memb 
## observedLabels 1 2 3 4 5 6 7 8 
## 1 10 0 0 0 0 0 0 0 
## 2 0 4 3 2 1 0 0 0 
## 3 0 0 0 0 0 6 4 0 
## 4 0 0 0 0 0 0 0 10 
## 5 0 0 0 0 0 0 10 0 
## 6 0 0 0 0 0 0 0 10 
25 / 39
Hierarchical Clustering with DTW Distance 
0 200 400 600 800 1000 
3 33 
3 33 
55 
33 33 
5 55 
5 55 
55 
66 
6 46 
4 44 
44 
44 
44 
66 66 66 
11 111 
11 111 
22 22 
22 
22 
22 
hclust (*, average) 
myDist 
Height 
26 / 39
Outline 
Introduction 
Time Series Decomposition 
Time Series Forecasting 
Time Series Clustering 
Time Series Classi
cation 
Online Resources 
27 / 39
Time Series Classi
cation 
Time Series Classi
cation 
I To build a classi
cation model based on labelled time series 
I and then use the model to predict the lable of unlabelled time 
series 
Feature Extraction 
I Singular Value Decomposition (SVD) 
I Discrete Fourier Transform (DFT) 
I Discrete Wavelet Transform (DWT) 
I Piecewise Aggregate Approximation (PAA) 
I Perpetually Important Points (PIP) 
I Piecewise Linear Representation 
I Symbolic Representation 
28 / 39
Decision Tree (ctree) 
ctree from package party 
classId - rep(as.character(1:6), each = 100) 
newSc - data.frame(cbind(classId, sc)) 
library(party) 
ct - ctree(classId ~ ., data = newSc, 
controls = ctree_control(minsplit = 20, 
minbucket = 5, maxdepth = 5)) 
29 / 39

Contenu connexe

Tendances

Tendances (20)

Probability
ProbabilityProbability
Probability
 
Time Series Analysis with R
Time Series Analysis with RTime Series Analysis with R
Time Series Analysis with R
 
Time series analysis in Stata
Time series analysis in StataTime series analysis in Stata
Time series analysis in Stata
 
Arima model
Arima modelArima model
Arima model
 
Time Series Analysis.pptx
Time Series Analysis.pptxTime Series Analysis.pptx
Time Series Analysis.pptx
 
Learn 90% of Python in 90 Minutes
Learn 90% of Python in 90 MinutesLearn 90% of Python in 90 Minutes
Learn 90% of Python in 90 Minutes
 
6. R data structures
6. R data structures6. R data structures
6. R data structures
 
Time series Analysis
Time series AnalysisTime series Analysis
Time series Analysis
 
Quantitative Data Analysis using R
Quantitative Data Analysis using RQuantitative Data Analysis using R
Quantitative Data Analysis using R
 
Time Series Analysis/ Forecasting
Time Series Analysis/ Forecasting  Time Series Analysis/ Forecasting
Time Series Analysis/ Forecasting
 
R decision tree
R   decision treeR   decision tree
R decision tree
 
Time Series Analysis - 1 | Time Series in R | Time Series Forecasting | Data ...
Time Series Analysis - 1 | Time Series in R | Time Series Forecasting | Data ...Time Series Analysis - 1 | Time Series in R | Time Series Forecasting | Data ...
Time Series Analysis - 1 | Time Series in R | Time Series Forecasting | Data ...
 
Data Management in R
Data Management in RData Management in R
Data Management in R
 
ARIMA
ARIMA ARIMA
ARIMA
 
Time series
Time seriesTime series
Time series
 
Introduction to Generalized Linear Models
Introduction to Generalized Linear ModelsIntroduction to Generalized Linear Models
Introduction to Generalized Linear Models
 
Logistical Regression.pptx
Logistical Regression.pptxLogistical Regression.pptx
Logistical Regression.pptx
 
4 Descriptive Statistics with R
4 Descriptive Statistics with R4 Descriptive Statistics with R
4 Descriptive Statistics with R
 
time series analysis
time series analysistime series analysis
time series analysis
 
R code for data manipulation
R code for data manipulationR code for data manipulation
R code for data manipulation
 

En vedette

Text Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataText Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataYanchang Zhao
 
Time series-mining-slides
Time series-mining-slidesTime series-mining-slides
Time series-mining-slidesYanchang Zhao
 
Data Clustering with R
Data Clustering with RData Clustering with R
Data Clustering with RYanchang Zhao
 
Data Exploration and Visualization with R
Data Exploration and Visualization with RData Exploration and Visualization with R
Data Exploration and Visualization with RYanchang Zhao
 
R Reference Card for Data Mining
R Reference Card for Data MiningR Reference Card for Data Mining
R Reference Card for Data MiningYanchang Zhao
 
Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with RYanchang Zhao
 
Time Series
Time SeriesTime Series
Time Seriesyush313
 
Association Rule Mining with R
Association Rule Mining with RAssociation Rule Mining with R
Association Rule Mining with RYanchang Zhao
 
Multivariate time series
Multivariate time seriesMultivariate time series
Multivariate time seriesLuigi Piva CQF
 
Time series data mining techniques
Time series data mining techniquesTime series data mining techniques
Time series data mining techniquesShanmukha S. Potti
 
Time Series Analysis: Theory and Practice
Time Series Analysis: Theory and PracticeTime Series Analysis: Theory and Practice
Time Series Analysis: Theory and PracticeTetiana Ivanova
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RYanchang Zhao
 
Course - Machine Learning Basics with R
Course - Machine Learning Basics with R Course - Machine Learning Basics with R
Course - Machine Learning Basics with R Persontyle
 
An Introduction to Data Mining with R
An Introduction to Data Mining with RAn Introduction to Data Mining with R
An Introduction to Data Mining with RYanchang Zhao
 
hands on: Text Mining With R
hands on: Text Mining With Rhands on: Text Mining With R
hands on: Text Mining With RJahnab Kumar Deka
 
Demand forecasting by time series analysis
Demand forecasting by time series analysisDemand forecasting by time series analysis
Demand forecasting by time series analysisSunny Gandhi
 

En vedette (20)

Timeseries Analysis with R
Timeseries Analysis with RTimeseries Analysis with R
Timeseries Analysis with R
 
INTRODUCTION TO TIME SERIES ANALYSIS WITH “R” JUNE 2014
INTRODUCTION TO TIME SERIES ANALYSIS WITH “R” JUNE 2014INTRODUCTION TO TIME SERIES ANALYSIS WITH “R” JUNE 2014
INTRODUCTION TO TIME SERIES ANALYSIS WITH “R” JUNE 2014
 
Text Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataText Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter Data
 
Time series-mining-slides
Time series-mining-slidesTime series-mining-slides
Time series-mining-slides
 
Data Clustering with R
Data Clustering with RData Clustering with R
Data Clustering with R
 
Data Exploration and Visualization with R
Data Exploration and Visualization with RData Exploration and Visualization with R
Data Exploration and Visualization with R
 
R Reference Card for Data Mining
R Reference Card for Data MiningR Reference Card for Data Mining
R Reference Card for Data Mining
 
Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with R
 
Time Series
Time SeriesTime Series
Time Series
 
Association Rule Mining with R
Association Rule Mining with RAssociation Rule Mining with R
Association Rule Mining with R
 
Multivariate time series
Multivariate time seriesMultivariate time series
Multivariate time series
 
Time series data mining techniques
Time series data mining techniquesTime series data mining techniques
Time series data mining techniques
 
Time Series Analysis: Theory and Practice
Time Series Analysis: Theory and PracticeTime Series Analysis: Theory and Practice
Time Series Analysis: Theory and Practice
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in R
 
Course - Machine Learning Basics with R
Course - Machine Learning Basics with R Course - Machine Learning Basics with R
Course - Machine Learning Basics with R
 
An Introduction to Data Mining with R
An Introduction to Data Mining with RAn Introduction to Data Mining with R
An Introduction to Data Mining with R
 
hands on: Text Mining With R
hands on: Text Mining With Rhands on: Text Mining With R
hands on: Text Mining With R
 
TextMining with R
TextMining with RTextMining with R
TextMining with R
 
Time series slideshare
Time series slideshareTime series slideshare
Time series slideshare
 
Demand forecasting by time series analysis
Demand forecasting by time series analysisDemand forecasting by time series analysis
Demand forecasting by time series analysis
 

Similaire à Time Series Analysis and Mining with R

R data mining-Time Series Analysis with R
R data mining-Time Series Analysis with RR data mining-Time Series Analysis with R
R data mining-Time Series Analysis with RDr. Volkan OBAN
 
RDataMining slides-time-series-analysis
RDataMining slides-time-series-analysisRDataMining slides-time-series-analysis
RDataMining slides-time-series-analysisYanchang Zhao
 
Table of Useful R commands.
Table of Useful R commands.Table of Useful R commands.
Table of Useful R commands.Dr. Volkan OBAN
 
Basic R Data Manipulation
Basic R Data ManipulationBasic R Data Manipulation
Basic R Data ManipulationChu An
 
Complex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsComplex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsPeter Solymos
 
Data manipulation with dplyr
Data manipulation with dplyrData manipulation with dplyr
Data manipulation with dplyrRomain Francois
 
talk at Virginia Bioinformatics Institute, December 5, 2013
talk at Virginia Bioinformatics Institute, December 5, 2013talk at Virginia Bioinformatics Institute, December 5, 2013
talk at Virginia Bioinformatics Institute, December 5, 2013ericupnorth
 
ggtimeseries-->ggplot2 extensions
ggtimeseries-->ggplot2 extensions ggtimeseries-->ggplot2 extensions
ggtimeseries-->ggplot2 extensions Dr. Volkan OBAN
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryDatabricks
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryDatabricks
 
C PROGRAMS - SARASWATHI RAMALINGAM
C PROGRAMS - SARASWATHI RAMALINGAMC PROGRAMS - SARASWATHI RAMALINGAM
C PROGRAMS - SARASWATHI RAMALINGAMSaraswathiRamalingam
 
Do snow.rwn
Do snow.rwnDo snow.rwn
Do snow.rwnARUN DN
 
Assignment 5.2.pdf
Assignment 5.2.pdfAssignment 5.2.pdf
Assignment 5.2.pdfdash41
 

Similaire à Time Series Analysis and Mining with R (20)

R data mining-Time Series Analysis with R
R data mining-Time Series Analysis with RR data mining-Time Series Analysis with R
R data mining-Time Series Analysis with R
 
RDataMining slides-time-series-analysis
RDataMining slides-time-series-analysisRDataMining slides-time-series-analysis
RDataMining slides-time-series-analysis
 
Table of Useful R commands.
Table of Useful R commands.Table of Useful R commands.
Table of Useful R commands.
 
R programming language
R programming languageR programming language
R programming language
 
Basic R Data Manipulation
Basic R Data ManipulationBasic R Data Manipulation
Basic R Data Manipulation
 
Complex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsComplex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutions
 
Seminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mmeSeminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mme
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
Data manipulation with dplyr
Data manipulation with dplyrData manipulation with dplyr
Data manipulation with dplyr
 
talk at Virginia Bioinformatics Institute, December 5, 2013
talk at Virginia Bioinformatics Institute, December 5, 2013talk at Virginia Bioinformatics Institute, December 5, 2013
talk at Virginia Bioinformatics Institute, December 5, 2013
 
R Programming Homework Help
R Programming Homework HelpR Programming Homework Help
R Programming Homework Help
 
ggtimeseries-->ggplot2 extensions
ggtimeseries-->ggplot2 extensions ggtimeseries-->ggplot2 extensions
ggtimeseries-->ggplot2 extensions
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love Story
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love Story
 
C PROGRAMS - SARASWATHI RAMALINGAM
C PROGRAMS - SARASWATHI RAMALINGAMC PROGRAMS - SARASWATHI RAMALINGAM
C PROGRAMS - SARASWATHI RAMALINGAM
 
R and data mining
R and data miningR and data mining
R and data mining
 
Do snow.rwn
Do snow.rwnDo snow.rwn
Do snow.rwn
 
Learn Matlab
Learn MatlabLearn Matlab
Learn Matlab
 
Assignment 5.2.pdf
Assignment 5.2.pdfAssignment 5.2.pdf
Assignment 5.2.pdf
 

Plus de Yanchang Zhao

RDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rRDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rYanchang Zhao
 
RDataMining slides-regression-classification
RDataMining slides-regression-classificationRDataMining slides-regression-classification
RDataMining slides-regression-classificationYanchang Zhao
 
RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programmingYanchang Zhao
 
RDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-rRDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-rYanchang Zhao
 
RDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisationRDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisationYanchang Zhao
 
RDataMining slides-clustering-with-r
RDataMining slides-clustering-with-rRDataMining slides-clustering-with-r
RDataMining slides-clustering-with-rYanchang Zhao
 
RDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-rRDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-rYanchang Zhao
 
RDataMining-reference-card
RDataMining-reference-cardRDataMining-reference-card
RDataMining-reference-cardYanchang Zhao
 

Plus de Yanchang Zhao (8)

RDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rRDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-r
 
RDataMining slides-regression-classification
RDataMining slides-regression-classificationRDataMining slides-regression-classification
RDataMining slides-regression-classification
 
RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programming
 
RDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-rRDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-r
 
RDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisationRDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisation
 
RDataMining slides-clustering-with-r
RDataMining slides-clustering-with-rRDataMining slides-clustering-with-r
RDataMining slides-clustering-with-r
 
RDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-rRDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-r
 
RDataMining-reference-card
RDataMining-reference-cardRDataMining-reference-card
RDataMining-reference-card
 

Dernier

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Dernier (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Time Series Analysis and Mining with R

  • 1. Time Series Analysis with R Yanchang Zhao http://www.RDataMining.com 30 September 2014 1 / 39
  • 2. Outline Introduction Time Series Decomposition Time Series Forecasting Time Series Clustering Time Series Classi
  • 4. Time Series Analysis with R 1 I time series data in R I time series decomposition, forecasting, clustering and classi
  • 5. cation I autoregressive integrated moving average (ARIMA) model I Dynamic Time Warping (DTW) I Discrete Wavelet Transform (DWT) I k-NN classi
  • 6. cation 1Chapter 8: Time Series Analysis and Mining, in book R and Data Mining: Examples and Case Studies. http://www.rdatamining.com/docs/RDataMining.pdf 3 / 39
  • 7. R I a free software environment for statistical computing and graphics I runs on Windows, Linux and MacOS I widely used in academia and research, as well as industrial applications I 5,800+ packages (as of 13 Sept 2014) I CRAN Task View: Time Series Analysis http://cran.r-project.org/web/views/TimeSeries.html 4 / 39
  • 8. Time Series Data in R I class ts I represents data which has been sampled at equispaced points in time I frequency=7: a weekly series I frequency=12: a monthly series I frequency=4: a quarterly series 5 / 39
  • 9. Time Series Data in R a <- ts(1:20, frequency = 12, start = c(2011, 3)) print(a) ## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec ## 2011 1 2 3 4 5 6 7 8 9 10 ## 2012 11 12 13 14 15 16 17 18 19 20 str(a) ## Time-Series [1:20] from 2011 to 2013: 1 2 3 4 5 6 7 8 9 10... attributes(a) ## $tsp ## [1] 2011 2013 12 ## ## $class ## [1] "ts" 6 / 39
  • 10. Outline Introduction Time Series Decomposition Time Series Forecasting Time Series Clustering Time Series Classi
  • 12. What is Time Series Decomposition To decompose a time series into components: I Trend component: long term trend I Seasonal component: seasonal variation I Cyclical component: repeated but non-periodic uctuations I Irregular component: the residuals 8 / 39
  • 13. Data AirPassengers Data AirPassengers: monthly totals of Box Jenkins international airline passengers, 1949 to 1960. It has 144(=1212) values. plot(AirPassengers) Time AirPassengers 1950 1952 1954 1956 1958 1960 100 200 300 400 500 600 9 / 39
  • 14. Decomposition apts - ts(AirPassengers, frequency = 12) f - decompose(apts) plot(f$figure, type = b) # seasonal figures 2 4 6 8 10 12 −40 −20 0 20 40 60 Index f$figure 10 / 39
  • 15. Decomposition plot(f) observed 150 250 350 450 100 300 500 trend seasonal −40 0 20 40 60 −40 0 20 40 60 2 4 6 8 10 12 random Time Decomposition of additive time series 11 / 39
  • 16. Outline Introduction Time Series Decomposition Time Series Forecasting Time Series Clustering Time Series Classi
  • 18. Time Series Forecasting I To forecast future events based on known past data I For example, to predict the price of a stock based on its past performance I Popular models I Autoregressive moving average (ARMA) I Autoregressive integrated moving average (ARIMA) 13 / 39
  • 19. Forecasting # build an ARIMA model fit - arima(AirPassengers, order = c(1, 0, 0), list(order = c(2, 1, 0), period = 12)) fore - predict(fit, n.ahead = 24) # error bounds at 95% confidence level U - fore$pred + 2 * fore$se L - fore$pred - 2 * fore$se ts.plot(AirPassengers, fore$pred, U, L, col = c(1, 2, 4, 4), lty = c(1, 1, 2, 2)) legend(topleft, col = c(1, 2, 4), lty = c(1, 1, 2), c(Actual, Forecast, Error Bounds (95% Confidence))) 14 / 39
  • 20. Forecasting 1950 1952 1954 1956 1958 1960 1962 Time 100 200 300 400 500 600 700 Actual Forecast Error Bounds (95% Confidence) 15 / 39
  • 21. Outline Introduction Time Series Decomposition Time Series Forecasting Time Series Clustering Time Series Classi
  • 23. Time Series Clustering I To partition time series data into groups based on similarity or distance, so that time series in the same cluster are similar I Measure of distance/dissimilarity I Euclidean distance I Manhattan distance I Maximum norm I Hamming distance I The angle between two vectors (inner product) I Dynamic Time Warping (DTW) distance I ... 17 / 39
  • 24. Dynamic Time Warping (DTW) DTW
  • 25. nds optimal alignment between two time series. library(dtw) idx - seq(0, 2 * pi, len = 100) a - sin(idx) + runif(100)/10 b - cos(idx) align - dtw(a, b, step = asymmetricP1, keep = T) dtwPlotTwoWay(align) Index Query value 0 20 40 60 80 100 −1.0 −0.5 0.0 0.5 1.0 18 / 39
  • 26. Synthetic Control Chart Time Series I The dataset contains 600 examples of control charts synthetically generated by the process in Alcock and Manolopoulos (1999). I Each control chart is a time series with 60 values. I Six classes: I 1-100 Normal I 101-200 Cyclic I 201-300 Increasing trend I 301-400 Decreasing trend I 401-500 Upward shift I 501-600 Downward shift I http://kdd.ics.uci.edu/databases/synthetic_control/synthetic_ control.html 19 / 39
  • 27. Synthetic Control Chart Time Series # read data into R sep='': the separator is white space, i.e., one # or more spaces, tabs, newlines or carriage returns sc - read.table(./data/synthetic_control.data, header = F, sep = ) # show one sample from each class idx - c(1, 101, 201, 301, 401, 501) sample1 - t(sc[idx, ]) plot.ts(sample1, main = ) 20 / 39
  • 28. Six Classes 1 24 26 28 30 32 34 36 101 15 20 25 30 35 40 45 25 30 35 40 45 0 10 20 30 40 50 60 201 Time 301 0 10 20 30 401 25 30 35 40 45 10 15 20 25 30 35 0 10 20 30 40 50 60 501 Time 21 / 39
  • 29. Hierarchical Clustering with Euclidean distance # sample n cases from every class n - 10 s - sample(1:100, n) idx - c(s, 100 + s, 200 + s, 300 + s, 400 + s, 500 + s) sample2 - sc[idx, ] observedLabels - rep(1:6, each = n) # hierarchical clustering with Euclidean distance hc - hclust(dist(sample2), method = ave) plot(hc, labels = observedLabels, main = ) 22 / 39
  • 30. Hierarchical Clustering with Euclidean distance 20 40 60 80 100 120 140 4 6 66 1 22 11 11 11 1 11 66 6 6 6 66 4 4 44 44 4 44 33 33 33 5 5 55 55 33 33 55 55 22 2 22 2 22 hclust (*, average) dist(sample2) Height 23 / 39
  • 31. Hierarchical Clustering with Euclidean distance # cut tree to get 8 clusters memb - cutree(hc, k = 8) table(observedLabels, memb) ## memb ## observedLabels 1 2 3 4 5 6 7 8 ## 1 10 0 0 0 0 0 0 0 ## 2 0 3 1 1 3 2 0 0 ## 3 0 0 0 0 0 0 10 0 ## 4 0 0 0 0 0 0 0 10 ## 5 0 0 0 0 0 0 10 0 ## 6 0 0 0 0 0 0 0 10 24 / 39
  • 32. Hierarchical Clustering with DTW Distance myDist - dist(sample2, method = DTW) hc - hclust(myDist, method = average) plot(hc, labels = observedLabels, main = ) # cut tree to get 8 clusters memb - cutree(hc, k = 8) table(observedLabels, memb) ## memb ## observedLabels 1 2 3 4 5 6 7 8 ## 1 10 0 0 0 0 0 0 0 ## 2 0 4 3 2 1 0 0 0 ## 3 0 0 0 0 0 6 4 0 ## 4 0 0 0 0 0 0 0 10 ## 5 0 0 0 0 0 0 10 0 ## 6 0 0 0 0 0 0 0 10 25 / 39
  • 33. Hierarchical Clustering with DTW Distance 0 200 400 600 800 1000 3 33 3 33 55 33 33 5 55 5 55 55 66 6 46 4 44 44 44 44 66 66 66 11 111 11 111 22 22 22 22 22 hclust (*, average) myDist Height 26 / 39
  • 34. Outline Introduction Time Series Decomposition Time Series Forecasting Time Series Clustering Time Series Classi
  • 38. cation I To build a classi
  • 39. cation model based on labelled time series I and then use the model to predict the lable of unlabelled time series Feature Extraction I Singular Value Decomposition (SVD) I Discrete Fourier Transform (DFT) I Discrete Wavelet Transform (DWT) I Piecewise Aggregate Approximation (PAA) I Perpetually Important Points (PIP) I Piecewise Linear Representation I Symbolic Representation 28 / 39
  • 40. Decision Tree (ctree) ctree from package party classId - rep(as.character(1:6), each = 100) newSc - data.frame(cbind(classId, sc)) library(party) ct - ctree(classId ~ ., data = newSc, controls = ctree_control(minsplit = 20, minbucket = 5, maxdepth = 5)) 29 / 39
  • 41. Decision Tree pClassId - predict(ct) table(classId, pClassId) ## pClassId ## classId 1 2 3 4 5 6 ## 1 100 0 0 0 0 0 ## 2 1 97 2 0 0 0 ## 3 0 0 99 0 1 0 ## 4 0 0 0 100 0 0 ## 5 4 0 8 0 88 0 ## 6 0 3 0 90 0 7 # accuracy (sum(classId == pClassId))/nrow(sc) ## [1] 0.8183 30 / 39
  • 42. DWT (Discrete Wavelet Transform) I Wavelet transform provides a multi-resolution representation using wavelets. I Haar Wavelet Transform { the simplest DWT http://dmr.ath.cx/gfx/haar/ I DFT (Discrete Fourier Transform): another popular feature extraction technique 31 / 39
  • 43. DWT (Discrete Wavelet Transform) # extract DWT (with Haar filter) coefficients library(wavelets) wtData - NULL for (i in 1:nrow(sc)) f a - t(sc[i, ]) wt - dwt(a, filter = haar, boundary = periodic) wtData - rbind(wtData, unlist(c(wt@W, wt@V[[wt@level]]))) g wtData - as.data.frame(wtData) wtSc - data.frame(cbind(classId, wtData)) 32 / 39
  • 44. Decision Tree with DWT ct - ctree(classId ~ ., data = wtSc, controls = ctree_control(minsplit=20, minbucket=5, maxdepth=5)) pClassId - predict(ct) table(classId, pClassId) ## pClassId ## classId 1 2 3 4 5 6 ## 1 98 2 0 0 0 0 ## 2 1 99 0 0 0 0 ## 3 0 0 81 0 19 0 ## 4 0 0 0 74 0 26 ## 5 0 0 16 0 84 0 ## 6 0 0 0 3 0 97 (sum(classId==pClassId)) / nrow(wtSc) ## [1] 0.8883 33 / 39
  • 45. plot(ct, ip_args = list(pval = F), ep_args = list(digits = 0)) 1 V57 £ 117 117 2 W43 £ −4 −4 3 W5 £ −8 −8 Node 4 (n = 68) 123456 1 0.8 0.6 0.4 0.2 0 Node 5 (n = 6) 123456 1 0.8 0.6 0.4 0.2 0 6 W31 £ −6 −6 Node 7 (n = 9) 123456 1 0.8 0.6 0.4 0.2 0 Node 8 (n = 86) 123456 1 0.8 0.6 0.4 0.2 0 9 V57 £ 140 140 Node 10 (n = 31) 123456 1 0.8 0.6 0.4 0.2 0 11 V57 £ 178 178 12 W22 £ −6 −6 Node 13 (n = 80) 123456 1 0.8 0.6 0.4 0.2 0 14 W31 £ −13 −13 Node 15 (n = 9) 123456 1 0.8 0.6 0.4 0.2 0 Node 16 (n = 99) 123456 1 0.8 0.6 0.4 0.2 0 17 W31 £ −15 −15 Node 18 (n = 12) 123456 1 0.8 0.6 0.4 0.2 0 19 W43 £ 3 3 Node 20 (n = 103) 123456 1 0.8 0.6 0.4 0.2 0 Node 21 (n = 97) 123456 1 0.8 0.6 0.4 0.2 0 34 / 39
  • 48. nd the k nearest neighbours of a new instance I label it by majority voting I needs an ecient indexing structure for large datasets k - 20 newTS - sc[501, ] + runif(100) * 15 distances - dist(newTS, sc, method = DTW) s - sort(as.vector(distances), index.return = TRUE) # class IDs of k nearest neighbours table(classId[s$ix[1:k]]) ## ## 4 6 ## 3 17 35 / 39
  • 51. nd the k nearest neighbours of a new instance I label it by majority voting I needs an ecient indexing structure for large datasets k - 20 newTS - sc[501, ] + runif(100) * 15 distances - dist(newTS, sc, method = DTW) s - sort(as.vector(distances), index.return = TRUE) # class IDs of k nearest neighbours table(classId[s$ix[1:k]]) ## ## 4 6 ## 3 17 Results of majority voting: class 6 35 / 39
  • 52. The TSclust Package I TSclust: a package for time seriesclustering 2 I measures of dissimilarity between time series to perform time series clustering. I metrics based on raw data, on generating models and on the forecast behavior I time series clustering algorithms and cluster evaluation metrics 2http://cran.r-project.org/web/packages/TSclust/ 36 / 39
  • 53. Outline Introduction Time Series Decomposition Time Series Forecasting Time Series Clustering Time Series Classi
  • 55. Online Resources I Chapter 8: Time Series Analysis and Mining, in book R and Data Mining: Examples and Case Studies http://www.rdatamining.com/docs/RDataMining.pdf I R Reference Card for Data Mining http://www.rdatamining.com/docs/R-refcard-data-mining.pdf I Free online courses and documents http://www.rdatamining.com/resources/ I RDataMining Group on LinkedIn (7,000+ members) http://group.rdatamining.com I RDataMining on Twitter (1,700+ followers) @RDataMining 38 / 39
  • 56. The End Thanks! Email: yanchang(at)rdatamining.com 39 / 39