SlideShare une entreprise Scribd logo
1  sur  40
Télécharger pour lire hors ligne
Time Series Analysis with R 
Yanchang Zhao 
http://www.RDataMining.com 
30 September 2014 
1 / 39
Outline 
Introduction 
Time Series Decomposition 
Time Series Forecasting 
Time Series Clustering 
Time Series Classi
cation 
Online Resources 
2 / 39
Time Series Analysis with R 1 
I time series data in R 
I time series decomposition, forecasting, clustering and 
classi
cation 
I autoregressive integrated moving average (ARIMA) model 
I Dynamic Time Warping (DTW) 
I Discrete Wavelet Transform (DWT) 
I k-NN classi
cation 
1Chapter 8: Time Series Analysis and Mining, in book 
R and Data Mining: Examples and Case Studies. 
http://www.rdatamining.com/docs/RDataMining.pdf 
3 / 39
R 
I a free software environment for statistical computing and 
graphics 
I runs on Windows, Linux and MacOS 
I widely used in academia and research, as well as industrial 
applications 
I 5,800+ packages (as of 13 Sept 2014) 
I CRAN Task View: Time Series Analysis 
http://cran.r-project.org/web/views/TimeSeries.html 
4 / 39
Time Series Data in R 
I class ts 
I represents data which has been sampled at equispaced points 
in time 
I frequency=7: a weekly series 
I frequency=12: a monthly series 
I frequency=4: a quarterly series 
5 / 39
Time Series Data in R 
a <- ts(1:20, frequency = 12, start = c(2011, 3)) 
print(a) 
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 
## 2011 1 2 3 4 5 6 7 8 9 10 
## 2012 11 12 13 14 15 16 17 18 19 20 
str(a) 
## Time-Series [1:20] from 2011 to 2013: 1 2 3 4 5 6 7 8 9 10... 
attributes(a) 
## $tsp 
## [1] 2011 2013 12 
## 
## $class 
## [1] "ts" 
6 / 39
Outline 
Introduction 
Time Series Decomposition 
Time Series Forecasting 
Time Series Clustering 
Time Series Classi
cation 
Online Resources 
7 / 39
What is Time Series Decomposition 
To decompose a time series into components: 
I Trend component: long term trend 
I Seasonal component: seasonal variation 
I Cyclical component: repeated but non-periodic 
uctuations 
I Irregular component: the residuals 
8 / 39
Data AirPassengers 
Data AirPassengers: monthly totals of Box Jenkins international 
airline passengers, 1949 to 1960. It has 144(=1212) values. 
plot(AirPassengers) 
Time 
AirPassengers 
1950 1952 1954 1956 1958 1960 
100 200 300 400 500 600 
9 / 39
Decomposition 
apts - ts(AirPassengers, frequency = 12) 
f - decompose(apts) 
plot(f$figure, type = b) # seasonal figures 
2 4 6 8 10 12 
−40 −20 0 20 40 60 
Index 
f$figure 
10 / 39
Decomposition 
plot(f) 
observed 
150 250 350 450 
100 300 500 
trend 
seasonal 
−40 0 20 40 60 
−40 0 20 40 60 
2 4 6 8 10 12 
random 
Time 
Decomposition of additive time series 
11 / 39
Outline 
Introduction 
Time Series Decomposition 
Time Series Forecasting 
Time Series Clustering 
Time Series Classi
cation 
Online Resources 
12 / 39
Time Series Forecasting 
I To forecast future events based on known past data 
I For example, to predict the price of a stock based on its past 
performance 
I Popular models 
I Autoregressive moving average (ARMA) 
I Autoregressive integrated moving average (ARIMA) 
13 / 39
Forecasting 
# build an ARIMA model 
fit - arima(AirPassengers, order = c(1, 0, 0), list(order = c(2, 
1, 0), period = 12)) 
fore - predict(fit, n.ahead = 24) 
# error bounds at 95% confidence level 
U - fore$pred + 2 * fore$se 
L - fore$pred - 2 * fore$se 
ts.plot(AirPassengers, fore$pred, U, L, 
col = c(1, 2, 4, 4), lty = c(1, 1, 2, 2)) 
legend(topleft, col = c(1, 2, 4), lty = c(1, 1, 2), 
c(Actual, Forecast, Error Bounds (95% Confidence))) 
14 / 39
Forecasting 
1950 1952 1954 1956 1958 1960 1962 
Time 
100 200 300 400 500 600 700 
Actual 
Forecast 
Error Bounds (95% Confidence) 
15 / 39
Outline 
Introduction 
Time Series Decomposition 
Time Series Forecasting 
Time Series Clustering 
Time Series Classi
cation 
Online Resources 
16 / 39
Time Series Clustering 
I To partition time series data into groups based on similarity or 
distance, so that time series in the same cluster are similar 
I Measure of distance/dissimilarity 
I Euclidean distance 
I Manhattan distance 
I Maximum norm 
I Hamming distance 
I The angle between two vectors (inner product) 
I Dynamic Time Warping (DTW) distance 
I ... 
17 / 39
Dynamic Time Warping (DTW) 
DTW
nds optimal alignment between two time series. 
library(dtw) 
idx - seq(0, 2 * pi, len = 100) 
a - sin(idx) + runif(100)/10 
b - cos(idx) 
align - dtw(a, b, step = asymmetricP1, keep = T) 
dtwPlotTwoWay(align) 
Index 
Query value 
0 20 40 60 80 100 
−1.0 −0.5 0.0 0.5 1.0 
18 / 39
Synthetic Control Chart Time Series 
I The dataset contains 600 examples of control charts 
synthetically generated by the process in Alcock and 
Manolopoulos (1999). 
I Each control chart is a time series with 60 values. 
I Six classes: 
I 1-100 Normal 
I 101-200 Cyclic 
I 201-300 Increasing trend 
I 301-400 Decreasing trend 
I 401-500 Upward shift 
I 501-600 Downward shift 
I http://kdd.ics.uci.edu/databases/synthetic_control/synthetic_ 
control.html 
19 / 39
Synthetic Control Chart Time Series 
# read data into R sep='': the separator is white space, i.e., one 
# or more spaces, tabs, newlines or carriage returns 
sc - read.table(./data/synthetic_control.data, header = F, sep = ) 
# show one sample from each class 
idx - c(1, 101, 201, 301, 401, 501) 
sample1 - t(sc[idx, ]) 
plot.ts(sample1, main = ) 
20 / 39
Six Classes 
1 
24 26 28 30 32 34 36 
101 
15 20 25 30 35 40 45 
25 30 35 40 45 
0 10 20 30 40 50 60 
201 
Time 
301 
0 10 20 30 
401 
25 30 35 40 45 
10 15 20 25 30 35 
0 10 20 30 40 50 60 
501 
Time 
21 / 39
Hierarchical Clustering with Euclidean distance 
# sample n cases from every class 
n - 10 
s - sample(1:100, n) 
idx - c(s, 100 + s, 200 + s, 300 + s, 400 + s, 500 + s) 
sample2 - sc[idx, ] 
observedLabels - rep(1:6, each = n) 
# hierarchical clustering with Euclidean distance 
hc - hclust(dist(sample2), method = ave) 
plot(hc, labels = observedLabels, main = ) 
22 / 39
Hierarchical Clustering with Euclidean distance 
20 40 60 80 100 120 140 
4 6 66 
1 22 11 
11 11 
1 11 66 
6 6 6 66 
4 4 44 
44 
4 44 
33 
33 33 
5 5 55 
55 
33 
33 
55 
55 
22 
2 22 
2 22 
hclust (*, average) 
dist(sample2) 
Height 
23 / 39
Hierarchical Clustering with Euclidean distance 
# cut tree to get 8 clusters 
memb - cutree(hc, k = 8) 
table(observedLabels, memb) 
## memb 
## observedLabels 1 2 3 4 5 6 7 8 
## 1 10 0 0 0 0 0 0 0 
## 2 0 3 1 1 3 2 0 0 
## 3 0 0 0 0 0 0 10 0 
## 4 0 0 0 0 0 0 0 10 
## 5 0 0 0 0 0 0 10 0 
## 6 0 0 0 0 0 0 0 10 
24 / 39
Hierarchical Clustering with DTW Distance 
myDist - dist(sample2, method = DTW) 
hc - hclust(myDist, method = average) 
plot(hc, labels = observedLabels, main = ) 
# cut tree to get 8 clusters 
memb - cutree(hc, k = 8) 
table(observedLabels, memb) 
## memb 
## observedLabels 1 2 3 4 5 6 7 8 
## 1 10 0 0 0 0 0 0 0 
## 2 0 4 3 2 1 0 0 0 
## 3 0 0 0 0 0 6 4 0 
## 4 0 0 0 0 0 0 0 10 
## 5 0 0 0 0 0 0 10 0 
## 6 0 0 0 0 0 0 0 10 
25 / 39
Hierarchical Clustering with DTW Distance 
0 200 400 600 800 1000 
3 33 
3 33 
55 
33 33 
5 55 
5 55 
55 
66 
6 46 
4 44 
44 
44 
44 
66 66 66 
11 111 
11 111 
22 22 
22 
22 
22 
hclust (*, average) 
myDist 
Height 
26 / 39
Outline 
Introduction 
Time Series Decomposition 
Time Series Forecasting 
Time Series Clustering 
Time Series Classi
cation 
Online Resources 
27 / 39
Time Series Classi
cation 
Time Series Classi
cation 
I To build a classi
cation model based on labelled time series 
I and then use the model to predict the lable of unlabelled time 
series 
Feature Extraction 
I Singular Value Decomposition (SVD) 
I Discrete Fourier Transform (DFT) 
I Discrete Wavelet Transform (DWT) 
I Piecewise Aggregate Approximation (PAA) 
I Perpetually Important Points (PIP) 
I Piecewise Linear Representation 
I Symbolic Representation 
28 / 39
Decision Tree (ctree) 
ctree from package party 
classId - rep(as.character(1:6), each = 100) 
newSc - data.frame(cbind(classId, sc)) 
library(party) 
ct - ctree(classId ~ ., data = newSc, 
controls = ctree_control(minsplit = 20, 
minbucket = 5, maxdepth = 5)) 
29 / 39

Contenu connexe

Tendances

Mba 532 2011_part_3_time_series_analysis
Mba 532 2011_part_3_time_series_analysisMba 532 2011_part_3_time_series_analysis
Mba 532 2011_part_3_time_series_analysisChandra Kodituwakku
 
Arima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet Mahana
Arima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet MahanaArima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet Mahana
Arima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet MahanaAmrinder Arora
 
ANOVA in R by Aman Chauhan
ANOVA in R by Aman ChauhanANOVA in R by Aman Chauhan
ANOVA in R by Aman ChauhanAman Chauhan
 
Time Series - Auto Regressive Models
Time Series - Auto Regressive ModelsTime Series - Auto Regressive Models
Time Series - Auto Regressive ModelsBhaskar T
 
Categorical data analysis
Categorical data analysisCategorical data analysis
Categorical data analysisSumit Das
 
Arima model (time series)
Arima model (time series)Arima model (time series)
Arima model (time series)Kumar P
 
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...Simplilearn
 
Time Series In R | Time Series Forecasting | Time Series Analysis | Data Scie...
Time Series In R | Time Series Forecasting | Time Series Analysis | Data Scie...Time Series In R | Time Series Forecasting | Time Series Analysis | Data Scie...
Time Series In R | Time Series Forecasting | Time Series Analysis | Data Scie...Edureka!
 
Time Series Analysis - Modeling and Forecasting
Time Series Analysis - Modeling and ForecastingTime Series Analysis - Modeling and Forecasting
Time Series Analysis - Modeling and ForecastingMaruthi Nataraj K
 
Data visualization using R
Data visualization using RData visualization using R
Data visualization using RUmmiya Mohammedi
 
K-Folds Cross Validation Method
K-Folds Cross Validation MethodK-Folds Cross Validation Method
K-Folds Cross Validation MethodSHUBHAM GUPTA
 

Tendances (20)

Resampling methods
Resampling methodsResampling methods
Resampling methods
 
Mba 532 2011_part_3_time_series_analysis
Mba 532 2011_part_3_time_series_analysisMba 532 2011_part_3_time_series_analysis
Mba 532 2011_part_3_time_series_analysis
 
Arima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet Mahana
Arima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet MahanaArima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet Mahana
Arima Forecasting - Presentation by Sera Cresta, Nora Alosaimi and Puneet Mahana
 
3 Data Structure in R
3 Data Structure in R3 Data Structure in R
3 Data Structure in R
 
Time series
Time seriesTime series
Time series
 
Timeseries forecasting
Timeseries forecastingTimeseries forecasting
Timeseries forecasting
 
ANOVA in R by Aman Chauhan
ANOVA in R by Aman ChauhanANOVA in R by Aman Chauhan
ANOVA in R by Aman Chauhan
 
Time Series - Auto Regressive Models
Time Series - Auto Regressive ModelsTime Series - Auto Regressive Models
Time Series - Auto Regressive Models
 
Roc auc curve
Roc auc curveRoc auc curve
Roc auc curve
 
Categorical data analysis
Categorical data analysisCategorical data analysis
Categorical data analysis
 
Arima model (time series)
Arima model (time series)Arima model (time series)
Arima model (time series)
 
Time series Analysis
Time series AnalysisTime series Analysis
Time series Analysis
 
Time series
Time seriesTime series
Time series
 
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
Time Series Analysis - 2 | Time Series in R | ARIMA Model Forecasting | Data ...
 
forecasting technique.pptx
forecasting technique.pptxforecasting technique.pptx
forecasting technique.pptx
 
Time Series In R | Time Series Forecasting | Time Series Analysis | Data Scie...
Time Series In R | Time Series Forecasting | Time Series Analysis | Data Scie...Time Series In R | Time Series Forecasting | Time Series Analysis | Data Scie...
Time Series In R | Time Series Forecasting | Time Series Analysis | Data Scie...
 
Time Series Analysis Ravi
Time Series Analysis RaviTime Series Analysis Ravi
Time Series Analysis Ravi
 
Time Series Analysis - Modeling and Forecasting
Time Series Analysis - Modeling and ForecastingTime Series Analysis - Modeling and Forecasting
Time Series Analysis - Modeling and Forecasting
 
Data visualization using R
Data visualization using RData visualization using R
Data visualization using R
 
K-Folds Cross Validation Method
K-Folds Cross Validation MethodK-Folds Cross Validation Method
K-Folds Cross Validation Method
 

En vedette

Text Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataText Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataYanchang Zhao
 
Time series-mining-slides
Time series-mining-slidesTime series-mining-slides
Time series-mining-slidesYanchang Zhao
 
Data Clustering with R
Data Clustering with RData Clustering with R
Data Clustering with RYanchang Zhao
 
Data Exploration and Visualization with R
Data Exploration and Visualization with RData Exploration and Visualization with R
Data Exploration and Visualization with RYanchang Zhao
 
R Reference Card for Data Mining
R Reference Card for Data MiningR Reference Card for Data Mining
R Reference Card for Data MiningYanchang Zhao
 
Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with RYanchang Zhao
 
Time Series
Time SeriesTime Series
Time Seriesyush313
 
Association Rule Mining with R
Association Rule Mining with RAssociation Rule Mining with R
Association Rule Mining with RYanchang Zhao
 
Multivariate time series
Multivariate time seriesMultivariate time series
Multivariate time seriesLuigi Piva CQF
 
Time series data mining techniques
Time series data mining techniquesTime series data mining techniques
Time series data mining techniquesShanmukha S. Potti
 
Time Series Analysis: Theory and Practice
Time Series Analysis: Theory and PracticeTime Series Analysis: Theory and Practice
Time Series Analysis: Theory and PracticeTetiana Ivanova
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RYanchang Zhao
 
Course - Machine Learning Basics with R
Course - Machine Learning Basics with R Course - Machine Learning Basics with R
Course - Machine Learning Basics with R Persontyle
 
An Introduction to Data Mining with R
An Introduction to Data Mining with RAn Introduction to Data Mining with R
An Introduction to Data Mining with RYanchang Zhao
 
hands on: Text Mining With R
hands on: Text Mining With Rhands on: Text Mining With R
hands on: Text Mining With RJahnab Kumar Deka
 

En vedette (20)

Timeseries Analysis with R
Timeseries Analysis with RTimeseries Analysis with R
Timeseries Analysis with R
 
INTRODUCTION TO TIME SERIES ANALYSIS WITH “R” JUNE 2014
INTRODUCTION TO TIME SERIES ANALYSIS WITH “R” JUNE 2014INTRODUCTION TO TIME SERIES ANALYSIS WITH “R” JUNE 2014
INTRODUCTION TO TIME SERIES ANALYSIS WITH “R” JUNE 2014
 
Text Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataText Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter Data
 
time series analysis
time series analysistime series analysis
time series analysis
 
Time series-mining-slides
Time series-mining-slidesTime series-mining-slides
Time series-mining-slides
 
Data Clustering with R
Data Clustering with RData Clustering with R
Data Clustering with R
 
Data Exploration and Visualization with R
Data Exploration and Visualization with RData Exploration and Visualization with R
Data Exploration and Visualization with R
 
R Reference Card for Data Mining
R Reference Card for Data MiningR Reference Card for Data Mining
R Reference Card for Data Mining
 
Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with R
 
Time Series
Time SeriesTime Series
Time Series
 
Association Rule Mining with R
Association Rule Mining with RAssociation Rule Mining with R
Association Rule Mining with R
 
Multivariate time series
Multivariate time seriesMultivariate time series
Multivariate time series
 
Time series data mining techniques
Time series data mining techniquesTime series data mining techniques
Time series data mining techniques
 
Time Series Analysis: Theory and Practice
Time Series Analysis: Theory and PracticeTime Series Analysis: Theory and Practice
Time Series Analysis: Theory and Practice
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in R
 
Course - Machine Learning Basics with R
Course - Machine Learning Basics with R Course - Machine Learning Basics with R
Course - Machine Learning Basics with R
 
An Introduction to Data Mining with R
An Introduction to Data Mining with RAn Introduction to Data Mining with R
An Introduction to Data Mining with R
 
hands on: Text Mining With R
hands on: Text Mining With Rhands on: Text Mining With R
hands on: Text Mining With R
 
TextMining with R
TextMining with RTextMining with R
TextMining with R
 
Time series slideshare
Time series slideshareTime series slideshare
Time series slideshare
 

Similaire à Time Series Analysis and Mining with R

R data mining-Time Series Analysis with R
R data mining-Time Series Analysis with RR data mining-Time Series Analysis with R
R data mining-Time Series Analysis with RDr. Volkan OBAN
 
RDataMining slides-time-series-analysis
RDataMining slides-time-series-analysisRDataMining slides-time-series-analysis
RDataMining slides-time-series-analysisYanchang Zhao
 
Table of Useful R commands.
Table of Useful R commands.Table of Useful R commands.
Table of Useful R commands.Dr. Volkan OBAN
 
Basic R Data Manipulation
Basic R Data ManipulationBasic R Data Manipulation
Basic R Data ManipulationChu An
 
Complex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsComplex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsPeter Solymos
 
Data manipulation with dplyr
Data manipulation with dplyrData manipulation with dplyr
Data manipulation with dplyrRomain Francois
 
talk at Virginia Bioinformatics Institute, December 5, 2013
talk at Virginia Bioinformatics Institute, December 5, 2013talk at Virginia Bioinformatics Institute, December 5, 2013
talk at Virginia Bioinformatics Institute, December 5, 2013ericupnorth
 
ggtimeseries-->ggplot2 extensions
ggtimeseries-->ggplot2 extensions ggtimeseries-->ggplot2 extensions
ggtimeseries-->ggplot2 extensions Dr. Volkan OBAN
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryDatabricks
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryDatabricks
 
C PROGRAMS - SARASWATHI RAMALINGAM
C PROGRAMS - SARASWATHI RAMALINGAMC PROGRAMS - SARASWATHI RAMALINGAM
C PROGRAMS - SARASWATHI RAMALINGAMSaraswathiRamalingam
 
Do snow.rwn
Do snow.rwnDo snow.rwn
Do snow.rwnARUN DN
 
Assignment 5.2.pdf
Assignment 5.2.pdfAssignment 5.2.pdf
Assignment 5.2.pdfdash41
 

Similaire à Time Series Analysis and Mining with R (20)

R data mining-Time Series Analysis with R
R data mining-Time Series Analysis with RR data mining-Time Series Analysis with R
R data mining-Time Series Analysis with R
 
RDataMining slides-time-series-analysis
RDataMining slides-time-series-analysisRDataMining slides-time-series-analysis
RDataMining slides-time-series-analysis
 
Table of Useful R commands.
Table of Useful R commands.Table of Useful R commands.
Table of Useful R commands.
 
R programming language
R programming languageR programming language
R programming language
 
Basic R Data Manipulation
Basic R Data ManipulationBasic R Data Manipulation
Basic R Data Manipulation
 
Complex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsComplex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutions
 
Seminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mmeSeminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mme
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
Data manipulation with dplyr
Data manipulation with dplyrData manipulation with dplyr
Data manipulation with dplyr
 
talk at Virginia Bioinformatics Institute, December 5, 2013
talk at Virginia Bioinformatics Institute, December 5, 2013talk at Virginia Bioinformatics Institute, December 5, 2013
talk at Virginia Bioinformatics Institute, December 5, 2013
 
R Programming Homework Help
R Programming Homework HelpR Programming Homework Help
R Programming Homework Help
 
ggtimeseries-->ggplot2 extensions
ggtimeseries-->ggplot2 extensions ggtimeseries-->ggplot2 extensions
ggtimeseries-->ggplot2 extensions
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love Story
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love Story
 
C PROGRAMS - SARASWATHI RAMALINGAM
C PROGRAMS - SARASWATHI RAMALINGAMC PROGRAMS - SARASWATHI RAMALINGAM
C PROGRAMS - SARASWATHI RAMALINGAM
 
R and data mining
R and data miningR and data mining
R and data mining
 
Do snow.rwn
Do snow.rwnDo snow.rwn
Do snow.rwn
 
Learn Matlab
Learn MatlabLearn Matlab
Learn Matlab
 
Assignment 5.2.pdf
Assignment 5.2.pdfAssignment 5.2.pdf
Assignment 5.2.pdf
 

Plus de Yanchang Zhao

RDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rRDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rYanchang Zhao
 
RDataMining slides-regression-classification
RDataMining slides-regression-classificationRDataMining slides-regression-classification
RDataMining slides-regression-classificationYanchang Zhao
 
RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programmingYanchang Zhao
 
RDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-rRDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-rYanchang Zhao
 
RDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisationRDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisationYanchang Zhao
 
RDataMining slides-clustering-with-r
RDataMining slides-clustering-with-rRDataMining slides-clustering-with-r
RDataMining slides-clustering-with-rYanchang Zhao
 
RDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-rRDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-rYanchang Zhao
 
RDataMining-reference-card
RDataMining-reference-cardRDataMining-reference-card
RDataMining-reference-cardYanchang Zhao
 

Plus de Yanchang Zhao (8)

RDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rRDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-r
 
RDataMining slides-regression-classification
RDataMining slides-regression-classificationRDataMining slides-regression-classification
RDataMining slides-regression-classification
 
RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programming
 
RDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-rRDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-r
 
RDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisationRDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisation
 
RDataMining slides-clustering-with-r
RDataMining slides-clustering-with-rRDataMining slides-clustering-with-r
RDataMining slides-clustering-with-r
 
RDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-rRDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-r
 
RDataMining-reference-card
RDataMining-reference-cardRDataMining-reference-card
RDataMining-reference-card
 

Dernier

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Dernier (20)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

Time Series Analysis and Mining with R

  • 1. Time Series Analysis with R Yanchang Zhao http://www.RDataMining.com 30 September 2014 1 / 39
  • 2. Outline Introduction Time Series Decomposition Time Series Forecasting Time Series Clustering Time Series Classi
  • 4. Time Series Analysis with R 1 I time series data in R I time series decomposition, forecasting, clustering and classi
  • 5. cation I autoregressive integrated moving average (ARIMA) model I Dynamic Time Warping (DTW) I Discrete Wavelet Transform (DWT) I k-NN classi
  • 6. cation 1Chapter 8: Time Series Analysis and Mining, in book R and Data Mining: Examples and Case Studies. http://www.rdatamining.com/docs/RDataMining.pdf 3 / 39
  • 7. R I a free software environment for statistical computing and graphics I runs on Windows, Linux and MacOS I widely used in academia and research, as well as industrial applications I 5,800+ packages (as of 13 Sept 2014) I CRAN Task View: Time Series Analysis http://cran.r-project.org/web/views/TimeSeries.html 4 / 39
  • 8. Time Series Data in R I class ts I represents data which has been sampled at equispaced points in time I frequency=7: a weekly series I frequency=12: a monthly series I frequency=4: a quarterly series 5 / 39
  • 9. Time Series Data in R a <- ts(1:20, frequency = 12, start = c(2011, 3)) print(a) ## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec ## 2011 1 2 3 4 5 6 7 8 9 10 ## 2012 11 12 13 14 15 16 17 18 19 20 str(a) ## Time-Series [1:20] from 2011 to 2013: 1 2 3 4 5 6 7 8 9 10... attributes(a) ## $tsp ## [1] 2011 2013 12 ## ## $class ## [1] "ts" 6 / 39
  • 10. Outline Introduction Time Series Decomposition Time Series Forecasting Time Series Clustering Time Series Classi
  • 12. What is Time Series Decomposition To decompose a time series into components: I Trend component: long term trend I Seasonal component: seasonal variation I Cyclical component: repeated but non-periodic uctuations I Irregular component: the residuals 8 / 39
  • 13. Data AirPassengers Data AirPassengers: monthly totals of Box Jenkins international airline passengers, 1949 to 1960. It has 144(=1212) values. plot(AirPassengers) Time AirPassengers 1950 1952 1954 1956 1958 1960 100 200 300 400 500 600 9 / 39
  • 14. Decomposition apts - ts(AirPassengers, frequency = 12) f - decompose(apts) plot(f$figure, type = b) # seasonal figures 2 4 6 8 10 12 −40 −20 0 20 40 60 Index f$figure 10 / 39
  • 15. Decomposition plot(f) observed 150 250 350 450 100 300 500 trend seasonal −40 0 20 40 60 −40 0 20 40 60 2 4 6 8 10 12 random Time Decomposition of additive time series 11 / 39
  • 16. Outline Introduction Time Series Decomposition Time Series Forecasting Time Series Clustering Time Series Classi
  • 18. Time Series Forecasting I To forecast future events based on known past data I For example, to predict the price of a stock based on its past performance I Popular models I Autoregressive moving average (ARMA) I Autoregressive integrated moving average (ARIMA) 13 / 39
  • 19. Forecasting # build an ARIMA model fit - arima(AirPassengers, order = c(1, 0, 0), list(order = c(2, 1, 0), period = 12)) fore - predict(fit, n.ahead = 24) # error bounds at 95% confidence level U - fore$pred + 2 * fore$se L - fore$pred - 2 * fore$se ts.plot(AirPassengers, fore$pred, U, L, col = c(1, 2, 4, 4), lty = c(1, 1, 2, 2)) legend(topleft, col = c(1, 2, 4), lty = c(1, 1, 2), c(Actual, Forecast, Error Bounds (95% Confidence))) 14 / 39
  • 20. Forecasting 1950 1952 1954 1956 1958 1960 1962 Time 100 200 300 400 500 600 700 Actual Forecast Error Bounds (95% Confidence) 15 / 39
  • 21. Outline Introduction Time Series Decomposition Time Series Forecasting Time Series Clustering Time Series Classi
  • 23. Time Series Clustering I To partition time series data into groups based on similarity or distance, so that time series in the same cluster are similar I Measure of distance/dissimilarity I Euclidean distance I Manhattan distance I Maximum norm I Hamming distance I The angle between two vectors (inner product) I Dynamic Time Warping (DTW) distance I ... 17 / 39
  • 24. Dynamic Time Warping (DTW) DTW
  • 25. nds optimal alignment between two time series. library(dtw) idx - seq(0, 2 * pi, len = 100) a - sin(idx) + runif(100)/10 b - cos(idx) align - dtw(a, b, step = asymmetricP1, keep = T) dtwPlotTwoWay(align) Index Query value 0 20 40 60 80 100 −1.0 −0.5 0.0 0.5 1.0 18 / 39
  • 26. Synthetic Control Chart Time Series I The dataset contains 600 examples of control charts synthetically generated by the process in Alcock and Manolopoulos (1999). I Each control chart is a time series with 60 values. I Six classes: I 1-100 Normal I 101-200 Cyclic I 201-300 Increasing trend I 301-400 Decreasing trend I 401-500 Upward shift I 501-600 Downward shift I http://kdd.ics.uci.edu/databases/synthetic_control/synthetic_ control.html 19 / 39
  • 27. Synthetic Control Chart Time Series # read data into R sep='': the separator is white space, i.e., one # or more spaces, tabs, newlines or carriage returns sc - read.table(./data/synthetic_control.data, header = F, sep = ) # show one sample from each class idx - c(1, 101, 201, 301, 401, 501) sample1 - t(sc[idx, ]) plot.ts(sample1, main = ) 20 / 39
  • 28. Six Classes 1 24 26 28 30 32 34 36 101 15 20 25 30 35 40 45 25 30 35 40 45 0 10 20 30 40 50 60 201 Time 301 0 10 20 30 401 25 30 35 40 45 10 15 20 25 30 35 0 10 20 30 40 50 60 501 Time 21 / 39
  • 29. Hierarchical Clustering with Euclidean distance # sample n cases from every class n - 10 s - sample(1:100, n) idx - c(s, 100 + s, 200 + s, 300 + s, 400 + s, 500 + s) sample2 - sc[idx, ] observedLabels - rep(1:6, each = n) # hierarchical clustering with Euclidean distance hc - hclust(dist(sample2), method = ave) plot(hc, labels = observedLabels, main = ) 22 / 39
  • 30. Hierarchical Clustering with Euclidean distance 20 40 60 80 100 120 140 4 6 66 1 22 11 11 11 1 11 66 6 6 6 66 4 4 44 44 4 44 33 33 33 5 5 55 55 33 33 55 55 22 2 22 2 22 hclust (*, average) dist(sample2) Height 23 / 39
  • 31. Hierarchical Clustering with Euclidean distance # cut tree to get 8 clusters memb - cutree(hc, k = 8) table(observedLabels, memb) ## memb ## observedLabels 1 2 3 4 5 6 7 8 ## 1 10 0 0 0 0 0 0 0 ## 2 0 3 1 1 3 2 0 0 ## 3 0 0 0 0 0 0 10 0 ## 4 0 0 0 0 0 0 0 10 ## 5 0 0 0 0 0 0 10 0 ## 6 0 0 0 0 0 0 0 10 24 / 39
  • 32. Hierarchical Clustering with DTW Distance myDist - dist(sample2, method = DTW) hc - hclust(myDist, method = average) plot(hc, labels = observedLabels, main = ) # cut tree to get 8 clusters memb - cutree(hc, k = 8) table(observedLabels, memb) ## memb ## observedLabels 1 2 3 4 5 6 7 8 ## 1 10 0 0 0 0 0 0 0 ## 2 0 4 3 2 1 0 0 0 ## 3 0 0 0 0 0 6 4 0 ## 4 0 0 0 0 0 0 0 10 ## 5 0 0 0 0 0 0 10 0 ## 6 0 0 0 0 0 0 0 10 25 / 39
  • 33. Hierarchical Clustering with DTW Distance 0 200 400 600 800 1000 3 33 3 33 55 33 33 5 55 5 55 55 66 6 46 4 44 44 44 44 66 66 66 11 111 11 111 22 22 22 22 22 hclust (*, average) myDist Height 26 / 39
  • 34. Outline Introduction Time Series Decomposition Time Series Forecasting Time Series Clustering Time Series Classi
  • 38. cation I To build a classi
  • 39. cation model based on labelled time series I and then use the model to predict the lable of unlabelled time series Feature Extraction I Singular Value Decomposition (SVD) I Discrete Fourier Transform (DFT) I Discrete Wavelet Transform (DWT) I Piecewise Aggregate Approximation (PAA) I Perpetually Important Points (PIP) I Piecewise Linear Representation I Symbolic Representation 28 / 39
  • 40. Decision Tree (ctree) ctree from package party classId - rep(as.character(1:6), each = 100) newSc - data.frame(cbind(classId, sc)) library(party) ct - ctree(classId ~ ., data = newSc, controls = ctree_control(minsplit = 20, minbucket = 5, maxdepth = 5)) 29 / 39
  • 41. Decision Tree pClassId - predict(ct) table(classId, pClassId) ## pClassId ## classId 1 2 3 4 5 6 ## 1 100 0 0 0 0 0 ## 2 1 97 2 0 0 0 ## 3 0 0 99 0 1 0 ## 4 0 0 0 100 0 0 ## 5 4 0 8 0 88 0 ## 6 0 3 0 90 0 7 # accuracy (sum(classId == pClassId))/nrow(sc) ## [1] 0.8183 30 / 39
  • 42. DWT (Discrete Wavelet Transform) I Wavelet transform provides a multi-resolution representation using wavelets. I Haar Wavelet Transform { the simplest DWT http://dmr.ath.cx/gfx/haar/ I DFT (Discrete Fourier Transform): another popular feature extraction technique 31 / 39
  • 43. DWT (Discrete Wavelet Transform) # extract DWT (with Haar filter) coefficients library(wavelets) wtData - NULL for (i in 1:nrow(sc)) f a - t(sc[i, ]) wt - dwt(a, filter = haar, boundary = periodic) wtData - rbind(wtData, unlist(c(wt@W, wt@V[[wt@level]]))) g wtData - as.data.frame(wtData) wtSc - data.frame(cbind(classId, wtData)) 32 / 39
  • 44. Decision Tree with DWT ct - ctree(classId ~ ., data = wtSc, controls = ctree_control(minsplit=20, minbucket=5, maxdepth=5)) pClassId - predict(ct) table(classId, pClassId) ## pClassId ## classId 1 2 3 4 5 6 ## 1 98 2 0 0 0 0 ## 2 1 99 0 0 0 0 ## 3 0 0 81 0 19 0 ## 4 0 0 0 74 0 26 ## 5 0 0 16 0 84 0 ## 6 0 0 0 3 0 97 (sum(classId==pClassId)) / nrow(wtSc) ## [1] 0.8883 33 / 39
  • 45. plot(ct, ip_args = list(pval = F), ep_args = list(digits = 0)) 1 V57 £ 117 117 2 W43 £ −4 −4 3 W5 £ −8 −8 Node 4 (n = 68) 123456 1 0.8 0.6 0.4 0.2 0 Node 5 (n = 6) 123456 1 0.8 0.6 0.4 0.2 0 6 W31 £ −6 −6 Node 7 (n = 9) 123456 1 0.8 0.6 0.4 0.2 0 Node 8 (n = 86) 123456 1 0.8 0.6 0.4 0.2 0 9 V57 £ 140 140 Node 10 (n = 31) 123456 1 0.8 0.6 0.4 0.2 0 11 V57 £ 178 178 12 W22 £ −6 −6 Node 13 (n = 80) 123456 1 0.8 0.6 0.4 0.2 0 14 W31 £ −13 −13 Node 15 (n = 9) 123456 1 0.8 0.6 0.4 0.2 0 Node 16 (n = 99) 123456 1 0.8 0.6 0.4 0.2 0 17 W31 £ −15 −15 Node 18 (n = 12) 123456 1 0.8 0.6 0.4 0.2 0 19 W43 £ 3 3 Node 20 (n = 103) 123456 1 0.8 0.6 0.4 0.2 0 Node 21 (n = 97) 123456 1 0.8 0.6 0.4 0.2 0 34 / 39
  • 48. nd the k nearest neighbours of a new instance I label it by majority voting I needs an ecient indexing structure for large datasets k - 20 newTS - sc[501, ] + runif(100) * 15 distances - dist(newTS, sc, method = DTW) s - sort(as.vector(distances), index.return = TRUE) # class IDs of k nearest neighbours table(classId[s$ix[1:k]]) ## ## 4 6 ## 3 17 35 / 39
  • 51. nd the k nearest neighbours of a new instance I label it by majority voting I needs an ecient indexing structure for large datasets k - 20 newTS - sc[501, ] + runif(100) * 15 distances - dist(newTS, sc, method = DTW) s - sort(as.vector(distances), index.return = TRUE) # class IDs of k nearest neighbours table(classId[s$ix[1:k]]) ## ## 4 6 ## 3 17 Results of majority voting: class 6 35 / 39
  • 52. The TSclust Package I TSclust: a package for time seriesclustering 2 I measures of dissimilarity between time series to perform time series clustering. I metrics based on raw data, on generating models and on the forecast behavior I time series clustering algorithms and cluster evaluation metrics 2http://cran.r-project.org/web/packages/TSclust/ 36 / 39
  • 53. Outline Introduction Time Series Decomposition Time Series Forecasting Time Series Clustering Time Series Classi
  • 55. Online Resources I Chapter 8: Time Series Analysis and Mining, in book R and Data Mining: Examples and Case Studies http://www.rdatamining.com/docs/RDataMining.pdf I R Reference Card for Data Mining http://www.rdatamining.com/docs/R-refcard-data-mining.pdf I Free online courses and documents http://www.rdatamining.com/resources/ I RDataMining Group on LinkedIn (7,000+ members) http://group.rdatamining.com I RDataMining on Twitter (1,700+ followers) @RDataMining 38 / 39
  • 56. The End Thanks! Email: yanchang(at)rdatamining.com 39 / 39