SlideShare a Scribd company logo
1 of 6
Download to read offline
Journal of Multidisciplinary Engineering Science and Technology (JMEST)
ISSN: 3159-0040
Vol. 2 Issue 11, November - 2015
www.jmest.org
JMESTN42351212 3285
Motifs In Time Series For Prediction
A naΓ―ve approach compared to ARIMA
Nertila Ismailaja
Programme Analyst
Armundia Factory
Tirana, Albania
n.ismailaja@armundiafactory.com
Abstractβ€”Time series is a subject that
includes two key factors - observations and time.
It is obvious that observations are time-
dependent. Another interest field is motif
discovery, composed solely by time series
subsequences. During this time, loads of
similarity measures have been presented. In their
article, Dhamo et al [5] concluded that the best
performance according to the quality in motif
discovery was achieved by Chouakria’s index with
CID (Chouakria’s index, proposed by Chouakria et
al [4] and CID is proposed by Batista et al [2]).
The following step is to use this distance and
other time series features to make predictions.
Various tests are made over time series with high
level of complexity. The results achieved by this
approach are compared to ARIMA models. All the
tests are made in R.
Keywordsβ€”motif discovery;ARIMA;time series;
forecasting; Chouakria with CID; R
I. INTRODUCTION
Motif discovery has been widely exploited in
Biological sciences, to detect anomalies in heart
beatings, pressure, etc. The effort was to deal with a
large amount of data (known as big data). The first
approach was made by reducing data dimensions’
[1][3]. Some methods were Discrete Fourier
Transformation[1], Discrete Wavelet Transformation[3],
Piecewise Linear Approximation [12], etc. The first
scientists to formally use motif discovery in time series,
were Agrawal et al [3], Lin et al [6], Mueen[7][8][9][10],
etc. Since then, several tactics have been proposed.
The studies have always followed the similarity search
throughout the time series, regarding to the problem.
The most recent method is Ξ΅-queries, proposed as a
probability approach to motif discovery. An expansion
of motif discovery is to compare time series measures
to each-other, based in four criterions - algorithm
complexity, number of discovered motifs, accuracy and
quality. In their article, Dhamo et al. [4] concluded that
Chouakria index with CID gave better results than CID
alone. Taking in consideration that Chouakria index
with CID is a well-performing similarity measure, we
move on in the following step- applying motif discovery
in forecasting. There are used several well-known time
series, for each of which is created a model, and then
compared to the model given by ARIMA (to construct
the model was used a predefined package of R,
”forecast”). According to the results, by using as norm
error in forecasting, we can deduce that our model
provides better performance.
II. TIME SERIES
A. Basic Concepts
A time series A time series may be defined as a
collection of data, where time is a relevant component.
Definition 1 A time series T is an ordered collection
of data, observed in n - regular intervals of time
[𝑇1, 𝑇2, … , 𝑇𝑛].
Definition 2 A motif M of length m in a time series T
of length n is a subsequence of 𝑇 which repeats itself
in T.
Definition 3 In Ξ΅-query search for similarity between
two time series’ subsequences 𝑃 = [𝑃1, 𝑃2, … , π‘ƒπ‘š] and
𝑄 = [𝑄1, 𝑄2, … , 𝑄 π‘š] of length m, in a time series T of
length n, P and Q are considered similar if the distance
between P and Q is laid in an interval with absolute
error equal to Ξ΅ .An illustration of a motif is given in Fig.
1., with dataset unemp.
Fig.1. Algorithm for motif discovery
B. Distances Used For Time Series
While comparing two notions, it is indispensable to
use a criterion for comparison, in order to claim
whether these notions mean or not the same. In time
series, this criterion is undoubtedly the similarity
measure, which is based in the concept of the
distance.
Definition 4 A distance is a function that complies
with the following properties:
ο‚· Identity: βˆ€π‘₯, 𝑦 ∈ 𝑅, 𝑑(π‘₯, 𝑦) = 0 ↔ π‘₯ = 𝑦
ο‚· Symmetry: βˆ€π‘₯, 𝑦 ∈ 𝑅, 𝑑(π‘₯, 𝑦) = 𝑑(𝑦, π‘₯)
ο‚· Non-negativity: βˆ€π‘₯, 𝑦 ∈ 𝑅, 𝑑(π‘₯, 𝑦) β‰₯ 0
ο‚· Transitive: βˆ€π‘₯, 𝑦 ∈ 𝑅, βˆƒπ‘§ ∈ 𝑅|𝑑(π‘₯, 𝑦) ≀
𝑑(π‘₯, 𝑧) + 𝑑(𝑧, 𝑦)
Definition 5 A similarity measure is a function that
disobeys at least one of the conditions to be distance.
0 100 200 300
200500800
CID
Index
Observedvalues
Time Series
Motif
Similar subseq
2 6 12
200500800
Similar subseq
Motif's length
Values
Journal of Multidisciplinary Engineering Science and Technology (JMEST)
ISSN: 3159-0040
Vol. 2 Issue 11, November - 2015
www.jmest.org
JMESTN42351212 3286
In other words, a similarity measure is able to
measure the similarity between two time series’
subsequences. Reported as a key factor in the quality
and quantity in motif discovery, there are several
similarity measures. In contrary to its’ wide usage,
Euclid distance provides dissatisfactory results if it is
used as a similarity measure.
Definition 6 Euclid distance between two
subsequences with length m, 𝑃 = [𝑃1, 𝑃2, … , π‘ƒπ‘š] and
𝑄 = [𝑄1, 𝑄2, … , 𝑄 π‘š], is the index, measured as below:
𝑑 𝐸𝑒𝑐(𝑃, 𝑄) = βˆ‘ (𝑃𝑖 βˆ’ 𝑄𝑖)2π‘š
𝑖=1 (1)
As we mentioned above, the most performing
similarity measure is Chouakria’s index with CID. In
this case, it is therefore necessary to explain what is
CID, and then to give the definition of Chouakria’s
index.
Definition 7 CID distance between two sub-
sequences with length m, 𝑃 = [𝑃1, 𝑃2, … , π‘ƒπ‘š] and
𝑄 = [𝑄1, 𝑄2, … , 𝑄 π‘š], is the index, measured as below:
𝑑 𝐢𝐼𝐷(𝑃, 𝑄) = 𝑑 𝐸𝑒𝑐𝑙(𝑃, 𝑄)
max{𝐢𝐸(𝑃),𝐢𝐸(𝑄)}
min{𝐢𝐸(𝑃),𝐢𝐸(𝑄)}
(2)
, where:
𝐢𝐸(𝑃) = βˆ‘ (𝑃𝑖 βˆ’ 𝑃𝑖+1)2π‘šβˆ’1
𝑖=1 (3)
This similarity measure is used for nonlinear time
series subsequences. One main feature of CID is that,
in case of two nonlinear subsequences P, Q of length
m:
𝑑 𝐢𝐼𝐷(𝑃, 𝑄) > 𝑑 𝐸𝑒𝑐𝑙(𝑃, 𝑄) (4)
Moreover, the closer to 1 is the fraction
max{𝐢𝐸(𝑃),𝐢𝐸(𝑄)}
min{𝐢𝐸(𝑃),𝐢𝐸(𝑄)}
, the more similar is the behavior of the
two subsequences.
Another similarity measures is Chouakria’s index.
This measure is composed by two factors, one of
which is responsible for behavior and the other for
proximity of values[]. Chouakria’s index is measured,
as below:
Definition 8 Chouakria’s index between two sub-
sequences with length m, 𝑃 = [𝑃1, 𝑃2, … , π‘ƒπ‘š] and
𝑄 = [𝑄1, 𝑄2, … , 𝑄 π‘š], is the index, measured as below:
𝑑 𝐢𝐼𝐷(𝑃, 𝑄) =
2
1+𝑒 π‘˜βˆ—π›Ώ(𝑃,𝑄) βˆ— 𝐢𝑂𝑅𝑑(𝑃, 𝑄) (5)
, where:
𝐢𝑂𝑅𝑑(𝑃, 𝑄) =
βˆ‘ (𝑃 π‘–βˆ’π‘ƒ 𝑖+1)(𝑄 π‘–βˆ’π‘„ 𝑖+1)π‘šβˆ’1
𝑖=1
βˆšβˆ‘ (𝑃 π‘–βˆ’π‘ƒ 𝑖+1)2π‘šβˆ’1
𝑖=1 βˆšβˆ‘ (𝑄 π‘–βˆ’π‘„ 𝑖+1)2π‘šβˆ’1
𝑖=1
(6)
and π‘˜ ∈ 𝑅+
; and 𝛿(𝑃, 𝑄) may be Euclid, etc. In their
article, Dhamo et al. [] proposed CID as distance
𝛿(𝑃, 𝑄) and π‘˜ = 2.
III. MODELLING TIME SERIES
Time series is a novel concept, which enrolls two
main concepts - determinism and randomness. A time
series is composed by many parts, such as trend,
seasonality, cycles, etc. After the detection of these
components in a time series, the remains are
considered a random part. In other words, a time
series T of length n, can be decomposed, as below:
𝑇𝑛 = 𝑑 𝑛 + 𝑐 𝑛 + 𝑠 𝑛 + πœ€ 𝑛 (7)
, where 𝑑 𝑛 is trend, 𝑐 𝑛 is cycle, 𝑠 𝑛 is seasonal
variation and πœ€ 𝑛 is noise. Usually a time series can be
decomposed according to these features. The only
characteristic to be modeled is the noise, which is a
random variable.
A. ARIMA Models
As cited above, a time series is a regular collection
of data. An ARIMA model is composed by
implementing other forecasting models, such as AR,
MA and the operator βˆ†. Let us first define the key
concepts of a stochastic process.
Definition 9 Let {πœ€π‘‘, 𝑑 ∈ 𝑇} be a stochastic process.
πœ€π‘‘~π‘Šπ‘(0, 𝜎2) is considered to be a white noise, if it
obeys the following rules:
{
𝐸(πœ€π‘‘) = 0
𝐸(πœ€π‘‘ πœ€π‘ ) = {
𝜎2
, 𝑠 = 𝑑
0, 𝑠 β‰  𝑑
(8)
Definition 10 Let {π‘Œπ‘‘, 𝑑 ∈ 𝑇} be a stochastic process
and πœ€π‘‘~π‘Šπ‘(0, 𝜎2). An AR(p) model is constructed, as
below:
πœ€π‘‘ = βˆ‘ πœ‘π‘–
𝑝
𝑖=0 π‘Œπ‘‘βˆ’π‘– (9)
, where πœ‘π‘–, 𝑖 = 1, 𝑝̅̅̅̅̅ are coefficients defined dynami-
cally.
Definition 10 Let {π‘Œπ‘‘, 𝑑 ∈ 𝑇} be a stochastic process.
Let πœ”π‘–~π‘Šπ‘(0, 𝜎2), 𝑖 = 0,1,2, … . A MA(q) model is
constructed, as below:
π‘Œπ‘‘ = βˆ‘ πœ“π‘–
π‘ž
𝑖=0 πœ”π‘‘βˆ’π‘– (10)
, where πœ“π‘–, 𝑖 = 1, 𝑝̅̅̅̅̅ are coefficients defined dynami-
cally.
Logically, a ARMA(p,q) model, is constructed, as
below:
Definition 12 A stochastic process {π‘Œπ‘‘, 𝑑 ∈ 𝑇} is said
to be ARMA(p,q) model (autoregressive moving
average of order (p,q)) if it can be expressed as:
π‘Œπ‘‘ = πœ‡ 𝑑 + βˆ‘ πœ‘π‘–
𝑝
𝑖=0 π‘Œπ‘‘βˆ’π‘– + βˆ‘ πœ“π‘–
π‘ž
𝑖=0 πœ”π‘‘βˆ’π‘– (11)
, where πœ‡ 𝑑, πœ‘π‘–, 𝑖 = 0, 𝑝̅̅̅̅̅ and πœ“π‘–, 𝑖 = 0, 𝑝̅̅̅̅̅ are constants
and πœ”π‘‘ is white noise.
An important parameter in ARIMA(p,d,q) is also the
difference operator (βˆ†), defined as below:
Definition 13 Let { π‘Œπ‘‘, 𝑑 ∈ 𝑇} be a stochastic process.
The βˆ† 𝑑
operator is though defined, as:
βˆ† 𝑑
= π‘Œπ‘‘ βˆ’ π‘Œπ‘‘βˆ’π‘‘ (12)
, where 𝑑 = 1,2, …
Having all the necessary concepts, we can
introduce ARIMA concept.
Journal of Multidisciplinary Engineering Science and Technology (JMEST)
ISSN: 3159-0040
Vol. 2 Issue 11, November - 2015
www.jmest.org
JMESTN42351212 3287
Definition 14 The stochastic process {π‘Œπ‘‘, 𝑑 ∈ 𝑇} is
considered to be an ARIMA(p,d,q) process only if the
process
𝑋𝑑 = βˆ† 𝑑
π‘Œπ‘‘ (13)
is an ARMA(p,q) process.
Of course that there are several ways to determine
the most efficient values of p,q and d. But this will be
discussed in the following sections.
B. The Proposed Approach
When trying to detect a motif in a time series, it is
agreed that there is a pattern that repeats itself in time
series, independently from observation. An example is
given in Fig. 2.
Fig.2. The relation between time series’ sub-
sequences (time series co2)
It is obvious the relationship that exists between
the subsequences of length 12. The relation is made
visible by the normalization of these subsequences,
where is impossible to distinguish subsequences from
each-other.
Definition 15 Normalization of the time series 𝑇 with
length 𝑛, is the time series 𝑇’, defined as:
𝑇′ =
π‘‡βˆ’π‘šπ‘’π‘Žπ‘›(𝑇)
𝑠𝑑(𝑇)
(14)
If there is a such strong relation between two
subsequences 𝑃 = [𝑇𝑖, 𝑇𝑖+1, … , 𝑇𝑖+π‘šβˆ’1] and 𝑄 =
[𝑇𝑗, 𝑇𝑗+1, … , 𝑇𝑗+π‘šβˆ’1] with length m, from the time series
T, it is presumable that the relation would be of the
same strength even if the length of the subsequence is
greater. The situation is shown in Fig. 3.
Fig. 3. Widening the motif’s length
It is clear from Fig. 3. that, even the length of the
subsequences has been enlarged, from 12 to 15, the
difference is still small.
C. Reasoning to Reach to Forecasting
When we say that there is a motif in a time series,
we assume that there is a large similarity between
different subsequences of the same time series.
Moreover, these subsequences are generally spread
throughout all the time series. Given a fixed length m
as motif’s length, we are able to find, according to
Brute-Force algorithm, all patterns in the time series.
In contrary to this algorithm, in order to predict, there
are some changes being made. Those changes are
reflected not only in code, but even in structure:
Suppose that we require to use a subsequence of
length m in our time series T of length n. Moreover, is
found out that the most approximate subsequence is
stated in position j. In this case, the dependency factor
would be calculated, as below:
π‘‘π‘’π‘π‘“π‘Žπ‘π‘‘ =
𝑇[𝑗+π‘š]
𝑇[𝑗+π‘šβˆ’1]
(15)
Our prediction is 𝑇[𝑛 + 1], which is calculated as:
𝑇[𝑛 + 1] = 𝑑𝑒𝑝 π‘“π‘Žπ‘π‘‘ βˆ— 𝑇[𝑛] (16)
We define π‘Ž = 𝑇[(n βˆ’ m + 1) : (n + 1)]. We can create a
95% confidence interval for our prediction, as below:
𝐢𝐼(𝑇[𝑛 + 1]) =]𝐸(π‘Ž) βˆ’ 1.96
𝑠𝑑(π‘Ž)
βˆšπ‘š+1
, 𝐸(π‘Ž) + 1.96
𝑠𝑑(π‘Ž)
βˆšπ‘š+1
(17)
An example of our algorithm in R, in time series
AirPassengers, is given in Fig. 4.
Fig.4. Snapshot of the proposed approach
CO2 time series
Time
Observation
0 100 200 300 400
320350
2
320350
Subsequences
Index
Observation
2
-1.50.01.5
Normalized
Index
Observation
CO2 time series
Time
Observation
0 100 200 300 400
320350
2
320350
Subsequences
Index
Observation
2
-2.0-0.51.0
Normalized
Index
Observation
Brute -force algorithm
Brute_force=function(T,m,epsilon)
1.#n=length(T)
2.threshold=min{d(Ai,Bj)}, Ai=T[i : (i+m-1)],i=1: (n-m+2) and
j=(i+1): (n-m+2)
3.motif_index=argmax{d(Ai,Bj)<threshold+epsilon}
4.motif=T[motif_index: (motif_index+m-1)]
5. similar_seq=T[where(d(motif,Bj)<threshold+epsilon)], Bj=T[j :
(j+m-1)] and j>motif_index
end function
Short-term algorithm
Short_term=function(T,m)
1.Keep the last subsequence of length m constant (also considered as
motif, 𝑇[(n βˆ’ m + 1) : n].).
2.Choose one suitable distance for time series (Chouakria’s index)
3. Normalize time series’ subsequences, by using (8).
4.Detect the strongest relationship between our motif and other
subsequences of our time series. Keep track of the initial index
where the strongest relationship is proven.
5.Create a dependency factor that is being used during the
prediction.
end function
Journal of Multidisciplinary Engineering Science and Technology (JMEST)
ISSN: 3159-0040
Vol. 2 Issue 11, November - 2015
www.jmest.org
JMESTN42351212 3288
D. Forecasting in a Long-term Period
In the previous paragraph, we constructed an
algorithm to predict the next observation, based on
previous ones. What is more, there is a possibility to
find a confidence interval for the provided prediction.
But, in practice, there is a long-term necessity for
prediction. In order to get to a long-term forecasting,
we construct the following algorithm, as below:
According to this algorithm, we can keep track of
the original time series and the forecasts provided. An
example is provided below, in Fig. 5.
Fig.5. Forecasting in long-term period
In Fig.5. is obviously seen the proximity in values
and even in shape. This means that the model that is
built is strong.
IV. ARIMA VS PROPOSED METHOD
In many forecasting models, such as AR, MA,
ARIMA, etc, the basis of fitting the best parameters is
low error in curve fitting. The lower the error it is, the
better are the parameters.
Definition 16 Given T a time series of length n, and
T’ the ARIMA fitted model. The error in curve fitting is
called the vector, as below:
πœ€ = 𝑇 βˆ’ 𝑇′ (18)
In practice, most commonly used is the Sum
Square Error (πœ€β€²πœ€). There are also other measures that
supply with information about the quality of the
constructed model, such as AIC (Akaike Information
Criterion) or BIC (Bayes Information Criterion).
Definition 11 Given T a time series of length n, and
T’ the ARIMA fitted model. AIC is measured, as below:
𝐴𝐼𝐢 = ln (
1
𝑛
βˆ‘ 𝑒𝑖
2𝑛
𝑖=1 ) +
2(π‘˜+1)
𝑛
(19)
, where 𝑒𝑖
2
= (𝑇𝑖 βˆ’ 𝑇𝑖′)2
and k is the degree of
freedom of T.
Definition 12 Given T a time series of length n, and
T’ the ARIMA fitted model. BIC is measured, as below:
𝐡𝐼𝐢 = ln (
1
𝑛
βˆ‘ 𝑒𝑖
2𝑛
𝑖=1 ) +
(π‘˜+1)βˆ—ln(𝑛)
𝑛
(20)
, where 𝑒𝑖
2
= (𝑇𝑖 βˆ’ 𝑇𝑖′)2
and k is the degree of
freedom of T.
Generally, between AIC and BIC criterions, the
most useful is BIC, which is considered to be more
accurate.
A. Detecting Error in Forecast in the Proposed
Model
In the proposed algorithm, we mentioned that the
method does not create a single model - it does not
create parameters. Our model is a dynamic model,
which, in a long-term period, changes constantly. This
means that AIC or BIC criterion is inapplicable.
In order to be able to proof which model is better,
and why, we use the concept of error in forecasting.
This means that we keep a certain percentage of
values unused (the last observations), in order to
detect which algorithm provides a smaller Sum Square
Error in Predictions. The number of predictions is kept
constant, 10, in any case that we studied. This is done
in order to prevent reducing the amount of information
in disposal.
In Table 1, there are some examples of errors
made during the forecast, for several time series.
TABLE I. COMPARISON OF TWO METHODS
Time
series
Error in forecasting
ARIMA
Prop.
approach
star 4184.963 4245.082
prodn 16.14206 18.56693
rosewine Inf 768.4422
unemp 15006.22 4095.185
penguin 596901.4 321.071
Al_births Inf 3489920
hsales 671.1774 274.1702
AirPassen
gers
281852.9 167063.2
ARIMA vs Proposed approach
In Table 1, is obvious the advantage of our
approach in relation to ARIMA. In 2 cases, ARIMA
could not give an appropriate model – AIC criterion
could not be approximated. In our trials, resulted that
in 66.7% of the cases, our approach provided better
performance than ARIMA’s model. In this percentage,
is also included the 12.5% when ARIMA failed to
provide a suitable model.
An example of how much differ the real results
from the ones provided by ARIMA, is given in Fig.6.
Long-Term Algorithm
Long_Term=function(T,m,k)
1.#n=length(T); k is the number of lags we want to forecast
2.for (i in seq(1,k)){
3.prediction=Short_Term(T,m)
4.T=c(T,prediction)
5. }
end function
0 10 20 30 40 50 60 70
100200300
Comparison real vs forecast
Time index
Observation
Time series
Prediction
Journal of Multidisciplinary Engineering Science and Technology (JMEST)
ISSN: 3159-0040
Vol. 2 Issue 11, November - 2015
www.jmest.org
JMESTN42351212 3289
The time series taken in consideration is penguin,
while the length of the motif is 7.
Fig.6. ARIMA, the prop. method and real values
It is clearly visible that the shape and the values
gained by the proposed method are more compatible
with the real ones. Whereas ARIMA’s curve of
prediction is far more unrealistic. In many cases,
ARIMA provided central values, very close to a linear
curve. Whereas the proposed method provides
different shapes, a nonlinear vector of forecasts.
B. Detecting factors that influence results
In order to define where this result comes from, so,
whether it is because of the complexity of the time
series, we use a complexity measure for time series’
complexity-fluctuations[9].
Definition 17 Given T a time series of length n,
fluctuations are measured, as below:
𝑓𝑙𝑒𝑐𝑑(𝑇) =
1
π‘›βˆ’1
βˆ‘ (𝑇𝑖 βˆ’ 𝑇𝑖+1)2π‘›βˆ’1
𝑖=1 (21)
In Fig.7. is shown the fluctuation of some time
series.
Fig.7. Fluctuation of some time series
According to the traditional definition to the
correlation between two factors, as below:
Definition 18 The correlation between two random
variables X and Y with length n is the index, measured
as below:
𝐢𝑂𝑅(𝑃, 𝑄) =
βˆ‘ (𝑃 π‘–βˆ’π‘ƒΜ…)βˆ—(𝑄 π‘–βˆ’π‘„Μ…)𝑛
𝑖=1
βˆšβˆ‘ (𝑃 π‘–βˆ’π‘ƒΜ…)2𝑛
𝑖=1 βˆšβˆ‘ (𝑄 π‘–βˆ’π‘„Μ…)2𝑛
𝑖=1
(22)
it results that, in both cases, the relation between
fluctuation and error in forecasting is strong (in each
case, greater than 99%). This means that there is a
strong dependency between these two factors.
Another important factor that influences the
forecast, might be the quantity of the data. It is naively
deduced that the longer the time series is, the smaller
will be the error. There have been made various tests,
by increasing the amount of data. It has been chosen a
high-complexity time series, such as hsales. In Fig.8.
we see what happens with the error in forecasting
when this time series is enlarged, with fixed length of
the time series m=12 and m=9.
Fig.8. Error in forecasting for different lengths
In 50% of the cases, ARIMA (the proposed
method with m=12) were better than the other. In a low
percentage, the proposed method with m=9 is more
performant than the others. What is more, we notice
that there is no defined trend of the error in
forecasting. This means that, if we increase the
amount of data, we cannot presume whether the error
will decrease or not.
An important parameter during our method would
be the length of the motif, m. The time series selected
for this evaluation is again hsales, due to its’
complexity. We keep a fixed length of the time series-
n=170. An example of this algorithm is shown in Fig.9.
Fig.9. a) hsales b) Detecting the best value of m
We can see that the lower error in prediction is
made for m=9, equal to the periodicity of the time
series. But this contradicts the upper results, where the
proposed approach with m=12 was better than the
proposed approach with m=9. This means that, in
order to get better results, the length of the motif
should be defined according to a careful study of
periodicity, variations, etc.
ACKNOWLEDGMENT
I would like to thank Ms. Ilda Drini, employee at
Department of Financial Stability and Statistics, for all
the help during the creation of this method.
REFERENCES
Comparison of fluctuations
Time series
Fluctuations
050000150000
star
prodn
rosewine
unemp
penguin
Al_births
hsales
Air_pass
60 80 100 120 140 160
040008000
Comparison
Length of t.s.
Errorinpred.
ARIMA
Pr. m. m=12
Pr. m. m=9
0 50 100 200
30507090
Time series hsales
Time
Observations
4 6 8 10 12
10003000
Detect best length of m
Length of motif
Errorinpred.
Journal of Multidisciplinary Engineering Science and Technology (JMEST)
ISSN: 3159-0040
Vol. 2 Issue 11, November - 2015
www.jmest.org
JMESTN42351212 3290
All time series that are used in this article, can be
found in the next websites:
http://new.censusatschool.org.nz/resource/time-series-
data-sets-2013/
www.instat.gov.al
https://datamarket.com/data/list/?q=provider%3Atsdl
http://www.cs.ucr.edu/~eamonn/discords/
http://vincentarelbundock.github.io/Rdatasets/datasets.ht
ml
http://cran.r-project.org/web/packages/astsa/index.htm
[1] Agrawal, R., Faloutsos, C., Swami, A., (1993):
β€œEfficient similarity search in sequence databases,” in:
Fourth International Conference on Foundations of
Data Organization, D. Lomet, Ed., Heidelberg:
SpringerVerlag, pp. 69–84
[2] Batista, G., Keogh, E. J. Tataw, O., de Souza,
V., β€œCID: an efficient complexity-invariant distance for
time series”
[3] Chan, K. & Fu, W. (1999). Efficient time series
matching by wavelets. Proceedings of the 15 th IEEE
International Conference on Data Engineering.
[4] Chouakria, A., D., Diallo A., Giroud F., (2007)
β€œAdaptive clustering of time series”. International
Association for Statistical Computing (IASC), Statistics
for Data Mining, Learning and Knowledge Extraction,
Aveiro, Portugal
[5] Dhamo, E., Ismailaja, N., Kalluçi, E., (2015):
β€œComparing the efficiency of CID distance and CORT
coefficient for finding similar subsequences in time
series”, Sixth International Conference ISTI, 5-6 June.
[6] Lin, J., Keogh, E. , Lonardi, S. and Patel, P.
(2002): β€œFinding Motifs in Time Series”, in Proc. of 2nd
Workshop on Temporal Data Mining
[7] Mueen A., Keogh, E., J., (2010A): β€œOnline
discovery and maintenance of time series motifs”.
KDD, pg. 1089-1098
[8] Mueen, A., Keogh, E., Zhu, Q., Cash, S.,
Westover, B., (2009A) β€œExact Discovery of Time
Series Motifs”, SDM, pg. 473-484
[9] Mueen A., Keogh, E., J., Bigdely- Shamlo N.,
(2009): β€œFinding Time Series Motifs in Disk-Resident
Data”. ICDM, pg 367-376
[10]Mueen A., (2013):” Enumeration of Time
Series Motifs of All Lengths”. ICDM,pg. 547-556
[11]Yan, Ch., Fang, J., Wu, L., Ma, Sh., (2013):
β€œAn Approach of Time Series Piecewise Linear
Representation Based on Local Maximum Minimum
and Extremum”, Journal of Information &
Computational Science 10:9 2747–2756
[12]Yi, B.-K., Faloutsos Ch., (2000): β€œFast time
sequence indexing for arbitrary Lp norms”. In The
VLDB Journal, pages 385–394.

More Related Content

What's hot

MMath Paper, Canlin Zhang
MMath Paper, Canlin ZhangMMath Paper, Canlin Zhang
MMath Paper, Canlin Zhangcanlin zhang
Β 
Advanced Support Vector Machine for classification in Neural Network
Advanced Support Vector Machine for classification  in Neural NetworkAdvanced Support Vector Machine for classification  in Neural Network
Advanced Support Vector Machine for classification in Neural NetworkAshwani Jha
Β 
Icitam2019 2020 book_chapter
Icitam2019 2020 book_chapterIcitam2019 2020 book_chapter
Icitam2019 2020 book_chapterBan Bang
Β 
Applied Mathematics and Sciences: An International Journal (MathSJ)
Applied Mathematics and Sciences: An International Journal (MathSJ)Applied Mathematics and Sciences: An International Journal (MathSJ)
Applied Mathematics and Sciences: An International Journal (MathSJ)mathsjournal
Β 
QTML2021 UAP Quantum Feature Map
QTML2021 UAP Quantum Feature MapQTML2021 UAP Quantum Feature Map
QTML2021 UAP Quantum Feature MapHa Phuong
Β 
directed-research-report
directed-research-reportdirected-research-report
directed-research-reportRyen Krusinga
Β 
Advanced cosine measures for collaborative filtering
Advanced cosine measures for collaborative filteringAdvanced cosine measures for collaborative filtering
Advanced cosine measures for collaborative filteringLoc Nguyen
Β 
AN ALPHA -CUT OPERATION IN A TRANSPORTATION PROBLEM USING SYMMETRIC HEXAGONAL...
AN ALPHA -CUT OPERATION IN A TRANSPORTATION PROBLEM USING SYMMETRIC HEXAGONAL...AN ALPHA -CUT OPERATION IN A TRANSPORTATION PROBLEM USING SYMMETRIC HEXAGONAL...
AN ALPHA -CUT OPERATION IN A TRANSPORTATION PROBLEM USING SYMMETRIC HEXAGONAL...ijfls
Β 
Shortest path (Dijkistra's Algorithm) & Spanning Tree (Prim's Algorithm)
Shortest path (Dijkistra's Algorithm) & Spanning Tree (Prim's Algorithm)Shortest path (Dijkistra's Algorithm) & Spanning Tree (Prim's Algorithm)
Shortest path (Dijkistra's Algorithm) & Spanning Tree (Prim's Algorithm)Mohanlal Sukhadia University (MLSU)
Β 
Free Ebooks Download
Free Ebooks Download Free Ebooks Download
Free Ebooks Download Edhole.com
Β 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
Β 
A technique to construct linear trend free fractional factorial design using...
A technique to construct  linear trend free fractional factorial design using...A technique to construct  linear trend free fractional factorial design using...
A technique to construct linear trend free fractional factorial design using...Premier Publishers
Β 
Goldberg etal 2006
Goldberg etal 2006Goldberg etal 2006
Goldberg etal 2006Adilson Torres
Β 
Contradictory of the Laplacian Smoothing Transform and Linear Discriminant An...
Contradictory of the Laplacian Smoothing Transform and Linear Discriminant An...Contradictory of the Laplacian Smoothing Transform and Linear Discriminant An...
Contradictory of the Laplacian Smoothing Transform and Linear Discriminant An...TELKOMNIKA JOURNAL
Β 
Varaiational formulation fem
Varaiational formulation fem Varaiational formulation fem
Varaiational formulation fem Tushar Aneyrao
Β 
Comparison between the genetic algorithms optimization and particle swarm opt...
Comparison between the genetic algorithms optimization and particle swarm opt...Comparison between the genetic algorithms optimization and particle swarm opt...
Comparison between the genetic algorithms optimization and particle swarm opt...IAEME Publication
Β 
K-Sort: A New Sorting Algorithm that Beats Heap Sort for n ο‚£ο€ 70 Lakhs!
K-Sort: A New Sorting Algorithm that Beats Heap Sort for n ο‚£ο€ 70 Lakhs!K-Sort: A New Sorting Algorithm that Beats Heap Sort for n ο‚£ο€ 70 Lakhs!
K-Sort: A New Sorting Algorithm that Beats Heap Sort for n ο‚£ο€ 70 Lakhs!idescitation
Β 

What's hot (17)

MMath Paper, Canlin Zhang
MMath Paper, Canlin ZhangMMath Paper, Canlin Zhang
MMath Paper, Canlin Zhang
Β 
Advanced Support Vector Machine for classification in Neural Network
Advanced Support Vector Machine for classification  in Neural NetworkAdvanced Support Vector Machine for classification  in Neural Network
Advanced Support Vector Machine for classification in Neural Network
Β 
Icitam2019 2020 book_chapter
Icitam2019 2020 book_chapterIcitam2019 2020 book_chapter
Icitam2019 2020 book_chapter
Β 
Applied Mathematics and Sciences: An International Journal (MathSJ)
Applied Mathematics and Sciences: An International Journal (MathSJ)Applied Mathematics and Sciences: An International Journal (MathSJ)
Applied Mathematics and Sciences: An International Journal (MathSJ)
Β 
QTML2021 UAP Quantum Feature Map
QTML2021 UAP Quantum Feature MapQTML2021 UAP Quantum Feature Map
QTML2021 UAP Quantum Feature Map
Β 
directed-research-report
directed-research-reportdirected-research-report
directed-research-report
Β 
Advanced cosine measures for collaborative filtering
Advanced cosine measures for collaborative filteringAdvanced cosine measures for collaborative filtering
Advanced cosine measures for collaborative filtering
Β 
AN ALPHA -CUT OPERATION IN A TRANSPORTATION PROBLEM USING SYMMETRIC HEXAGONAL...
AN ALPHA -CUT OPERATION IN A TRANSPORTATION PROBLEM USING SYMMETRIC HEXAGONAL...AN ALPHA -CUT OPERATION IN A TRANSPORTATION PROBLEM USING SYMMETRIC HEXAGONAL...
AN ALPHA -CUT OPERATION IN A TRANSPORTATION PROBLEM USING SYMMETRIC HEXAGONAL...
Β 
Shortest path (Dijkistra's Algorithm) & Spanning Tree (Prim's Algorithm)
Shortest path (Dijkistra's Algorithm) & Spanning Tree (Prim's Algorithm)Shortest path (Dijkistra's Algorithm) & Spanning Tree (Prim's Algorithm)
Shortest path (Dijkistra's Algorithm) & Spanning Tree (Prim's Algorithm)
Β 
Free Ebooks Download
Free Ebooks Download Free Ebooks Download
Free Ebooks Download
Β 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
Β 
A technique to construct linear trend free fractional factorial design using...
A technique to construct  linear trend free fractional factorial design using...A technique to construct  linear trend free fractional factorial design using...
A technique to construct linear trend free fractional factorial design using...
Β 
Goldberg etal 2006
Goldberg etal 2006Goldberg etal 2006
Goldberg etal 2006
Β 
Contradictory of the Laplacian Smoothing Transform and Linear Discriminant An...
Contradictory of the Laplacian Smoothing Transform and Linear Discriminant An...Contradictory of the Laplacian Smoothing Transform and Linear Discriminant An...
Contradictory of the Laplacian Smoothing Transform and Linear Discriminant An...
Β 
Varaiational formulation fem
Varaiational formulation fem Varaiational formulation fem
Varaiational formulation fem
Β 
Comparison between the genetic algorithms optimization and particle swarm opt...
Comparison between the genetic algorithms optimization and particle swarm opt...Comparison between the genetic algorithms optimization and particle swarm opt...
Comparison between the genetic algorithms optimization and particle swarm opt...
Β 
K-Sort: A New Sorting Algorithm that Beats Heap Sort for n ο‚£ο€ 70 Lakhs!
K-Sort: A New Sorting Algorithm that Beats Heap Sort for n ο‚£ο€ 70 Lakhs!K-Sort: A New Sorting Algorithm that Beats Heap Sort for n ο‚£ο€ 70 Lakhs!
K-Sort: A New Sorting Algorithm that Beats Heap Sort for n ο‚£ο€ 70 Lakhs!
Β 

Viewers also liked

Taufiq nur r manajemen pembiayaan lpi
Taufiq nur r manajemen pembiayaan lpiTaufiq nur r manajemen pembiayaan lpi
Taufiq nur r manajemen pembiayaan lpisiti nurjannah
Β 
Teoria de la diversiΓ³
Teoria de la diversiΓ³Teoria de la diversiΓ³
Teoria de la diversiΓ³RubenM2803
Β 
DanHickey_2014_Folio
DanHickey_2014_FolioDanHickey_2014_Folio
DanHickey_2014_FolioDaniel Hickey
Β 
Tot art Γ©s esport i tot esport Γ©s art
Tot art Γ©s esport i tot esport Γ©s artTot art Γ©s esport i tot esport Γ©s art
Tot art Γ©s esport i tot esport Γ©s artlaiatr3
Β 
ScholarshipPlanFall2016
ScholarshipPlanFall2016ScholarshipPlanFall2016
ScholarshipPlanFall2016Emma McAdams
Β 
El desarrollo humano
El desarrollo humanoEl desarrollo humano
El desarrollo humanonico ruiz
Β 
Power point resume
Power point resumePower point resume
Power point resumeKara Kressin
Β 
CCFsafari
CCFsafariCCFsafari
CCFsafariMike Dolan
Β 
Standar pelayanan
Standar pelayananStandar pelayanan
Standar pelayananAgus Dwiyanto
Β 
ΰΉΰΈšΰΉˆΰΈ‡ΰΈ«ΰΈ™ΰΉ‰ΰΈ²ΰΈ—ΰΈ΅ΰΉˆ Cut
ΰΉΰΈšΰΉˆΰΈ‡ΰΈ«ΰΈ™ΰΉ‰ΰΈ²ΰΈ—ΰΈ΅ΰΉˆ CutΰΉΰΈšΰΉˆΰΈ‡ΰΈ«ΰΈ™ΰΉ‰ΰΈ²ΰΈ—ΰΈ΅ΰΉˆ Cut
ΰΉΰΈšΰΉˆΰΈ‡ΰΈ«ΰΈ™ΰΉ‰ΰΈ²ΰΈ—ΰΈ΅ΰΉˆ CutWattana Chamchad
Β 
Disponibilidad del agua
Disponibilidad del aguaDisponibilidad del agua
Disponibilidad del aguaA. Laura Perez
Β 
A Recipe for Virality
A Recipe for ViralityA Recipe for Virality
A Recipe for ViralityCult LDN
Β 

Viewers also liked (13)

Taufiq nur r manajemen pembiayaan lpi
Taufiq nur r manajemen pembiayaan lpiTaufiq nur r manajemen pembiayaan lpi
Taufiq nur r manajemen pembiayaan lpi
Β 
Teoria de la diversiΓ³
Teoria de la diversiΓ³Teoria de la diversiΓ³
Teoria de la diversiΓ³
Β 
MisiΓ³n Vision
MisiΓ³n VisionMisiΓ³n Vision
MisiΓ³n Vision
Β 
DanHickey_2014_Folio
DanHickey_2014_FolioDanHickey_2014_Folio
DanHickey_2014_Folio
Β 
Tot art Γ©s esport i tot esport Γ©s art
Tot art Γ©s esport i tot esport Γ©s artTot art Γ©s esport i tot esport Γ©s art
Tot art Γ©s esport i tot esport Γ©s art
Β 
ScholarshipPlanFall2016
ScholarshipPlanFall2016ScholarshipPlanFall2016
ScholarshipPlanFall2016
Β 
El desarrollo humano
El desarrollo humanoEl desarrollo humano
El desarrollo humano
Β 
Power point resume
Power point resumePower point resume
Power point resume
Β 
CCFsafari
CCFsafariCCFsafari
CCFsafari
Β 
Standar pelayanan
Standar pelayananStandar pelayanan
Standar pelayanan
Β 
ΰΉΰΈšΰΉˆΰΈ‡ΰΈ«ΰΈ™ΰΉ‰ΰΈ²ΰΈ—ΰΈ΅ΰΉˆ Cut
ΰΉΰΈšΰΉˆΰΈ‡ΰΈ«ΰΈ™ΰΉ‰ΰΈ²ΰΈ—ΰΈ΅ΰΉˆ CutΰΉΰΈšΰΉˆΰΈ‡ΰΈ«ΰΈ™ΰΉ‰ΰΈ²ΰΈ—ΰΈ΅ΰΉˆ Cut
ΰΉΰΈšΰΉˆΰΈ‡ΰΈ«ΰΈ™ΰΉ‰ΰΈ²ΰΈ—ΰΈ΅ΰΉˆ Cut
Β 
Disponibilidad del agua
Disponibilidad del aguaDisponibilidad del agua
Disponibilidad del agua
Β 
A Recipe for Virality
A Recipe for ViralityA Recipe for Virality
A Recipe for Virality
Β 

Similar to Jmestn42351212

On Modeling Murder Crimes in Nigeria
On Modeling Murder Crimes in NigeriaOn Modeling Murder Crimes in Nigeria
On Modeling Murder Crimes in NigeriaScientific Review SR
Β 
On Selection of Periodic Kernels Parameters in Time Series Prediction
On Selection of Periodic Kernels Parameters in Time Series Prediction On Selection of Periodic Kernels Parameters in Time Series Prediction
On Selection of Periodic Kernels Parameters in Time Series Prediction cscpconf
Β 
ON SELECTION OF PERIODIC KERNELS PARAMETERS IN TIME SERIES PREDICTION
ON SELECTION OF PERIODIC KERNELS PARAMETERS IN TIME SERIES PREDICTIONON SELECTION OF PERIODIC KERNELS PARAMETERS IN TIME SERIES PREDICTION
ON SELECTION OF PERIODIC KERNELS PARAMETERS IN TIME SERIES PREDICTIONcscpconf
Β 
Enhance interval width of crime forecasting with ARIMA model-fuzzy alpha cut
Enhance interval width of crime forecasting with ARIMA model-fuzzy alpha cutEnhance interval width of crime forecasting with ARIMA model-fuzzy alpha cut
Enhance interval width of crime forecasting with ARIMA model-fuzzy alpha cutTELKOMNIKA JOURNAL
Β 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
Β 
esmaeili-2016-ijca-911399
esmaeili-2016-ijca-911399esmaeili-2016-ijca-911399
esmaeili-2016-ijca-911399Nafas Esmaeili
Β 
Arima model
Arima modelArima model
Arima modelJassika
Β 
arimamodel-170204090012.pdf
arimamodel-170204090012.pdfarimamodel-170204090012.pdf
arimamodel-170204090012.pdfssuserdca880
Β 
Data Driven Choice of Threshold in Cepstrum Based Spectrum Estimate
Data Driven Choice of Threshold in Cepstrum Based Spectrum EstimateData Driven Choice of Threshold in Cepstrum Based Spectrum Estimate
Data Driven Choice of Threshold in Cepstrum Based Spectrum Estimatesipij
Β 
Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...
Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...
Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...BRNSS Publication Hub
Β 
Different Models Used In Time Series - InsideAIML
Different Models Used In Time Series - InsideAIMLDifferent Models Used In Time Series - InsideAIML
Different Models Used In Time Series - InsideAIMLVijaySharma802
Β 
Seminar Report (Final)
Seminar Report (Final)Seminar Report (Final)
Seminar Report (Final)Aruneel Das
Β 
THE APPLICATION OF BAYES YING-YANG HARMONY BASED GMMS IN ON-LINE SIGNATURE VE...
THE APPLICATION OF BAYES YING-YANG HARMONY BASED GMMS IN ON-LINE SIGNATURE VE...THE APPLICATION OF BAYES YING-YANG HARMONY BASED GMMS IN ON-LINE SIGNATURE VE...
THE APPLICATION OF BAYES YING-YANG HARMONY BASED GMMS IN ON-LINE SIGNATURE VE...ijaia
Β 
Software of Time Series Forecasting based on Combinations of Fuzzy and Statis...
Software of Time Series Forecasting based on Combinations of Fuzzy and Statis...Software of Time Series Forecasting based on Combinations of Fuzzy and Statis...
Software of Time Series Forecasting based on Combinations of Fuzzy and Statis...ITIIIndustries
Β 
AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...
AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...
AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...ijdpsjournal
Β 
Tracking the tracker: Time Series Analysis in Python from First Principles
Tracking the tracker: Time Series Analysis in Python from First PrinciplesTracking the tracker: Time Series Analysis in Python from First Principles
Tracking the tracker: Time Series Analysis in Python from First Principleskenluck2001
Β 

Similar to Jmestn42351212 (20)

On Modeling Murder Crimes in Nigeria
On Modeling Murder Crimes in NigeriaOn Modeling Murder Crimes in Nigeria
On Modeling Murder Crimes in Nigeria
Β 
On Selection of Periodic Kernels Parameters in Time Series Prediction
On Selection of Periodic Kernels Parameters in Time Series Prediction On Selection of Periodic Kernels Parameters in Time Series Prediction
On Selection of Periodic Kernels Parameters in Time Series Prediction
Β 
ON SELECTION OF PERIODIC KERNELS PARAMETERS IN TIME SERIES PREDICTION
ON SELECTION OF PERIODIC KERNELS PARAMETERS IN TIME SERIES PREDICTIONON SELECTION OF PERIODIC KERNELS PARAMETERS IN TIME SERIES PREDICTION
ON SELECTION OF PERIODIC KERNELS PARAMETERS IN TIME SERIES PREDICTION
Β 
Enhance interval width of crime forecasting with ARIMA model-fuzzy alpha cut
Enhance interval width of crime forecasting with ARIMA model-fuzzy alpha cutEnhance interval width of crime forecasting with ARIMA model-fuzzy alpha cut
Enhance interval width of crime forecasting with ARIMA model-fuzzy alpha cut
Β 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
Β 
esmaeili-2016-ijca-911399
esmaeili-2016-ijca-911399esmaeili-2016-ijca-911399
esmaeili-2016-ijca-911399
Β 
Arima model
Arima modelArima model
Arima model
Β 
arimamodel-170204090012.pdf
arimamodel-170204090012.pdfarimamodel-170204090012.pdf
arimamodel-170204090012.pdf
Β 
Social_Distancing_DIS_Time_Series
Social_Distancing_DIS_Time_SeriesSocial_Distancing_DIS_Time_Series
Social_Distancing_DIS_Time_Series
Β 
How to Decide the Best Fuzzy Model in ANFIS
How to Decide the Best Fuzzy Model in ANFIS How to Decide the Best Fuzzy Model in ANFIS
How to Decide the Best Fuzzy Model in ANFIS
Β 
Data Driven Choice of Threshold in Cepstrum Based Spectrum Estimate
Data Driven Choice of Threshold in Cepstrum Based Spectrum EstimateData Driven Choice of Threshold in Cepstrum Based Spectrum Estimate
Data Driven Choice of Threshold in Cepstrum Based Spectrum Estimate
Β 
04_AJMS_288_20.pdf
04_AJMS_288_20.pdf04_AJMS_288_20.pdf
04_AJMS_288_20.pdf
Β 
Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...
Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...
Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...
Β 
Different Models Used In Time Series - InsideAIML
Different Models Used In Time Series - InsideAIMLDifferent Models Used In Time Series - InsideAIML
Different Models Used In Time Series - InsideAIML
Β 
Seminar Report (Final)
Seminar Report (Final)Seminar Report (Final)
Seminar Report (Final)
Β 
THE APPLICATION OF BAYES YING-YANG HARMONY BASED GMMS IN ON-LINE SIGNATURE VE...
THE APPLICATION OF BAYES YING-YANG HARMONY BASED GMMS IN ON-LINE SIGNATURE VE...THE APPLICATION OF BAYES YING-YANG HARMONY BASED GMMS IN ON-LINE SIGNATURE VE...
THE APPLICATION OF BAYES YING-YANG HARMONY BASED GMMS IN ON-LINE SIGNATURE VE...
Β 
Software of Time Series Forecasting based on Combinations of Fuzzy and Statis...
Software of Time Series Forecasting based on Combinations of Fuzzy and Statis...Software of Time Series Forecasting based on Combinations of Fuzzy and Statis...
Software of Time Series Forecasting based on Combinations of Fuzzy and Statis...
Β 
ARIMA Models - [Lab 3]
ARIMA Models - [Lab 3]ARIMA Models - [Lab 3]
ARIMA Models - [Lab 3]
Β 
AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...
AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...
AN EFFICIENT PARALLEL ALGORITHM FOR COMPUTING DETERMINANT OF NON-SQUARE MATRI...
Β 
Tracking the tracker: Time Series Analysis in Python from First Principles
Tracking the tracker: Time Series Analysis in Python from First PrinciplesTracking the tracker: Time Series Analysis in Python from First Principles
Tracking the tracker: Time Series Analysis in Python from First Principles
Β 

Recently uploaded

Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
Β 
Call Girls Indiranagar Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Service B...
Call Girls Indiranagar Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Service B...Call Girls Indiranagar Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Service B...
Call Girls Indiranagar Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Service B...amitlee9823
Β 
Call Girls in Sarai Kale Khan Delhi πŸ’― Call Us πŸ”9205541914 πŸ”( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi πŸ’― Call Us πŸ”9205541914 πŸ”( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi πŸ’― Call Us πŸ”9205541914 πŸ”( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi πŸ’― Call Us πŸ”9205541914 πŸ”( Delhi) Escorts S...Delhi Call girls
Β 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
Β 
Call Girls In Bellandur ☎ 7737669865 πŸ₯΅ Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 πŸ₯΅ Book Your One night StandCall Girls In Bellandur ☎ 7737669865 πŸ₯΅ Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 πŸ₯΅ Book Your One night Standamitlee9823
Β 
Chintamani Call Girls: πŸ“ 7737669865 πŸ“ High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: πŸ“ 7737669865 πŸ“ High Profile Model Escorts | Bangalore ...Chintamani Call Girls: πŸ“ 7737669865 πŸ“ High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: πŸ“ 7737669865 πŸ“ High Profile Model Escorts | Bangalore ...amitlee9823
Β 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
Β 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectBoston Institute of Analytics
Β 
Call Girls Bannerghatta Road Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Ser...amitlee9823
Β 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
Β 
Call Girls In Attibele ☎ 7737669865 πŸ₯΅ Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 πŸ₯΅ Book Your One night StandCall Girls In Attibele ☎ 7737669865 πŸ₯΅ Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 πŸ₯΅ Book Your One night Standamitlee9823
Β 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
Β 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
Β 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
Β 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
Β 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
Β 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
Β 

Recently uploaded (20)

Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
Β 
Call Girls Indiranagar Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Service B...
Call Girls Indiranagar Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Service B...Call Girls Indiranagar Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Service B...
Call Girls Indiranagar Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Service B...
Β 
Call Girls in Sarai Kale Khan Delhi πŸ’― Call Us πŸ”9205541914 πŸ”( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi πŸ’― Call Us πŸ”9205541914 πŸ”( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi πŸ’― Call Us πŸ”9205541914 πŸ”( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi πŸ’― Call Us πŸ”9205541914 πŸ”( Delhi) Escorts S...
Β 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
Β 
Call Girls In Bellandur ☎ 7737669865 πŸ₯΅ Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 πŸ₯΅ Book Your One night StandCall Girls In Bellandur ☎ 7737669865 πŸ₯΅ Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 πŸ₯΅ Book Your One night Stand
Β 
Chintamani Call Girls: πŸ“ 7737669865 πŸ“ High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: πŸ“ 7737669865 πŸ“ High Profile Model Escorts | Bangalore ...Chintamani Call Girls: πŸ“ 7737669865 πŸ“ High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: πŸ“ 7737669865 πŸ“ High Profile Model Escorts | Bangalore ...
Β 
CHEAP Call Girls in Saket (-DELHI )πŸ” 9953056974πŸ”(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )πŸ” 9953056974πŸ”(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )πŸ” 9953056974πŸ”(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )πŸ” 9953056974πŸ”(=)/CALL GIRLS SERVICE
Β 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Β 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
Β 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
Β 
Call Girls Bannerghatta Road Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call πŸ‘— 7737669865 πŸ‘— Top Class Call Girl Ser...
Β 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
Β 
Call Girls In Attibele ☎ 7737669865 πŸ₯΅ Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 πŸ₯΅ Book Your One night StandCall Girls In Attibele ☎ 7737669865 πŸ₯΅ Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 πŸ₯΅ Book Your One night Stand
Β 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
Β 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
Β 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
Β 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Β 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
Β 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
Β 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
Β 

Jmestn42351212

  • 1. Journal of Multidisciplinary Engineering Science and Technology (JMEST) ISSN: 3159-0040 Vol. 2 Issue 11, November - 2015 www.jmest.org JMESTN42351212 3285 Motifs In Time Series For Prediction A naΓ―ve approach compared to ARIMA Nertila Ismailaja Programme Analyst Armundia Factory Tirana, Albania n.ismailaja@armundiafactory.com Abstractβ€”Time series is a subject that includes two key factors - observations and time. It is obvious that observations are time- dependent. Another interest field is motif discovery, composed solely by time series subsequences. During this time, loads of similarity measures have been presented. In their article, Dhamo et al [5] concluded that the best performance according to the quality in motif discovery was achieved by Chouakria’s index with CID (Chouakria’s index, proposed by Chouakria et al [4] and CID is proposed by Batista et al [2]). The following step is to use this distance and other time series features to make predictions. Various tests are made over time series with high level of complexity. The results achieved by this approach are compared to ARIMA models. All the tests are made in R. Keywordsβ€”motif discovery;ARIMA;time series; forecasting; Chouakria with CID; R I. INTRODUCTION Motif discovery has been widely exploited in Biological sciences, to detect anomalies in heart beatings, pressure, etc. The effort was to deal with a large amount of data (known as big data). The first approach was made by reducing data dimensions’ [1][3]. Some methods were Discrete Fourier Transformation[1], Discrete Wavelet Transformation[3], Piecewise Linear Approximation [12], etc. The first scientists to formally use motif discovery in time series, were Agrawal et al [3], Lin et al [6], Mueen[7][8][9][10], etc. Since then, several tactics have been proposed. The studies have always followed the similarity search throughout the time series, regarding to the problem. The most recent method is Ξ΅-queries, proposed as a probability approach to motif discovery. An expansion of motif discovery is to compare time series measures to each-other, based in four criterions - algorithm complexity, number of discovered motifs, accuracy and quality. In their article, Dhamo et al. [4] concluded that Chouakria index with CID gave better results than CID alone. Taking in consideration that Chouakria index with CID is a well-performing similarity measure, we move on in the following step- applying motif discovery in forecasting. There are used several well-known time series, for each of which is created a model, and then compared to the model given by ARIMA (to construct the model was used a predefined package of R, ”forecast”). According to the results, by using as norm error in forecasting, we can deduce that our model provides better performance. II. TIME SERIES A. Basic Concepts A time series A time series may be defined as a collection of data, where time is a relevant component. Definition 1 A time series T is an ordered collection of data, observed in n - regular intervals of time [𝑇1, 𝑇2, … , 𝑇𝑛]. Definition 2 A motif M of length m in a time series T of length n is a subsequence of 𝑇 which repeats itself in T. Definition 3 In Ξ΅-query search for similarity between two time series’ subsequences 𝑃 = [𝑃1, 𝑃2, … , π‘ƒπ‘š] and 𝑄 = [𝑄1, 𝑄2, … , 𝑄 π‘š] of length m, in a time series T of length n, P and Q are considered similar if the distance between P and Q is laid in an interval with absolute error equal to Ξ΅ .An illustration of a motif is given in Fig. 1., with dataset unemp. Fig.1. Algorithm for motif discovery B. Distances Used For Time Series While comparing two notions, it is indispensable to use a criterion for comparison, in order to claim whether these notions mean or not the same. In time series, this criterion is undoubtedly the similarity measure, which is based in the concept of the distance. Definition 4 A distance is a function that complies with the following properties: ο‚· Identity: βˆ€π‘₯, 𝑦 ∈ 𝑅, 𝑑(π‘₯, 𝑦) = 0 ↔ π‘₯ = 𝑦 ο‚· Symmetry: βˆ€π‘₯, 𝑦 ∈ 𝑅, 𝑑(π‘₯, 𝑦) = 𝑑(𝑦, π‘₯) ο‚· Non-negativity: βˆ€π‘₯, 𝑦 ∈ 𝑅, 𝑑(π‘₯, 𝑦) β‰₯ 0 ο‚· Transitive: βˆ€π‘₯, 𝑦 ∈ 𝑅, βˆƒπ‘§ ∈ 𝑅|𝑑(π‘₯, 𝑦) ≀ 𝑑(π‘₯, 𝑧) + 𝑑(𝑧, 𝑦) Definition 5 A similarity measure is a function that disobeys at least one of the conditions to be distance. 0 100 200 300 200500800 CID Index Observedvalues Time Series Motif Similar subseq 2 6 12 200500800 Similar subseq Motif's length Values
  • 2. Journal of Multidisciplinary Engineering Science and Technology (JMEST) ISSN: 3159-0040 Vol. 2 Issue 11, November - 2015 www.jmest.org JMESTN42351212 3286 In other words, a similarity measure is able to measure the similarity between two time series’ subsequences. Reported as a key factor in the quality and quantity in motif discovery, there are several similarity measures. In contrary to its’ wide usage, Euclid distance provides dissatisfactory results if it is used as a similarity measure. Definition 6 Euclid distance between two subsequences with length m, 𝑃 = [𝑃1, 𝑃2, … , π‘ƒπ‘š] and 𝑄 = [𝑄1, 𝑄2, … , 𝑄 π‘š], is the index, measured as below: 𝑑 𝐸𝑒𝑐(𝑃, 𝑄) = βˆ‘ (𝑃𝑖 βˆ’ 𝑄𝑖)2π‘š 𝑖=1 (1) As we mentioned above, the most performing similarity measure is Chouakria’s index with CID. In this case, it is therefore necessary to explain what is CID, and then to give the definition of Chouakria’s index. Definition 7 CID distance between two sub- sequences with length m, 𝑃 = [𝑃1, 𝑃2, … , π‘ƒπ‘š] and 𝑄 = [𝑄1, 𝑄2, … , 𝑄 π‘š], is the index, measured as below: 𝑑 𝐢𝐼𝐷(𝑃, 𝑄) = 𝑑 𝐸𝑒𝑐𝑙(𝑃, 𝑄) max{𝐢𝐸(𝑃),𝐢𝐸(𝑄)} min{𝐢𝐸(𝑃),𝐢𝐸(𝑄)} (2) , where: 𝐢𝐸(𝑃) = βˆ‘ (𝑃𝑖 βˆ’ 𝑃𝑖+1)2π‘šβˆ’1 𝑖=1 (3) This similarity measure is used for nonlinear time series subsequences. One main feature of CID is that, in case of two nonlinear subsequences P, Q of length m: 𝑑 𝐢𝐼𝐷(𝑃, 𝑄) > 𝑑 𝐸𝑒𝑐𝑙(𝑃, 𝑄) (4) Moreover, the closer to 1 is the fraction max{𝐢𝐸(𝑃),𝐢𝐸(𝑄)} min{𝐢𝐸(𝑃),𝐢𝐸(𝑄)} , the more similar is the behavior of the two subsequences. Another similarity measures is Chouakria’s index. This measure is composed by two factors, one of which is responsible for behavior and the other for proximity of values[]. Chouakria’s index is measured, as below: Definition 8 Chouakria’s index between two sub- sequences with length m, 𝑃 = [𝑃1, 𝑃2, … , π‘ƒπ‘š] and 𝑄 = [𝑄1, 𝑄2, … , 𝑄 π‘š], is the index, measured as below: 𝑑 𝐢𝐼𝐷(𝑃, 𝑄) = 2 1+𝑒 π‘˜βˆ—π›Ώ(𝑃,𝑄) βˆ— 𝐢𝑂𝑅𝑑(𝑃, 𝑄) (5) , where: 𝐢𝑂𝑅𝑑(𝑃, 𝑄) = βˆ‘ (𝑃 π‘–βˆ’π‘ƒ 𝑖+1)(𝑄 π‘–βˆ’π‘„ 𝑖+1)π‘šβˆ’1 𝑖=1 βˆšβˆ‘ (𝑃 π‘–βˆ’π‘ƒ 𝑖+1)2π‘šβˆ’1 𝑖=1 βˆšβˆ‘ (𝑄 π‘–βˆ’π‘„ 𝑖+1)2π‘šβˆ’1 𝑖=1 (6) and π‘˜ ∈ 𝑅+ ; and 𝛿(𝑃, 𝑄) may be Euclid, etc. In their article, Dhamo et al. [] proposed CID as distance 𝛿(𝑃, 𝑄) and π‘˜ = 2. III. MODELLING TIME SERIES Time series is a novel concept, which enrolls two main concepts - determinism and randomness. A time series is composed by many parts, such as trend, seasonality, cycles, etc. After the detection of these components in a time series, the remains are considered a random part. In other words, a time series T of length n, can be decomposed, as below: 𝑇𝑛 = 𝑑 𝑛 + 𝑐 𝑛 + 𝑠 𝑛 + πœ€ 𝑛 (7) , where 𝑑 𝑛 is trend, 𝑐 𝑛 is cycle, 𝑠 𝑛 is seasonal variation and πœ€ 𝑛 is noise. Usually a time series can be decomposed according to these features. The only characteristic to be modeled is the noise, which is a random variable. A. ARIMA Models As cited above, a time series is a regular collection of data. An ARIMA model is composed by implementing other forecasting models, such as AR, MA and the operator βˆ†. Let us first define the key concepts of a stochastic process. Definition 9 Let {πœ€π‘‘, 𝑑 ∈ 𝑇} be a stochastic process. πœ€π‘‘~π‘Šπ‘(0, 𝜎2) is considered to be a white noise, if it obeys the following rules: { 𝐸(πœ€π‘‘) = 0 𝐸(πœ€π‘‘ πœ€π‘ ) = { 𝜎2 , 𝑠 = 𝑑 0, 𝑠 β‰  𝑑 (8) Definition 10 Let {π‘Œπ‘‘, 𝑑 ∈ 𝑇} be a stochastic process and πœ€π‘‘~π‘Šπ‘(0, 𝜎2). An AR(p) model is constructed, as below: πœ€π‘‘ = βˆ‘ πœ‘π‘– 𝑝 𝑖=0 π‘Œπ‘‘βˆ’π‘– (9) , where πœ‘π‘–, 𝑖 = 1, 𝑝̅̅̅̅̅ are coefficients defined dynami- cally. Definition 10 Let {π‘Œπ‘‘, 𝑑 ∈ 𝑇} be a stochastic process. Let πœ”π‘–~π‘Šπ‘(0, 𝜎2), 𝑖 = 0,1,2, … . A MA(q) model is constructed, as below: π‘Œπ‘‘ = βˆ‘ πœ“π‘– π‘ž 𝑖=0 πœ”π‘‘βˆ’π‘– (10) , where πœ“π‘–, 𝑖 = 1, 𝑝̅̅̅̅̅ are coefficients defined dynami- cally. Logically, a ARMA(p,q) model, is constructed, as below: Definition 12 A stochastic process {π‘Œπ‘‘, 𝑑 ∈ 𝑇} is said to be ARMA(p,q) model (autoregressive moving average of order (p,q)) if it can be expressed as: π‘Œπ‘‘ = πœ‡ 𝑑 + βˆ‘ πœ‘π‘– 𝑝 𝑖=0 π‘Œπ‘‘βˆ’π‘– + βˆ‘ πœ“π‘– π‘ž 𝑖=0 πœ”π‘‘βˆ’π‘– (11) , where πœ‡ 𝑑, πœ‘π‘–, 𝑖 = 0, 𝑝̅̅̅̅̅ and πœ“π‘–, 𝑖 = 0, 𝑝̅̅̅̅̅ are constants and πœ”π‘‘ is white noise. An important parameter in ARIMA(p,d,q) is also the difference operator (βˆ†), defined as below: Definition 13 Let { π‘Œπ‘‘, 𝑑 ∈ 𝑇} be a stochastic process. The βˆ† 𝑑 operator is though defined, as: βˆ† 𝑑 = π‘Œπ‘‘ βˆ’ π‘Œπ‘‘βˆ’π‘‘ (12) , where 𝑑 = 1,2, … Having all the necessary concepts, we can introduce ARIMA concept.
  • 3. Journal of Multidisciplinary Engineering Science and Technology (JMEST) ISSN: 3159-0040 Vol. 2 Issue 11, November - 2015 www.jmest.org JMESTN42351212 3287 Definition 14 The stochastic process {π‘Œπ‘‘, 𝑑 ∈ 𝑇} is considered to be an ARIMA(p,d,q) process only if the process 𝑋𝑑 = βˆ† 𝑑 π‘Œπ‘‘ (13) is an ARMA(p,q) process. Of course that there are several ways to determine the most efficient values of p,q and d. But this will be discussed in the following sections. B. The Proposed Approach When trying to detect a motif in a time series, it is agreed that there is a pattern that repeats itself in time series, independently from observation. An example is given in Fig. 2. Fig.2. The relation between time series’ sub- sequences (time series co2) It is obvious the relationship that exists between the subsequences of length 12. The relation is made visible by the normalization of these subsequences, where is impossible to distinguish subsequences from each-other. Definition 15 Normalization of the time series 𝑇 with length 𝑛, is the time series 𝑇’, defined as: 𝑇′ = π‘‡βˆ’π‘šπ‘’π‘Žπ‘›(𝑇) 𝑠𝑑(𝑇) (14) If there is a such strong relation between two subsequences 𝑃 = [𝑇𝑖, 𝑇𝑖+1, … , 𝑇𝑖+π‘šβˆ’1] and 𝑄 = [𝑇𝑗, 𝑇𝑗+1, … , 𝑇𝑗+π‘šβˆ’1] with length m, from the time series T, it is presumable that the relation would be of the same strength even if the length of the subsequence is greater. The situation is shown in Fig. 3. Fig. 3. Widening the motif’s length It is clear from Fig. 3. that, even the length of the subsequences has been enlarged, from 12 to 15, the difference is still small. C. Reasoning to Reach to Forecasting When we say that there is a motif in a time series, we assume that there is a large similarity between different subsequences of the same time series. Moreover, these subsequences are generally spread throughout all the time series. Given a fixed length m as motif’s length, we are able to find, according to Brute-Force algorithm, all patterns in the time series. In contrary to this algorithm, in order to predict, there are some changes being made. Those changes are reflected not only in code, but even in structure: Suppose that we require to use a subsequence of length m in our time series T of length n. Moreover, is found out that the most approximate subsequence is stated in position j. In this case, the dependency factor would be calculated, as below: π‘‘π‘’π‘π‘“π‘Žπ‘π‘‘ = 𝑇[𝑗+π‘š] 𝑇[𝑗+π‘šβˆ’1] (15) Our prediction is 𝑇[𝑛 + 1], which is calculated as: 𝑇[𝑛 + 1] = 𝑑𝑒𝑝 π‘“π‘Žπ‘π‘‘ βˆ— 𝑇[𝑛] (16) We define π‘Ž = 𝑇[(n βˆ’ m + 1) : (n + 1)]. We can create a 95% confidence interval for our prediction, as below: 𝐢𝐼(𝑇[𝑛 + 1]) =]𝐸(π‘Ž) βˆ’ 1.96 𝑠𝑑(π‘Ž) βˆšπ‘š+1 , 𝐸(π‘Ž) + 1.96 𝑠𝑑(π‘Ž) βˆšπ‘š+1 (17) An example of our algorithm in R, in time series AirPassengers, is given in Fig. 4. Fig.4. Snapshot of the proposed approach CO2 time series Time Observation 0 100 200 300 400 320350 2 320350 Subsequences Index Observation 2 -1.50.01.5 Normalized Index Observation CO2 time series Time Observation 0 100 200 300 400 320350 2 320350 Subsequences Index Observation 2 -2.0-0.51.0 Normalized Index Observation Brute -force algorithm Brute_force=function(T,m,epsilon) 1.#n=length(T) 2.threshold=min{d(Ai,Bj)}, Ai=T[i : (i+m-1)],i=1: (n-m+2) and j=(i+1): (n-m+2) 3.motif_index=argmax{d(Ai,Bj)<threshold+epsilon} 4.motif=T[motif_index: (motif_index+m-1)] 5. similar_seq=T[where(d(motif,Bj)<threshold+epsilon)], Bj=T[j : (j+m-1)] and j>motif_index end function Short-term algorithm Short_term=function(T,m) 1.Keep the last subsequence of length m constant (also considered as motif, 𝑇[(n βˆ’ m + 1) : n].). 2.Choose one suitable distance for time series (Chouakria’s index) 3. Normalize time series’ subsequences, by using (8). 4.Detect the strongest relationship between our motif and other subsequences of our time series. Keep track of the initial index where the strongest relationship is proven. 5.Create a dependency factor that is being used during the prediction. end function
  • 4. Journal of Multidisciplinary Engineering Science and Technology (JMEST) ISSN: 3159-0040 Vol. 2 Issue 11, November - 2015 www.jmest.org JMESTN42351212 3288 D. Forecasting in a Long-term Period In the previous paragraph, we constructed an algorithm to predict the next observation, based on previous ones. What is more, there is a possibility to find a confidence interval for the provided prediction. But, in practice, there is a long-term necessity for prediction. In order to get to a long-term forecasting, we construct the following algorithm, as below: According to this algorithm, we can keep track of the original time series and the forecasts provided. An example is provided below, in Fig. 5. Fig.5. Forecasting in long-term period In Fig.5. is obviously seen the proximity in values and even in shape. This means that the model that is built is strong. IV. ARIMA VS PROPOSED METHOD In many forecasting models, such as AR, MA, ARIMA, etc, the basis of fitting the best parameters is low error in curve fitting. The lower the error it is, the better are the parameters. Definition 16 Given T a time series of length n, and T’ the ARIMA fitted model. The error in curve fitting is called the vector, as below: πœ€ = 𝑇 βˆ’ 𝑇′ (18) In practice, most commonly used is the Sum Square Error (πœ€β€²πœ€). There are also other measures that supply with information about the quality of the constructed model, such as AIC (Akaike Information Criterion) or BIC (Bayes Information Criterion). Definition 11 Given T a time series of length n, and T’ the ARIMA fitted model. AIC is measured, as below: 𝐴𝐼𝐢 = ln ( 1 𝑛 βˆ‘ 𝑒𝑖 2𝑛 𝑖=1 ) + 2(π‘˜+1) 𝑛 (19) , where 𝑒𝑖 2 = (𝑇𝑖 βˆ’ 𝑇𝑖′)2 and k is the degree of freedom of T. Definition 12 Given T a time series of length n, and T’ the ARIMA fitted model. BIC is measured, as below: 𝐡𝐼𝐢 = ln ( 1 𝑛 βˆ‘ 𝑒𝑖 2𝑛 𝑖=1 ) + (π‘˜+1)βˆ—ln(𝑛) 𝑛 (20) , where 𝑒𝑖 2 = (𝑇𝑖 βˆ’ 𝑇𝑖′)2 and k is the degree of freedom of T. Generally, between AIC and BIC criterions, the most useful is BIC, which is considered to be more accurate. A. Detecting Error in Forecast in the Proposed Model In the proposed algorithm, we mentioned that the method does not create a single model - it does not create parameters. Our model is a dynamic model, which, in a long-term period, changes constantly. This means that AIC or BIC criterion is inapplicable. In order to be able to proof which model is better, and why, we use the concept of error in forecasting. This means that we keep a certain percentage of values unused (the last observations), in order to detect which algorithm provides a smaller Sum Square Error in Predictions. The number of predictions is kept constant, 10, in any case that we studied. This is done in order to prevent reducing the amount of information in disposal. In Table 1, there are some examples of errors made during the forecast, for several time series. TABLE I. COMPARISON OF TWO METHODS Time series Error in forecasting ARIMA Prop. approach star 4184.963 4245.082 prodn 16.14206 18.56693 rosewine Inf 768.4422 unemp 15006.22 4095.185 penguin 596901.4 321.071 Al_births Inf 3489920 hsales 671.1774 274.1702 AirPassen gers 281852.9 167063.2 ARIMA vs Proposed approach In Table 1, is obvious the advantage of our approach in relation to ARIMA. In 2 cases, ARIMA could not give an appropriate model – AIC criterion could not be approximated. In our trials, resulted that in 66.7% of the cases, our approach provided better performance than ARIMA’s model. In this percentage, is also included the 12.5% when ARIMA failed to provide a suitable model. An example of how much differ the real results from the ones provided by ARIMA, is given in Fig.6. Long-Term Algorithm Long_Term=function(T,m,k) 1.#n=length(T); k is the number of lags we want to forecast 2.for (i in seq(1,k)){ 3.prediction=Short_Term(T,m) 4.T=c(T,prediction) 5. } end function 0 10 20 30 40 50 60 70 100200300 Comparison real vs forecast Time index Observation Time series Prediction
  • 5. Journal of Multidisciplinary Engineering Science and Technology (JMEST) ISSN: 3159-0040 Vol. 2 Issue 11, November - 2015 www.jmest.org JMESTN42351212 3289 The time series taken in consideration is penguin, while the length of the motif is 7. Fig.6. ARIMA, the prop. method and real values It is clearly visible that the shape and the values gained by the proposed method are more compatible with the real ones. Whereas ARIMA’s curve of prediction is far more unrealistic. In many cases, ARIMA provided central values, very close to a linear curve. Whereas the proposed method provides different shapes, a nonlinear vector of forecasts. B. Detecting factors that influence results In order to define where this result comes from, so, whether it is because of the complexity of the time series, we use a complexity measure for time series’ complexity-fluctuations[9]. Definition 17 Given T a time series of length n, fluctuations are measured, as below: 𝑓𝑙𝑒𝑐𝑑(𝑇) = 1 π‘›βˆ’1 βˆ‘ (𝑇𝑖 βˆ’ 𝑇𝑖+1)2π‘›βˆ’1 𝑖=1 (21) In Fig.7. is shown the fluctuation of some time series. Fig.7. Fluctuation of some time series According to the traditional definition to the correlation between two factors, as below: Definition 18 The correlation between two random variables X and Y with length n is the index, measured as below: 𝐢𝑂𝑅(𝑃, 𝑄) = βˆ‘ (𝑃 π‘–βˆ’π‘ƒΜ…)βˆ—(𝑄 π‘–βˆ’π‘„Μ…)𝑛 𝑖=1 βˆšβˆ‘ (𝑃 π‘–βˆ’π‘ƒΜ…)2𝑛 𝑖=1 βˆšβˆ‘ (𝑄 π‘–βˆ’π‘„Μ…)2𝑛 𝑖=1 (22) it results that, in both cases, the relation between fluctuation and error in forecasting is strong (in each case, greater than 99%). This means that there is a strong dependency between these two factors. Another important factor that influences the forecast, might be the quantity of the data. It is naively deduced that the longer the time series is, the smaller will be the error. There have been made various tests, by increasing the amount of data. It has been chosen a high-complexity time series, such as hsales. In Fig.8. we see what happens with the error in forecasting when this time series is enlarged, with fixed length of the time series m=12 and m=9. Fig.8. Error in forecasting for different lengths In 50% of the cases, ARIMA (the proposed method with m=12) were better than the other. In a low percentage, the proposed method with m=9 is more performant than the others. What is more, we notice that there is no defined trend of the error in forecasting. This means that, if we increase the amount of data, we cannot presume whether the error will decrease or not. An important parameter during our method would be the length of the motif, m. The time series selected for this evaluation is again hsales, due to its’ complexity. We keep a fixed length of the time series- n=170. An example of this algorithm is shown in Fig.9. Fig.9. a) hsales b) Detecting the best value of m We can see that the lower error in prediction is made for m=9, equal to the periodicity of the time series. But this contradicts the upper results, where the proposed approach with m=12 was better than the proposed approach with m=9. This means that, in order to get better results, the length of the motif should be defined according to a careful study of periodicity, variations, etc. ACKNOWLEDGMENT I would like to thank Ms. Ilda Drini, employee at Department of Financial Stability and Statistics, for all the help during the creation of this method. REFERENCES Comparison of fluctuations Time series Fluctuations 050000150000 star prodn rosewine unemp penguin Al_births hsales Air_pass 60 80 100 120 140 160 040008000 Comparison Length of t.s. Errorinpred. ARIMA Pr. m. m=12 Pr. m. m=9 0 50 100 200 30507090 Time series hsales Time Observations 4 6 8 10 12 10003000 Detect best length of m Length of motif Errorinpred.
  • 6. Journal of Multidisciplinary Engineering Science and Technology (JMEST) ISSN: 3159-0040 Vol. 2 Issue 11, November - 2015 www.jmest.org JMESTN42351212 3290 All time series that are used in this article, can be found in the next websites: http://new.censusatschool.org.nz/resource/time-series- data-sets-2013/ www.instat.gov.al https://datamarket.com/data/list/?q=provider%3Atsdl http://www.cs.ucr.edu/~eamonn/discords/ http://vincentarelbundock.github.io/Rdatasets/datasets.ht ml http://cran.r-project.org/web/packages/astsa/index.htm [1] Agrawal, R., Faloutsos, C., Swami, A., (1993): β€œEfficient similarity search in sequence databases,” in: Fourth International Conference on Foundations of Data Organization, D. Lomet, Ed., Heidelberg: SpringerVerlag, pp. 69–84 [2] Batista, G., Keogh, E. J. Tataw, O., de Souza, V., β€œCID: an efficient complexity-invariant distance for time series” [3] Chan, K. & Fu, W. (1999). Efficient time series matching by wavelets. Proceedings of the 15 th IEEE International Conference on Data Engineering. [4] Chouakria, A., D., Diallo A., Giroud F., (2007) β€œAdaptive clustering of time series”. International Association for Statistical Computing (IASC), Statistics for Data Mining, Learning and Knowledge Extraction, Aveiro, Portugal [5] Dhamo, E., Ismailaja, N., KalluΓ§i, E., (2015): β€œComparing the efficiency of CID distance and CORT coefficient for finding similar subsequences in time series”, Sixth International Conference ISTI, 5-6 June. [6] Lin, J., Keogh, E. , Lonardi, S. and Patel, P. (2002): β€œFinding Motifs in Time Series”, in Proc. of 2nd Workshop on Temporal Data Mining [7] Mueen A., Keogh, E., J., (2010A): β€œOnline discovery and maintenance of time series motifs”. KDD, pg. 1089-1098 [8] Mueen, A., Keogh, E., Zhu, Q., Cash, S., Westover, B., (2009A) β€œExact Discovery of Time Series Motifs”, SDM, pg. 473-484 [9] Mueen A., Keogh, E., J., Bigdely- Shamlo N., (2009): β€œFinding Time Series Motifs in Disk-Resident Data”. ICDM, pg 367-376 [10]Mueen A., (2013):” Enumeration of Time Series Motifs of All Lengths”. ICDM,pg. 547-556 [11]Yan, Ch., Fang, J., Wu, L., Ma, Sh., (2013): β€œAn Approach of Time Series Piecewise Linear Representation Based on Local Maximum Minimum and Extremum”, Journal of Information & Computational Science 10:9 2747–2756 [12]Yi, B.-K., Faloutsos Ch., (2000): β€œFast time sequence indexing for arbitrary Lp norms”. In The VLDB Journal, pages 385–394.