SlideShare une entreprise Scribd logo
1  sur  43
University of Naples “Parthenope”
Accelerating Dynamic Time Warping
Subsequence Search with GPU
Davide Nardone
0120/131
A.A 2015/16
Summary
1. Introduction to Time Series
2. Time Series contexts and tasks
3. Basic idea: Subsequence search
4. DTW: Definition and Background
5. Parallelizing DTW Subsequence search
6. Evaluation
7. Experimental Case Studies
8. Conclusions and Future Remarks
What are Time Series ?
 Time series are collection of observations made sequentially in time
 Each ti is a real number as shown below
0 20 40 60 80 100 120 140 160 180 200
4.5
4.6
4.7
4.8
4.9
5
5.1
5.2
5.3
5.4
5.5
T = t1,t2,t3,…,tn
Time
Value
Time Series Contexts
EPG, ECG signal
Stock Price
Image
Video
Data Mining and Machine Learning tasks
Classification
Control
Clustering
0 50 0 1000 150 0 2000 2500
0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140
A B C
A B C
Motif Discovery
Anomaly Detection
Basic idea: Subsequence search
 The smallest computed distance will result in the best occurrence of Q found in a
time series T.
Given a query series Q:
find the occurrences of Q in a time series T, which are more similar in term of
measure of distance.
 Distance smaller than a threshold
Time
Finding a Similarity Measure
 Euclidean Distance (ED) often produces pessimistic similarity measures
when it encounters distortion in the time axis. Instead, the signals must
be synchronized whether they represent the same pattern, but are in a
different relative phases.
 In this case, some sort of synchronization is needed and the way to do it
is to use Dynamic Time Warping (DTW).
Euclidean Distance Dynamic Time Warping
DTW: Definition and Background
D(C,Q) = DTW(n,m)
DTW(i, j) = di, j +min
DTW(i-1, j)
DTW(i, j -1)
DTW(i-1, j -1)
ì
í
ï
î
ï
+¥
DTW is a measure distance for comparing two time series, let’s say C e Q.
It’s defined as following:
where di,j is a a distance such as |ci-qj| or (ci-qj)2.
In addition, for initial condition we set DTW(1,1)=(c1-q1)2 and for undefined terms
we assume .
The DTW distance: an example
 Consider the following two time
series:
 On the left the d matrix of distances (ED) and on the right the DTW
matrix.
What DTW computes
 Considering the d distance matrix, in order to align two time series we must
find a warping path.
 Start from (1,1) and end to (n,m)
 Take one step at time
 At each step, move only by increasing i,j or both
 Sum all distances you’ve found in the “warping path”
• In this random example it’s 21
How to find the optimal warping path ?
What DTW computes (cont.)
 Each warping path is a way to “align” (match) two time series, such that all samples
are matched with at least one sample of the other series.
 The DTW distance is the (square root) cost of the optimal warping path (√15)
• The “Euclidean path” moves only along the main diagonal, and cost √29 in this example.
 The recursive definition allows DTW to be computed in O(n×m) time, even if the
number of warping path is exponential.
• This is a classic example of dynamic programming algorithm
 Once the DTW matrix has been filled, the optimal warping path can be recovered by
going back from DTWn,m.
DTW visualizing process
Definition of the Problem
 Given a time series T=t1,t2,…,tn and a query Q=q1,q2,…,qm
 Find the subsequence Cs,m of T (which is any contiguous set of m sample
starting at s (i.e, ts, ts+1,…, ts+m-1)), such that DTW(Cs,m,Q) is minimum.
 Computational Time O(n×m2)
 Computational Space O(n×m) reducible to O(2×m)
Parallelizable
Why normalize ?
-5
-3
-1
1
3
0 100-5
-3
-1
1
3
Query
Distance to the query
0 500 1000 1500
0
40
80
120
Threshold = 30
 Wandering baseline problem: Because of this problem due to the machinery
cables moving during the reading or to other problem associated with it,
some equal patterns may not be recognized as the same.
 It needs to normalize any subsequence Cs,m of T, as well as the Q series
before they can be compared.
 Z-normalization: (x-μ/σ).
• Shift and scale invariance
baseline
Parallelizing DTW Subsequence Search
Q
-8000
-7500
-7000
T
time
 Slide a window of fixed size
 Compute DTW measure between the query Q
and the Z-normalized subsequence Cs,m of the
time series T
 Update the minimum whether necessary
Cs,m
Assign each DTW computation to a single thread!
CUDA Thread Organization
 Because of the nature of the problem, the CUDA-threads are distributed
(for each block) only along the x-axis as well as the blocks are only
allocated along the x-axis of the Grid.
 The number of blocks to allocate depends on
the time series length of T and Q, in fact
Grid _size =
number _of _subsequences
block _size
These are key parameters which influence the computational time of the
GP-GPU DTW!
where the number_of_subsequences and the block_size are given by n-m+1 and the
number of thread-per-block respectively.
Main stages
2. The CPU call the GPU kernel
• Every kernel thread operates on a specific sliding window in two steps:
1. Accessing the sliding window to compute the mean and variance;
2. Computing the normalized DTW distance to the query.
3. The CPU copies the output from the GPU
• The algorithm computes the minimum distance to obtain the most
similar subsequence of T to the query Q.
1. The CPU copies the whole times series T to the global memory of the
GPU
• Since the query Q is fixed, we first copy it into the global memory,
then based on the DTW algorithm version we keep it there or store
into the shared memory.
DTW: Global Memory vs Shared Memory
 The only difference between the two kernel functions concern the way to
access and store the time series Q and the warping matrix.
float warping_mat[WS][2];
float *query;
Global Memory
__shared__ float warping_mat [WS][2];
extern __shared__ float query_sm[];
Shared Memory
Note: In both the versions the warping_mat is allocated statically (WS is the
window size) because, in doing so, the CUDA compiler is likely to store this
array in the register file which is much faster then allocate it dynamically (i.e.
cudaMalloc()).
 Because Q is a fixed time series and by the problem definition its size
doesn’t change during the execution (plus it’s much smaller than T), it can
fit in the shared memory of the GPU device in which the read/write
operations are 150x faster than the global memory.
Evaluation
 The processing units used for the purpose are:
1. CPU: Intel Core i7-860 CPU at 2.80GHz;
2. GPU: NVIDIA Quadro K5000 with 1,536 cores.
 In order to get significant result, the execution time analysis has been
performed by varying three critical parameters:
1. The T time series length (2,500,15,000,…,1,080,000);
2. The Q time series length (100,200,…,1,000);
3. The number of thread for each block (64,128,…,1024).
Since all the combinations of these parameters yield many graphs, here we
show only the ones that are more meaningful, and if necessary presenting some
graphs in which occur particular circumstances.
Consideration on the GPU execution time (1)
Remembering this equation
It’s possible to state that with the increasing of the number_of_subsequences
and the decreasing of the block_size, the Grid_size increase.
Grid _size =
number_of _subsequences
block _size
 How does it influence the execution time ?
• A growing number of blocks in the grid corresponds to an higher effort by the GPU
to access to all the shared data structures (DTW-shared version).
• Intuitively this result should show up more frequently as the block_size gets
smaller and the length of query Q gets larger!
Block_size=128Block_size=6
4
Block_size=3
2
 This phenomenon for the
DTW-shared algorithm
begins to appear with the
decrease of the block_size
and the increase of the Q
time series length.
 This problem is due to the
high overhead produced by
the CUDA kernel to access to
the shared data structures,
that therefore make the
DTW-global algorithm's
performances quite similar
to the DTW-shared version
or even better.
Consideration on the GPU execution time (2)
Problem cases
 Based upon the three parameters on which the algorithm’s performance depend
• T series length;
• Q series length;
• Block_Size.
we make some considerations about the Speed-up and the Parallel Scalability of the
DTW algorithm.
 Hereafter, for the next observations, we will only consider the DTW-shared
algorithm.
 In particular we will look at the problem in two different manners:
1. By fixing the the T Time Series length and varying the Q Time Series length;
2. By fixing the the Q Time Series length and varying the T Time Series length.
Case 1: Speed up
64 128 256 512 1024
2,500 67.170 67.012 66.674 65.774 58.747
15,000 128.593 187.926 230.083 244.225 272.303
90,000 138.777 227.739 294.817 319.392 314.872
540,000 142.175 236.758 308.555 334.440 327.428
1,080,000 142.336 237.772 309.799 333.535 328.111
T Series
length
# of threads
Speed up on average by varying the Q time series length
 In this first case, we look at the problem by fixing the T time series length
and varying the Q time series length.
 It’s possible to observe that along the columns, the Speed-up is almost always
growing, indicating a fair parallel scalability for the problem.
Case 1: Parallel scalability
 This measurement indicates how efficient an application is when using an
increasing numbers of parallel processing elements (threads).
 The longer the length of T is, more constant is its curve of Speed-up (any
configuration)
Case 2: Speed up
 In this second case, we look at the problem by fixing the Q time series
length and varying the T time series length.
64 128 256 512 1024
100 208.57 271.85 268.98 265.88 266.41
200 208.05 269.29 266.38 262.89 264.2
300 163.31 247.27 261.89 258.74 259.77
400 147.33 232.49 264.5 261.05 262.07
500 119.85 206.31 262.28 257.45 258.43
600 94.33 162.41 238.87 256.11 257.61
700 80.931 143.85 223.26 258.08 257.4
800 80.818 144.5 224.4 259.84 259.14
900 67.828 118.7 205.25 260.03 259.59
1,000 67.079 117.75 204.04 258.67 258.3
VARIANCE 3035.4 3731.4 677.5 8.0547 8.9986
Q Series
length
# of threads
Speed up on average by varying the T time series length
Case 2: Parallel scalability
 This measurement indicates how efficient an application is when using an
increasing numbers of parallel processing elements (threads).
 CUDA-thread configuration on 512 and 1,024 threads seems to be constant,
showing up neither a benefit nor a loss of Speed-up.
Best CUDA-threads Configuration
 In order to asses the best configuration of thread-per-block (for any task of the
problem), we clustered and compared all the results by varying the Q time series
length.
Best CUDA-threads Configuration (cont.)
 By taking the minimum value for each cluster (over the 10 runs), it was
possible to visualize the best 3D trend in algorithm performance as well as
to better understand what CUDA-threads configuration best fit for a
particular problem.
Experimental Case Studies
 Here, have been considered three case studies in which some of the previous
tasks might be involved:
1. Case Study in Entomology (Subsequence search);
2. Case Study in Cardiology (Anomaly Detection);
3. Case Study in Astrology (Classification).
 DTW subsequence similarity search is a key problem in many higher level Data
Mining or Machine Learning tasks such as motif discovery, anomaly detection,
association discovery and classification.
 Many research projects use DTW subsequence similarity search as a subroutine,
and could greatly benefit from significantly improved performance.
Case Study in Entomology
 Many species of insect feed by inserting their stylet into a plant and sucking out
sap. While this behavior in itself is generally not harmful to the plants, if one
plant has a disease, the insect will transmit it from plant to plant.
 By a study conducted in [1], as
soon as the insect’s stylet
penetrates the plant, an
Electrical Penetration Graph
(EPG) signal occurs and can then
be amplified an recorded.
 One critical task researcher is to
search for patterns in long traces
which present variability in time
of transitions, process for which
the DTW is well suited.
Case Study in Entomology (cont.)
 The GPU solution just took ≈12.30 seconds while the CPU took ≈68 minutes,
which is too slow for a real application in the entomology field.
The GPU solution is resulted then ≈331x faster than the CPU one.
 In addition these time series are for a single query-by-content; if we needed to
perform 1,000 such searches (possibly different in size) the CPU version would
take approximately 34 days! [4]
 Since the time series are significant in length, to test the GPU scalability in this
domain, it’s been searched for the query shown below (500 in length) in a EPG
trace of length 1,499,000.
Case Study in Cardiology
 Congestive Heart failure is a complex clinical syndrome that occurs when the heart
is unable to pump sufficiently to maintain blood flow to meet the body’s need.
 By constantly monitoring the Electrocardiogram (ECG) of a patient, it’s possible to
recognize irregular pattern and prevent a possible/immediate heart failure.
 Such task in Data Mining is called Anomaly Detection or Outlier Detection which is
here illustrated by means of the DTW.
Case Study in Cardiology (cont.)
 To test the validity of the method, we have considered the sample and control of
the ECG signal of a patient affected by this pathology [2][3], and tested our GPU
and CPU algorithms.
ALGORITHM STEPS
1. Perform the
Subsequence similarity
search between the
Control signal and all
the subsequences of the
Sample signal (m-n+1).
2. Perform a thresholding
task so as to preserve all
DTW measures under a
certain value (∂=3).
3. Overlap on the Sample
signal only the curves
which belong to the
thresholded DTW value.
Control Sample
DTW graph Outlier
Case Study in Cardiology (cont.)
 Finally, to test the scalability on such long ECG sequences, it’s been used the GPU
version for both the DTW subsequence similarity search and the thresholding
algorithm.
 For this purpose, long sequences around 1,6 and 12 hours have been used.
CPU GPU
1 hr. 23 min. 4.32 sec.
6 hr. 2.3 hr 25 sec.
12 hr. 4.7hr 49.sec
Time
Length CPU GPU
1 hr. 0.588 ms 1.03e-2 ms
6 hr. 3.573 ms 4.83e-2 ms
12 hr. 7.149 ms 9.37e-2 ms
Time
Length
 These execution times are for a single query-by-content.
Subsequence similarity search Thresholding
Case Study in Astrology
 A star light curve is a graph which shows the brightness of a stellar object over a
period of time.
 The reason why the stars change their brightness include planetary transits,
cataclysmic or explosive events (nova or supernova).
 Since 1855 to nowadays many star light curves have been collected and an
obvious thing to do is to classify them.
 Astronomers have an algorithm called universal phasing to produce a canonical
alignment for the light curves, but it has some problem when applied to large
datasets, plus it doesn’t work good as they believe.
 Anyway, by using the idea of the Subsequence search similarity is possible to
solve an univariate and supervised problem of classification.
 While it’s possible to extract
a single light curve cycle,
there is no well-defined
starting point.
Case Study in Astrology (cont.)
 The basic idea is to compare each curve of the training set over any curve of
the testing set by using the DTW measure, and assigning to the latter the label
of the training curve whose DTW value calculated is minimum (more similar).
Case Study in Astrology (cont.)
 From the work considered at [4], it’s been used a three-class star light curve
dataset which had been universally phased at Time Series Center at Harvard
University.
 In order to compare the CPU and GPU performance it’s been created a testing
set with just 128 objects and a training set of 1,024 objects.
Case Study in Astrology (cont.)
 While it’s possible to extract a single light curve cycle, there is no well-defined
starting point, therefore here it has also tested the so called Universal Phasing
Assumption.
 Rotation Invariant DTW – O(n3)
• Try all possible rotations to find the minimum possible distance
• Compute the DTW between the Q series and each shift of the C time series
DTW distance 53.49
rDTW distance 0
0 20 40 60 80 100 120
1
5
9
C
0 20 40 60 80 100 120
1
5
9
Q
Case Study in Astrology (cont.)
 This problem has never tested before, presumably because the Rotation
Invariant version of the DTW (rDTW) is O(n3), which is quite untenable for a
normal CPU. Therefore, it was interesting to test such task on GPU.
 Also in this case it’s been created a testing set with just 128 objects and a
training set of 1,024 objects.
Case Study in Astrology (cont.)
 As shown below in the table the result are quite interesting
Accurancy Time GPU Time CPU
ED 80.47% <1 sec. 2.5 sec.
rED 81.25% 14.6 sec. 43.6 min.
DTW 88.28% 1.8 min. 35.4 min.
rDTW 91.4% 3.37 hours 42 days
 It’s important to point out that the norm-2 has been used to compute the
distance matrix d.
 By using different measures of distance, such as norm-1, we obtained different
results (i.e. DTW 86.7% and rDTW 84.4%).
Conclusions and Future Remarks
 The Subsequence similarity search is an important problem that has attracted a
lot of great interest.
 The CPU solutions cannot provide an adequate speed to handle these problems
while the GPU solutions has been revealed a good tool to handle all those
problems that before were computationally untenable.
 In addition it’s been shown with three different cases studies, as a GP-GPU DTW
version has led to very significant results, both in time complexity and accuracy.
 Future works include revisiting current algorithms that use DTW as a subroutine
and the implementation of a GP-GPUs version for a Multi-Dimensional Dynamic
Time Warping (MD-DTW).
References
[1] D. L. MC-Lean & M. G. Kinsey(1964). A Technique for Electronically Recording
Aphid Feeding and Salivation. Nature 202, 1358 - 1359 (27 June 1964).
[2] Goldberger, A. L., Amaral, L. A., Glass, L., Hausdor, J. M., Ivanov, P. C., Mark, R.
G., & Stanley, H. E. (2000). Physiobank, physiotoolkit, and physionet components of
a new research resource for complex physiologic signals. Circulation, 101(23), e215-
e220.
[3] Baim, D. S., Colucci, W. S., Monrad, E. S., Smith, H. S., Wright, R. F., Lanoue, A.,
& Braunwald, E. (1986). Survival of patients with severe congestive heart failure
treated with oral milrinone. Journal of the American College of Cardiology, 7(3), 661-
670.
[4] Sart, Doruk, et al. Accelerating dynamic time warping subsequence search with
GPUs and FPGAs. Data Mining (ICDM), 2010 IEEE 10th International Conference on.
IEEE, 2010.
Thank you for your
attention!
Any questions?

Contenu connexe

Tendances

Tendances (20)

Decision Transformer: Reinforcement Learning via Sequence Modeling
Decision Transformer: Reinforcement Learning via Sequence ModelingDecision Transformer: Reinforcement Learning via Sequence Modeling
Decision Transformer: Reinforcement Learning via Sequence Modeling
 
Sliced Wasserstein Distance for Learning Gaussian Mixture Models
Sliced Wasserstein Distance for Learning Gaussian Mixture ModelsSliced Wasserstein Distance for Learning Gaussian Mixture Models
Sliced Wasserstein Distance for Learning Gaussian Mixture Models
 
Optimizer入門&最新動向
Optimizer入門&最新動向Optimizer入門&最新動向
Optimizer入門&最新動向
 
Deformable Part Modelとその発展
Deformable Part Modelとその発展Deformable Part Modelとその発展
Deformable Part Modelとその発展
 
Depth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors IIDepth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors II
 
パンハウスゼミ 異常検知論文紹介 20191005
パンハウスゼミ 異常検知論文紹介  20191005パンハウスゼミ 異常検知論文紹介  20191005
パンハウスゼミ 異常検知論文紹介 20191005
 
Deep Counterfactual Regret Minimization
Deep Counterfactual Regret MinimizationDeep Counterfactual Regret Minimization
Deep Counterfactual Regret Minimization
 
Adversarial Examples 分野の動向 (敵対的サンプル発表資料)
Adversarial Examples 分野の動向(敵対的サンプル発表資料)Adversarial Examples 分野の動向(敵対的サンプル発表資料)
Adversarial Examples 分野の動向 (敵対的サンプル発表資料)
 
深層生成モデルと世界モデル, 深層生成モデルライブラリPixyzについて
深層生成モデルと世界モデル,深層生成モデルライブラリPixyzについて深層生成モデルと世界モデル,深層生成モデルライブラリPixyzについて
深層生成モデルと世界モデル, 深層生成モデルライブラリPixyzについて
 
データサイエンス概論第一=4-2 確率と確率分布
データサイエンス概論第一=4-2 確率と確率分布データサイエンス概論第一=4-2 確率と確率分布
データサイエンス概論第一=4-2 確率と確率分布
 
L0TV: a new method for image restoration in the presence of impulse noise
L0TV: a new method for image restoration in the presence of impulse noiseL0TV: a new method for image restoration in the presence of impulse noise
L0TV: a new method for image restoration in the presence of impulse noise
 
自己教師学習(Self-Supervised Learning)
自己教師学習(Self-Supervised Learning)自己教師学習(Self-Supervised Learning)
自己教師学習(Self-Supervised Learning)
 
画像認識のための深層学習
画像認識のための深層学習画像認識のための深層学習
画像認識のための深層学習
 
[DL輪読会]Dense Captioning分野のまとめ
[DL輪読会]Dense Captioning分野のまとめ[DL輪読会]Dense Captioning分野のまとめ
[DL輪読会]Dense Captioning分野のまとめ
 
NetVLAD: CNN architecture for weakly supervised place recognition
NetVLAD:  CNN architecture for weakly supervised place recognitionNetVLAD:  CNN architecture for weakly supervised place recognition
NetVLAD: CNN architecture for weakly supervised place recognition
 
[DL輪読会] マルチエージェント強化学習と心の理論
[DL輪読会] マルチエージェント強化学習と心の理論[DL輪読会] マルチエージェント強化学習と心の理論
[DL輪読会] マルチエージェント強化学習と心の理論
 
最近強化学習の良記事がたくさん出てきたので勉強しながらまとめた
最近強化学習の良記事がたくさん出てきたので勉強しながらまとめた最近強化学習の良記事がたくさん出てきたので勉強しながらまとめた
最近強化学習の良記事がたくさん出てきたので勉強しながらまとめた
 
TalkingData AdTracking Fraud Detection Challenge (1st place solution)
TalkingData AdTracking  Fraud Detection Challenge (1st place solution)TalkingData AdTracking  Fraud Detection Challenge (1st place solution)
TalkingData AdTracking Fraud Detection Challenge (1st place solution)
 
[DL輪読会]Active Domain Randomization
[DL輪読会]Active Domain Randomization[DL輪読会]Active Domain Randomization
[DL輪読会]Active Domain Randomization
 
最近のDQN
最近のDQN最近のDQN
最近のDQN
 

Similaire à Accelerating Dynamic Time Warping Subsequence Search with GPU

Efficient Implementation of Self-Organizing Map for Sparse Input Data
Efficient Implementation of Self-Organizing Map for Sparse Input DataEfficient Implementation of Self-Organizing Map for Sparse Input Data
Efficient Implementation of Self-Organizing Map for Sparse Input Data
ymelka
 
A Review: Compensation of Mismatches in Time Interleaved Analog to Digital Co...
A Review: Compensation of Mismatches in Time Interleaved Analog to Digital Co...A Review: Compensation of Mismatches in Time Interleaved Analog to Digital Co...
A Review: Compensation of Mismatches in Time Interleaved Analog to Digital Co...
IJERA Editor
 
Compressed learning for time series classification
Compressed learning for time series classificationCompressed learning for time series classification
Compressed learning for time series classification
學翰 施
 
Discretizing of linear systems with time-delay Using method of Euler’s and Tu...
Discretizing of linear systems with time-delay Using method of Euler’s and Tu...Discretizing of linear systems with time-delay Using method of Euler’s and Tu...
Discretizing of linear systems with time-delay Using method of Euler’s and Tu...
IJERA Editor
 

Similaire à Accelerating Dynamic Time Warping Subsequence Search with GPU (20)

Efficient Implementation of Self-Organizing Map for Sparse Input Data
Efficient Implementation of Self-Organizing Map for Sparse Input DataEfficient Implementation of Self-Organizing Map for Sparse Input Data
Efficient Implementation of Self-Organizing Map for Sparse Input Data
 
Final Project
Final ProjectFinal Project
Final Project
 
A Review: Compensation of Mismatches in Time Interleaved Analog to Digital Co...
A Review: Compensation of Mismatches in Time Interleaved Analog to Digital Co...A Review: Compensation of Mismatches in Time Interleaved Analog to Digital Co...
A Review: Compensation of Mismatches in Time Interleaved Analog to Digital Co...
 
Time series data mining techniques
Time series data mining techniquesTime series data mining techniques
Time series data mining techniques
 
Compressed learning for time series classification
Compressed learning for time series classificationCompressed learning for time series classification
Compressed learning for time series classification
 
Introduction to computing Processing and performance.pdf
Introduction to computing Processing and performance.pdfIntroduction to computing Processing and performance.pdf
Introduction to computing Processing and performance.pdf
 
Efficient Broadcast Authentication with Highest Life Span in Wireless Sensor ...
Efficient Broadcast Authentication with Highest Life Span in Wireless Sensor ...Efficient Broadcast Authentication with Highest Life Span in Wireless Sensor ...
Efficient Broadcast Authentication with Highest Life Span in Wireless Sensor ...
 
Efficient Realization of Parallel HEVC Intra Coding
Efficient Realization of Parallel HEVC Intra CodingEfficient Realization of Parallel HEVC Intra Coding
Efficient Realization of Parallel HEVC Intra Coding
 
Mining of time series data base using fuzzy neural information systems
Mining of time series data base using fuzzy neural information systemsMining of time series data base using fuzzy neural information systems
Mining of time series data base using fuzzy neural information systems
 
Model based similarity measure in time cloud
Model based similarity measure in time cloudModel based similarity measure in time cloud
Model based similarity measure in time cloud
 
Processing Reachability Queries with Realistic Constraints on Massive Network...
Processing Reachability Queries with Realistic Constraints on Massive Network...Processing Reachability Queries with Realistic Constraints on Massive Network...
Processing Reachability Queries with Realistic Constraints on Massive Network...
 
Final Project
Final ProjectFinal Project
Final Project
 
Transition-based Dependency Parsing with Selectional Branching
Transition-based Dependency Parsing with Selectional BranchingTransition-based Dependency Parsing with Selectional Branching
Transition-based Dependency Parsing with Selectional Branching
 
AINL 2016: Goncharov
AINL 2016: GoncharovAINL 2016: Goncharov
AINL 2016: Goncharov
 
Be34347356
Be34347356Be34347356
Be34347356
 
Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...
 
Discretizing of linear systems with time-delay Using method of Euler’s and Tu...
Discretizing of linear systems with time-delay Using method of Euler’s and Tu...Discretizing of linear systems with time-delay Using method of Euler’s and Tu...
Discretizing of linear systems with time-delay Using method of Euler’s and Tu...
 
Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...
 
A robust blind and secure watermarking scheme using positive semi definite ma...
A robust blind and secure watermarking scheme using positive semi definite ma...A robust blind and secure watermarking scheme using positive semi definite ma...
A robust blind and secure watermarking scheme using positive semi definite ma...
 
How to time logic
How to time logicHow to time logic
How to time logic
 

Plus de Davide Nardone

M.Sc thesis
M.Sc thesisM.Sc thesis
M.Sc thesis
Davide Nardone
 
A Sparse-Coding Based Approach for Class-Specific Feature Selection
A Sparse-Coding Based Approach for Class-Specific Feature SelectionA Sparse-Coding Based Approach for Class-Specific Feature Selection
A Sparse-Coding Based Approach for Class-Specific Feature Selection
Davide Nardone
 
Blind Source Separation using Dictionary Learning
Blind Source Separation using Dictionary LearningBlind Source Separation using Dictionary Learning
Blind Source Separation using Dictionary Learning
Davide Nardone
 

Plus de Davide Nardone (9)

M.Sc thesis
M.Sc thesisM.Sc thesis
M.Sc thesis
 
Quantum computing
Quantum computingQuantum computing
Quantum computing
 
A Sparse-Coding Based Approach for Class-Specific Feature Selection
A Sparse-Coding Based Approach for Class-Specific Feature SelectionA Sparse-Coding Based Approach for Class-Specific Feature Selection
A Sparse-Coding Based Approach for Class-Specific Feature Selection
 
A Biological Smart Platform for the Environmental Risk Assessment
A Biological Smart Platform for the Environmental Risk AssessmentA Biological Smart Platform for the Environmental Risk Assessment
A Biological Smart Platform for the Environmental Risk Assessment
 
Installing Apache tomcat with Netbeans
Installing Apache tomcat with NetbeansInstalling Apache tomcat with Netbeans
Installing Apache tomcat with Netbeans
 
Internet of Things: Research Directions
Internet of Things: Research DirectionsInternet of Things: Research Directions
Internet of Things: Research Directions
 
Online Tweet Sentiment Analysis with Apache Spark
Online Tweet Sentiment Analysis with Apache SparkOnline Tweet Sentiment Analysis with Apache Spark
Online Tweet Sentiment Analysis with Apache Spark
 
Blind Source Separation using Dictionary Learning
Blind Source Separation using Dictionary LearningBlind Source Separation using Dictionary Learning
Blind Source Separation using Dictionary Learning
 
LZ78
LZ78LZ78
LZ78
 

Dernier

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 

Dernier (20)

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 

Accelerating Dynamic Time Warping Subsequence Search with GPU

  • 1. University of Naples “Parthenope” Accelerating Dynamic Time Warping Subsequence Search with GPU Davide Nardone 0120/131 A.A 2015/16
  • 2. Summary 1. Introduction to Time Series 2. Time Series contexts and tasks 3. Basic idea: Subsequence search 4. DTW: Definition and Background 5. Parallelizing DTW Subsequence search 6. Evaluation 7. Experimental Case Studies 8. Conclusions and Future Remarks
  • 3. What are Time Series ?  Time series are collection of observations made sequentially in time  Each ti is a real number as shown below 0 20 40 60 80 100 120 140 160 180 200 4.5 4.6 4.7 4.8 4.9 5 5.1 5.2 5.3 5.4 5.5 T = t1,t2,t3,…,tn Time Value
  • 4. Time Series Contexts EPG, ECG signal Stock Price Image Video
  • 5. Data Mining and Machine Learning tasks Classification Control Clustering 0 50 0 1000 150 0 2000 2500 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 A B C A B C Motif Discovery Anomaly Detection
  • 6. Basic idea: Subsequence search  The smallest computed distance will result in the best occurrence of Q found in a time series T. Given a query series Q: find the occurrences of Q in a time series T, which are more similar in term of measure of distance.  Distance smaller than a threshold Time
  • 7. Finding a Similarity Measure  Euclidean Distance (ED) often produces pessimistic similarity measures when it encounters distortion in the time axis. Instead, the signals must be synchronized whether they represent the same pattern, but are in a different relative phases.  In this case, some sort of synchronization is needed and the way to do it is to use Dynamic Time Warping (DTW). Euclidean Distance Dynamic Time Warping
  • 8. DTW: Definition and Background D(C,Q) = DTW(n,m) DTW(i, j) = di, j +min DTW(i-1, j) DTW(i, j -1) DTW(i-1, j -1) ì í ï î ï +¥ DTW is a measure distance for comparing two time series, let’s say C e Q. It’s defined as following: where di,j is a a distance such as |ci-qj| or (ci-qj)2. In addition, for initial condition we set DTW(1,1)=(c1-q1)2 and for undefined terms we assume .
  • 9. The DTW distance: an example  Consider the following two time series:  On the left the d matrix of distances (ED) and on the right the DTW matrix.
  • 10. What DTW computes  Considering the d distance matrix, in order to align two time series we must find a warping path.  Start from (1,1) and end to (n,m)  Take one step at time  At each step, move only by increasing i,j or both  Sum all distances you’ve found in the “warping path” • In this random example it’s 21 How to find the optimal warping path ?
  • 11. What DTW computes (cont.)  Each warping path is a way to “align” (match) two time series, such that all samples are matched with at least one sample of the other series.  The DTW distance is the (square root) cost of the optimal warping path (√15) • The “Euclidean path” moves only along the main diagonal, and cost √29 in this example.  The recursive definition allows DTW to be computed in O(n×m) time, even if the number of warping path is exponential. • This is a classic example of dynamic programming algorithm  Once the DTW matrix has been filled, the optimal warping path can be recovered by going back from DTWn,m.
  • 13. Definition of the Problem  Given a time series T=t1,t2,…,tn and a query Q=q1,q2,…,qm  Find the subsequence Cs,m of T (which is any contiguous set of m sample starting at s (i.e, ts, ts+1,…, ts+m-1)), such that DTW(Cs,m,Q) is minimum.  Computational Time O(n×m2)  Computational Space O(n×m) reducible to O(2×m) Parallelizable
  • 14. Why normalize ? -5 -3 -1 1 3 0 100-5 -3 -1 1 3 Query Distance to the query 0 500 1000 1500 0 40 80 120 Threshold = 30  Wandering baseline problem: Because of this problem due to the machinery cables moving during the reading or to other problem associated with it, some equal patterns may not be recognized as the same.  It needs to normalize any subsequence Cs,m of T, as well as the Q series before they can be compared.  Z-normalization: (x-μ/σ). • Shift and scale invariance baseline
  • 15. Parallelizing DTW Subsequence Search Q -8000 -7500 -7000 T time  Slide a window of fixed size  Compute DTW measure between the query Q and the Z-normalized subsequence Cs,m of the time series T  Update the minimum whether necessary Cs,m Assign each DTW computation to a single thread!
  • 16. CUDA Thread Organization  Because of the nature of the problem, the CUDA-threads are distributed (for each block) only along the x-axis as well as the blocks are only allocated along the x-axis of the Grid.  The number of blocks to allocate depends on the time series length of T and Q, in fact Grid _size = number _of _subsequences block _size These are key parameters which influence the computational time of the GP-GPU DTW! where the number_of_subsequences and the block_size are given by n-m+1 and the number of thread-per-block respectively.
  • 17. Main stages 2. The CPU call the GPU kernel • Every kernel thread operates on a specific sliding window in two steps: 1. Accessing the sliding window to compute the mean and variance; 2. Computing the normalized DTW distance to the query. 3. The CPU copies the output from the GPU • The algorithm computes the minimum distance to obtain the most similar subsequence of T to the query Q. 1. The CPU copies the whole times series T to the global memory of the GPU • Since the query Q is fixed, we first copy it into the global memory, then based on the DTW algorithm version we keep it there or store into the shared memory.
  • 18. DTW: Global Memory vs Shared Memory  The only difference between the two kernel functions concern the way to access and store the time series Q and the warping matrix. float warping_mat[WS][2]; float *query; Global Memory __shared__ float warping_mat [WS][2]; extern __shared__ float query_sm[]; Shared Memory Note: In both the versions the warping_mat is allocated statically (WS is the window size) because, in doing so, the CUDA compiler is likely to store this array in the register file which is much faster then allocate it dynamically (i.e. cudaMalloc()).  Because Q is a fixed time series and by the problem definition its size doesn’t change during the execution (plus it’s much smaller than T), it can fit in the shared memory of the GPU device in which the read/write operations are 150x faster than the global memory.
  • 19. Evaluation  The processing units used for the purpose are: 1. CPU: Intel Core i7-860 CPU at 2.80GHz; 2. GPU: NVIDIA Quadro K5000 with 1,536 cores.  In order to get significant result, the execution time analysis has been performed by varying three critical parameters: 1. The T time series length (2,500,15,000,…,1,080,000); 2. The Q time series length (100,200,…,1,000); 3. The number of thread for each block (64,128,…,1024). Since all the combinations of these parameters yield many graphs, here we show only the ones that are more meaningful, and if necessary presenting some graphs in which occur particular circumstances.
  • 20. Consideration on the GPU execution time (1) Remembering this equation It’s possible to state that with the increasing of the number_of_subsequences and the decreasing of the block_size, the Grid_size increase. Grid _size = number_of _subsequences block _size  How does it influence the execution time ? • A growing number of blocks in the grid corresponds to an higher effort by the GPU to access to all the shared data structures (DTW-shared version). • Intuitively this result should show up more frequently as the block_size gets smaller and the length of query Q gets larger!
  • 21. Block_size=128Block_size=6 4 Block_size=3 2  This phenomenon for the DTW-shared algorithm begins to appear with the decrease of the block_size and the increase of the Q time series length.  This problem is due to the high overhead produced by the CUDA kernel to access to the shared data structures, that therefore make the DTW-global algorithm's performances quite similar to the DTW-shared version or even better. Consideration on the GPU execution time (2)
  • 22. Problem cases  Based upon the three parameters on which the algorithm’s performance depend • T series length; • Q series length; • Block_Size. we make some considerations about the Speed-up and the Parallel Scalability of the DTW algorithm.  Hereafter, for the next observations, we will only consider the DTW-shared algorithm.  In particular we will look at the problem in two different manners: 1. By fixing the the T Time Series length and varying the Q Time Series length; 2. By fixing the the Q Time Series length and varying the T Time Series length.
  • 23. Case 1: Speed up 64 128 256 512 1024 2,500 67.170 67.012 66.674 65.774 58.747 15,000 128.593 187.926 230.083 244.225 272.303 90,000 138.777 227.739 294.817 319.392 314.872 540,000 142.175 236.758 308.555 334.440 327.428 1,080,000 142.336 237.772 309.799 333.535 328.111 T Series length # of threads Speed up on average by varying the Q time series length  In this first case, we look at the problem by fixing the T time series length and varying the Q time series length.  It’s possible to observe that along the columns, the Speed-up is almost always growing, indicating a fair parallel scalability for the problem.
  • 24. Case 1: Parallel scalability  This measurement indicates how efficient an application is when using an increasing numbers of parallel processing elements (threads).  The longer the length of T is, more constant is its curve of Speed-up (any configuration)
  • 25. Case 2: Speed up  In this second case, we look at the problem by fixing the Q time series length and varying the T time series length. 64 128 256 512 1024 100 208.57 271.85 268.98 265.88 266.41 200 208.05 269.29 266.38 262.89 264.2 300 163.31 247.27 261.89 258.74 259.77 400 147.33 232.49 264.5 261.05 262.07 500 119.85 206.31 262.28 257.45 258.43 600 94.33 162.41 238.87 256.11 257.61 700 80.931 143.85 223.26 258.08 257.4 800 80.818 144.5 224.4 259.84 259.14 900 67.828 118.7 205.25 260.03 259.59 1,000 67.079 117.75 204.04 258.67 258.3 VARIANCE 3035.4 3731.4 677.5 8.0547 8.9986 Q Series length # of threads Speed up on average by varying the T time series length
  • 26. Case 2: Parallel scalability  This measurement indicates how efficient an application is when using an increasing numbers of parallel processing elements (threads).  CUDA-thread configuration on 512 and 1,024 threads seems to be constant, showing up neither a benefit nor a loss of Speed-up.
  • 27. Best CUDA-threads Configuration  In order to asses the best configuration of thread-per-block (for any task of the problem), we clustered and compared all the results by varying the Q time series length.
  • 28. Best CUDA-threads Configuration (cont.)  By taking the minimum value for each cluster (over the 10 runs), it was possible to visualize the best 3D trend in algorithm performance as well as to better understand what CUDA-threads configuration best fit for a particular problem.
  • 29. Experimental Case Studies  Here, have been considered three case studies in which some of the previous tasks might be involved: 1. Case Study in Entomology (Subsequence search); 2. Case Study in Cardiology (Anomaly Detection); 3. Case Study in Astrology (Classification).  DTW subsequence similarity search is a key problem in many higher level Data Mining or Machine Learning tasks such as motif discovery, anomaly detection, association discovery and classification.  Many research projects use DTW subsequence similarity search as a subroutine, and could greatly benefit from significantly improved performance.
  • 30. Case Study in Entomology  Many species of insect feed by inserting their stylet into a plant and sucking out sap. While this behavior in itself is generally not harmful to the plants, if one plant has a disease, the insect will transmit it from plant to plant.  By a study conducted in [1], as soon as the insect’s stylet penetrates the plant, an Electrical Penetration Graph (EPG) signal occurs and can then be amplified an recorded.  One critical task researcher is to search for patterns in long traces which present variability in time of transitions, process for which the DTW is well suited.
  • 31. Case Study in Entomology (cont.)  The GPU solution just took ≈12.30 seconds while the CPU took ≈68 minutes, which is too slow for a real application in the entomology field. The GPU solution is resulted then ≈331x faster than the CPU one.  In addition these time series are for a single query-by-content; if we needed to perform 1,000 such searches (possibly different in size) the CPU version would take approximately 34 days! [4]  Since the time series are significant in length, to test the GPU scalability in this domain, it’s been searched for the query shown below (500 in length) in a EPG trace of length 1,499,000.
  • 32. Case Study in Cardiology  Congestive Heart failure is a complex clinical syndrome that occurs when the heart is unable to pump sufficiently to maintain blood flow to meet the body’s need.  By constantly monitoring the Electrocardiogram (ECG) of a patient, it’s possible to recognize irregular pattern and prevent a possible/immediate heart failure.  Such task in Data Mining is called Anomaly Detection or Outlier Detection which is here illustrated by means of the DTW.
  • 33. Case Study in Cardiology (cont.)  To test the validity of the method, we have considered the sample and control of the ECG signal of a patient affected by this pathology [2][3], and tested our GPU and CPU algorithms. ALGORITHM STEPS 1. Perform the Subsequence similarity search between the Control signal and all the subsequences of the Sample signal (m-n+1). 2. Perform a thresholding task so as to preserve all DTW measures under a certain value (∂=3). 3. Overlap on the Sample signal only the curves which belong to the thresholded DTW value. Control Sample DTW graph Outlier
  • 34. Case Study in Cardiology (cont.)  Finally, to test the scalability on such long ECG sequences, it’s been used the GPU version for both the DTW subsequence similarity search and the thresholding algorithm.  For this purpose, long sequences around 1,6 and 12 hours have been used. CPU GPU 1 hr. 23 min. 4.32 sec. 6 hr. 2.3 hr 25 sec. 12 hr. 4.7hr 49.sec Time Length CPU GPU 1 hr. 0.588 ms 1.03e-2 ms 6 hr. 3.573 ms 4.83e-2 ms 12 hr. 7.149 ms 9.37e-2 ms Time Length  These execution times are for a single query-by-content. Subsequence similarity search Thresholding
  • 35. Case Study in Astrology  A star light curve is a graph which shows the brightness of a stellar object over a period of time.  The reason why the stars change their brightness include planetary transits, cataclysmic or explosive events (nova or supernova).  Since 1855 to nowadays many star light curves have been collected and an obvious thing to do is to classify them.  Astronomers have an algorithm called universal phasing to produce a canonical alignment for the light curves, but it has some problem when applied to large datasets, plus it doesn’t work good as they believe.  Anyway, by using the idea of the Subsequence search similarity is possible to solve an univariate and supervised problem of classification.  While it’s possible to extract a single light curve cycle, there is no well-defined starting point.
  • 36. Case Study in Astrology (cont.)  The basic idea is to compare each curve of the training set over any curve of the testing set by using the DTW measure, and assigning to the latter the label of the training curve whose DTW value calculated is minimum (more similar).
  • 37. Case Study in Astrology (cont.)  From the work considered at [4], it’s been used a three-class star light curve dataset which had been universally phased at Time Series Center at Harvard University.  In order to compare the CPU and GPU performance it’s been created a testing set with just 128 objects and a training set of 1,024 objects.
  • 38. Case Study in Astrology (cont.)  While it’s possible to extract a single light curve cycle, there is no well-defined starting point, therefore here it has also tested the so called Universal Phasing Assumption.  Rotation Invariant DTW – O(n3) • Try all possible rotations to find the minimum possible distance • Compute the DTW between the Q series and each shift of the C time series DTW distance 53.49 rDTW distance 0 0 20 40 60 80 100 120 1 5 9 C 0 20 40 60 80 100 120 1 5 9 Q
  • 39. Case Study in Astrology (cont.)  This problem has never tested before, presumably because the Rotation Invariant version of the DTW (rDTW) is O(n3), which is quite untenable for a normal CPU. Therefore, it was interesting to test such task on GPU.  Also in this case it’s been created a testing set with just 128 objects and a training set of 1,024 objects.
  • 40. Case Study in Astrology (cont.)  As shown below in the table the result are quite interesting Accurancy Time GPU Time CPU ED 80.47% <1 sec. 2.5 sec. rED 81.25% 14.6 sec. 43.6 min. DTW 88.28% 1.8 min. 35.4 min. rDTW 91.4% 3.37 hours 42 days  It’s important to point out that the norm-2 has been used to compute the distance matrix d.  By using different measures of distance, such as norm-1, we obtained different results (i.e. DTW 86.7% and rDTW 84.4%).
  • 41. Conclusions and Future Remarks  The Subsequence similarity search is an important problem that has attracted a lot of great interest.  The CPU solutions cannot provide an adequate speed to handle these problems while the GPU solutions has been revealed a good tool to handle all those problems that before were computationally untenable.  In addition it’s been shown with three different cases studies, as a GP-GPU DTW version has led to very significant results, both in time complexity and accuracy.  Future works include revisiting current algorithms that use DTW as a subroutine and the implementation of a GP-GPUs version for a Multi-Dimensional Dynamic Time Warping (MD-DTW).
  • 42. References [1] D. L. MC-Lean & M. G. Kinsey(1964). A Technique for Electronically Recording Aphid Feeding and Salivation. Nature 202, 1358 - 1359 (27 June 1964). [2] Goldberger, A. L., Amaral, L. A., Glass, L., Hausdor, J. M., Ivanov, P. C., Mark, R. G., & Stanley, H. E. (2000). Physiobank, physiotoolkit, and physionet components of a new research resource for complex physiologic signals. Circulation, 101(23), e215- e220. [3] Baim, D. S., Colucci, W. S., Monrad, E. S., Smith, H. S., Wright, R. F., Lanoue, A., & Braunwald, E. (1986). Survival of patients with severe congestive heart failure treated with oral milrinone. Journal of the American College of Cardiology, 7(3), 661- 670. [4] Sart, Doruk, et al. Accelerating dynamic time warping subsequence search with GPUs and FPGAs. Data Mining (ICDM), 2010 IEEE 10th International Conference on. IEEE, 2010.
  • 43. Thank you for your attention! Any questions?