SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
Clustering Random Walk Time Series
GSI 2015 - Geometric Science of Information
Gautier Marti, Frank Nielsen, Philippe Very, Philippe Donnat
29 October 2015
Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
1 Introduction
2 Geometry of Random Walk Time Series
3 The Hierarchical Block Model
4 Conclusion
Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
Context (data from www.datagrapple.com)
Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
What is a clustering program?
Definition
Clustering is the task of grouping a set of objects in such a way
that objects in the same group (cluster) are more similar to each
other than those in different groups.
Example of a clustering program
We aim at finding k groups by positioning k group centers
{c1, . . . , ck} such that data points {x1, . . . , xn} minimize
minc1,...,ck
n
i=1 mink
j=1 d(xi , cj )2
But, what is the distance d between two random walk time series?
Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
What are clusters of Random Walk Time Series?
French banks and building materials
CDS over 2006-2015
Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
What are clusters of Random Walk Time Series?
French banks and building materials
CDS over 2006-2015
Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
1 Introduction
2 Geometry of Random Walk Time Series
3 The Hierarchical Block Model
4 Conclusion
Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
Geometry of RW TS ≡ Geometry of Random Variables
i.i.d. observations:
X1 : X1
1 , X2
1 , . . . , XT
1
X2 : X1
2 , X2
2 , . . . , XT
2
. . . , . . . , . . . , . . . , . . .
XN : X1
N, X2
N, . . . , XT
N
Which distances d(Xi , Xj ) between dependent random variables?
Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
Pitfalls of a basic distance
Let (X, Y ) be a bivariate Gaussian vector, with X ∼ N(µX , σ2
X ),
Y ∼ N(µY , σ2
Y ) and whose correlation is ρ(X, Y ) ∈ [−1, 1].
E[(X − Y )2
] = (µX − µY )2
+ (σX − σY )2
+ 2σX σY (1 − ρ(X, Y ))
Now, consider the following values for correlation:
ρ(X, Y ) = 0, so E[(X − Y )2] = (µX − µY )2 + σ2
X + σ2
Y .
Assume µX = µY and σX = σY . For σX = σY 1, we
obtain E[(X − Y )2] 1 instead of the distance 0, expected
from comparing two equal Gaussians.
ρ(X, Y ) = 1, so E[(X − Y )2] = (µX − µY )2 + (σX − σY )2.
Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
Pitfalls of a basic distance
Let (X, Y ) be a bivariate Gaussian vector, with X ∼ N (µX , σ2
X ), Y ∼ N (µY , σ2
Y ) and whose correlation is
ρ(X, Y ) ∈ [−1, 1].
E[(X − Y )
2
] = (µX − µY )
2
+ (σX − σY )
2
+ 2σX σY (1 − ρ(X, Y ))
Now, consider the following values for correlation:
ρ(X, Y ) = 0, so E[(X − Y )2
] = (µX − µY )2
+ σ2
X + σ2
Y . Assume µX = µY and σX = σY . For
σX = σY 1, we obtain E[(X − Y )2
] 1 instead of the distance 0, expected from comparing two
equal Gaussians.
ρ(X, Y ) = 1, so E[(X − Y )2
] = (µX − µY )2
+ (σX − σY )2
.
30 20 10 0 10 20 30
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40 Probability density functions of Gaus-
sians N(−5, 1) and N(5, 1), Gaus-
sians N(−5, 3) and N(5, 3), and
Gaussians N(−5, 10) and N(5, 10).
Green, red and blue Gaussians are
equidistant using L2 geometry on the
parameter space (µ, σ).
Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
Sklar’s Theorem
Theorem (Sklar’s Theorem (1959))
For any random vector X = (X1, . . . , XN) having continuous
marginal cdfs Pi , 1 ≤ i ≤ N, its joint cumulative distribution P is
uniquely expressed as
P(X1, . . . , XN) = C(P1(X1), . . . , PN(XN)),
where C, the multivariate distribution of uniform marginals, is
known as the copula of X.
Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
Sklar’s Theorem
Theorem (Sklar’s Theorem (1959))
For any random vector X = (X1, . . . , XN ) having continuous marginal cdfs Pi , 1 ≤ i ≤ N, its joint cumulative
distribution P is uniquely expressed as P(X1, . . . , XN ) = C(P1(X1), . . . , PN (XN )), where C, the multivariate
distribution of uniform marginals, is known as the copula of X.
Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
The Copula Transform
Definition (The Copula Transform)
Let X = (X1, . . . , XN) be a random vector with continuous
marginal cumulative distribution functions (cdfs) Pi , 1 ≤ i ≤ N.
The random vector
U = (U1, . . . , UN) := P(X) = (P1(X1), . . . , PN(XN))
is known as the copula transform.
Ui , 1 ≤ i ≤ N, are uniformly distributed on [0, 1] (the probability
integral transform): for Pi the cdf of Xi , we have
x = Pi (Pi
−1
(x)) = Pr(Xi ≤ Pi
−1
(x)) = Pr(Pi (Xi ) ≤ x), thus
Pi (Xi ) ∼ U[0, 1].
Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
The Copula Transform
Definition (The Copula Transform)
Let X = (X1, . . . , XN ) be a random vector with continuous marginal cumulative distribution functions (cdfs) Pi ,
1 ≤ i ≤ N. The random vector U = (U1, . . . , UN ) := P(X) = (P1(X1), . . . , PN (XN )) is known as the copula
transform.
0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2
X∼U[0,1]
10
8
6
4
2
0
2
Y∼ln(X)
ρ≈0.84
0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2
PX (X)
0.2
0.0
0.2
0.4
0.6
0.8
1.0
1.2
PY(Y)
ρ=1
The Copula Transform invariance to strictly increasing transformation
Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
Deheuvels’ Empirical Copula Transform
Let (Xt
1 , . . . , Xt
N ), 1 ≤ t ≤ T, be T observations from a random vector (X1, . . . , XN ) with continuous margins.
Since one cannot directly obtain the corresponding copula observations (Ut
1, . . . , Ut
N ) = (P1(Xt
1 ), . . . , PN (Xt
N )),
where t = 1, . . . , T, without knowing a priori (P1, . . . , PN ), one can instead
Definition (The Empirical Copula Transform)
estimate the N empirical margins PT
i (x) = 1
T
T
t=1 1(Xt
i ≤ x),
1 ≤ i ≤ N, to obtain the T empirical observations
( ˜Ut
1, . . . , ˜Ut
N ) = (PT
1 (Xt
1 ), . . . , PT
N (Xt
N )).
Equivalently, since ˜Ut
i = Rt
i /T, Rt
i being the rank of observation
Xt
i , the empirical copula transform can be considered as the
normalized rank transform.
In practice
x_transform = rankdata(x)/len(x)
Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
Generic Non-Parametric Distance
d2
θ (Xi , Xj ) = θ3E |Pi (Xi ) − Pj (Xj )|2
+ (1 − θ)
1
2 R
dPi
dλ
−
dPj
dλ
2
dλ
(i) 0 ≤ dθ ≤ 1, (ii) 0 < θ < 1, dθ metric,
(iii) dθ is invariant under diffeomorphism
Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
Generic Non-Parametric Distance
d2
0 : 1
2 R
dPi
dλ −
dPj
dλ
2
dλ = Hellinger2
d2
1 : 3E |Pi (Xi ) − Pj (Xj )|2
=
1 − ρS
2
= 2−6
1
0
1
0
C(u, v)dudv
Remark:
If f (x, θ) = cΦ(u1, . . . , uN; Σ) N
i=1 fi (xi ; νi ) then
ds2
= ds2
GaussCopula +
N
i=1
ds2
margins
Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
1 Introduction
2 Geometry of Random Walk Time Series
3 The Hierarchical Block Model
4 Conclusion
Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
The Hierarchical Block Model
A model of nested partitions
The nested partitions defined by the
model can be seen on the distance
matrix for a proper distance and the
right permutation of the data points
In practice, one observe and work
with the above distance matrix
which is identitical to the left one
up to a permutation of the data
Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
Results: Data from Hierarchical Block Model
Adjusted Rand Index
Algo. Distance Distrib Correl Correl+Distrib
HC-AL
(1 − ρ)/2 0.00 ±0.01 0.99 ±0.01 0.56 ±0.01
E[(X − Y )2
] 0.00 ±0.00 0.09 ±0.12 0.55 ±0.05
GPR θ = 0 0.34 ±0.01 0.01 ±0.01 0.06 ±0.02
GPR θ = 1 0.00 ±0.01 0.99 ±0.01 0.56 ±0.01
GPR θ = .5 0.34 ±0.01 0.59 ±0.12 0.57 ±0.01
GNPR θ = 0 1 0.00 ±0.00 0.17 ±0.00
GNPR θ = 1 0.00 ±0.00 1 0.57 ±0.00
GNPR θ = .5 0.99 ±0.01 0.25 ±0.20 0.95 ±0.08
AP
(1 − ρ)/2 0.00 ±0.00 0.99 ±0.07 0.48 ±0.02
E[(X − Y )2
] 0.14 ±0.03 0.94 ±0.02 0.59 ±0.00
GPR θ = 0 0.25 ±0.08 0.01 ±0.01 0.05 ±0.02
GPR θ = 1 0.00 ±0.01 0.99 ±0.01 0.48 ±0.02
GPR θ = .5 0.06 ±0.00 0.80 ±0.10 0.52 ±0.02
GNPR θ = 0 1 0.00 ±0.00 0.18 ±0.01
GNPR θ = 1 0.00 ±0.01 1 0.59 ±0.00
GNPR θ = .5 0.39 ±0.02 0.39 ±0.11 1
Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
Results: Application to Credit Default Swap Time Series
Distance matrices
computed on CDS
time series exhibit a
hierarchical block
structure
Marti, Very, Donnat,
Nielsen IEEE ICMLA 2015
(un)Stability of
clusters with L2
distance
Stability of clusters
with the proposed
distance
Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
Consistency
Definition (Consistency of a clustering algorithm)
A clustering algorithm A is consistent with respect to the Hierarchical
Block Model defining a set of nested partitions P if the probability that
the algorithm A recovers all the partitions in P converges to 1 when
T → ∞.
Definition (Space-conserving algorithm)
A space-conserving algorithm does not distort the space, i.e. the distance
Dij between two clusters Ci and Cj is such that
Dij ∈ min
x∈Ci ,y∈Cj
d(x, y), max
x∈Ci ,y∈Cj
d(x, y) .
Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
Consistency
Theorem (Consistency of space-conserving algorithms (Andler,
Marti, Nielsen, Donnat, 2015))
Space-conserving algorithms (e.g., Single, Average, Complete
Linkage) are consistent with respect to the Hierarchical Block
Model.
T = 100 T = 1000 T = 10000
Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
1 Introduction
2 Geometry of Random Walk Time Series
3 The Hierarchical Block Model
4 Conclusion
Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
Discussion and questions?
Avenue for research:
distances on (copula,margins)
clustering using multivariate dependence information
clustering using multi-wise dependence information
Optimal Copula Transport for Clustering Multivariate Time Series,
Marti, Nielsen, Donnat, 2015
Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Contenu connexe

Tendances

Tendances (20)

QMC Opening Workshop, Support Points - a new way to compact distributions, wi...
QMC Opening Workshop, Support Points - a new way to compact distributions, wi...QMC Opening Workshop, Support Points - a new way to compact distributions, wi...
QMC Opening Workshop, Support Points - a new way to compact distributions, wi...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big Data
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Introduction to MCMC methods
Introduction to MCMC methodsIntroduction to MCMC methods
Introduction to MCMC methods
 
ABC in Venezia
ABC in VeneziaABC in Venezia
ABC in Venezia
 
Bayesian model choice in cosmology
Bayesian model choice in cosmologyBayesian model choice in cosmology
Bayesian model choice in cosmology
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical Methods
 
Nested sampling
Nested samplingNested sampling
Nested sampling
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Chris Sherlock's slides
Chris Sherlock's slidesChris Sherlock's slides
Chris Sherlock's slides
 

En vedette

On clustering financial time series - A need for distances between dependent ...
On clustering financial time series - A need for distances between dependent ...On clustering financial time series - A need for distances between dependent ...
On clustering financial time series - A need for distances between dependent ...
Gautier Marti
 
Market efficiency presentation
Market efficiency presentationMarket efficiency presentation
Market efficiency presentation
Mohammed Alashi
 
Market efficiency presentation
Market efficiency presentationMarket efficiency presentation
Market efficiency presentation
Mohammad Waqas
 
Fernando Imperiale - Una aguja en el pajar
Fernando Imperiale - Una aguja en el pajarFernando Imperiale - Una aguja en el pajar
Fernando Imperiale - Una aguja en el pajar
Fernando M. Imperiale
 
Bartha_Éva_Lili-A_matroid_és_gráfelmélet_összefüggései - MSc_Diplomamunka
Bartha_Éva_Lili-A_matroid_és_gráfelmélet_összefüggései - MSc_DiplomamunkaBartha_Éva_Lili-A_matroid_és_gráfelmélet_összefüggései - MSc_Diplomamunka
Bartha_Éva_Lili-A_matroid_és_gráfelmélet_összefüggései - MSc_Diplomamunka
Lili Eva Bartha
 

En vedette (20)

On clustering financial time series - A need for distances between dependent ...
On clustering financial time series - A need for distances between dependent ...On clustering financial time series - A need for distances between dependent ...
On clustering financial time series - A need for distances between dependent ...
 
A Random Walk Down Wall Street
A Random Walk Down Wall StreetA Random Walk Down Wall Street
A Random Walk Down Wall Street
 
Market efficiency presentation
Market efficiency presentationMarket efficiency presentation
Market efficiency presentation
 
A Random Walk Through Search Research
A Random Walk Through Search ResearchA Random Walk Through Search Research
A Random Walk Through Search Research
 
Market efficiency presentation
Market efficiency presentationMarket efficiency presentation
Market efficiency presentation
 
Random walk on Graphs
Random walk on GraphsRandom walk on Graphs
Random walk on Graphs
 
Random Walk Theory: 'Sistemi di grid automatizzati per Metatrader 4'
Random Walk Theory: 'Sistemi di grid automatizzati per Metatrader 4'Random Walk Theory: 'Sistemi di grid automatizzati per Metatrader 4'
Random Walk Theory: 'Sistemi di grid automatizzati per Metatrader 4'
 
Fernando Imperiale - Una aguja en el pajar
Fernando Imperiale - Una aguja en el pajarFernando Imperiale - Una aguja en el pajar
Fernando Imperiale - Una aguja en el pajar
 
On the stability of clustering financial time series
On the stability of clustering financial time seriesOn the stability of clustering financial time series
On the stability of clustering financial time series
 
Cv bank pa
Cv bank paCv bank pa
Cv bank pa
 
Carla Casilli - Cineca + open badges - May 2015
Carla Casilli - Cineca + open badges - May 2015Carla Casilli - Cineca + open badges - May 2015
Carla Casilli - Cineca + open badges - May 2015
 
Geography 372 Final Presentation
Geography 372 Final PresentationGeography 372 Final Presentation
Geography 372 Final Presentation
 
Yazeed kay-ghazi
Yazeed kay-ghaziYazeed kay-ghazi
Yazeed kay-ghazi
 
2015年3月の中国からGitHubへのDDoS攻撃(MITM)の概要
2015年3月の中国からGitHubへのDDoS攻撃(MITM)の概要2015年3月の中国からGitHubへのDDoS攻撃(MITM)の概要
2015年3月の中国からGitHubへのDDoS攻撃(MITM)の概要
 
Diapo bourse aux sports
Diapo bourse aux sportsDiapo bourse aux sports
Diapo bourse aux sports
 
Neurological considerations
Neurological considerationsNeurological considerations
Neurological considerations
 
Bartha_Éva_Lili-A_matroid_és_gráfelmélet_összefüggései - MSc_Diplomamunka
Bartha_Éva_Lili-A_matroid_és_gráfelmélet_összefüggései - MSc_DiplomamunkaBartha_Éva_Lili-A_matroid_és_gráfelmélet_összefüggései - MSc_Diplomamunka
Bartha_Éva_Lili-A_matroid_és_gráfelmélet_összefüggései - MSc_Diplomamunka
 
Magento News @ Magento Meetup Wien 17
Magento News @ Magento Meetup Wien 17Magento News @ Magento Meetup Wien 17
Magento News @ Magento Meetup Wien 17
 
integrating climate risks in agricultural value chains enamul haque
integrating climate risks in agricultural value chains   enamul haqueintegrating climate risks in agricultural value chains   enamul haque
integrating climate risks in agricultural value chains enamul haque
 
Health & safety officer performance appraisal
Health & safety officer performance appraisalHealth & safety officer performance appraisal
Health & safety officer performance appraisal
 

Similaire à Clustering Random Walk Time Series

somenath_fixedpoint_dasguptaIMF17-20-2013
somenath_fixedpoint_dasguptaIMF17-20-2013somenath_fixedpoint_dasguptaIMF17-20-2013
somenath_fixedpoint_dasguptaIMF17-20-2013
Somenath Bandyopadhyay
 
A numerical method to solve fractional Fredholm-Volterra integro-differential...
A numerical method to solve fractional Fredholm-Volterra integro-differential...A numerical method to solve fractional Fredholm-Volterra integro-differential...
A numerical method to solve fractional Fredholm-Volterra integro-differential...
OctavianPostavaru
 

Similaire à Clustering Random Walk Time Series (20)

The Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability DistributionThe Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability Distribution
 
A Note on “   Geraghty contraction type mappings”
A Note on “   Geraghty contraction type mappings”A Note on “   Geraghty contraction type mappings”
A Note on “   Geraghty contraction type mappings”
 
Common fixed point theorem for occasionally weakly compatible mapping in q fu...
Common fixed point theorem for occasionally weakly compatible mapping in q fu...Common fixed point theorem for occasionally weakly compatible mapping in q fu...
Common fixed point theorem for occasionally weakly compatible mapping in q fu...
 
Stochastic Schrödinger equations
Stochastic Schrödinger equationsStochastic Schrödinger equations
Stochastic Schrödinger equations
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
QMC: Operator Splitting Workshop, Using Sequences of Iterates in Inertial Met...
QMC: Operator Splitting Workshop, Using Sequences of Iterates in Inertial Met...QMC: Operator Splitting Workshop, Using Sequences of Iterates in Inertial Met...
QMC: Operator Splitting Workshop, Using Sequences of Iterates in Inertial Met...
 
Slides econometrics-2018-graduate-4
Slides econometrics-2018-graduate-4Slides econometrics-2018-graduate-4
Slides econometrics-2018-graduate-4
 
Extension of Some Common Fixed Point Theorems using Compatible Mappings in Fu...
Extension of Some Common Fixed Point Theorems using Compatible Mappings in Fu...Extension of Some Common Fixed Point Theorems using Compatible Mappings in Fu...
Extension of Some Common Fixed Point Theorems using Compatible Mappings in Fu...
 
somenath_fixedpoint_dasguptaIMF17-20-2013
somenath_fixedpoint_dasguptaIMF17-20-2013somenath_fixedpoint_dasguptaIMF17-20-2013
somenath_fixedpoint_dasguptaIMF17-20-2013
 
International conference "QP 34 -- Quantum Probability and Related Topics"
International conference "QP 34 -- Quantum Probability and Related Topics"International conference "QP 34 -- Quantum Probability and Related Topics"
International conference "QP 34 -- Quantum Probability and Related Topics"
 
Graduate Econometrics Course, part 4, 2017
Graduate Econometrics Course, part 4, 2017Graduate Econometrics Course, part 4, 2017
Graduate Econometrics Course, part 4, 2017
 
A numerical method to solve fractional Fredholm-Volterra integro-differential...
A numerical method to solve fractional Fredholm-Volterra integro-differential...A numerical method to solve fractional Fredholm-Volterra integro-differential...
A numerical method to solve fractional Fredholm-Volterra integro-differential...
 
SOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMS
SOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMSSOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMS
SOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMS
 
MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models...
MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models...MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models...
MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models...
 
COMMON FIXED POINT THEOREMS IN COMPATIBLE MAPPINGS OF TYPE (P*) OF GENERALIZE...
COMMON FIXED POINT THEOREMS IN COMPATIBLE MAPPINGS OF TYPE (P*) OF GENERALIZE...COMMON FIXED POINT THEOREMS IN COMPATIBLE MAPPINGS OF TYPE (P*) OF GENERALIZE...
COMMON FIXED POINT THEOREMS IN COMPATIBLE MAPPINGS OF TYPE (P*) OF GENERALIZE...
 
COMMON FIXED POINT THEOREMS IN COMPATIBLE MAPPINGS OF TYPE (P*) OF GENERALIZE...
COMMON FIXED POINT THEOREMS IN COMPATIBLE MAPPINGS OF TYPE (P*) OF GENERALIZE...COMMON FIXED POINT THEOREMS IN COMPATIBLE MAPPINGS OF TYPE (P*) OF GENERALIZE...
COMMON FIXED POINT THEOREMS IN COMPATIBLE MAPPINGS OF TYPE (P*) OF GENERALIZE...
 
COMMON FIXED POINT THEOREMS IN COMPATIBLE MAPPINGS OF TYPE (P*) OF GENERALIZE...
COMMON FIXED POINT THEOREMS IN COMPATIBLE MAPPINGS OF TYPE (P*) OF GENERALIZE...COMMON FIXED POINT THEOREMS IN COMPATIBLE MAPPINGS OF TYPE (P*) OF GENERALIZE...
COMMON FIXED POINT THEOREMS IN COMPATIBLE MAPPINGS OF TYPE (P*) OF GENERALIZE...
 
COMMON FIXED POINT THEOREMS IN COMPATIBLE MAPPINGS OF TYPE (P*) OF GENERALIZE...
COMMON FIXED POINT THEOREMS IN COMPATIBLE MAPPINGS OF TYPE (P*) OF GENERALIZE...COMMON FIXED POINT THEOREMS IN COMPATIBLE MAPPINGS OF TYPE (P*) OF GENERALIZE...
COMMON FIXED POINT THEOREMS IN COMPATIBLE MAPPINGS OF TYPE (P*) OF GENERALIZE...
 
Fixed points of contractive and Geraghty contraction mappings under the influ...
Fixed points of contractive and Geraghty contraction mappings under the influ...Fixed points of contractive and Geraghty contraction mappings under the influ...
Fixed points of contractive and Geraghty contraction mappings under the influ...
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihood
 

Plus de Gautier Marti

Plus de Gautier Marti (17)

Using Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of CodeUsing Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of Code
 
What deep learning can bring to...
What deep learning can bring to...What deep learning can bring to...
What deep learning can bring to...
 
A quick demo of Top2Vec With application on 2020 10-K business descriptions
A quick demo of Top2Vec With application on 2020 10-K business descriptionsA quick demo of Top2Vec With application on 2020 10-K business descriptions
A quick demo of Top2Vec With application on 2020 10-K business descriptions
 
cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...
cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...
cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...
 
How deep generative models can help quants reduce the risk of overfitting?
How deep generative models can help quants reduce the risk of overfitting?How deep generative models can help quants reduce the risk of overfitting?
How deep generative models can help quants reduce the risk of overfitting?
 
Generating Realistic Synthetic Data in Finance
Generating Realistic Synthetic Data in FinanceGenerating Realistic Synthetic Data in Finance
Generating Realistic Synthetic Data in Finance
 
Applications of GANs in Finance
Applications of GANs in FinanceApplications of GANs in Finance
Applications of GANs in Finance
 
My recent attempts at using GANs for simulating realistic stocks returns
My recent attempts at using GANs for simulating realistic stocks returnsMy recent attempts at using GANs for simulating realistic stocks returns
My recent attempts at using GANs for simulating realistic stocks returns
 
Takeaways from ICML 2019, Long Beach, California
Takeaways from ICML 2019, Long Beach, CaliforniaTakeaways from ICML 2019, Long Beach, California
Takeaways from ICML 2019, Long Beach, California
 
A review of two decades of correlations, hierarchies, networks and clustering...
A review of two decades of correlations, hierarchies, networks and clustering...A review of two decades of correlations, hierarchies, networks and clustering...
A review of two decades of correlations, hierarchies, networks and clustering...
 
Autoregressive Convolutional Neural Networks for Asynchronous Time Series
Autoregressive Convolutional Neural Networks for Asynchronous Time SeriesAutoregressive Convolutional Neural Networks for Asynchronous Time Series
Autoregressive Convolutional Neural Networks for Asynchronous Time Series
 
Some contributions to the clustering of financial time series - Applications ...
Some contributions to the clustering of financial time series - Applications ...Some contributions to the clustering of financial time series - Applications ...
Some contributions to the clustering of financial time series - Applications ...
 
Clustering CDS: algorithms, distances, stability and convergence rates
Clustering CDS: algorithms, distances, stability and convergence ratesClustering CDS: algorithms, distances, stability and convergence rates
Clustering CDS: algorithms, distances, stability and convergence rates
 
Clustering Financial Time Series using their Correlations and their Distribut...
Clustering Financial Time Series using their Correlations and their Distribut...Clustering Financial Time Series using their Correlations and their Distribut...
Clustering Financial Time Series using their Correlations and their Distribut...
 
A closer look at correlations
A closer look at correlationsA closer look at correlations
A closer look at correlations
 
Clustering Financial Time Series: How Long is Enough?
Clustering Financial Time Series: How Long is Enough?Clustering Financial Time Series: How Long is Enough?
Clustering Financial Time Series: How Long is Enough?
 
Optimal Transport between Copulas for Clustering Time Series
Optimal Transport between Copulas for Clustering Time SeriesOptimal Transport between Copulas for Clustering Time Series
Optimal Transport between Copulas for Clustering Time Series
 

Dernier

Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
Lokesh Kothari
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 

Dernier (20)

COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 

Clustering Random Walk Time Series

  • 1. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Clustering Random Walk Time Series GSI 2015 - Geometric Science of Information Gautier Marti, Frank Nielsen, Philippe Very, Philippe Donnat 29 October 2015 Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  • 2. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion 1 Introduction 2 Geometry of Random Walk Time Series 3 The Hierarchical Block Model 4 Conclusion Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  • 3. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Context (data from www.datagrapple.com) Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  • 4. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion What is a clustering program? Definition Clustering is the task of grouping a set of objects in such a way that objects in the same group (cluster) are more similar to each other than those in different groups. Example of a clustering program We aim at finding k groups by positioning k group centers {c1, . . . , ck} such that data points {x1, . . . , xn} minimize minc1,...,ck n i=1 mink j=1 d(xi , cj )2 But, what is the distance d between two random walk time series? Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  • 5. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion What are clusters of Random Walk Time Series? French banks and building materials CDS over 2006-2015 Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  • 6. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion What are clusters of Random Walk Time Series? French banks and building materials CDS over 2006-2015 Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  • 7. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion 1 Introduction 2 Geometry of Random Walk Time Series 3 The Hierarchical Block Model 4 Conclusion Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  • 8. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Geometry of RW TS ≡ Geometry of Random Variables i.i.d. observations: X1 : X1 1 , X2 1 , . . . , XT 1 X2 : X1 2 , X2 2 , . . . , XT 2 . . . , . . . , . . . , . . . , . . . XN : X1 N, X2 N, . . . , XT N Which distances d(Xi , Xj ) between dependent random variables? Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  • 9. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Pitfalls of a basic distance Let (X, Y ) be a bivariate Gaussian vector, with X ∼ N(µX , σ2 X ), Y ∼ N(µY , σ2 Y ) and whose correlation is ρ(X, Y ) ∈ [−1, 1]. E[(X − Y )2 ] = (µX − µY )2 + (σX − σY )2 + 2σX σY (1 − ρ(X, Y )) Now, consider the following values for correlation: ρ(X, Y ) = 0, so E[(X − Y )2] = (µX − µY )2 + σ2 X + σ2 Y . Assume µX = µY and σX = σY . For σX = σY 1, we obtain E[(X − Y )2] 1 instead of the distance 0, expected from comparing two equal Gaussians. ρ(X, Y ) = 1, so E[(X − Y )2] = (µX − µY )2 + (σX − σY )2. Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  • 10. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Pitfalls of a basic distance Let (X, Y ) be a bivariate Gaussian vector, with X ∼ N (µX , σ2 X ), Y ∼ N (µY , σ2 Y ) and whose correlation is ρ(X, Y ) ∈ [−1, 1]. E[(X − Y ) 2 ] = (µX − µY ) 2 + (σX − σY ) 2 + 2σX σY (1 − ρ(X, Y )) Now, consider the following values for correlation: ρ(X, Y ) = 0, so E[(X − Y )2 ] = (µX − µY )2 + σ2 X + σ2 Y . Assume µX = µY and σX = σY . For σX = σY 1, we obtain E[(X − Y )2 ] 1 instead of the distance 0, expected from comparing two equal Gaussians. ρ(X, Y ) = 1, so E[(X − Y )2 ] = (µX − µY )2 + (σX − σY )2 . 30 20 10 0 10 20 30 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 Probability density functions of Gaus- sians N(−5, 1) and N(5, 1), Gaus- sians N(−5, 3) and N(5, 3), and Gaussians N(−5, 10) and N(5, 10). Green, red and blue Gaussians are equidistant using L2 geometry on the parameter space (µ, σ). Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  • 11. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Sklar’s Theorem Theorem (Sklar’s Theorem (1959)) For any random vector X = (X1, . . . , XN) having continuous marginal cdfs Pi , 1 ≤ i ≤ N, its joint cumulative distribution P is uniquely expressed as P(X1, . . . , XN) = C(P1(X1), . . . , PN(XN)), where C, the multivariate distribution of uniform marginals, is known as the copula of X. Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  • 12. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Sklar’s Theorem Theorem (Sklar’s Theorem (1959)) For any random vector X = (X1, . . . , XN ) having continuous marginal cdfs Pi , 1 ≤ i ≤ N, its joint cumulative distribution P is uniquely expressed as P(X1, . . . , XN ) = C(P1(X1), . . . , PN (XN )), where C, the multivariate distribution of uniform marginals, is known as the copula of X. Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  • 13. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion The Copula Transform Definition (The Copula Transform) Let X = (X1, . . . , XN) be a random vector with continuous marginal cumulative distribution functions (cdfs) Pi , 1 ≤ i ≤ N. The random vector U = (U1, . . . , UN) := P(X) = (P1(X1), . . . , PN(XN)) is known as the copula transform. Ui , 1 ≤ i ≤ N, are uniformly distributed on [0, 1] (the probability integral transform): for Pi the cdf of Xi , we have x = Pi (Pi −1 (x)) = Pr(Xi ≤ Pi −1 (x)) = Pr(Pi (Xi ) ≤ x), thus Pi (Xi ) ∼ U[0, 1]. Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  • 14. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion The Copula Transform Definition (The Copula Transform) Let X = (X1, . . . , XN ) be a random vector with continuous marginal cumulative distribution functions (cdfs) Pi , 1 ≤ i ≤ N. The random vector U = (U1, . . . , UN ) := P(X) = (P1(X1), . . . , PN (XN )) is known as the copula transform. 0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 X∼U[0,1] 10 8 6 4 2 0 2 Y∼ln(X) ρ≈0.84 0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 PX (X) 0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 PY(Y) ρ=1 The Copula Transform invariance to strictly increasing transformation Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  • 15. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Deheuvels’ Empirical Copula Transform Let (Xt 1 , . . . , Xt N ), 1 ≤ t ≤ T, be T observations from a random vector (X1, . . . , XN ) with continuous margins. Since one cannot directly obtain the corresponding copula observations (Ut 1, . . . , Ut N ) = (P1(Xt 1 ), . . . , PN (Xt N )), where t = 1, . . . , T, without knowing a priori (P1, . . . , PN ), one can instead Definition (The Empirical Copula Transform) estimate the N empirical margins PT i (x) = 1 T T t=1 1(Xt i ≤ x), 1 ≤ i ≤ N, to obtain the T empirical observations ( ˜Ut 1, . . . , ˜Ut N ) = (PT 1 (Xt 1 ), . . . , PT N (Xt N )). Equivalently, since ˜Ut i = Rt i /T, Rt i being the rank of observation Xt i , the empirical copula transform can be considered as the normalized rank transform. In practice x_transform = rankdata(x)/len(x) Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  • 16. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Generic Non-Parametric Distance d2 θ (Xi , Xj ) = θ3E |Pi (Xi ) − Pj (Xj )|2 + (1 − θ) 1 2 R dPi dλ − dPj dλ 2 dλ (i) 0 ≤ dθ ≤ 1, (ii) 0 < θ < 1, dθ metric, (iii) dθ is invariant under diffeomorphism Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  • 17. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Generic Non-Parametric Distance d2 0 : 1 2 R dPi dλ − dPj dλ 2 dλ = Hellinger2 d2 1 : 3E |Pi (Xi ) − Pj (Xj )|2 = 1 − ρS 2 = 2−6 1 0 1 0 C(u, v)dudv Remark: If f (x, θ) = cΦ(u1, . . . , uN; Σ) N i=1 fi (xi ; νi ) then ds2 = ds2 GaussCopula + N i=1 ds2 margins Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  • 18. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion 1 Introduction 2 Geometry of Random Walk Time Series 3 The Hierarchical Block Model 4 Conclusion Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  • 19. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion The Hierarchical Block Model A model of nested partitions The nested partitions defined by the model can be seen on the distance matrix for a proper distance and the right permutation of the data points In practice, one observe and work with the above distance matrix which is identitical to the left one up to a permutation of the data Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  • 20. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Results: Data from Hierarchical Block Model Adjusted Rand Index Algo. Distance Distrib Correl Correl+Distrib HC-AL (1 − ρ)/2 0.00 ±0.01 0.99 ±0.01 0.56 ±0.01 E[(X − Y )2 ] 0.00 ±0.00 0.09 ±0.12 0.55 ±0.05 GPR θ = 0 0.34 ±0.01 0.01 ±0.01 0.06 ±0.02 GPR θ = 1 0.00 ±0.01 0.99 ±0.01 0.56 ±0.01 GPR θ = .5 0.34 ±0.01 0.59 ±0.12 0.57 ±0.01 GNPR θ = 0 1 0.00 ±0.00 0.17 ±0.00 GNPR θ = 1 0.00 ±0.00 1 0.57 ±0.00 GNPR θ = .5 0.99 ±0.01 0.25 ±0.20 0.95 ±0.08 AP (1 − ρ)/2 0.00 ±0.00 0.99 ±0.07 0.48 ±0.02 E[(X − Y )2 ] 0.14 ±0.03 0.94 ±0.02 0.59 ±0.00 GPR θ = 0 0.25 ±0.08 0.01 ±0.01 0.05 ±0.02 GPR θ = 1 0.00 ±0.01 0.99 ±0.01 0.48 ±0.02 GPR θ = .5 0.06 ±0.00 0.80 ±0.10 0.52 ±0.02 GNPR θ = 0 1 0.00 ±0.00 0.18 ±0.01 GNPR θ = 1 0.00 ±0.01 1 0.59 ±0.00 GNPR θ = .5 0.39 ±0.02 0.39 ±0.11 1 Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  • 21. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Results: Application to Credit Default Swap Time Series Distance matrices computed on CDS time series exhibit a hierarchical block structure Marti, Very, Donnat, Nielsen IEEE ICMLA 2015 (un)Stability of clusters with L2 distance Stability of clusters with the proposed distance Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  • 22. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Consistency Definition (Consistency of a clustering algorithm) A clustering algorithm A is consistent with respect to the Hierarchical Block Model defining a set of nested partitions P if the probability that the algorithm A recovers all the partitions in P converges to 1 when T → ∞. Definition (Space-conserving algorithm) A space-conserving algorithm does not distort the space, i.e. the distance Dij between two clusters Ci and Cj is such that Dij ∈ min x∈Ci ,y∈Cj d(x, y), max x∈Ci ,y∈Cj d(x, y) . Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  • 23. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Consistency Theorem (Consistency of space-conserving algorithms (Andler, Marti, Nielsen, Donnat, 2015)) Space-conserving algorithms (e.g., Single, Average, Complete Linkage) are consistent with respect to the Hierarchical Block Model. T = 100 T = 1000 T = 10000 Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  • 24. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion 1 Introduction 2 Geometry of Random Walk Time Series 3 The Hierarchical Block Model 4 Conclusion Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  • 25. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Discussion and questions? Avenue for research: distances on (copula,margins) clustering using multivariate dependence information clustering using multi-wise dependence information Optimal Copula Transport for Clustering Multivariate Time Series, Marti, Nielsen, Donnat, 2015 Gautier Marti, Frank Nielsen Clustering Random Walk Time Series