Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
Clustering Random Walk Time Serie...
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
1 Introduction
2 Geometry of Rand...
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
Context (data from www.datagrappl...
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
What is a clustering program?
Defi...
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
What are clusters of Random Walk ...
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
What are clusters of Random Walk ...
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
1 Introduction
2 Geometry of Rand...
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
Geometry of RW TS ≡ Geometry of R...
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
Pitfalls of a basic distance
Let ...
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
Pitfalls of a basic distance
Let ...
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
Sklar’s Theorem
Theorem (Sklar’s ...
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
Sklar’s Theorem
Theorem (Sklar’s ...
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
The Copula Transform
Definition (T...
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
The Copula Transform
Definition (T...
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
Deheuvels’ Empirical Copula Trans...
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
Generic Non-Parametric Distance
d...
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
Generic Non-Parametric Distance
d...
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
1 Introduction
2 Geometry of Rand...
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
The Hierarchical Block Model
A mo...
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
Results: Data from Hierarchical B...
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
Results: Application to Credit De...
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
Consistency
Definition (Consistenc...
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
Consistency
Theorem (Consistency ...
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
1 Introduction
2 Geometry of Rand...
Introduction
Geometry of Random Walk Time Series
The Hierarchical Block Model
Conclusion
Discussion and questions?
Avenue ...
Prochain SlideShare
Chargement dans…5
×

Clustering Random Walk Time Series

Talk given at Geometric Science of Information GSI 2015

  • Soyez le premier à commenter

Clustering Random Walk Time Series

  1. 1. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Clustering Random Walk Time Series GSI 2015 - Geometric Science of Information Gautier Marti, Frank Nielsen, Philippe Very, Philippe Donnat 29 October 2015 Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  2. 2. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion 1 Introduction 2 Geometry of Random Walk Time Series 3 The Hierarchical Block Model 4 Conclusion Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  3. 3. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Context (data from www.datagrapple.com) Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  4. 4. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion What is a clustering program? Definition Clustering is the task of grouping a set of objects in such a way that objects in the same group (cluster) are more similar to each other than those in different groups. Example of a clustering program We aim at finding k groups by positioning k group centers {c1, . . . , ck} such that data points {x1, . . . , xn} minimize minc1,...,ck n i=1 mink j=1 d(xi , cj )2 But, what is the distance d between two random walk time series? Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  5. 5. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion What are clusters of Random Walk Time Series? French banks and building materials CDS over 2006-2015 Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  6. 6. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion What are clusters of Random Walk Time Series? French banks and building materials CDS over 2006-2015 Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  7. 7. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion 1 Introduction 2 Geometry of Random Walk Time Series 3 The Hierarchical Block Model 4 Conclusion Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  8. 8. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Geometry of RW TS ≡ Geometry of Random Variables i.i.d. observations: X1 : X1 1 , X2 1 , . . . , XT 1 X2 : X1 2 , X2 2 , . . . , XT 2 . . . , . . . , . . . , . . . , . . . XN : X1 N, X2 N, . . . , XT N Which distances d(Xi , Xj ) between dependent random variables? Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  9. 9. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Pitfalls of a basic distance Let (X, Y ) be a bivariate Gaussian vector, with X ∼ N(µX , σ2 X ), Y ∼ N(µY , σ2 Y ) and whose correlation is ρ(X, Y ) ∈ [−1, 1]. E[(X − Y )2 ] = (µX − µY )2 + (σX − σY )2 + 2σX σY (1 − ρ(X, Y )) Now, consider the following values for correlation: ρ(X, Y ) = 0, so E[(X − Y )2] = (µX − µY )2 + σ2 X + σ2 Y . Assume µX = µY and σX = σY . For σX = σY 1, we obtain E[(X − Y )2] 1 instead of the distance 0, expected from comparing two equal Gaussians. ρ(X, Y ) = 1, so E[(X − Y )2] = (µX − µY )2 + (σX − σY )2. Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  10. 10. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Pitfalls of a basic distance Let (X, Y ) be a bivariate Gaussian vector, with X ∼ N (µX , σ2 X ), Y ∼ N (µY , σ2 Y ) and whose correlation is ρ(X, Y ) ∈ [−1, 1]. E[(X − Y ) 2 ] = (µX − µY ) 2 + (σX − σY ) 2 + 2σX σY (1 − ρ(X, Y )) Now, consider the following values for correlation: ρ(X, Y ) = 0, so E[(X − Y )2 ] = (µX − µY )2 + σ2 X + σ2 Y . Assume µX = µY and σX = σY . For σX = σY 1, we obtain E[(X − Y )2 ] 1 instead of the distance 0, expected from comparing two equal Gaussians. ρ(X, Y ) = 1, so E[(X − Y )2 ] = (µX − µY )2 + (σX − σY )2 . 30 20 10 0 10 20 30 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 Probability density functions of Gaus- sians N(−5, 1) and N(5, 1), Gaus- sians N(−5, 3) and N(5, 3), and Gaussians N(−5, 10) and N(5, 10). Green, red and blue Gaussians are equidistant using L2 geometry on the parameter space (µ, σ). Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  11. 11. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Sklar’s Theorem Theorem (Sklar’s Theorem (1959)) For any random vector X = (X1, . . . , XN) having continuous marginal cdfs Pi , 1 ≤ i ≤ N, its joint cumulative distribution P is uniquely expressed as P(X1, . . . , XN) = C(P1(X1), . . . , PN(XN)), where C, the multivariate distribution of uniform marginals, is known as the copula of X. Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  12. 12. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Sklar’s Theorem Theorem (Sklar’s Theorem (1959)) For any random vector X = (X1, . . . , XN ) having continuous marginal cdfs Pi , 1 ≤ i ≤ N, its joint cumulative distribution P is uniquely expressed as P(X1, . . . , XN ) = C(P1(X1), . . . , PN (XN )), where C, the multivariate distribution of uniform marginals, is known as the copula of X. Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  13. 13. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion The Copula Transform Definition (The Copula Transform) Let X = (X1, . . . , XN) be a random vector with continuous marginal cumulative distribution functions (cdfs) Pi , 1 ≤ i ≤ N. The random vector U = (U1, . . . , UN) := P(X) = (P1(X1), . . . , PN(XN)) is known as the copula transform. Ui , 1 ≤ i ≤ N, are uniformly distributed on [0, 1] (the probability integral transform): for Pi the cdf of Xi , we have x = Pi (Pi −1 (x)) = Pr(Xi ≤ Pi −1 (x)) = Pr(Pi (Xi ) ≤ x), thus Pi (Xi ) ∼ U[0, 1]. Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  14. 14. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion The Copula Transform Definition (The Copula Transform) Let X = (X1, . . . , XN ) be a random vector with continuous marginal cumulative distribution functions (cdfs) Pi , 1 ≤ i ≤ N. The random vector U = (U1, . . . , UN ) := P(X) = (P1(X1), . . . , PN (XN )) is known as the copula transform. 0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 X∼U[0,1] 10 8 6 4 2 0 2 Y∼ln(X) ρ≈0.84 0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 PX (X) 0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 PY(Y) ρ=1 The Copula Transform invariance to strictly increasing transformation Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  15. 15. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Deheuvels’ Empirical Copula Transform Let (Xt 1 , . . . , Xt N ), 1 ≤ t ≤ T, be T observations from a random vector (X1, . . . , XN ) with continuous margins. Since one cannot directly obtain the corresponding copula observations (Ut 1, . . . , Ut N ) = (P1(Xt 1 ), . . . , PN (Xt N )), where t = 1, . . . , T, without knowing a priori (P1, . . . , PN ), one can instead Definition (The Empirical Copula Transform) estimate the N empirical margins PT i (x) = 1 T T t=1 1(Xt i ≤ x), 1 ≤ i ≤ N, to obtain the T empirical observations ( ˜Ut 1, . . . , ˜Ut N ) = (PT 1 (Xt 1 ), . . . , PT N (Xt N )). Equivalently, since ˜Ut i = Rt i /T, Rt i being the rank of observation Xt i , the empirical copula transform can be considered as the normalized rank transform. In practice x_transform = rankdata(x)/len(x) Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  16. 16. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Generic Non-Parametric Distance d2 θ (Xi , Xj ) = θ3E |Pi (Xi ) − Pj (Xj )|2 + (1 − θ) 1 2 R dPi dλ − dPj dλ 2 dλ (i) 0 ≤ dθ ≤ 1, (ii) 0 < θ < 1, dθ metric, (iii) dθ is invariant under diffeomorphism Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  17. 17. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Generic Non-Parametric Distance d2 0 : 1 2 R dPi dλ − dPj dλ 2 dλ = Hellinger2 d2 1 : 3E |Pi (Xi ) − Pj (Xj )|2 = 1 − ρS 2 = 2−6 1 0 1 0 C(u, v)dudv Remark: If f (x, θ) = cΦ(u1, . . . , uN; Σ) N i=1 fi (xi ; νi ) then ds2 = ds2 GaussCopula + N i=1 ds2 margins Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  18. 18. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion 1 Introduction 2 Geometry of Random Walk Time Series 3 The Hierarchical Block Model 4 Conclusion Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  19. 19. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion The Hierarchical Block Model A model of nested partitions The nested partitions defined by the model can be seen on the distance matrix for a proper distance and the right permutation of the data points In practice, one observe and work with the above distance matrix which is identitical to the left one up to a permutation of the data Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  20. 20. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Results: Data from Hierarchical Block Model Adjusted Rand Index Algo. Distance Distrib Correl Correl+Distrib HC-AL (1 − ρ)/2 0.00 ±0.01 0.99 ±0.01 0.56 ±0.01 E[(X − Y )2 ] 0.00 ±0.00 0.09 ±0.12 0.55 ±0.05 GPR θ = 0 0.34 ±0.01 0.01 ±0.01 0.06 ±0.02 GPR θ = 1 0.00 ±0.01 0.99 ±0.01 0.56 ±0.01 GPR θ = .5 0.34 ±0.01 0.59 ±0.12 0.57 ±0.01 GNPR θ = 0 1 0.00 ±0.00 0.17 ±0.00 GNPR θ = 1 0.00 ±0.00 1 0.57 ±0.00 GNPR θ = .5 0.99 ±0.01 0.25 ±0.20 0.95 ±0.08 AP (1 − ρ)/2 0.00 ±0.00 0.99 ±0.07 0.48 ±0.02 E[(X − Y )2 ] 0.14 ±0.03 0.94 ±0.02 0.59 ±0.00 GPR θ = 0 0.25 ±0.08 0.01 ±0.01 0.05 ±0.02 GPR θ = 1 0.00 ±0.01 0.99 ±0.01 0.48 ±0.02 GPR θ = .5 0.06 ±0.00 0.80 ±0.10 0.52 ±0.02 GNPR θ = 0 1 0.00 ±0.00 0.18 ±0.01 GNPR θ = 1 0.00 ±0.01 1 0.59 ±0.00 GNPR θ = .5 0.39 ±0.02 0.39 ±0.11 1 Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  21. 21. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Results: Application to Credit Default Swap Time Series Distance matrices computed on CDS time series exhibit a hierarchical block structure Marti, Very, Donnat, Nielsen IEEE ICMLA 2015 (un)Stability of clusters with L2 distance Stability of clusters with the proposed distance Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  22. 22. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Consistency Definition (Consistency of a clustering algorithm) A clustering algorithm A is consistent with respect to the Hierarchical Block Model defining a set of nested partitions P if the probability that the algorithm A recovers all the partitions in P converges to 1 when T → ∞. Definition (Space-conserving algorithm) A space-conserving algorithm does not distort the space, i.e. the distance Dij between two clusters Ci and Cj is such that Dij ∈ min x∈Ci ,y∈Cj d(x, y), max x∈Ci ,y∈Cj d(x, y) . Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  23. 23. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Consistency Theorem (Consistency of space-conserving algorithms (Andler, Marti, Nielsen, Donnat, 2015)) Space-conserving algorithms (e.g., Single, Average, Complete Linkage) are consistent with respect to the Hierarchical Block Model. T = 100 T = 1000 T = 10000 Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  24. 24. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion 1 Introduction 2 Geometry of Random Walk Time Series 3 The Hierarchical Block Model 4 Conclusion Gautier Marti, Frank Nielsen Clustering Random Walk Time Series
  25. 25. Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Discussion and questions? Avenue for research: distances on (copula,margins) clustering using multivariate dependence information clustering using multi-wise dependence information Optimal Copula Transport for Clustering Multivariate Time Series, Marti, Nielsen, Donnat, 2015 Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

×