Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

On the stability of clustering financial time series

555 vues

Publié le

Talk at IEEE ICMLA 2015 Miami

In this presentation, we suggest some data perturbations that can help to validate or reject a clustering methodology besides yielding insights on the time series at hand. We show in this study that Pearson correlation is not that relevant for clustering these time series since it yields unstable clusters; prefer a more robust measure such as Spearman correlation based on rank statistics.

Publié dans : Sciences
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

On the stability of clustering financial time series

  1. 1. Introduction to financial time series clustering Empirical results from the clustering stability study Conclusion On the Stability of Clustering Financial Time Series – How to investigate? IEEE ICMLA Miami, Florida, USA, December 9-11, 2015 Gautier Marti, Philippe Very, Philippe Donnat, Frank Nielsen 9 December 2015 Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
  2. 2. Introduction to financial time series clustering Empirical results from the clustering stability study Conclusion 1 Introduction to financial time series clustering 2 Empirical results from the clustering stability study 3 Conclusion Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
  3. 3. Introduction to financial time series clustering Empirical results from the clustering stability study Conclusion Financial time series (data from www.datagrapple.com) Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
  4. 4. Introduction to financial time series clustering Empirical results from the clustering stability study Conclusion Clustering? Definition Clustering is the task of grouping a set of objects in such a way that objects in the same group (cluster) are more similar to each other than those in different groups. French banks (blue) and building materials (red) CDS over 2006-2015 Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
  5. 5. Introduction to financial time series clustering Empirical results from the clustering stability study Conclusion Why clustering? Mathematical finance: Use of variance-covariance matrices (e.g., Markowitz, Value-at-Risk) Stylized fact: Empirical variance-covariance matrices estimated on financial time series are very noisy (Random Matrix Theory, Noise Dressing of Financial Correlation Matrices, Laloux et al, 1999) 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 λ 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 ρ(λ) Marchenko-Pastur distribution vs. empirical eigenvalues distribution of the correlation matrix How to filter these variance-covariance matrices? Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
  6. 6. Introduction to financial time series clustering Empirical results from the clustering stability study Conclusion For filtering, clustering! Mantegna (1999) et al’s work: 0 100 200 300 400 500 0 100 200 300 400 500 0 100 200 300 400 500 0 100 200 300 400 500 0 100 200 300 400 500 0 100 200 300 400 500 (left) empirical correlation matrix (center) the same matrix seriated using a hierarchical clustering (right) correlations filtered using the clustering structure N.B. other applications: statarb, alternative risk measures Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
  7. 7. Introduction to financial time series clustering Empirical results from the clustering stability study Conclusion Why stability? statistical consistency of the clustering method requires assumptions that may not hold in practice: e.g. returns are i.i.d., underlying elliptical copula, enough data is available stability is a weaker property: reproducibility of results across a wide range of slight data perturbations Clusters obtained at time t, t + 1, t + 2; Is the difference between the successive clusters a“true”signal? Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
  8. 8. Introduction to financial time series clustering Empirical results from the clustering stability study Conclusion Is the clustering of financial time series stable? According to [2], clusters are not stable with respect to the clustering algorithm, but only a squared Euclidean distance was considered which is not relevant for clustering assets from their returns (cf. [4]). Idea: A more relevant distance should increase stability We investigate the clustering stability resulting from using: an Euclidean distance a Pearson correlation distance [3] a Spearman correlation distance a distance for comparing two dependent random variables [4] Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
  9. 9. Introduction to financial time series clustering Empirical results from the clustering stability study Conclusion Some usual distances for clustering financial time series (Pi t )t≥0 Si t+1 = log Pi t+1 −log Pi t (Si t )t≥1 Euclidean distance: d(Si , Sj ) = T t=1(Si t − Sj t )2 Pearson correl.: ρ(Si , Sj ) = T t=1(Si t −Si )(Sj t −Sj ) T t=1(Si t −Si )2 T t=1(Sj t −Sj )2 Spearman correl.: ρS (Si , Sj ) = 1 − 6 T(T2−1) T t=1(Si (t) − Si (t))2 Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
  10. 10. Introduction to financial time series clustering Empirical results from the clustering stability study Conclusion Generic Non-Parametric Distance [4] d2 θ (Xi , Xj ) = θ3E |Pi (Xi ) − Pj (Xj )|2 + (1 − θ) 1 2 R dPi dλ − dPj dλ 2 dλ (i) 0 ≤ dθ ≤ 1, (ii) 0 < θ < 1, dθ metric, (iii) dθ is invariant under diffeomorphism Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
  11. 11. Introduction to financial time series clustering Empirical results from the clustering stability study Conclusion Generic Non-Parametric Distance [4] d2 0 : 1 2 R dPi dλ − dPj dλ 2 dλ = Hellinger2 d2 1 : 3E |Pi (Xi ) − Pj (Xj )|2 = 1 − ρS 2 = 2−6 1 0 1 0 C(u, v)dudv Remark: If f (x, θ) = c(F1(x1; ν1), . . . , FN(xN; νN); θc) N i=1 fi (xi ; νi ) then with CML hypothesis ds2 = ds2 copula + N i=1 ds2 margins Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
  12. 12. Introduction to financial time series clustering Empirical results from the clustering stability study Conclusion 1 Introduction to financial time series clustering 2 Empirical results from the clustering stability study 3 Conclusion Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
  13. 13. Introduction to financial time series clustering Empirical results from the clustering stability study Conclusion Sliding Window PCA stability curve (red) vs. Euclidean Clusters stability curve as a function of time using results from [1] for fair comparison: clusters are more stable most basic perturbation: traders face it everyday when monitoring their indicators we do not want to overfit our analysis to this particular stability goal stability perf.: dist. [4] Spearman Pearson Euclidean Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
  14. 14. Introduction to financial time series clustering Empirical results from the clustering stability study Conclusion Odd vs. Even A clustering al- gorithm applied on two samples describing the same phenomenon should yield the same results. How to obtain two of these samples? (un)Stability of clusters with L2 distance Stability of clusters with the proposed distance [4] Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
  15. 15. Introduction to financial time series clustering Empirical results from the clustering stability study Conclusion Economic Regimes AXA 5-year CDS spread over 2006-2015 Average of the pairwise correlations; correlation skyrockets during crises Is the clustering structure persistent? Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
  16. 16. Introduction to financial time series clustering Empirical results from the clustering stability study Conclusion Economic Regimes Clustering Stability Pearson (top left), Spearman (top right), Euclidean (bottom left), corr+distr (bottom right) Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
  17. 17. Introduction to financial time series clustering Empirical results from the clustering stability study Conclusion Heart vs. Tails Clustering Stability ≈ orange+red vs. green+yellow periods Pearson (top left), Spearman (top right), Euclidean (bottom left), corr+distr (bottom right) Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
  18. 18. Introduction to financial time series clustering Empirical results from the clustering stability study Conclusion Multiscale Is the clustering structure persistent to different sampling frequencies? Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
  19. 19. Introduction to financial time series clustering Empirical results from the clustering stability study Conclusion Multiscale Clustering Stability Pearson (top left), Spearman (top right), Euclidean (bottom left), corr+distr (bottom right) Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
  20. 20. Introduction to financial time series clustering Empirical results from the clustering stability study Conclusion Maturities & Term Structure An asset is described by several time series whose dynamics are similar: Nokia Oyj is described here by the cost of insurance against its default for {1, 3, 5, 7, 10} years Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
  21. 21. Introduction to financial time series clustering Empirical results from the clustering stability study Conclusion Maturities & Term Structure Clustering Stability Pearson (top left), Spearman (top right), Euclidean (bottom left), corr+distr (bottom right) Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
  22. 22. Introduction to financial time series clustering Empirical results from the clustering stability study Conclusion 1 Introduction to financial time series clustering 2 Empirical results from the clustering stability study 3 Conclusion Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
  23. 23. Introduction to financial time series clustering Empirical results from the clustering stability study Conclusion Discussion and questions? A given clustering algorithm yields a particular clustering structure, but with a relevant distance it can be more stable The perturbations presented can be readily extended (e.g. using different CDS datasets) Disclosing stability results is interesting since complex models often perform poorly (the many parameters are somewhat overfitted) and cannot be used by practitioners Correlation+distribution distance (presented in [4]) may work for your applications (which ones?) Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
  24. 24. Introduction to financial time series clustering Empirical results from the clustering stability study Conclusion C. Ding and X. He. K-means clustering via principal component analysis. In Proceedings of the twenty-first international conference on Machine learning, page 29. ACM, 2004. V. Lemieux, P. S. Rahmdel, R. Walker, B. Wong, and M. Flood. Clustering techniques and their effect on portfolio formation and risk analysis. In Proceedings of the International Workshop on Data Science for Macro-Modeling, pages 1–6. ACM, 2014. R. N. Mantegna and H. E. Stanley. Introduction to econophysics: correlations and complexity in finance. Cambridge university press, 1999. G. Marti, P. Very, and P. Donnat. Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series
  25. 25. Introduction to financial time series clustering Empirical results from the clustering stability study Conclusion Toward a generic representation of random variables for machine learning. Pattern Recognition Letters, 2015. Gautier Marti, Philippe Donnat On the Stability of Clustering Financial Time Series

×