Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Exploring and measuring non-linear correlations
G. Marti†
, S. Andler†‡
, F. Nielsen , P. Donnat†
(presented by M. Binkows...
Prochain SlideShare
Chargement dans…5
×

Exploring and measuring non-linear correlations

309 vues

Publié le

Poster for NIPS Time Series Analysis 2016 in Barcelona, Spain.

We propose a methodology to explore and measure the pairwise correlations that exist between variables in a dataset.
The methodology leverages copulas for encoding dependence between two variables, state-of-the-art optimal transport for providing a relevant geometry to the copulas, and clustering for summarizing the main dependence patterns found between the variables.
Some of the clusters centers can be used to parameterize a novel dependence coefficient which can target or forget specific dependence patterns.
Finally, we illustrate the methodology with financial time series (credit default swaps, stocks, foreign exchange rates).
Code and numerical experiments are available online at \url{https://www.datagrapple.com/Tech} for reproducible research.

Publié dans : Données & analyses
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Exploring and measuring non-linear correlations

  1. 1. Exploring and measuring non-linear correlations G. Marti† , S. Andler†‡ , F. Nielsen , P. Donnat† (presented by M. Binkowski†∗ ) † Hellebore Capital Ltd, Ecole Polytechnique, ‡ ENS de Lyon, ∗ Imperial College London Motivations • Interpretability of pairwise dependence • Summary of associations between many variables • Find abnormal dependence patterns • Design robust and custom dependence coefficients • Query the dataset for specific associations • Realistic simulations of market variables Copulas Sklar’s Theorem Let X = (Xi, Xj) be a random vector with a joint cumulative distribution function F, and having continuous marginal cumulative distribu- tion functions Fi, Fj respectively. Then, there exists a unique distribution C such that F(Xi, Xj) = C(Fi(Xi), Fj(Xj)). C, the copula of X, is the bivariate distribution of uniform marginals Ui, Uj := Fi(Xi), Fj(Xj). Fréchet-Hoeffding copula bounds 0 0.5 1 ui 0 0.5 1 uj w(ui,uj) 0.000 0.002 0.004 0.006 0.008 0.010 0.012 0.014 0.016 0.018 0.020 0 0.5 1 ui 0 0.5 1 uj W(ui,uj) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 0.5 1 ui 0 0.5 1 uj π(ui,uj) 0.00036 0.00037 0.00038 0.00039 0.00040 0.00041 0.00042 0.00043 0.00044 0 0.5 1 ui 0 0.5 1 uj Π(ui,uj) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 0.5 1 ui 0 0.5 1 uj m(ui,uj) 0.000 0.002 0.004 0.006 0.008 0.010 0.012 0.014 0.016 0.018 0.020 0 0.5 1 ui 0 0.5 1 uj M(ui,uj) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Figure 1: Copulas measure (left column) and cumulative dis- tribution function (right column) heatmaps for negative de- pendence (first row), independence (second row), i.e. the uniform distribution over [0, 1]2 , and positive dependence (third row) The methodology - Clustering of copulas & custom dependence coefficients The methodology leverages copulas for encoding depen- dence between two variables, state-of-the-art optimal transport for providing a relevant geometry to the cop- ulas, and clustering for summarizing the main depen- dence patterns found between the variables. Some of the clusters centers can be used to parameterize a cus- tom dependence coefficient. Target/Forget Dependence Coefficient: Let {C− l }l be the set of forget-dependence copulas, and {C+ k }k be the set of target-dependence copulas. Let C be the copula of (Xi, Xj). TFDC Xi, Xj; {C+ k }k, {C− l }l := minl dM(C− l , C) minl dM(C− l , C) + mink dM(C, C+ k ) ∈ [0, 1]. Which geometry for copulas? In [1], we detail the benefit of optimal transport over information divergences for clustering copulas. Figure 2: Copulas C1, C2, C3 encoding a correlation of 0.5, 0.99, 0.9999 respectively; Which pair of copulas is the near- est? For Fisher-Rao, Kullback-Leibler, Hellinger and related di- vergences: D(C1, C2) ≤ D(C2, C3); W2(C2, C3) ≤ W2(C1, C2) We use results from [2], [3] to compute faster the distances and barycenters needed for the clustering. 0 0.5 1 0 0.5 1 Bregman barycenter copula 0.0000 0.0008 0.0016 0.0024 0.0032 0.0040 0.0048 0.0056 0 0.5 1 0 0.5 1 Wasserstein barycenter copula 0.0000 0.0004 0.0008 0.0012 0.0016 0.0020 0.0024 0.0028 0.0032 Figure 3: Barycenter for: (left) Bregman geometry (which in- cludes, for example, squared Euclidean and Kullback-Leibler dis- tances); (right) Wasserstein geometry. Copulas of financial time series We apply clustering to the N 2 bivariate copulas of a financial time series dataset consisting in daily re- turns of stocks, credit default swaps and FX rates. Figure 4: Credit default swaps: More mass in the top-right corner, i.e. upper tail dependence. Insurance cost against the default of companies tends to soar in distressed market. Queries about dependence (A) (B) (C) (D) Figure 5: Target copulas (simulated or handcrafted) and their respective nearest copulas which answer questions A,B,C,D • (A) most Gaussian with ρ = 0.7? • (B) both positively and negatively correlated? • (C) extreme returns for one, small for the other? • (D) uncorrelated but correlated for small returns? References [1] G. Marti, S. Andler, F. Nielsen, P. Donnat, IEEE Statistical Signal Processing Workshop (2016), 1-5. [2] M. Cuturi, Advances in Neural Information Processing Systems (2013), 2292-2300. [3] M. Cuturi, A. Doucet, Proceedings of the 31th International Conference on Machine Learning (2014), 685-693. HELLEBORECAPITAL

×