SlideShare utilise les cookies pour améliorer les fonctionnalités et les performances, et également pour vous montrer des publicités pertinentes. Si vous continuez à naviguer sur ce site, vous acceptez l’utilisation de cookies. Consultez nos Conditions d’utilisation et notre Politique de confidentialité.

SlideShare utilise les cookies pour améliorer les fonctionnalités et les performances, et également pour vous montrer des publicités pertinentes. Si vous continuez à naviguer sur ce site, vous acceptez l’utilisation de cookies. Consultez notre Politique de confidentialité et nos Conditions d’utilisation pour en savoir plus.

Ce diaporama a bien été signalé.

Vous avez aimé cette présentation ? Partagez !

- The AI Rush by Jean-Baptiste Dumont 1376341 views
- AI and Machine Learning Demystified... by Carol Smith 3725803 views
- 10 facts about jobs in the future by Pew Research Cent... 719980 views
- 2017 holiday survey: An annual anal... by Deloitte United S... 1154341 views
- Harry Surden - Artificial Intellige... by Harry Surden 681601 views
- Inside Google's Numbers in 2017 by Rand Fishkin 1262274 views

436 vues

Publié le

networks and clustering in financial markets presented at Ton Duc Thang University in Ho Chi Minh City, Vietnam.

Publié dans :
Économie & finance

Aucun téléchargement

Nombre de vues

436

Sur SlideShare

0

Issues des intégrations

0

Intégrations

3

Partages

0

Téléchargements

26

Commentaires

0

J’aime

2

Aucune incorporation

Aucune remarque pour cette diapositive

- 1. A review of two decades of correlations, hierarchies, networks and clustering in ﬁnancial markets Ton Duc Thang University, Ho Chi Minh City, Vietnam Gautier Marti, Frank Nielsen, Mikolaj Bi´nkowski, Philippe Donnat Ecole Polytechnique, Imperial College London, Hellebore Capital Ltd. 10 August 2018 HELLEBORECAPITAL Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 1 / 64
- 2. Table of contents 1 Introduction 2 Correlation networks The standard and widely adopted methodology Concerns about the standard methodology Contributions for improving the methodology On algorithms On distances On other methodological aspects 3 Other networks 4 Dynamics of networks 5 Applications 6 Opinionated views on research directions Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 2 / 64
- 3. Section 1 Introduction Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 3 / 64
- 4. Introduction Motivation: A better understanding of ﬁnancial markets using a scientiﬁc approach. Empirical studies are using data to verify hypotheses and discover stylized facts. Example of datasets: price, volume, returns, turnover time series supply chain networks market (OTC, exchange) transaction data retail transactional data (credit cards) corporate payments networks international trade (import/export) networks, ... Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 4 / 64
- 5. Introduction Several research ﬁelds are tackling the problem with their own tools: statistical physics, econophyics: Minimum Spanning Tree (MST) Random Matrix Theory (RMT) linear correlations statistics, data mining, machine learning: graph theory communities detection clustering algorithms non-linear dependence alternative distances statistical signiﬁcance and robustness check via bootstrapping economics, ﬁnance, accounting, behavioural ﬁnance: standard industry and fundamental classiﬁcations vs. statistical and text-based classiﬁcations networks of trades, suppliers, consumers, competitors, investors linear regressions on network statistics, statistical signiﬁcance through t-stats Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 5 / 64
- 6. Section 2 Correlation networks Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 6 / 64
- 7. The standard and widely adopted methodology (Mantegna, 1999) [add the proper biblio ref] Let N be the number of assets. Let Pi (t) be the price at time t of asset i, 1 ≤ i ≤ N. Let ri (t) be the log-return at time t of asset i: ri (t) = log Pi (t) − log Pi (t − 1). For each pair i, j of assets, compute their correlation: ρij = ri rj − ri rj r2 i − ri 2 r2 j − rj 2 . Convert the correlation coeﬃcients ρij into distances: dij = 2(1 − ρij ). Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 7 / 64
- 8. The standard and widely adopted methodology From all the distances dij , compute a minimum spanning tree (MST) using, for example, Algorithm 1: Algorithm 1 Kruskal’s algorithm 1: procedure BuildMST({dij }1≤i,j≤N) 2: Start with a fully disconnected graph G = (V , E) 3: E ← ∅ 4: V ← {i}1≤i≤N 5: Try to add edges by increasing distances 6: for (i, j) ∈ V 2 ordered by increasing dij do 7: Verify that i and j are not already connected by a path 8: if not connected(i, j) then 9: Add the edge (i, j) to connect i and j 10: E ← E ∪ {(i, j)} 11: G is the resulting MST return G = (V , E) Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 8 / 64
- 9. The standard and widely adopted methodology Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 9 / 64
- 10. Concerns about the standard methodology The clusters obtained from the MST (or equivalently, the Single Linkage Clustering Algorithm (SLCA)) are known to be unstable (small perturbations of the input data may cause big diﬀerences in the resulting clusters) [MVDN15]. The clustering instability may be partly due to the algorithm (MST/Single Linkage are known for the chaining phenomenon [CM10]). The clustering instability may be partly due to the correlation coeﬃcient (Pearson linear correlation) deﬁning the distance which is known for being brittle to outliers, and, more generally, not well suited to distributions other than the Gaussian ones [DMV16]. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 10 / 64
- 11. Single Linkage chaining problem... makes it brittle to small perturbations in the input distances. Clusters and hierarchies are skewed: It does not take into account some notion of density. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 11 / 64
- 12. Pearson linear correlation... is too sensitive to outliers. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 12 / 64
- 13. Concerns about the standard methodology Theoretical results providing the statistical reliability of hierarchical trees and correlation-based networks are still not available [TLM10]. One might expect that the higher the correlation associated to a link in a correlation-based network is, the higher the reliability of this link is. In [TCL+07], authors show that this is not always observed empirically. Changes aﬀecting speciﬁc links (and clusters) during prominent crises are of diﬃcult interpretation due to the high level of statistical uncertainty associated with the correlation estimation [STZM11]. The standard method is somewhat arbitrary: A change in the method (e.g. using a diﬀerent clustering algorithm or a diﬀerent correlation coeﬃcient) may yield a huge change in the clustering results [LRW+14, MVDN15]. As a consequence, it implies huge variability in portfolio formation and perceived risk [LRW+14]. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 13 / 64
- 14. Variance of the Pearson correlation estimator Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 14 / 64
- 15. CRLB of the Pearson correlation estimator - Proof Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 15 / 64
- 16. Random Matrix Theory & Empirical correlation matrices Let X be the matrix storing the standardized returns of N = 560 assets (credit default swaps) over a period of T = 2500 trading days. Then, the empirical correlation matrix of the returns is C = 1 T XX . We can compute the empirical density of its eigenvalues ρ(λ) = 1 N dn(λ) dλ , where n(λ) counts the number of eigenvalues of C less than λ. From random matrix theory, the Marchenko-Pastur distribution gives the limit distribution as N → ∞, T → ∞ and T/N ﬁxed. It reads: ρ(λ) = T/N 2π (λmax − λ)(λ − λmin) λ , where λmax min = 1 + N/T ± 2 N/T, and λ ∈ [λmin, λmax]. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 16 / 64
- 17. Random Matrix Theory & Empirical correlation matrices Notice that the Marchenko-Pastur density ﬁts well the empirical density meaning that most of the information contained in the empirical correlation matrix amounts to noise: only 26 eigenvalues are greater than λmax. The highest eigenvalue corresponds to the ‘market’, the 25 others can be associated to ‘industrial sectors’. It is a known stylized fact of empirical correlation matrices between ﬁnancial returns: Only ≈ 5% of their eigenvalues are greater than λmax. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 17 / 64
- 18. A somewhat arbitrary choice of methodology The standard method is somewhat arbitrary. Adopting another one may yield strongly diﬀerent results. Which ones to trust? Are they both useful? Clusters obtained are much diﬀerent from one method to another Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 18 / 64
- 19. Contributions on algorithms Several alternative algorithms have been proposed to replace the minimum spanning tree and its corresponding clusters: Average Linkage Minimum Spanning Tree (ALMST) [TCL+07]; Authors introduce a spanning tree associated to the Average Linkage Clustering Algorithm (ALCA); It is designed to remedy the unwanted chaining phenomenon of MST/SLCA. Planar Maximally Filtered Graph (PMFG) [ADMH05, TADMM05] which strictly contains the Minimum Spanning Tree (MST) but encodes a larger amount of information in its internal structure. Directed Bubble Hierarchal Tree (DBHT) [SDMA11, SDMA12] which is designed to extract, without parameters, the deterministic clusters from the PMFG. Triangulated Maximally Filtered Graph (TMFG) [MDMA16]; Authors introduce another ﬁltered graph more suitable for big datasets. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 19 / 64
- 20. Contributions on algorithms (cont’d) Clustering using Potts super-paramagnetic transitions [KKM00]; When anti-correlations occur, the model creates repulsion between the stocks which modify their clustering structure. Clustering using maximum likelihood [GM01, GM02]; Authors deﬁne the likelihood of a clustering based on a simple 1-factor model, then devise parameter-free methods to ﬁnd a clustering with high likelihood. Clustering using Random Matrix Theory (RMT) [PGR+00]; Eigenvalues help to determine the number of clusters, and eigenvectors their composition. [MG15] proposes network-based community detection methods whose null hypothesis is consistent with RMT results on cross-correlation matrices for ﬁnancial time series data, unlike existing community detection algorithms. Clustering using the p-median problem [KBP14]; With this construction, every cluster is a star, i.e. a tree with one central node. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 20 / 64
- 21. Planar Maximally Filtered Graph (PMFG) The PMFG is a compelling alternative to the MST. PMFG nodes are colored according to the clusters obtained from DBHT Implementation of the PMFG in Python: https: //gmarti.gitlab.io/networks/2018/06/03/pmfg-algorithm.html Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 21 / 64
- 22. Contributions on distances At the heart of clustering algorithms is the fundamental notion of distance that can be deﬁned upon a proper representation of data. It is thus an obvious direction to explore. We list below what has been proposed in the literature so far: Distances that try to quantify how one ﬁnancial instrument provides information about another instrument: Distance using Granger causality [BGLP12], Distance using partial correlation [KTM+ 10], Study of asynchronous, lead-lag relationships by using mutual information instead of Pearson’s correlation coeﬃcient [Fie14a, RTS16], The correlation matrix is normalized using the aﬃnity transformation: the correlation between each pair of stocks is normalized according to the correlations of each of the two stocks with all other stocks [KSM+ 10]. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 22 / 64
- 23. Contributions on distances (cont’d) Distances that aim at including non-linear relationships in the analysis: Distances using mutual information, mutual information rate, and other information-theoretic distances [Fie14b, RTS16, BP17a, BP17b, GHA18, GZT18], The Brownian distance [ZPKS14], Copula-based [MND16, DP15, B+ 13] and tail dependence [DFPW15] distances. Distances that aim at taking into account multivariate dependence: Each stock is represented by a bivariate time series: its returns and traded volumes [BR08]; a distance is then applied to an ad hoc transform of the two time series into a symbolic sequence, Each stock is represented by a multivariate time series, for example the daily (high, low, open, close) [LD13]; Authors use the Escouﬁer’s RV coeﬃcient (a multivariate extension of the Pearson’s correlation coeﬃcient). A distance taking into account both the correlation between returns and their distributions [DMV16]. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 23 / 64
- 24. Contributions on distances (cont’d) Unlike recent studies which claim that the existence of nonlinear dependence between stock returns have eﬀects on network characteristics, [HH18] documents that “most of the apparent nonlinearity is due to univariate non-Gaussianity. Further, strong non-stationarity in a few speciﬁc stocks may play a role. In particular, the sharp decrease of some stocks during the global ﬁnancial crisis in 2008” gives rise to apparent negative tail dependence among stocks. When constructing unweighted stock networks, they suggest to use linear correlation “on marginally normalized data”, that is Spearman’s rank correlation. In fact, this is similar to the idea of splitting apart the dependence information from the distribution one as in [DMV16], where Spearman’s rank correlation stems from using a Euclidean distance between the uniform margins of the underlying bivariate copula. Following previous studies, and unlike in [DMV16], the distribution information is discarded when constructing the network. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 24 / 64
- 25. Dependence and marginal distribution of the returns Theorem (Sklar’s theorem, 1959) For any random vector X = (X1, . . . , XN) having continuous marginal cumulative distribution functions Fi , its joint cumulative distribution F is uniquely expressed as F(X1, . . . , XN) = C(F1(X1), . . . , FN(XN)), where C, the multivariate distribution of uniform marginals, is known as the copula of X. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 25 / 64
- 26. Information-theoretic distances vs. Copula-based ones? Copula entropy: Hc(x) = − u c(u) log c(u)du Mutual information: I(x) = x p(x) log p(x) i pi (xi ) dx = x c(ux ) i pi (xi ) log c(ux )dx = u c(ux ) log c(ux )dux = −Hc(x) Entropy: H(x) = i H(xi ) + Hc(x) Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 26 / 64
- 27. Contributions on other methodological aspects Reliability and statistical uncertainty of the methods: A bootstrap approach is used to estimate the statistical reliability of both hierarchical trees [TLM07a, MAND16] and correlation-based networks [TCL+ 07, MMMM18], Consistency proof of clustering algorithms for recovering clusters deﬁned by nested block correlation matrices; Study of empirical convergence rates [MAND16], Kullback-Leibler divergence is used to estimate the amount of ﬁltered information between the sample correlation matrix and the ﬁltered one [TLM07b], Cophenetic correlation is used between the original correlation distances and the hierarchical cluster representation [PS15], Several measures between successive (in time) clusters, dendrograms, networks are used to estimate stability of the methods, e.g. cophenetic correlation between dendrograms in [PLJ76], adjusted Rand index (ARI) between clusters in [MVDN15], mutual information (MI) of link co-occurrence between networks in [STZM11]. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 27 / 64
- 28. Contributions on other methodological aspects (cont’d) Preprocessing of the time series: Subtract the market mode before performing a cluster or network analysis on the returns [BMM07], Encode both rank statistics and a distribution histogram of the returns into a representative vector [DMV16], Fit an ARMA(p,q)-FIEGARCH(1,d,1)-cDCC process (econometric preprocessing) to obtain dynamic correlations instead of the common approach of rolling window Pearson correlations [ST14], Use a clustering of successive correlation matrices to infer a market state [PS15]. Use of other types of networks: threshold networks [OKK04], inﬂuence networks [GZC15], partial-correlation networks [KTM+10, KPGGBJ12], Granger causality networks [BGLP12, VLB15], cointegration-based networks [Tu14], bipartite networks [TML+11], etc. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 28 / 64
- 29. Consistency and empirical convergence rates [MAND16] Model selection: The faster the (empirical) convergence, the better. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 29 / 64
- 30. Statistical & practical stability One can use bootstrap, block bootstrap or other common sense and practical perturbations of the data as presented in [MVDN15]. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 30 / 64
- 31. Eﬀect of a basic preprocessing: Subtract the market mode Visualization of the Planar Maximally Filtered Graph (PMFG) and DBHT clusters, for both non-detrended (left) and detrended (right) log-returns [MADM15]. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 31 / 64
- 32. Section 3 Other networks Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 32 / 64
- 33. Examples of other ﬁnancial networks supply chain networks [Wu15] investor (security holdings and trading behaviour) networks [BKES18] corporate board and director networks [BC04] international trade networks [BFG10] transaction networks [LL18] sovereign debt (quarterly public debt-to-GDP ratio) networks [MO15] interbank (exposures between banks) networks [SVLG13] These networks are built from alternative data which are often: conﬁdential hard or costly to obtain Most often these studies are done in collaboration with a commercial or regulatory organization. Some of these datasets may contain signiﬁcant alphas, and thus results are not publicly advertised: Papers are relatively few in contrast to the ones on the correlation of asset returns which are more oriented toward risk understanding. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 33 / 64
- 34. Section 4 Dynamics of networks Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 34 / 64
- 35. Studying the dynamics of networks Comparing and ﬁnding the diﬀerences in a sequence of large graphs is a computationally diﬃcult problem. In the literature, one often studies the following statistics: (for networks) the normalized tree length [OCK+03], the mean occupation layer [OCK+03], the tree half-life [OCK+03], a survival ratio of the edges [OCKK02, JMS+05, ST14], node degree, strength [ST14], eigenvector, betweenness, closeness centrality [ST14], the agglomerative coeﬃcient [MO15] (for clusters) the merging, splitting, birth, death, contraction, and growth of the clusters in time [PS15] Remark. To the best of my knowledge, graph embedding into vector spaces (cf. the recent Deep Learning literature, or this survey [GF18]) have not been used to study time series of ﬁnancial networks. Such a vector representation would open the ﬁeld to the toolbox of standard machine learning algorithms: Cluster networks and ﬁnd those which are associated to some events (e.g. a crisis); Predict the future networks in a sequence of networks with a LSTM (stat arb?); Detect a structural break, etc. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 35 / 64
- 36. Section 5 Applications Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 36 / 64
- 37. Portfolio optimization [OCK+ 03] ﬁnds that the Markowitz portfolio layer in the MST is higher than the mean layer at all times. As the stocks of the minimum risk portfolio are found on the outskirts of the tree [PDMA13, OCK+ 03], authors expect larger trees to have greater diversiﬁcation potential. In [TLGM08, PLJ76], authors compare the Markowitz portfolios from the ﬁltered empirical correlation matrices using the clustering approach, the RMT approach and the shrinkage approach. [RLL+ 16, PZ16] propose to invest in diﬀerent part of the MST depending on the estimated market conditions. Authors show that there is no inner-mathematical relationship between the minimum variance portfolio from Markowitz theory and the portfolios designed from the minimum spanning tree [HMM18]. Empirical evidence of such relations found by previous studies is essentially a stylized fact of ﬁnancial returns correlations and time series, not a general property of correlation matrices. [DFPW15] introduces a procedure to design portfolios which are diversiﬁed in their tail behavior by selecting only a single asset in each cluster. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 37 / 64
- 38. Trading strategy Earnings per share forecasts prepared on the basis of statistically grouped data (clusters) outperform forecasts made on data grouped on traditional industrial criteria as well as forecasts prepared by mechanical extrapolation techniques [EG71]. One can build a simple mean-reversion statistical arbitrage strategy whereby one assumes that stocks in a given industry move together, cross-sectionally demeans stock returns within said industry, shorts stocks with positive residual returns and goes long stocks with negative residual returns [KY16]. In [PS15], they suggest that tracking the merging, splitting, birth, and death of the clusters in time could be the basis for pairs-like reversal trading strategies but with pairs corresponding to clusters. The paper [DC05] describes methods for index tracking and enhanced index tracking based on clusters of ﬁnancial time series. [MADM16] ﬁnds the existence of signiﬁcant relations between past changes in the market correlation structure and future changes in the market volatility. In [KLT12], authors claim that long-short strategies exploiting mispricing due to the industry categorization bias generate statistically signiﬁcant and economically sizable risk-adjusted excess returns. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 38 / 64
- 39. Risk In [DPT14], authors design clusters that tend to be comonotonic in their extreme low values: To avoid contagion in the portfolio during risky scenarios, an investor should diversify over these clusters. In [MDMA14], authors postulate the existence of a hierarchical structure of risks which can be deemed responsible for both stock multivariate dependency structure and univariate multifractal behaviour, and then propose a model that reproduces the empirical observations (entanglement of univariate multi-scaling and multivariate cross-correlation properties of ﬁnancial time series). The interplay between multi-scaling and average cross-correlation is conﬁrmed in [BMDM18]. Clusters (statistical industry classiﬁcation) can be an alternative to sometimes unavailable “fundamental” industry classiﬁcations (e.g. in emerging or small markets) [KY16]. [HZYU16] ﬁnds that ﬁnancial institutions which have, in the correlation networks, greater node strength, larger node betweenness centrality, larger node closeness centrality and larger node clustering coeﬃcient tend to be associated with larger systemic risk contributions. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 39 / 64
- 40. Financial policy making Clusters and networks can help designing ﬁnancial policies. Several papers propose to leverage them to detect risky market environments, develop indicators that can predict forthcoming crisis or economic recovery [ZLW+ 11], improve economic nowcasting [EFC17], or ﬁnd key markets and assets that drive a whole region, and on which stimulus can be applied eﬀectively. Authors of [HSBYBY10] claim that “separation prevents failure propagation and connections increase risks of global crises” whereas the prevailing view in favor of deregulation is that banks, by investing in diverse sectors, would have greater stability. To support their argument, using ﬁnancial networks, they study the aftermath of the Glass-Steagall Act (1933) repeal by Clinton administration in 1999. They ﬁnd that erosion of the Glass–Steagall Act, and cross sector investments eliminated “ﬁrewalls” that could have prevented the housing sector decline from triggering a wider ﬁnancial and economic crisis: Our analysis implies that the investment across economic sectors itself creates increased cross-linking of otherwise much more weakly coupled parts of the economy, causing dependencies that increase, rather than decrease, risk. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 40 / 64
- 41. Section 6 Opinionated views on research directions Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 41 / 64
- 42. Opinionated views on research directions What’s missing for “ﬁnancial networks” to become a mature research ﬁeld? Some inspiration from the booming deep learning era: lack of reproducibility provide code and data (at least synthetic datasets) diﬃculty to compare methods, re-implementation bias build open source libraries (standardized api, optimized code) open source software helps to engage more with practitioners conﬁdential data provide synthetic datasets encoding stylized facts propose generative models (cf. the GAN literature applied to graphs) lack of evaluation metrics / no end-to-end approach deﬁne common tasks (e.g. evaluate the clustering or network methodology on portfolio optimization, crisis detection, mean reversion strategy) where all the details are speciﬁed (e.g. a well-chosen artiﬁcial dataset, or samples from a generative model, or public ﬁnancial data) Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 42 / 64
- 43. Thank you for the attention. Questions? Co-authorship network (left) and its MST (right) Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 43 / 64
- 44. References I Tomaso Aste, Tiziana Di Matteo, and ST Hyde, Complex networks on hyperbolic surfaces, Physica A: Statistical Mechanics and its Applications 346 (2005), no. 1, 20–26. Eike Christian Brechmann et al., Hierarchical kendall copulas and the modeling of systemic and operational risk, Ph.D. thesis, Universit¨atsbibliothek der TU M¨unchen, 2013. Stefano Battiston and Michele Catanzaro, Statistical properties of corporate board and director networks, The European Physical Journal B 38 (2004), no. 2, 345–352. Matteo Barigozzi, Giorgio Fagiolo, and Diego Garlaschelli, Multinetwork of international trade: A commodity-speciﬁc analysis, Physical Review E 81 (2010), no. 4, 046104. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 44 / 64
- 45. References II Monica Billio, Mila Getmansky, Andrew W Lo, and Loriana Pelizzon, Econometric measures of connectedness and systemic risk in the ﬁnance and insurance sectors, Journal of Financial Economics 104 (2012), no. 3, 535–559. Kestutis Baltakys, Juho Kanniainen, and Frank Emmert-Streib, Multilayer aggregation with statistical validation: Application to investor networks, Scientiﬁc reports 8 (2018), no. 1, 8198. RJ Buonocore, RN Mantegna, and T Di Matteo, On the interplay between multiscaling and average cross-correlation, arXiv preprint arXiv:1802.01113 (2018). Christian Borghesi, Matteo Marsili, and Salvatore Miccich`e, Emergence of time-horizon invariant correlation structure in ﬁnancial returns by subtraction of the market mode, Physical Review E 76 (2007), no. 2, 026104. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 45 / 64
- 46. References III Eduard Baitinger and Jochen Papenbrock, Interconnectedness risk and active portfolio management: The information-theoretic perspective. AQ Barbi and GA Prataviera, Nonlinear dependencies on brazilian equity network from mutual information minimum spanning trees, arXiv preprint arXiv:1711.06185 (2017). Juan Gabriel Brida and Wiston Adri´an Risso, Multidimensional minimal spanning tree: The dow jones case, Physica A: Statistical Mechanics and its Applications 387 (2008), no. 21, 5205–5210. Gunnar Carlsson and Facundo M˜AˇSmoli, Characterization, stability and convergence of hierarchical clustering methods, Journal of machine learning research 11 (2010), no. Apr, 1425–1470. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 46 / 64
- 47. References IV Christian Dose and Silvano Cincotti, Clustering of ﬁnancial time series with application to index and enhanced index tracking portfolio, Physica A: Statistical Mechanics and its Applications 355 (2005), no. 1, 145–151. Fabrizio Durante, Enrico Foscolo, Roberta Pappad`a, and Hao Wang, A portfolio diversiﬁcation strategy via tail dependence measures. Philippe Donnat, Gautier Marti, and Philippe Very, Toward a generic representation of random variables for machine learning, Pattern Recognition Letters 70 (2016), 24–31. Fabrizio Durante and Roberta Pappada, Cluster analysis of time series via kendall distribution, Strengthening Links Between Data Analysis and Soft Computing, Springer, 2015, pp. 209–216. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 47 / 64
- 48. References V Fabrizio Durante, Roberta Pappad`a, and Nicola Torelli, Clustering of ﬁnancial time series in risky scenarios, Advances in Data Analysis and Classiﬁcation 8 (2014), no. 4, 359–376. Mohammed Elshendy and Andrea Fronzetti Colladon, Big data analysis of economic news: Hints to forecast macroeconomic indicators, International Journal of Engineering Business Management 9 (2017), 1847979017720040. Edwin J Elton and Martin J Gruber, Improved forecasting through the design of homogeneous groups, The Journal of Business 44 (1971), no. 4, 432–450. Pawel Fiedor, Information-theoretic approach to lead-lag eﬀect on ﬁnancial markets, The European Physical Journal B 87 (2014), no. 8, 1–9. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 48 / 64
- 49. References VI , Networks in ﬁnancial markets based on the mutual information rate, Physical Review E 89 (2014), no. 5, 052801. Palash Goyal and Emilio Ferrara, Graph embedding techniques, applications, and performance: A survey, Knowledge-Based Systems 151 (2018), 78–94. Yong Kheng Goh, Haslifah M Hasim, and Chris G Antonopoulos, Inference of ﬁnancial networks using the normalised mutual information rate, PloS one 13 (2018), no. 2, e0192160. Lorenzo Giada and Matteo Marsili, Data clustering and noise undressing of correlation matrices, Physical Review E 63 (2001), no. 6, 061101. , Algorithms of maximum likelihood data clustering with applications, Physica A: Statistical Mechanics and its Applications 315 (2002), no. 3, 650–664. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 49 / 64
- 50. References VII Ya-Chun Gao, Yong Zeng, and Shi-Min Cai, Inﬂuence network in the Chinese stock market, Journal of Statistical Mechanics: Theory and Experiment 2015 (2015), no. 3, P03017. Xue Guo, Hu Zhang, and Tianhai Tian, Development of stock correlation networks using mutual information and ﬁnancial big data, PloS one 13 (2018), no. 4, e0195941. David Hartman and Jaroslav Hlinka, Nonlinearity in stock networks, arXiv preprint arXiv:1804.10264 (2018). Amelie H¨uttner, Jan-Frederik Mai, and Stefano Mineo, Portfolio selection based on graphs: Does it align with markowitz-optimal portfolios?, Dependence Modeling (2018). Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 50 / 64
- 51. References VIII Dion Harmon, Blake Stacey, Yavni Bar-Yam, and Yaneer Bar-Yam, Networks of economic market interdependence and systemic risk, arXiv preprint arXiv:1011.3707 (2010). Wei-Qiang Huang, Xin-Tian Zhuang, Shuang Yao, and Stan Uryasev, A ﬁnancial network perspective of ﬁnancial institutions’ systemic risk contributions, Physica A: Statistical Mechanics and its Applications 456 (2016), 183–196. Neil F Johnson, Mark McDonald, Omer Suleman, Stacy Williams, and Sam Howison, What shakes the FX tree? understanding currency dominance, dependence, and dynamics (keynote address), SPIE Third International Symposium on Fluctuations and Noise, International Society for Optics and Photonics, 2005, pp. 86–99. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 51 / 64
- 52. References IX Anton Kocheturov, Mikhail Batsyn, and Panos M Pardalos, Dynamics of cluster structures in a ﬁnancial market network, Physica A: Statistical Mechanics and its Applications 413 (2014), 523–533. L Kullmann, J Kertesz, and RN Mantegna, Identiﬁcation of clusters of companies in stock indices via potts super-paramagnetic transitions, Physica A: Statistical Mechanics and its Applications 287 (2000), no. 3, 412–419. Philipp Kr¨uger, Augustin Landier, and David Thesmar, Categorization bias in the stock market, Available SSRN 2034204 (2012). Dror Y Kenett, Tobias Preis, Gitit Gur-Gershgoren, and Eshel Ben-Jacob, Dependency network and node inﬂuence: application to the study of ﬁnancial markets, International Journal of Bifurcation and Chaos 22 (2012), no. 07, 1250181. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 52 / 64
- 53. References X Dror Y Kenett, Yoash Shapira, Asaf Madi, Sharron Bransburg-Zabary, Gitit Gur-Gershgoren, and Eshel Ben-Jacob, Dynamics of stock market correlations, AUCO Czech Economic Review 4 (2010), no. 3, 330–341. Dror Y Kenett, Michele Tumminello, Asaf Madi, Gitit Gur-Gershgoren, Rosario N Mantegna, and Eshel Ben-Jacob, Dominating clasp of the ﬁnancial sector revealed by partial correlation analysis of the stock market, PloS one 5 (2010), no. 12, e15032. Zura Kakushadze and Willie Yu, Statistical industry classiﬁcation. Gan Siew Lee and Maman A Djauhari, Multidimensional stock network analysis: An Escouﬁer’s RV coeﬃcient approach, AIP Conference Proceedings, vol. 1, 2013, pp. 550–555. Elisa Letizia and Fabrizio Lillo, Corporate payments networks and credit risk rating. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 53 / 64
- 54. References XI Victoria Lemieux, Payam S Rahmdel, Rick Walker, BL Wong, and Mark Flood, Clustering techniques and their eﬀect on portfolio formation and risk analysis, Proceedings of the International Workshop on Data Science for Macro-Modeling, ACM, 2014, pp. 1–6. Nicol´o Musmeci, Tomaso Aste, and Tiziana Di Matteo, Relation between ﬁnancial market structure and the real economy: comparison between clustering methods, PloS one 10 (2015), no. 3, e0116201. Nicol´o Musmeci, Tomaso Aste, and T Di Matteo, Interplay between past market correlation structure changes and future volatility outbursts, Scientiﬁc reports 6 (2016). Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 54 / 64
- 55. References XII Gautier Marti, S´ebastien Andler, Frank Nielsen, and Philippe Donnat, Clustering ﬁnancial time series: How long is enough?, Proceedings of the Twenty-Fifth International Joint Conference on Artiﬁcial Intelligence, IJCAI 2016, New York, NY, USA, 9-15 July 2016, 2016, pp. 2583–2589. Raﬀaello Morales, T Di Matteo, and Tomaso Aste, Dependency structure and scaling properties of ﬁnancial time series are related, Scientiﬁc Reports 4 (2014), no. 4589. Guido Previde Massara, Tiziana Di Matteo, and Tomaso Aste, Network ﬁltering for big data: triangulated maximally ﬁltered graph, Journal of complex Networks 5 (2016), no. 2, 161–178. Mel MacMahon and Diego Garlaschelli, Community detection for correlation matrices, Phys. Rev. X 5 (2015), 021006. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 55 / 64
- 56. References XIII Federico Musciotto, Luca Marotta, Salvatore Miccich`e, and Rosario N Mantegna, Bootstrap validation of links of a minimum spanning tree, arXiv preprint arXiv:1802.03395 (2018). Gautier Marti, Frank Nielsen, and Philippe Donnat, Optimal copula transport for clustering multivariate time series, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2016, pp. 2379–2383. David Matesanz and Guillermo J Ortega, Sovereign public debt crisis in europe. a network analysis, Physica A: Statistical Mechanics and its Applications 436 (2015), 756–766. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 56 / 64
- 57. References XIV Gautier Marti, Philippe Very, Philippe Donnat, and Frank Nielsen, A proposal of a methodological framework with experimental guidelines to investigate clustering stability on ﬁnancial time series, 14th IEEE International Conference on Machine Learning and Applications, ICMLA 2015, Miami, FL, USA, December 9-11, 2015, 2015, pp. 32–37. J-P Onnela, Anirban Chakraborti, Kimmo Kaski, Janos Kertesz, and Antti Kanto, Dynamics of market correlations: Taxonomy and portfolio analysis, Physical Review E 68 (2003), no. 5, 056110. J-P Onnela, A Chakraborti, K Kaski, and J Kerti´esz, Dynamic asset trees and portfolio analysis, The European Physical Journal B-Condensed Matter and Complex Systems 30 (2002), no. 3, 285–288. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 57 / 64
- 58. References XV J-P Onnela, Kimmo Kaski, and Janos Kert´esz, Clustering and information in correlation based ﬁnancial networks, The European Physical Journal B-Condensed Matter and Complex Systems 38 (2004), no. 2, 353–362. Francesco Pozzi, Tiziana Di Matteo, and Tomaso Aste, Spread of risk across ﬁnancial markets: better to invest in the peripheries, Scientiﬁc reports 3 (2013). Vasiliki Plerou, P Gopikrishnan, Bernd Rosenow, LA Nunes Amaral, and H Eugene Stanley, A random matrix theory approach to ﬁnancial cross-correlations, Physica A: Statistical Mechanics and its Applications 287 (2000), no. 3, 374–382. Don B Panton, V Parker Lessig, and O Maurice Joy, Comovement of international equity markets: a taxonomic approach, Journal of Financial and Quantitative Analysis 11 (1976), no. 03, 415–432. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 58 / 64
- 59. References XVI Jochen Papenbrock and Peter Schwendner, Handling risk-on/risk-oﬀ dynamics with correlation regimes and correlation networks, Financial Markets and Portfolio Management 29 (2015), no. 2, 125–147. Gustavo Peralta and Abalfazl Zareei, A network approach to portfolio selection, Journal of Empirical Finance (2016). Fei Ren, Ya-Nan Lu, Sai-Ping Li, Xiong-Fei Jiang, Li-Xin Zhong, and Tian Qiu, Dynamic portfolio strategy using clustering approach, arXiv preprint arXiv:1608.03058 (2016). Jacopo Rocchi, Enoch Yan Lok Tsui, and David Saad, Emerging interdependence between stock values during ﬁnancial crashes, arXiv preprint arXiv:1611.02549 (2016). Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 59 / 64
- 60. References XVII Won-Min Song, Tiziana Di Matteo, and Tomaso Aste, Nested hierarchies in planar graphs, Discrete Applied Mathematics 159 (2011), no. 17, 2135–2146. Won-Min Song, T Di Matteo, and Tomaso Aste, Hierarchical information clustering by means of topologically embedded graphs, PLoS One 7 (2012), no. 3, e31929. Ahmet Sensoy and Benjamin M Tabak, Dynamic spanning trees in stock market networks: The case of Asia-Paciﬁc, Physica A: Statistical Mechanics and its Applications 414 (2014), 387–402. Dong-Ming Song, Michele Tumminello, Wei-Xing Zhou, and Rosario N Mantegna, Evolution of worldwide stock markets, correlation structure, and correlation-based graphs, Physical Review E 84 (2011), no. 2, 026108. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 60 / 64
- 61. References XVIII Tiziano Squartini, Iman Van Lelyveld, and Diego Garlaschelli, Early-warning signals of topological collapse in interbank networks, Scientiﬁc reports 3 (2013). Michele Tumminello, Tomaso Aste, Tiziana Di Matteo, and Rosario N Mantegna, A tool for ﬁltering information in complex systems, Proceedings of the National Academy of Sciences of the United States of America 102 (2005), no. 30, 10421–10426. Michele Tumminello, Claudia Coronnello, Fabrizio Lillo, Salvatore Micciche, and Rosario N Mantegna, Spanning trees and bootstrap reliability estimation in correlation-based networks, International Journal of Bifurcation and Chaos 17 (2007), no. 07, 2319–2329. Vincenzo Tola, Fabrizio Lillo, Mauro Gallegati, and Rosario N Mantegna, Cluster analysis for portfolio optimization, Journal of Economic Dynamics and Control 32 (2008), no. 1, 235–258. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 61 / 64
- 62. References XIX Michele Tumminello, Fabrizio Lillo, and Rosario N Mantegna, Hierarchically nested factor model from multivariate data, EPL (Europhysics Letters) 78 (2007), no. 3, 30006. , Kullback-leibler distance as a measure of the information ﬁltered from multivariate data, Physical Review E 76 (2007), no. 3, 031123. , Correlation, hierarchies, and networks in ﬁnancial markets, Journal of Economic Behavior & Organization 75 (2010), no. 1, 40–58. Michele Tumminello, Salvatore Miccich`e, Fabrizio Lillo, Jyrki Piilo, and Rosario N Mantegna, Statistically validated networks in bipartite complex systems, PloS one 6 (2011), no. 3, e17994. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 62 / 64
- 63. References XX Chengyi Tu, Cointegration-based ﬁnancial networks study in chinese stock market, Physica A: Statistical Mechanics and its Applications 402 (2014), 245–254. Tom´aˇs V`yrost, ˇStefan Ly´ocsa, and Eduard Baum¨ohl, Granger causality stock market networks: Temporal proximity and preferential attachment, Physica A: Statistical Mechanics and its Applications 427 (2015), 262–276. Liuren Wu, Centrality of the supply chain network. Yiting Zhang, Gladys Hui Ting Lee, Jian Cheng Wong, Jun Liang Kok, Manamohan Prusty, and Siew Ann Cheong, Will the us economy recover in 2010? a minimal spanning tree study, Physica A: Statistical Mechanics and its Applications 390 (2011), no. 11, 2020–2050. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 63 / 64
- 64. References XXI Xin Zhang, Boris Podobnik, Dror Y Kenett, and H Eugene Stanley, Systemic risk and causality dynamics of the world international shipping market, Physica A: Statistical Mechanics and its Applications 415 (2014), 43–53. Gautier Marti (Ecole Polytechnique) Correlations, Networks and Clustering 10 August 2018 64 / 64

Aucun clipboard public n’a été trouvé avec cette diapositive

Il semblerait que vous ayez déjà ajouté cette diapositive à .

Créer un clipboard

Soyez le premier à commenter