SlideShare une entreprise Scribd logo

On Clustering Financial Time Series - Beyond Correlation

Financial correlation matrices are noisy. Most of their coefficients are meaningless. RMT advocates that the intrinsic dimension is much lower than O(N^2). Clustering can help to reduce the dimension. But, it can also work on other information than mere correlation...

1  sur  1
Télécharger pour lire hors ligne
ON CLUSTERING FINANCIAL TIME SERIES
GAUTIER MARTI, PHILIPPE DONNAT AND FRANK NIELSEN
NOISY CORRELATION MATRICES
Let X be the matrix storing the standardized re-
turns of N = 560 assets (credit default swaps)
over a period of T = 2500 trading days.
Then, the empirical correlation matrix of the re-
turns is
C =
1
T
XX .
We can compute the empirical density of its
eigenvalues
ρ(λ) =
1
N
dn(λ)
dλ
,
where n(λ) counts the number of eigenvalues of
C less than λ.
From random matrix theory, the Marchenko-
Pastur distribution gives the limit distribution as
N → ∞, T → ∞ and T/N fixed. It reads:
ρ(λ) =
T/N
2π
(λmax − λ)(λ − λmin)
λ
,
where λmax
min = 1 + N/T ± 2 N/T, and λ ∈
[λmin, λmax].
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
λ
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
ρ(λ)
Figure 1: Marchenko-Pastur density vs. empirical den-
sity of the correlation matrix eigenvalues
Notice that the Marchenko-Pastur density fits
well the empirical density meaning that most of
the information contained in the empirical corre-
lation matrix amounts to noise: only 26 eigenval-
ues are greater than λmax.
The highest eigenvalue corresponds to the ‘mar-
ket’, the 25 others can be associated to ‘industrial
sectors’.
CLUSTERING TIME SERIES
Given a correlation matrix of the returns,
0 100 200 300 400 500
0
100
200
300
400
500
Figure 2: An empirical and noisy correlation matrix
one can re-order assets using a hierarchical clus-
tering algorithm to make the hierarchical correla-
tion pattern blatant,
0 100 200 300 400 500
0
100
200
300
400
500
Figure 3: The same noisy correlation matrix re-ordered
by a hierarchical clustering algorithm
and finally filter the noise according to the corre-
lation pattern:
0 100 200 300 400 500
0
100
200
300
400
500
Figure 4: The resulting filtered correlation matrix
BEYOND CORRELATION
Sklar’s Theorem. For any random vector X = (X1, . . . , XN ) having continuous marginal cumulative
distribution functions Fi, its joint cumulative distribution F is uniquely expressed as
F(X1, . . . , XN ) = C(F1(X1), . . . , FN (XN )),
where C, the multivariate distribution of uniform marginals, is known as the copula of X.
Figure 5: ArcelorMittal and Société générale prices are projected on dependence ⊕ distribution space; notice their
heavy-tailed exponential distribution.
Let θ ∈ [0, 1]. Let (X, Y ) ∈ V2
. Let G = (GX, GY ), where GX and GY are respectively X and Y marginal
cdf. We define the following distance
d2
θ(X, Y ) = θd2
1(GX(X), GY (Y )) + (1 − θ)d2
0(GX, GY ),
where d2
1(GX(X), GY (Y )) = 3E[|GX(X) − GY (Y )|2
], and d2
0(GX, GY ) = 1
2 R
dGX
dλ − dGY
dλ
2
dλ.
CLUSTERING RESULTS & STABILITY
0 5 10 15 20 25 30
Standard Deviation in basis points
0
5
10
15
20
25
30
35
Numberofoccurrences
Standard Deviations Histogram
Figure 6: (Top) The returns correlation structure ap-
pears more clearly using rank correlation; (Bottom)
Clusters of returns distributions can be partly described
by the returns volatility
Figure 7: Stability test on Odd/Even trading days sub-
sampling: our approach (GNPR) yields more stable
clusters with respect to this perturbation than standard
approaches (using Pearson correlation or L2 distances).

Recommandé

Clustering Random Walk Time Series
Clustering Random Walk Time SeriesClustering Random Walk Time Series
Clustering Random Walk Time SeriesGautier Marti
 
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsRao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsChristian Robert
 

Contenu connexe

Tendances

MCMC and likelihood-free methods
MCMC and likelihood-free methodsMCMC and likelihood-free methods
MCMC and likelihood-free methodsChristian Robert
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big DataChristian Robert
 
Introduction to MCMC methods
Introduction to MCMC methodsIntroduction to MCMC methods
Introduction to MCMC methodsChristian Robert
 
A Note on “   Geraghty contraction type mappings”
A Note on “   Geraghty contraction type mappings”A Note on “   Geraghty contraction type mappings”
A Note on “   Geraghty contraction type mappings”IOSRJM
 
Poster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferencePoster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferenceChristian Robert
 
Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010Christian Robert
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical MethodsChristian Robert
 
Stochastic Control and Information Theoretic Dualities (Complete Version)
Stochastic Control and Information Theoretic Dualities (Complete Version)Stochastic Control and Information Theoretic Dualities (Complete Version)
Stochastic Control and Information Theoretic Dualities (Complete Version)Haruki Nishimura
 
somenath_fixedpoint_dasguptaIMF17-20-2013
somenath_fixedpoint_dasguptaIMF17-20-2013somenath_fixedpoint_dasguptaIMF17-20-2013
somenath_fixedpoint_dasguptaIMF17-20-2013Somenath Bandyopadhyay
 
Hybrid Atlas Models of Financial Equity Market
Hybrid Atlas Models of Financial Equity MarketHybrid Atlas Models of Financial Equity Market
Hybrid Atlas Models of Financial Equity Markettomoyukiichiba
 
A Tau Approach for Solving Fractional Diffusion Equations using Legendre-Cheb...
A Tau Approach for Solving Fractional Diffusion Equations using Legendre-Cheb...A Tau Approach for Solving Fractional Diffusion Equations using Legendre-Cheb...
A Tau Approach for Solving Fractional Diffusion Equations using Legendre-Cheb...iosrjce
 
Random Matrix Theory and Machine Learning - Part 1
Random Matrix Theory and Machine Learning - Part 1Random Matrix Theory and Machine Learning - Part 1
Random Matrix Theory and Machine Learning - Part 1Fabian Pedregosa
 

Tendances (20)

MCMC and likelihood-free methods
MCMC and likelihood-free methodsMCMC and likelihood-free methods
MCMC and likelihood-free methods
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big Data
 
Introduction to MCMC methods
Introduction to MCMC methodsIntroduction to MCMC methods
Introduction to MCMC methods
 
A Note on “   Geraghty contraction type mappings”
A Note on “   Geraghty contraction type mappings”A Note on “   Geraghty contraction type mappings”
A Note on “   Geraghty contraction type mappings”
 
ABC in Venezia
ABC in VeneziaABC in Venezia
ABC in Venezia
 
Poster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferencePoster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conference
 
Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010
 
Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical Methods
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Stochastic Control and Information Theoretic Dualities (Complete Version)
Stochastic Control and Information Theoretic Dualities (Complete Version)Stochastic Control and Information Theoretic Dualities (Complete Version)
Stochastic Control and Information Theoretic Dualities (Complete Version)
 
somenath_fixedpoint_dasguptaIMF17-20-2013
somenath_fixedpoint_dasguptaIMF17-20-2013somenath_fixedpoint_dasguptaIMF17-20-2013
somenath_fixedpoint_dasguptaIMF17-20-2013
 
Hybrid Atlas Models of Financial Equity Market
Hybrid Atlas Models of Financial Equity MarketHybrid Atlas Models of Financial Equity Market
Hybrid Atlas Models of Financial Equity Market
 
A Tau Approach for Solving Fractional Diffusion Equations using Legendre-Cheb...
A Tau Approach for Solving Fractional Diffusion Equations using Legendre-Cheb...A Tau Approach for Solving Fractional Diffusion Equations using Legendre-Cheb...
A Tau Approach for Solving Fractional Diffusion Equations using Legendre-Cheb...
 
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
 
Richard Everitt's slides
Richard Everitt's slidesRichard Everitt's slides
Richard Everitt's slides
 
Random Matrix Theory and Machine Learning - Part 1
Random Matrix Theory and Machine Learning - Part 1Random Matrix Theory and Machine Learning - Part 1
Random Matrix Theory and Machine Learning - Part 1
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 

En vedette

Financial Time Series: Concept and Forecast (dsth Meetup#2)
Financial Time Series: Concept and Forecast (dsth Meetup#2)Financial Time Series: Concept and Forecast (dsth Meetup#2)
Financial Time Series: Concept and Forecast (dsth Meetup#2)Data Science Thailand
 
Financial time series_forecasting_svm
Financial time series_forecasting_svmFinancial time series_forecasting_svm
Financial time series_forecasting_svmMohamed DHAOUI
 
Financial forecasting by time series 55660701
Financial forecasting by time series 55660701Financial forecasting by time series 55660701
Financial forecasting by time series 55660701Pongsiri Nontasak
 
Managing Large Scale Financial Time-Series Data with Graphs
Managing Large Scale Financial Time-Series Data with Graphs Managing Large Scale Financial Time-Series Data with Graphs
Managing Large Scale Financial Time-Series Data with Graphs Objectivity
 
A closer look at correlations
A closer look at correlationsA closer look at correlations
A closer look at correlationsGautier Marti
 
Clustering Financial Time Series: How Long is Enough?
Clustering Financial Time Series: How Long is Enough?Clustering Financial Time Series: How Long is Enough?
Clustering Financial Time Series: How Long is Enough?Gautier Marti
 
NYC* Big Tech Day 2013: Financial Time Series
NYC* Big Tech Day 2013: Financial Time SeriesNYC* Big Tech Day 2013: Financial Time Series
NYC* Big Tech Day 2013: Financial Time SeriesCarl Yeksigian
 

En vedette (11)

trading
tradingtrading
trading
 
Financial Time Series: Concept and Forecast (dsth Meetup#2)
Financial Time Series: Concept and Forecast (dsth Meetup#2)Financial Time Series: Concept and Forecast (dsth Meetup#2)
Financial Time Series: Concept and Forecast (dsth Meetup#2)
 
quantmachine
quantmachinequantmachine
quantmachine
 
Financial time series_forecasting_svm
Financial time series_forecasting_svmFinancial time series_forecasting_svm
Financial time series_forecasting_svm
 
Financial forecasting by time series 55660701
Financial forecasting by time series 55660701Financial forecasting by time series 55660701
Financial forecasting by time series 55660701
 
Managing Large Scale Financial Time-Series Data with Graphs
Managing Large Scale Financial Time-Series Data with Graphs Managing Large Scale Financial Time-Series Data with Graphs
Managing Large Scale Financial Time-Series Data with Graphs
 
A closer look at correlations
A closer look at correlationsA closer look at correlations
A closer look at correlations
 
Clustering Financial Time Series: How Long is Enough?
Clustering Financial Time Series: How Long is Enough?Clustering Financial Time Series: How Long is Enough?
Clustering Financial Time Series: How Long is Enough?
 
Support Vector Machine
Support Vector MachineSupport Vector Machine
Support Vector Machine
 
NYC* Big Tech Day 2013: Financial Time Series
NYC* Big Tech Day 2013: Financial Time SeriesNYC* Big Tech Day 2013: Financial Time Series
NYC* Big Tech Day 2013: Financial Time Series
 
time series analysis
time series analysistime series analysis
time series analysis
 

Similaire à On Clustering Financial Time Series - Beyond Correlation

The Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability DistributionThe Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability DistributionPedro222284
 
Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Alexander Litvinenko
 
Vitaly Vanchurin "General relativity from non-equilibrium thermodynamics of q...
Vitaly Vanchurin "General relativity from non-equilibrium thermodynamics of q...Vitaly Vanchurin "General relativity from non-equilibrium thermodynamics of q...
Vitaly Vanchurin "General relativity from non-equilibrium thermodynamics of q...SEENET-MTP
 
Engr 371 final exam april 1996
Engr 371 final exam april 1996Engr 371 final exam april 1996
Engr 371 final exam april 1996amnesiann
 
Multivariate Gaussin, Rayleigh & Rician distributions
Multivariate Gaussin, Rayleigh & Rician distributionsMultivariate Gaussin, Rayleigh & Rician distributions
Multivariate Gaussin, Rayleigh & Rician distributionsHAmindavarLectures
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Daisuke Yoneoka
 
On Application of the Fixed-Point Theorem to the Solution of Ordinary Differe...
On Application of the Fixed-Point Theorem to the Solution of Ordinary Differe...On Application of the Fixed-Point Theorem to the Solution of Ordinary Differe...
On Application of the Fixed-Point Theorem to the Solution of Ordinary Differe...BRNSS Publication Hub
 
Complete l fuzzy metric spaces and common fixed point theorems
Complete l fuzzy metric spaces and  common fixed point theoremsComplete l fuzzy metric spaces and  common fixed point theorems
Complete l fuzzy metric spaces and common fixed point theoremsAlexander Decker
 
A Generalized Metric Space and Related Fixed Point Theorems
A Generalized Metric Space and Related Fixed Point TheoremsA Generalized Metric Space and Related Fixed Point Theorems
A Generalized Metric Space and Related Fixed Point TheoremsIRJET Journal
 
553_Final_Project_Bokser_Litalien
553_Final_Project_Bokser_Litalien553_Final_Project_Bokser_Litalien
553_Final_Project_Bokser_LitalienRory Bokser
 
Density theorems for Euclidean point configurations
Density theorems for Euclidean point configurationsDensity theorems for Euclidean point configurations
Density theorems for Euclidean point configurationsVjekoslavKovac1
 
Geometric and viscosity solutions for the Cauchy problem of first order
Geometric and viscosity solutions for the Cauchy problem of first orderGeometric and viscosity solutions for the Cauchy problem of first order
Geometric and viscosity solutions for the Cauchy problem of first orderJuliho Castillo
 
IJSRED-V2I5P56
IJSRED-V2I5P56IJSRED-V2I5P56
IJSRED-V2I5P56IJSRED
 
Moment-Generating Functions and Reproductive Properties of Distributions
Moment-Generating Functions and Reproductive Properties of DistributionsMoment-Generating Functions and Reproductive Properties of Distributions
Moment-Generating Functions and Reproductive Properties of DistributionsIJSRED
 
Lesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of CalculusLesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of CalculusMatthew Leingang
 
Lesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of CalculusLesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of CalculusMatthew Leingang
 

Similaire à On Clustering Financial Time Series - Beyond Correlation (20)

Frobenious theorem
Frobenious theoremFrobenious theorem
Frobenious theorem
 
The Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability DistributionThe Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability Distribution
 
Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...
 
Vitaly Vanchurin "General relativity from non-equilibrium thermodynamics of q...
Vitaly Vanchurin "General relativity from non-equilibrium thermodynamics of q...Vitaly Vanchurin "General relativity from non-equilibrium thermodynamics of q...
Vitaly Vanchurin "General relativity from non-equilibrium thermodynamics of q...
 
Engr 371 final exam april 1996
Engr 371 final exam april 1996Engr 371 final exam april 1996
Engr 371 final exam april 1996
 
Slides ima
Slides imaSlides ima
Slides ima
 
Multivariate Gaussin, Rayleigh & Rician distributions
Multivariate Gaussin, Rayleigh & Rician distributionsMultivariate Gaussin, Rayleigh & Rician distributions
Multivariate Gaussin, Rayleigh & Rician distributions
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9
 
02_AJMS_186_19_RA.pdf
02_AJMS_186_19_RA.pdf02_AJMS_186_19_RA.pdf
02_AJMS_186_19_RA.pdf
 
02_AJMS_186_19_RA.pdf
02_AJMS_186_19_RA.pdf02_AJMS_186_19_RA.pdf
02_AJMS_186_19_RA.pdf
 
On Application of the Fixed-Point Theorem to the Solution of Ordinary Differe...
On Application of the Fixed-Point Theorem to the Solution of Ordinary Differe...On Application of the Fixed-Point Theorem to the Solution of Ordinary Differe...
On Application of the Fixed-Point Theorem to the Solution of Ordinary Differe...
 
Complete l fuzzy metric spaces and common fixed point theorems
Complete l fuzzy metric spaces and  common fixed point theoremsComplete l fuzzy metric spaces and  common fixed point theorems
Complete l fuzzy metric spaces and common fixed point theorems
 
A Generalized Metric Space and Related Fixed Point Theorems
A Generalized Metric Space and Related Fixed Point TheoremsA Generalized Metric Space and Related Fixed Point Theorems
A Generalized Metric Space and Related Fixed Point Theorems
 
553_Final_Project_Bokser_Litalien
553_Final_Project_Bokser_Litalien553_Final_Project_Bokser_Litalien
553_Final_Project_Bokser_Litalien
 
Density theorems for Euclidean point configurations
Density theorems for Euclidean point configurationsDensity theorems for Euclidean point configurations
Density theorems for Euclidean point configurations
 
Geometric and viscosity solutions for the Cauchy problem of first order
Geometric and viscosity solutions for the Cauchy problem of first orderGeometric and viscosity solutions for the Cauchy problem of first order
Geometric and viscosity solutions for the Cauchy problem of first order
 
IJSRED-V2I5P56
IJSRED-V2I5P56IJSRED-V2I5P56
IJSRED-V2I5P56
 
Moment-Generating Functions and Reproductive Properties of Distributions
Moment-Generating Functions and Reproductive Properties of DistributionsMoment-Generating Functions and Reproductive Properties of Distributions
Moment-Generating Functions and Reproductive Properties of Distributions
 
Lesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of CalculusLesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of Calculus
 
Lesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of CalculusLesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of Calculus
 

Plus de Gautier Marti

Using Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of CodeUsing Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of CodeGautier Marti
 
What deep learning can bring to...
What deep learning can bring to...What deep learning can bring to...
What deep learning can bring to...Gautier Marti
 
A quick demo of Top2Vec With application on 2020 10-K business descriptions
A quick demo of Top2Vec With application on 2020 10-K business descriptionsA quick demo of Top2Vec With application on 2020 10-K business descriptions
A quick demo of Top2Vec With application on 2020 10-K business descriptionsGautier Marti
 
cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...
cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...
cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...Gautier Marti
 
How deep generative models can help quants reduce the risk of overfitting?
How deep generative models can help quants reduce the risk of overfitting?How deep generative models can help quants reduce the risk of overfitting?
How deep generative models can help quants reduce the risk of overfitting?Gautier Marti
 
Generating Realistic Synthetic Data in Finance
Generating Realistic Synthetic Data in FinanceGenerating Realistic Synthetic Data in Finance
Generating Realistic Synthetic Data in FinanceGautier Marti
 
Applications of GANs in Finance
Applications of GANs in FinanceApplications of GANs in Finance
Applications of GANs in FinanceGautier Marti
 
My recent attempts at using GANs for simulating realistic stocks returns
My recent attempts at using GANs for simulating realistic stocks returnsMy recent attempts at using GANs for simulating realistic stocks returns
My recent attempts at using GANs for simulating realistic stocks returnsGautier Marti
 
Takeaways from ICML 2019, Long Beach, California
Takeaways from ICML 2019, Long Beach, CaliforniaTakeaways from ICML 2019, Long Beach, California
Takeaways from ICML 2019, Long Beach, CaliforniaGautier Marti
 
A review of two decades of correlations, hierarchies, networks and clustering...
A review of two decades of correlations, hierarchies, networks and clustering...A review of two decades of correlations, hierarchies, networks and clustering...
A review of two decades of correlations, hierarchies, networks and clustering...Gautier Marti
 
Autoregressive Convolutional Neural Networks for Asynchronous Time Series
Autoregressive Convolutional Neural Networks for Asynchronous Time SeriesAutoregressive Convolutional Neural Networks for Asynchronous Time Series
Autoregressive Convolutional Neural Networks for Asynchronous Time SeriesGautier Marti
 
Some contributions to the clustering of financial time series - Applications ...
Some contributions to the clustering of financial time series - Applications ...Some contributions to the clustering of financial time series - Applications ...
Some contributions to the clustering of financial time series - Applications ...Gautier Marti
 
Clustering CDS: algorithms, distances, stability and convergence rates
Clustering CDS: algorithms, distances, stability and convergence ratesClustering CDS: algorithms, distances, stability and convergence rates
Clustering CDS: algorithms, distances, stability and convergence ratesGautier Marti
 
Clustering Financial Time Series using their Correlations and their Distribut...
Clustering Financial Time Series using their Correlations and their Distribut...Clustering Financial Time Series using their Correlations and their Distribut...
Clustering Financial Time Series using their Correlations and their Distribut...Gautier Marti
 
Optimal Transport vs. Fisher-Rao distance between Copulas
Optimal Transport vs. Fisher-Rao distance between CopulasOptimal Transport vs. Fisher-Rao distance between Copulas
Optimal Transport vs. Fisher-Rao distance between CopulasGautier Marti
 
Optimal Transport between Copulas for Clustering Time Series
Optimal Transport between Copulas for Clustering Time SeriesOptimal Transport between Copulas for Clustering Time Series
Optimal Transport between Copulas for Clustering Time SeriesGautier Marti
 
On the stability of clustering financial time series
On the stability of clustering financial time seriesOn the stability of clustering financial time series
On the stability of clustering financial time seriesGautier Marti
 
On clustering financial time series - A need for distances between dependent ...
On clustering financial time series - A need for distances between dependent ...On clustering financial time series - A need for distances between dependent ...
On clustering financial time series - A need for distances between dependent ...Gautier Marti
 

Plus de Gautier Marti (18)

Using Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of CodeUsing Large Language Models in 10 Lines of Code
Using Large Language Models in 10 Lines of Code
 
What deep learning can bring to...
What deep learning can bring to...What deep learning can bring to...
What deep learning can bring to...
 
A quick demo of Top2Vec With application on 2020 10-K business descriptions
A quick demo of Top2Vec With application on 2020 10-K business descriptionsA quick demo of Top2Vec With application on 2020 10-K business descriptions
A quick demo of Top2Vec With application on 2020 10-K business descriptions
 
cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...
cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...
cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...
 
How deep generative models can help quants reduce the risk of overfitting?
How deep generative models can help quants reduce the risk of overfitting?How deep generative models can help quants reduce the risk of overfitting?
How deep generative models can help quants reduce the risk of overfitting?
 
Generating Realistic Synthetic Data in Finance
Generating Realistic Synthetic Data in FinanceGenerating Realistic Synthetic Data in Finance
Generating Realistic Synthetic Data in Finance
 
Applications of GANs in Finance
Applications of GANs in FinanceApplications of GANs in Finance
Applications of GANs in Finance
 
My recent attempts at using GANs for simulating realistic stocks returns
My recent attempts at using GANs for simulating realistic stocks returnsMy recent attempts at using GANs for simulating realistic stocks returns
My recent attempts at using GANs for simulating realistic stocks returns
 
Takeaways from ICML 2019, Long Beach, California
Takeaways from ICML 2019, Long Beach, CaliforniaTakeaways from ICML 2019, Long Beach, California
Takeaways from ICML 2019, Long Beach, California
 
A review of two decades of correlations, hierarchies, networks and clustering...
A review of two decades of correlations, hierarchies, networks and clustering...A review of two decades of correlations, hierarchies, networks and clustering...
A review of two decades of correlations, hierarchies, networks and clustering...
 
Autoregressive Convolutional Neural Networks for Asynchronous Time Series
Autoregressive Convolutional Neural Networks for Asynchronous Time SeriesAutoregressive Convolutional Neural Networks for Asynchronous Time Series
Autoregressive Convolutional Neural Networks for Asynchronous Time Series
 
Some contributions to the clustering of financial time series - Applications ...
Some contributions to the clustering of financial time series - Applications ...Some contributions to the clustering of financial time series - Applications ...
Some contributions to the clustering of financial time series - Applications ...
 
Clustering CDS: algorithms, distances, stability and convergence rates
Clustering CDS: algorithms, distances, stability and convergence ratesClustering CDS: algorithms, distances, stability and convergence rates
Clustering CDS: algorithms, distances, stability and convergence rates
 
Clustering Financial Time Series using their Correlations and their Distribut...
Clustering Financial Time Series using their Correlations and their Distribut...Clustering Financial Time Series using their Correlations and their Distribut...
Clustering Financial Time Series using their Correlations and their Distribut...
 
Optimal Transport vs. Fisher-Rao distance between Copulas
Optimal Transport vs. Fisher-Rao distance between CopulasOptimal Transport vs. Fisher-Rao distance between Copulas
Optimal Transport vs. Fisher-Rao distance between Copulas
 
Optimal Transport between Copulas for Clustering Time Series
Optimal Transport between Copulas for Clustering Time SeriesOptimal Transport between Copulas for Clustering Time Series
Optimal Transport between Copulas for Clustering Time Series
 
On the stability of clustering financial time series
On the stability of clustering financial time seriesOn the stability of clustering financial time series
On the stability of clustering financial time series
 
On clustering financial time series - A need for distances between dependent ...
On clustering financial time series - A need for distances between dependent ...On clustering financial time series - A need for distances between dependent ...
On clustering financial time series - A need for distances between dependent ...
 

Dernier

Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)CUO VEERANAN VEERANAN
 
Tips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data GoalsTips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data GoalsDataArchiva
 
A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)UNCResearchHub
 
SABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a referenceSABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a referencepriyansabari355
 
What is the value of your Data v3.0.pptx
What is the value of your Data v3.0.pptxWhat is the value of your Data v3.0.pptx
What is the value of your Data v3.0.pptxJose Briones
 
ppt penjualan berbasis online omset.pptx
ppt penjualan berbasis online omset.pptxppt penjualan berbasis online omset.pptx
ppt penjualan berbasis online omset.pptxHizkiaJastis
 
fundamentals of digital imaging - POONAM.pptx
fundamentals of digital imaging - POONAM.pptxfundamentals of digital imaging - POONAM.pptx
fundamentals of digital imaging - POONAM.pptxPoonamRijal
 
Lies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix EnigmaLies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix EnigmaAdrian Sanabria
 
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...Thibaud Le Douarin
 
SABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as referenceSABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as referencepriyansabari355
 
AWS Identity and access management for users
AWS Identity and access management for usersAWS Identity and access management for users
AWS Identity and access management for usersStephenEfange3
 
Business Analytics _ Confidence Interval
Business Analytics _ Confidence IntervalBusiness Analytics _ Confidence Interval
Business Analytics _ Confidence IntervalRavindra Nath Shukla
 
Operations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample ScreensOperations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample ScreensKondapi V Siva Rama Brahmam
 
Industry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptxIndustry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptxMdRafiqulIslam403212
 
Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023stephizcoolio
 
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdfIIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdfAustraliaChapterIIBA
 

Dernier (17)

Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)
 
Tips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data GoalsTips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data Goals
 
A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)
 
SABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a referenceSABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a reference
 
What is the value of your Data v3.0.pptx
What is the value of your Data v3.0.pptxWhat is the value of your Data v3.0.pptx
What is the value of your Data v3.0.pptx
 
ppt penjualan berbasis online omset.pptx
ppt penjualan berbasis online omset.pptxppt penjualan berbasis online omset.pptx
ppt penjualan berbasis online omset.pptx
 
fundamentals of digital imaging - POONAM.pptx
fundamentals of digital imaging - POONAM.pptxfundamentals of digital imaging - POONAM.pptx
fundamentals of digital imaging - POONAM.pptx
 
Lies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix EnigmaLies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix Enigma
 
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
 
SABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as referenceSABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as reference
 
AWS Identity and access management for users
AWS Identity and access management for usersAWS Identity and access management for users
AWS Identity and access management for users
 
Business Analytics _ Confidence Interval
Business Analytics _ Confidence IntervalBusiness Analytics _ Confidence Interval
Business Analytics _ Confidence Interval
 
Operations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample ScreensOperations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample Screens
 
Industry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptxIndustry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptx
 
Electricity Year 2023_updated_22022024.pptx
Electricity Year 2023_updated_22022024.pptxElectricity Year 2023_updated_22022024.pptx
Electricity Year 2023_updated_22022024.pptx
 
Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023
 
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdfIIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
 

On Clustering Financial Time Series - Beyond Correlation

  • 1. ON CLUSTERING FINANCIAL TIME SERIES GAUTIER MARTI, PHILIPPE DONNAT AND FRANK NIELSEN NOISY CORRELATION MATRICES Let X be the matrix storing the standardized re- turns of N = 560 assets (credit default swaps) over a period of T = 2500 trading days. Then, the empirical correlation matrix of the re- turns is C = 1 T XX . We can compute the empirical density of its eigenvalues ρ(λ) = 1 N dn(λ) dλ , where n(λ) counts the number of eigenvalues of C less than λ. From random matrix theory, the Marchenko- Pastur distribution gives the limit distribution as N → ∞, T → ∞ and T/N fixed. It reads: ρ(λ) = T/N 2π (λmax − λ)(λ − λmin) λ , where λmax min = 1 + N/T ± 2 N/T, and λ ∈ [λmin, λmax]. 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 λ 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 ρ(λ) Figure 1: Marchenko-Pastur density vs. empirical den- sity of the correlation matrix eigenvalues Notice that the Marchenko-Pastur density fits well the empirical density meaning that most of the information contained in the empirical corre- lation matrix amounts to noise: only 26 eigenval- ues are greater than λmax. The highest eigenvalue corresponds to the ‘mar- ket’, the 25 others can be associated to ‘industrial sectors’. CLUSTERING TIME SERIES Given a correlation matrix of the returns, 0 100 200 300 400 500 0 100 200 300 400 500 Figure 2: An empirical and noisy correlation matrix one can re-order assets using a hierarchical clus- tering algorithm to make the hierarchical correla- tion pattern blatant, 0 100 200 300 400 500 0 100 200 300 400 500 Figure 3: The same noisy correlation matrix re-ordered by a hierarchical clustering algorithm and finally filter the noise according to the corre- lation pattern: 0 100 200 300 400 500 0 100 200 300 400 500 Figure 4: The resulting filtered correlation matrix BEYOND CORRELATION Sklar’s Theorem. For any random vector X = (X1, . . . , XN ) having continuous marginal cumulative distribution functions Fi, its joint cumulative distribution F is uniquely expressed as F(X1, . . . , XN ) = C(F1(X1), . . . , FN (XN )), where C, the multivariate distribution of uniform marginals, is known as the copula of X. Figure 5: ArcelorMittal and Société générale prices are projected on dependence ⊕ distribution space; notice their heavy-tailed exponential distribution. Let θ ∈ [0, 1]. Let (X, Y ) ∈ V2 . Let G = (GX, GY ), where GX and GY are respectively X and Y marginal cdf. We define the following distance d2 θ(X, Y ) = θd2 1(GX(X), GY (Y )) + (1 − θ)d2 0(GX, GY ), where d2 1(GX(X), GY (Y )) = 3E[|GX(X) − GY (Y )|2 ], and d2 0(GX, GY ) = 1 2 R dGX dλ − dGY dλ 2 dλ. CLUSTERING RESULTS & STABILITY 0 5 10 15 20 25 30 Standard Deviation in basis points 0 5 10 15 20 25 30 35 Numberofoccurrences Standard Deviations Histogram Figure 6: (Top) The returns correlation structure ap- pears more clearly using rank correlation; (Bottom) Clusters of returns distributions can be partly described by the returns volatility Figure 7: Stability test on Odd/Even trading days sub- sampling: our approach (GNPR) yields more stable clusters with respect to this perturbation than standard approaches (using Pearson correlation or L2 distances).