Clustering Financial Time Series Using Noise Filtering and Copula Modeling

•

0 j'aime•616 vues

This document discusses clustering financial time series data using correlation matrices. It summarizes that analyzing 560 credit default swaps over 2500 days, the empirical correlation matrix eigenvalues closely match the theoretical Marchenko-Pastur distribution, indicating noise. Only 26 eigenvalues exceed the theoretical maximum, which may correspond to market and industry factors. Hierarchical clustering can reorder assets to reveal correlation patterns. Filtering by this reveals the underlying network structure. Beyond correlations, copulas represent the dependence structure, and a distance measure is proposed combining L1 and L0 distances of cumulative distribution functions to cluster on full distributions rather than just correlations. Stability tests show the proposed approach yields more robust clusters than standard correlation-based methods.

Données & analyses

ON CLUSTERING FINANCIAL TIME SERIES
GAUTIER MARTI, PHILIPPE DONNAT AND FRANK NIELSEN
NOISY CORRELATION MATRICES
Let X be the matrix storing the standardized re-
turns of N = 560 assets (credit default swaps)
over a period of T = 2500 trading days.
Then, the empirical correlation matrix of the re-
turns is
C =
1
T
XX .
We can compute the empirical density of its
eigenvalues
ρ(λ) =
1
N
dn(λ)
dλ
,
where n(λ) counts the number of eigenvalues of
C less than λ.
From random matrix theory, the Marchenko-
Pastur distribution gives the limit distribution as
N → ∞, T → ∞ and T/N ﬁxed. It reads:
ρ(λ) =
T/N
2π
(λmax − λ)(λ − λmin)
λ
,
where λmax
min = 1 + N/T ± 2 N/T, and λ ∈
[λmin, λmax].
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
λ
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
ρ(λ)
Figure 1: Marchenko-Pastur density vs. empirical den-
sity of the correlation matrix eigenvalues
Notice that the Marchenko-Pastur density ﬁts
well the empirical density meaning that most of
the information contained in the empirical corre-
lation matrix amounts to noise: only 26 eigenval-
ues are greater than λmax.
The highest eigenvalue corresponds to the ‘mar-
ket’, the 25 others can be associated to ‘industrial
sectors’.
CLUSTERING TIME SERIES
Given a correlation matrix of the returns,
0 100 200 300 400 500
0
100
200
300
400
500
Figure 2: An empirical and noisy correlation matrix
one can re-order assets using a hierarchical clus-
tering algorithm to make the hierarchical correla-
tion pattern blatant,
0 100 200 300 400 500
0
100
200
300
400
500
Figure 3: The same noisy correlation matrix re-ordered
by a hierarchical clustering algorithm
and ﬁnally ﬁlter the noise according to the corre-
lation pattern:
0 100 200 300 400 500
0
100
200
300
400
500
Figure 4: The resulting ﬁltered correlation matrix
BEYOND CORRELATION
Sklar’s Theorem. For any random vector X = (X1, . . . , XN ) having continuous marginal cumulative
distribution functions Fi, its joint cumulative distribution F is uniquely expressed as
F(X1, . . . , XN ) = C(F1(X1), . . . , FN (XN )),
where C, the multivariate distribution of uniform marginals, is known as the copula of X.
Figure 5: ArcelorMittal and Société générale prices are projected on dependence ⊕ distribution space; notice their
heavy-tailed exponential distribution.
Let θ ∈ [0, 1]. Let (X, Y ) ∈ V2
. Let G = (GX, GY ), where GX and GY are respectively X and Y marginal
cdf. We deﬁne the following distance
d2
θ(X, Y ) = θd2
1(GX(X), GY (Y )) + (1 − θ)d2
0(GX, GY ),
where d2
1(GX(X), GY (Y )) = 3E[|GX(X) − GY (Y )|2
], and d2
0(GX, GY ) = 1
2 R
dGX
dλ − dGY
dλ
2
dλ.
CLUSTERING RESULTS & STABILITY
0 5 10 15 20 25 30
Standard Deviation in basis points
0
5
10
15
20
25
30
35
Numberofoccurrences
Standard Deviations Histogram
Figure 6: (Top) The returns correlation structure ap-
pears more clearly using rank correlation; (Bottom)
Clusters of returns distributions can be partly described
by the returns volatility
Figure 7: Stability test on Odd/Even trading days sub-
sampling: our approach (GNPR) yields more stable
clusters with respect to this perturbation than standard
approaches (using Pearson correlation or L2 distances).

Contenu connexe

Tendances

MCMC and likelihood-free methodsChristian Robert

QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...The Statistical and Applied Mathematical Sciences Institute

QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...The Statistical and Applied Mathematical Sciences Institute

Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...The Statistical and Applied Mathematical Sciences Institute

Unbiased Bayes for Big DataChristian Robert

Introduction to MCMC methodsChristian Robert

A Note on “   Geraghty contraction type mappings”IOSRJM

ABC in VeneziaChristian Robert

Poster for Bayesian Statistics in the Big Data Era conferenceChristian Robert

Mark Girolami's Read Paper 2010Christian Robert

Monte Carlo Statistical MethodsChristian Robert

Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...The Statistical and Applied Mathematical Sciences Institute

Stochastic Control and Information Theoretic Dualities (Complete Version)Haruki Nishimura

somenath_fixedpoint_dasguptaIMF17-20-2013Somenath Bandyopadhyay

Hybrid Atlas Models of Financial Equity Markettomoyukiichiba

A Tau Approach for Solving Fractional Diffusion Equations using Legendre-Cheb...iosrjce

2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...The Statistical and Applied Mathematical Sciences Institute

Richard Everitt's slidesChristian Robert

Random Matrix Theory and Machine Learning - Part 1Fabian Pedregosa

Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...The Statistical and Applied Mathematical Sciences Institute

Tendances (20)

MCMC and likelihood-free methods

QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...

QMC Opening Workshop, High Accuracy Algorithms for Interpolating and Integrat...

Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...

Unbiased Bayes for Big Data

Introduction to MCMC methods

A Note on “   Geraghty contraction type mappings”

ABC in Venezia

Poster for Bayesian Statistics in the Big Data Era conference

Mark Girolami's Read Paper 2010

Monte Carlo Statistical Methods

Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...

Stochastic Control and Information Theoretic Dualities (Complete Version)

somenath_fixedpoint_dasguptaIMF17-20-2013

Hybrid Atlas Models of Financial Equity Market

A Tau Approach for Solving Fractional Diffusion Equations using Legendre-Cheb...

2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...

Richard Everitt's slides

Random Matrix Theory and Machine Learning - Part 1

Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...

En vedette

tradingSimon Dablemont

Financial Time Series: Concept and Forecast (dsth Meetup#2)Data Science Thailand

quantmachineHovhannes Grigoryan

Financial time series_forecasting_svmMohamed DHAOUI

Financial forecasting by time series 55660701Pongsiri Nontasak

Managing Large Scale Financial Time-Series Data with Graphs Objectivity

A closer look at correlationsGautier Marti

Clustering Financial Time Series: How Long is Enough?Gautier Marti

Support Vector MachineShao-Chuan Wang

NYC* Big Tech Day 2013: Financial Time SeriesCarl Yeksigian

time series analysisSACHIN AWASTHI

En vedette (11)

trading

Financial Time Series: Concept and Forecast (dsth Meetup#2)

quantmachine

Financial time series_forecasting_svm

Financial forecasting by time series 55660701

Managing Large Scale Financial Time-Series Data with Graphs

A closer look at correlations

Clustering Financial Time Series: How Long is Enough?

Support Vector Machine

NYC* Big Tech Day 2013: Financial Time Series

time series analysis

Similaire à Clustering Financial Time Series Using Noise Filtering and Copula Modeling

Frobenious theoremPantelis Sopasakis

The Multivariate Gaussian Probability DistributionPedro222284

Hierarchical matrices for approximating large covariance matries and computin...Alexander Litvinenko

Vitaly Vanchurin "General relativity from non-equilibrium thermodynamics of q...SEENET-MTP

Engr 371 final exam april 1996amnesiann

Slides imaArthur Charpentier

Multivariate Gaussin, Rayleigh & Rician distributionsHAmindavarLectures

Murphy: Machine learning A probabilistic perspective: Ch.9Daisuke Yoneoka

02_AJMS_186_19_RA.pdfBRNSS Publication Hub

On Application of the Fixed-Point Theorem to the Solution of Ordinary Differe...BRNSS Publication Hub

Complete l fuzzy metric spaces and common fixed point theoremsAlexander Decker

A Generalized Metric Space and Related Fixed Point TheoremsIRJET Journal

553_Final_Project_Bokser_LitalienRory Bokser

Density theorems for Euclidean point configurationsVjekoslavKovac1

Geometric and viscosity solutions for the Cauchy problem of first orderJuliho Castillo

Moment-Generating Functions and Reproductive Properties of DistributionsIJSRED

IJSRED-V2I5P56IJSRED

Lesson 28: The Fundamental Theorem of CalculusMatthew Leingang

Similaire à Clustering Financial Time Series Using Noise Filtering and Copula Modeling (20)

Frobenious theorem

The Multivariate Gaussian Probability Distribution

Hierarchical matrices for approximating large covariance matries and computin...

Vitaly Vanchurin "General relativity from non-equilibrium thermodynamics of q...

Engr 371 final exam april 1996

Slides ima

Multivariate Gaussin, Rayleigh & Rician distributions

Murphy: Machine learning A probabilistic perspective: Ch.9

02_AJMS_186_19_RA.pdf

On Application of the Fixed-Point Theorem to the Solution of Ordinary Differe...

Complete l fuzzy metric spaces and common fixed point theorems

A Generalized Metric Space and Related Fixed Point Theorems

553_Final_Project_Bokser_Litalien

Density theorems for Euclidean point configurations

Geometric and viscosity solutions for the Cauchy problem of first order

Moment-Generating Functions and Reproductive Properties of Distributions

IJSRED-V2I5P56

Lesson 28: The Fundamental Theorem of Calculus

Plus de Gautier Marti

Using Large Language Models in 10 Lines of CodeGautier Marti

What deep learning can bring to...Gautier Marti

A quick demo of Top2Vec With application on 2020 10-K business descriptionsGautier Marti

cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...Gautier Marti

How deep generative models can help quants reduce the risk of overfitting?Gautier Marti

Generating Realistic Synthetic Data in FinanceGautier Marti

Applications of GANs in FinanceGautier Marti

My recent attempts at using GANs for simulating realistic stocks returnsGautier Marti

Takeaways from ICML 2019, Long Beach, CaliforniaGautier Marti

A review of two decades of correlations, hierarchies, networks and clustering...Gautier Marti

Autoregressive Convolutional Neural Networks for Asynchronous Time SeriesGautier Marti

Some contributions to the clustering of financial time series - Applications ...Gautier Marti

Clustering CDS: algorithms, distances, stability and convergence ratesGautier Marti

Clustering Financial Time Series using their Correlations and their Distribut...Gautier Marti

Optimal Transport vs. Fisher-Rao distance between CopulasGautier Marti

Optimal Transport between Copulas for Clustering Time SeriesGautier Marti

On the stability of clustering financial time seriesGautier Marti

On clustering financial time series - A need for distances between dependent ...Gautier Marti

Plus de Gautier Marti (18)

Using Large Language Models in 10 Lines of Code

What deep learning can bring to...

A quick demo of Top2Vec With application on 2020 10-K business descriptions

cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...

How deep generative models can help quants reduce the risk of overfitting?

Generating Realistic Synthetic Data in Finance

Applications of GANs in Finance

My recent attempts at using GANs for simulating realistic stocks returns

Takeaways from ICML 2019, Long Beach, California

A review of two decades of correlations, hierarchies, networks and clustering...

Autoregressive Convolutional Neural Networks for Asynchronous Time Series

Some contributions to the clustering of financial time series - Applications ...

Clustering CDS: algorithms, distances, stability and convergence rates

Clustering Financial Time Series using their Correlations and their Distribut...

Optimal Transport vs. Fisher-Rao distance between Copulas

Optimal Transport between Copulas for Clustering Time Series

On the stability of clustering financial time series

On clustering financial time series - A need for distances between dependent ...

Dernier

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823

Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson

Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45

Zuja dropshipping via API with DroFx.pptxolyaivanovalion

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Carero dropshipping via API with DroFx.pptxolyaivanovalion

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823

FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls

Week-01-2.ppt BBB human Computer interactionfulawalesam

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

Edukaciniai dropshipping via API with DroFxolyaivanovalion

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls

Data-Analysis for Chicago Crime Data 2023ymrp368

Smarteg dropshipping via API with DroFx.pptxolyaivanovalion

Dernier (20)

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...

Schema on read is obsolete. Welcome metaprogramming..pdf

Determinants of health, dimensions of health, positive health and spectrum of...

Zuja dropshipping via API with DroFx.pptx

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service

Carero dropshipping via API with DroFx.pptx

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...

FESE Capital Markets Fact Sheet 2024 Q1.pdf

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service

Week-01-2.ppt BBB human Computer interaction

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

Edukaciniai dropshipping via API with DroFx

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...

Data-Analysis for Chicago Crime Data 2023

Smarteg dropshipping via API with DroFx.pptx

Clustering Financial Time Series Using Noise Filtering and Copula Modeling

1. ON CLUSTERING FINANCIAL TIME SERIES GAUTIER MARTI, PHILIPPE DONNAT AND FRANK NIELSEN NOISY CORRELATION MATRICES Let X be the matrix storing the standardized returns of N = 560 assets (credit default swaps) over a period of T = 2500 trading days. Then, the empirical correlation matrix of the returns is C = 1 T XX . We can compute the empirical density of its eigenvalues ρ(λ) = 1 N dn(λ) dλ , where n(λ) counts the number of eigenvalues of C less than λ. From random matrix theory, the Marchenko- Pastur distribution gives the limit distribution as N → ∞, T → ∞ and T/N fixed. It reads: ρ(λ) = T/N 2π (λmax − λ)(λ − λmin) λ , where λmax min = 1 + N/T ± 2 N/T, and λ ∈ [λmin, λmax]. 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 λ 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 ρ(λ) Figure 1: Marchenko-Pastur density vs. empirical density of the correlation matrix eigenvalues Notice that the Marchenko-Pastur density fits well the empirical density meaning that most of the information contained in the empirical correlation matrix amounts to noise: only 26 eigenvalues are greater than λmax. The highest eigenvalue corresponds to the ‘market’, the 25 others can be associated to ‘industrial sectors’. CLUSTERING TIME SERIES Given a correlation matrix of the returns, 0 100 200 300 400 500 0 100 200 300 400 500 Figure 2: An empirical and noisy correlation matrix one can re-order assets using a hierarchical clustering algorithm to make the hierarchical correlation pattern blatant, 0 100 200 300 400 500 0 100 200 300 400 500 Figure 3: The same noisy correlation matrix re-ordered by a hierarchical clustering algorithm and finally filter the noise according to the correlation pattern: 0 100 200 300 400 500 0 100 200 300 400 500 Figure 4: The resulting filtered correlation matrix BEYOND CORRELATION Sklar’s Theorem. For any random vector X = (X1, . . . , XN ) having continuous marginal cumulative distribution functions Fi, its joint cumulative distribution F is uniquely expressed as F(X1, . . . , XN ) = C(F1(X1), . . . , FN (XN )), where C, the multivariate distribution of uniform marginals, is known as the copula of X. Figure 5: ArcelorMittal and Société générale prices are projected on dependence ⊕ distribution space; notice their heavy-tailed exponential distribution. Let θ ∈ [0, 1]. Let (X, Y ) ∈ V2 . Let G = (GX, GY ), where GX and GY are respectively X and Y marginal cdf. We define the following distance d2 θ(X, Y ) = θd2 1(GX(X), GY (Y )) + (1 − θ)d2 0(GX, GY ), where d2 1(GX(X), GY (Y )) = 3E[|GX(X) − GY (Y )|2 ], and d2 0(GX, GY ) = 1 2 R dGX dλ − dGY dλ 2 dλ. CLUSTERING RESULTS & STABILITY 0 5 10 15 20 25 30 Standard Deviation in basis points 0 5 10 15 20 25 30 35 Numberofoccurrences Standard Deviations Histogram Figure 6: (Top) The returns correlation structure ap- pears more clearly using rank correlation; (Bottom) Clusters of returns distributions can be partly described by the returns volatility Figure 7: Stability test on Odd/Even trading days sub- sampling: our approach (GNPR) yields more stable clusters with respect to this perturbation than standard approaches (using Pearson correlation or L2 distances).

Clustering Financial Time Series Using Noise Filtering and Copula Modeling

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (11)

Similaire à Clustering Financial Time Series Using Noise Filtering and Copula Modeling

Similaire à Clustering Financial Time Series Using Noise Filtering and Copula Modeling (20)

Plus de Gautier Marti

Plus de Gautier Marti (18)

Dernier

Dernier (20)

Clustering Financial Time Series Using Noise Filtering and Copula Modeling