9.8.15.thesis (3)

Measuring Systemic Risk with Vine Copula
models between Banks and Insurance
Companies in the UK during different time
horizons of UK GDP growth.
By:Peter Nicholas Allen
Supervisor: Dr. Eric A. Beutner
Submitted on: 20th August 2015

Acknowledgement
I would like to thank my supervisor Dr. Eric Beutner who helped me during challenging points
of my thesis. Additionally, I would like to thank J.S. Bach and Wolfgang Amadeus Mozart for
their musical contributions which made this research thoroughly enjoyable. Finally my father
who taught me mathematics as a child and was very patient.
i

Abstract
In this thesis we explore how different time horizons connected to sharp changes in the UK GDP
percentage growth effect our constructed systemic risk models between the major UK banks and
insurance companies. In particular we make use of extended GARCH models, copula functions
and the R- & C-vine copula models to illustrate their network of dependence. Stress testing is
used through a simulation exercise to identify influential and dependable institutions within the
financial system.
ii

Contents
Acknowledgement i
Abstract ii
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Questions of Our Investigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 The data 3
3 Methods 6
3.1 Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.1 Time series preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.2 ARMA Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.3 GARCH Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.4 Model Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Dependence Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 Copulae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4 Vine Copula Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.5 Vine Copula Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4 Application 43
4.1 Fitting Time Series Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2 Constructing the Vine Copulae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.3 The Vine Copula Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5 Conclusion 67
5.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2 Process and Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
A Additional figures - Time Series Modelling 70
B Additional figures - Vine Copula Modelling 74
B.1 Scatter and Contour Matrix of Full Copula Data Set: 2008 . . . . . . . . 74
B.2 C-Vine Copula Decomposition Matrices: 2009 . . . . . . . . . . . . . . . . . 75
B.3 R-Vine Copula Decomposition Matrices: 2008 . . . . . . . . . . . . . . . . . 77
B.4 Remaining Vine Tree Comparisons C-Vines . . . . . . . . . . . . . . . . . . 78
B.5 Remaining R-Vine Tree Comparisons . . . . . . . . . . . . . . . . . . . . . . 81
C R-code 85
C.1 Fitting GARCH Models and Preparing Copula Data - 2008 Data . . . . 85
C.2 Example Simulation Code X1 = HSBC Specific - 2008 Data . . . . . . . . 91
C.3 Example H-Functions Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Bibliography 95

1 Introduction
1.1 Background
The aim of this paper is to analyse systemic risk during an economic recession and recovery. This
is done through vine copula models which try to accurately depict the interdependence between a
specified collection of banks and insurance companies. However, we plan to ascertain and comment
on any changes in these measures when we re-sample from 2008 and 2009. We have chosen these
specific time horizons to build a hypothesis that the dependence amongst the institutions is far
greater during 2008 compared to 2009 due to different economic states i.e. recession and recovery.
The percentage change of UK GDP has been used as an indicator of the change in economic state.
During financial turmoil flight-to-quality is a financial market phenomenon which characterises
the sudden selling of risky assets such as equities, and reinvestment into safer alternative investments
such as bonds and securities. As a result private investors, banks, insurance companies, hedge
funds and other institutions supplement a surge of transactions, offloading these risky assets
and moving this liquidity somewhere else. This is the fundamental behaviour that governs the
hypothesis we will be testing. Given times of economic decline, co-movements of equities should
be more severe due to the erratic behaviour of investors.
Whilst this paper looks at the individual importance of one particular financial system, we
think it is important for the reader to consider the wider applications of these different risk
measures used to analyse the interdependency between a portfolios of random variables. Especially
in terms of how one (hedge fund, private investor, risk manager, insurance company, bank etc)
would act during and post a financial recession. Institutions are spending more time and money
on active risk management and we believe risk managers should be cognizant of the level of
dependence and interlinks between any portfolio of risks.
In terms of the statistical distribution theory used, the reader will see that we step away from
the traditional and much criticised Normal distribution framework. We use the modern work done
by the academic community to uniquely specify our marginals for the univariate data sets before
building the multivariate framework. The mathematical tools we use to model the dependence
structure are copulas and vine copula pairwise constructions. However, before these tools are
used we use the traditional method of fitting GARCH models to return independent identically
distributed error terms which will in turn be converted to copula data for dependence modelling.
In order to carry out all of this methodology we had to use a statistical package. In this
paper we make detailed references to our chosen software, R-Software. It allows us to fit GARCH
models, fit R- & C-vine models and also a framework in which to run the simulations. The
most important thing about R is that all the data is in one place and can be referenced as the
calculations continue from GARCH to vine copula simulations. To give the reader a good feel for
the application a lot of the graphs and code used to conduct the methodology has been included
in the appendix. However, we must point out that some of the code is custom so there may be
more efficient methods of coding the calculations.
1

1.2 Questions of Our Investigation
In the subsection we outline what we would like to investigate. We hope to get answers to the
following questions:
(i) Our first question is, what does the general dependence structure look like between the
institutions? Also collectively, the UK Banks and Insurance Companies? Is there one particular
institution which seems to stand out amongst the eight in terms of the level of dependence with
the rest?
(ii) Secondly, we would like to see how the dependence structure changes given that we apply a
shock to each individual company i.e. significant drop in share price? Which institution has the
most dramatic effect on the system given a shock is applied to its share price?
(iii)Thirdly we redo-question (i) but with the data from 2009 as opposed to 2008, do we believe
there is a link between the significance of dependence within our system of institutions and the
change in percentage GDP for the UK?
(iv) Finally we re-do question (ii) but again with the 2009 as opposed to 2008, does this period
of increased percentage GDP suggest dependence is not as high and do shocks to individual
institutions have less of a domino affect on the remaining institutions?
Now that we know what it is we are looking to investigate we introduce the data we will be
using.
2

2 The data
The first thing we need to do when we conduct our experiment is collect and prepare our data set
for analysis. We have sourced the data from Yahoo Finance (http://finance.yahoo.com/) which
allows you to download stock market data from any given time period within the companies
lifetime on the stock market. As we are looking at systemic risk the UK institutions chosen are
deamed to be the top 4 in terms of market capitalisation (MC), on the London Stock Exchange
(LSE). Below we have indicated the companies selected with their respective MC in £Billions:
Company Name Stock Symbol £MC
HSBC Holdings PLC HSBA.L 109Bn
Lloyds Banking Group plc LLOY.L 56.7Bn
Barclays PLC BARC.L 30.8Bn
Standard Chartered PLC STAN.L 26.6Bn
Table 1: List of Bank Stocks from LSE
Company Name Stock Symbol £MC
Prudential PLC PRU.L 42.7Bn
Legal & General Group LGEN.L 16.4Bn
Aviva plc AV.L 16.0Bn
Old Mutual PLC OML.L 11Bn
Table 2: List of Insurance Company Stocks from LSE
The UK banking industry is notorious for being heavily dependent on a handful of banks.
Unlike the German banking industry for example where there are hundreds of different banks
making a thoroughly diversified banking industry, the UK is too dependent upon the stability
and continued success of the above institutions. After the 2008-2009 financial melt down,
stricter regulation had to be introduced in order to prevent another financial crisis. Critics said
that despite the existence of the Financial Services Authority (FSA) nobody knew who to assign
the blame of the crisis thus three new bodies were created. Financial Policy Committee (FPC)
which would take overall responsibility for financial regulation in the UK from 2013. Prudential
Regulation Authority (PRA) who would take over responsibility for supervising the safety and
soundness of individual financial firms. Finally, the Financial Conduct Authority (FCA), which
was tasked with protecting consumers from sharp practices, and making sure that workers in the
financial services sector comply with rules. With banks and insurance companies having their
balance sheets closely monitored by these authorities; the emphasis being on the capital held in
relation to exposure to risk. We expect the banking system to become more robust and stable.
3

When calculating our risk measures we aim to conduct the experiment from two different time
horizons, the re-sample will take into account two different sharp changes in the UK GDP %
growth, we took 2008 and 2009 as our data sets for re-sampling, as in 2008 we saw a dramatic
decline and 2009 we saw a reverse incline. We hope to see some interesting results in order to
investigate how the change in GDP growth may affect the dependency between the major financial
institutions. Please see figure 1 below:
Figure 1: UK % Change in GDP
GDP is commonly used as an indicator of the economic health of a country, as well as to gauge
a country’s standard of living. This is why we try to make the link in this thesis between level of
risk dependence and the change in GDP growth. As implied in our background we would expect
to see a higher level of dependence in a negative growth in the GDP as this indicator implies
the economy is in a worse position than before and thus investors may become more sceptical,
business may reduce with banks and overall consumption may decrease which may have a knock
on affect to the institutions overall performance and financial strength.
Once our data was downloaded we had to check the quality of the data and make any necessary
adjustments. Where there were missing values on the same days for more than one company we
removed the entire row for those particular days. After converting the data into log return form
rt = log(St) − log(St−1)
where rt, is the log return for a particular stock, St.
4

Finishing this process we had a sample Xn1, ..., Xn8 of stocks with n = 253 observations for 2008
and similarly for 2009 we had n = 252 observations. Please note in the Standard Chartered 2009
data we had to replace 2 consecutive observations as they were disrupting any form of GARCH
model fitting, it consisted off one extreme positive value followed by one extreme negative value.
We replaced them with the average of the observations before and then for the second value the
average of the remaining observations going forward. This method was also implemented for
AVIVA which had a severe drop in share price on one particular day. Whilst we appreciate this
reduces the accuracy of our tests going forward if this step is not taken, the necessary models
can not be fit. This is because we need independent and identically distributed residuals before
moving onto the copula modelling. Below is a graph to illustrate HSBA.L in the log return format:
Figure 2: HSBC 2008: Top - Raw data & Bottom - Log return
When starting the application of the copula construction to our data i.e. fitting the Vine Copula
tree structure and fitting the appropriate pairwise copula distributions we will need to adjust
our innovations (obtained from fitting the GARCH model). This process is called probability
integral transformation and will ensure all our data belongs to the interval [0,1]. This property
is necessary for our marginal distributions, once this transformation has occurred we will refer
to the data as copula data, this process is discussed fully in section 4.1.
5

3 Methods
The order of topics in the methods runs parallel with the order in which the application will occur.
Hopefully the reader will find this easier to follow. In section 3.1 we look at fitting time series
models to best describe the dynamics of the return, specialising in GARCH models. Section 3.2
we take a introductory look at copulae. Then in 3.3 we look at the construction of vine copula
models and finally in section 3.4 we look at simulation for the vine copulae models.
3.1 Time Series
In order to conduct our Copula construction in section 3 we need to ensure our data sets are
independent and identically distributed. To achieve this we will ultimately fit GARCH models,
see section 3.1.3, which will remove trends, seasonality, and serial dependence from the data.
Then the reader should be able to understand the application of our GARCH models in section
4.2.
3.1.1 Time series preliminaries
As we are modelling time series data which is random we need to introduce the definition of a
stochastic process.
Definition 3.01 - Stochastic Process
A stochastic process is a collection of random variables (Xt)t∈T defined on a probability space
(Ω, F, P). That is for each index t ∈ T, Xt is a random variable. Where T is our time domain.
As our data is recorded on a daily basis we associate our stochastic process with a discrete time
stochastic process. The data is a set of integer time steps so we set T = N. A time series is
a set of observations (xt)t∈T , where each observation is recorded at time t and comes from a
given trajectory of the random variable Xt. We use the term time series and stochastic process
interchangeably.
For this thesis we shall use two time series models which are very popular in the financial
industry, ARMA and GARCH models.
Reminder - Continuing on from our data section where we defined the log return. This is also
a stochastic process, where, rt, is the log return for a particular stock, St:
rt =
log(St)
log(St−1)
= log(St) − log(St−1), ∀t ∈ Z (3.1)
In the financial markets the prices are directly observable but it is common practice to use the
log return as it depicts the relative changes in the specific investment/asset. The log return also
possesses some useful time series characteristics which we will see later on in this section. Please
see figure 2 from the data section to view both the market prices and log return of HSBC.
6

In order to determine the dependence structure within a stochastic process we introduce the
concept of autocovariance and autocorrelation functions.
Definition 3.02 - Autocovariance Function
Let (Xt)t∈Z be a stochastic process with E[X2
t ] < ∞, ∀t ∈ Z. Then the autocovariance
function,γX, of (Xt)t∈Z is defined as
γX(r, s) = Cov(Xr, Xs), r, s ∈ Z
Definition 3.03 - Autocorrelation Function (ACF)
The autocorrelation function (ACF), ρX, of (Xt)t∈Z is defined as:
ρX(r, s) = Corr(Xr, Xs) = γX (r,s)
√
γX (r,r)
√
γX (s,s)
, r, s ∈ Z
In order to fit appropriate models to our time series sample (x1, ..., xn) we need to make some
underlying assumptions. One of the most important ones is the idea of stationarity, which
essentially means the series is well behaved for a given time horizon i.e. a bounded variance.
Although there are several forms of stationarity, for the purposes of this thesis, we need only
define wide sense (or covariance stationarity). For more detail please see J. Davidson [2000].
Definition 3.04 - Wide Sense Stationarity
A stochastic process (Xt)t∈Z is said to be stationary in the wide sense if the mean, variance and
jth-order autocovariances for j > 0 are all independent of t. And:
(a) E[X2
t ] < ∞, ∀t ∈ Z
(b) E[Xt] = m, ∀t ∈ Z and m ∈ R, and
(c) γX(r, s) = γX(r + j, s + j), ∀r, s, j ∈ Z
The above definition implies that the covariance of (Xr) and (Xs) only depend on |s−r|, because
γX(j) := γX(j, 0) = Cov(Xt+j, Xt), t, j ∈ Z. We can simplify the autocovariance function of a
stationary time series as
γX(j) := γX(j, 0) = Cov(Xt+j, Xt),
where t, j ∈ Z and j is called the lag.
Similarly, the autocorrelation function of a stationary time series with lag j (as defined above)
is defined as
ρX(j) := γX (j)
γX (0) = Corr(Xt+j, Xt), t, j ∈ Z
In empirical analysis of the time series data we will need to estimate both of these functions, this
is done via the sample autocovariance function and sample autocorrelation function.
7

Definition 3.05 - Sample Autocovariance Function
For a stationary time series (Xt)t∈Z the sample autocovariance function is defined as
ˆγ(j) :=
1
n
n−j
i=1
(xi+j − ¯x)(xi − ¯x), j < n,
where ¯x = n
i=1 xi is the sample mean.
Definition 3.06 - Sample Autocorrelation Function
For a time series (Xt)t∈Z the sample autocorrelation function is defined as
ˆρ(j) :=
ˆγ(j)
ˆγ(0)
, j < n.
we have mentioned ACF previously, well another tool we use to explore dependence and help
ascertain the order in our ARMA models is the partial autocorrelation function (PACF).
Definition 3.07 - Partial Autocorrelation Function (PACF)
The PACF at lag j, ψ(j), of a stationary time series (Xt)t∈Z is defined as
ψX(j) =
Corr(Xt+1, Xt) = ρX(1), for j = 1
Corr(Xt+j − Pt,j(Xt+j), Xt − Pt,j(Xt), for j ≥ 2
where Pt,j(X) denotes the projection of X onto the space spanned by (Xt+1, ..., Xt+j−1).
The PACF measures the correlation between Xt and its lagged terms lets say Xt+j for
j ∈ Z{0}, without the effect of observations (Xt+1, ..., Xt+j−1). Similar to before we denoted
the PACF as ¯ψ(j) for which there are multiple algorithms, for which we refer the reader to
Brockwell and Davis [1991] for further information.
Figure 3: ACF and PACF for log return HSBC 2008 data set, lag = 18
8

The graphs in the previous page were intended to give the reader some graphical representation
of the above functions applied to real data. We have inserted the ACF and PACF for HSBC
(figure 3).
This particular pair of graphs are very useful in immediately determining any signs of
autocorrelation, if we find that the bars are escaping the two horizontal bounds we can be sure
that their is some form of autocorrelation and we should include some lagged terms in our
ARMA model (explained further section 3.1.2) to remove this. If we find that the majority, if
not all of the bars are contained within the horizontal bounds we can proceed on the assumption
that there is no sign of autocorrelation i.e. white noise, see definition 3.08 below.
Definition 3.08 - White Noise
Let (Zt)t∈Z be a stationary stochastic process with E[Zt] = 0 ∀t ∈ Z and autocovariance function
γZ(j) =
σ2
Z, for j = 0,
0, for j = 0,
with σ2
Z > 0. Then (Zt) is called a white noise process with mean 0 and variance σ2
Z, thus,
(Zt)t∈Z ∼ WN(0, σ2
Z).
Now that we have introduced the fundamentals we can bring in the more specific time series
models, before we do this, the general time series framework will be outlined so that the reader
can always refer back to it for reference.
Definition 3.09 - Generalised Time Series Model
We use a time series model to describe the dynamic movements of a stochastic process, in
particular we will be concentrating on the log return process (rt)t ∈ Z. It has the following
framework:
rt = µt + t (3.2)
t = σtZt
The conditional mean, µt, and the conditional variance, σ2
t , are defined as follows
µt := E[rt|Ft−1] and (3.3)
σ2
t := E[(rt − µt)2
|Ft−1], (3.4)
where Ft is the filtration representing the information set available at time t. The Zt’s are
assumed to follow a white noise representation. We call equation 3.2 the return equation, it is
made up of the following constituent parts:
(i) Conditional mean, µt, (ii) Conditional variance, σ2
t ,
(iii) residuals [observed minus fitted], t and (iv) white noise, Zt
9

Distribution of the Zt’s
More often than not the distributions of the Zt’s are assumed to be standard normal. This is
because it makes modelling and inference more tractable. However, in real life application this
cannot be assumed, especially with financial data. It is well known that financial data often
pertains negative skewness and leptokurtosis (other wise known as fat tails or heavy tails). We
shall therefore consider the use of the following distributions in our time series model fitting,
please note they are all standardized:
(a) Standard Normal Distribution (NORM) is given by
fZ(z) =
1
√
2π
exp
−z2
2
,
with mean of 0 and variance = 1. This is the simplest distribution used to model the residuals.
But doesn’t model fat tailed or skewed data.
(b) Student-t distribution (STD) is given by
fZ(z) =
Γ
ν + 1
2
Γ
ν
2
(ν − 2)π
1 +
z2
ν − 2
−( ν+1
2
)
,
with ν > 2 being the shape parameter and Γ(.) the Gamma function. For large samples this will
tend closer to the normal distribution. It also has the property of fat tails which makes it a
useful distribution to model financial data. It should also be noted that the t-distribution is a
special case of the generalised hyperbolic distribution.
(c) Generalised Error Distribution (GED) is given by
fZ(z) =
ν exp − 1
2|z
λ |ν
λ · 2
1+
1
ν · Γ
1
ν
,
with shape parameter 0 < ν ≤ ∞, λ = 2
−2
ν Γ(1
ν )Γ(3
ν ), and Γ(.) again denotes the Gamma
function. NB: The normal distribution is a special case of the GED (for ν = 2). The
distribution is able to account for light tails as well as heavy tails depending on whether ν > 2
for heavy tails and ν < 2 for light tails.
10

(d) Generalised Hyperbolic Distribution (GHYP) is given by
fZ(z) =
κ
δ
λ
exp(β(z − µ))
√
2πKλ(δκ)
Kλ−1
2
α δ2 + (z − µ)2
δ2 + (z − µ)2
α
λ−1
2
,
with κ := α2 − β2, 0 ≤ |β| < α, µ, λ ∈ R, δ > 0, and Kλ(.) being the modified function of the
second kind with index λ please see Barndorff-Nielsen [2013] for further details on this. The δ is
called the scale parameter.
As we are looking at the standardised version with mean 0 and variance 1, we use another
parametrisation with ν = β
α and ζ = δ α2 − β2, shape and skewness parameters, respectively.
This distribution is mainly applied to areas that require sufficient probability of far-field
behaviour, which it can model due to its semi-heavy tails, again a property often required for
financial market data.
(e) Normal Inverse Gaussian distribution (NIG) is given by
fZ(z) =
αδK1(α δ2 + (z − µ)2)
π δ2 + (z − µ)
exp[δκ + β(z − µ)],
with κ := α2 − β2, 0 ≤ |β| ≤ α, µ, ∈ R, δ > 0, and K1(.) being the modified Bessel function of
the second kind of index 1. See Anders Eriksson [2009] for further detail. Note that the NIG
distribution is a special case of the GHYP distribution with (λ = −1
2 ). As above we use the
parametrisation with ζ and ν. The class of NIG distributions is a flexible system of distributions
that includes fat-tailed and skewed distributions which is exactly the kind of distribution we
hope to implement.
3.1.2 ARMA Models
In this section we introduce univariate autoregressive moving average (ARMA) processes which
model the dynamics of a time series with a linear collection of past observations and white noise
residual terms. For a more comprehensive look in ARMA models please see J. Hamilton [1994].
Definition 3.10 - MA(q) process
A q-th order moving average process, MA(q), is characterised by
Xt = µ + t + η1 t−1 + ... + ηq t−q,
where t is white noise mentioned in definition 2.08 such that t ∼ WN(0, σ2). Also the elements
of (η1, ..., ηq) can be any real number, R. And q ∈ N{0}. Below we have illustrated a simulated
example of a MA(1) process, we have taken η = 0.75 with 750 observations, see figure below:
11

Figure 4: MA(1), η = 0.75 with 750 observations
Definition 3.11 - AR(p) process
A p-th order autoregression, AR(p), is characterised by
Xt = µ + φ1Xt−1 + ... + φpXt−p + t,
similarly to the above definition t ∼ WN(0, σ2), φi ∈ R and p ∈ N{0}.
Figure 5: AR(1), φ = 0.75 with 750 observations
Combining these two processes will make a ARMA(p,q) model thus.
Definition 3.12 - ARMA(p,q) Process
A ARMA(p,q) process includes both of the above processes thus we characterise it as follows
Xt − φ1Xt−1 − ... − φpXt−p = t + η1 t−1 + ... + ηq t−q,
where the same criteria applies as in definition 2.11 and 2.12.
12

Figure 6: ARMA(1,1), φ & η = 0.75 with 750 observations
In the context of our log return equation (see equation 3.1). Our ARMA(p,q) should look as
follows, essentially we replace the Xt’s with rt’s:
rt − φ1rt−1 − ... − φprt−p = t + η1 t−1 + ... + ηq t−q,
Now that we have chosen a model type for our return we need to determine the orders of p
and q, before fitting the model. As you will see in my R-code appendix C.1, there is a useful
function auto.arima which we use to automatically determine the orders of p and q, through a
information criteria selection of ”aic” or ”bic”, further detail on these tests in section 3.1.4 - Model
Diagnostics. However, should the reader wish to do this manually, there is a popular graphical
approach. For this we refer to a series of step by step notes put together by Robert Nau from
Fuqua School of Business, Duke University, see R. Nau.
To give a brief outline we have put in a series of shortened steps:
(i) Assuming the time series is stationary we must determine the number of AR or MA terms
needed to correct any autocorrelation that remains in the series.
(ii) By looking at the ACF and PACF plots of the series, you can tentatively identify the number
of AR and/or MA terms that are needed.
NB: ACF plot: Is a bar chart of the coefficients of correlation between a time series and lags of
itself. The PACF plot is a plot of the partial correlation coefficients between the series and lags
of itself. See figure 7 below for illustration.
13

Figure 7: ARMA(0,2), Old Mutual PLC
So in figure 7 below we can see that there are two spikes escaping the lower horizontal band
indicating the presence of correlation in the MA lagged terms. When you then look at the
PACF you can see these two spikes more clearly indicating that a AR(0) and MA(2) would
collectively be the most suited model i.e. ARMA(0,2).
(iii) As we have discussed above when analysing the ACF and PACF the reader should firstly
be looking for a series of bars in the ACF escaping the horizontal bounds to indicate some form
of autocorrelation. Then when looking at the PACF, if the bars are outside of the upper
horizontal bound we are looking at AR terms and if we see this occurring below the lower
horizontal bound we have MA terms.
(iv) Once this is done we recommend that the chosen model is fitted and compared with any
similar models against the log likelihood measure and AIC measure. Again see section 3.1.4 for
further details.
The ARMA processes are modelled under the assumption that the variance, σ2, is constant.
However, it is commonly known that this is not the case when working with financial data. We
usually see clusters of high and low volatilities, this will usually be visible by a period of large price
movements followed by a period of low price movements (see figure 2). Thus assuming returns
to be independent identically distributed noise terms would be wrong. They instead seems to
depend on past information and illustrate conditional behaviour. For these reasons we now look
to model volatility conditionally on time and past observations, using GARCH models.
14

3.1.3 GARCH Models
In this section we introduce univariate generalized autoregressive conditional heteroskedastic
(GARCH) processes which are an extension to the Autogressive Conditional heteroscedastic
(ARCH) processes. The GARCH model essentially models the conditional variance of a stochastic
process via a linear combination of previous squared volatilities and previous squared values of
the process.
There are multiple types of GARCH models and Extended GARCH models. For the purpose of
this thesis we shall cover the basic models and finally the Exponential-GARCH model as it was
used most during my application. However, we do recommend that the reader tests different
models if they are to apply this method. See J. Hamilton [1994] and A. Ghalanos [2014] for
theory and coding, respectively. Please note we consider GARCH(1,1) models only and do not
cover the higher order GARCH models for simplicity purposes.
Definition 3.13 - GARCH(p,q) Process
Given a stochastic process ( t)t∈Z and that we have a i.i.d sequence of random variables (Zt)t∈Z
with mean 0 and variance equal to 1. Then t ∼ GARCH(p, q) if E[ t|Ft−1] = 0 and, for every t,
it satisfies
t = σtZt (3.5)
V ar[ t|Ft−1] := σ2
t = ω +
q
i=1
αi
2
t−i +
p
j=1
βjσ2
t−j
with p ≥ 0, q ≥ 0, ω > 0, and α1, ..., αq, β1, ..., βp ≥ 0.
Result 3.14 - Stationarity of GARCH(1,1) Process
t ∼ GARCH(1, 1) is given by
t = σtZt
σ2
t = ω + α 2
t−1 + βσ2
t−1
with p ≥ 0, q ≥ 0, ω > 0, and (Zt)t∈Z as in definition 2.11, is stationary iff α + β < 1. The
(unconditional) variance of the process is then given by
V ar[ t|Ft−1] := σ2
t =
ω
1 − α − β
For a more extensive review of this result and more, please see page 666 of J. Hamilton [1994].
Coming back to the general model again, in equation 3.5 we are assuming that the mean is
constant i.e. ∀t µ = µt. However, we would like to remove this modelling restriction by combining
our previously discussed ARMA(p,q) model and the GARCH(1,1), to give a ARMA(p,q)-GARCH(1,1)
model. This should allow us more than enough modelling tools to model our returns well.
15

Definition 3.15 - ARMA(1,1)-GARCH(1,1) Process
Combining the ARMA(p,q) and GARCH(1,1) process we get a ARMA(p,q)-GARCH(1,1)
process. Here we illustrate specifically the process with (p,q) = (1,1) giving the following time
series
rt = µ + φrt−1 + η t−1 + t
t = σtZt
σ2
t = ω + α 2
t−1 + βσ2
t−1,
where φ, η ∈ R, ω > 0 and α, β ≥ 0. The distribution of the (Zt)t∈Z’s are chosen among those
detailed in section 3.1.1. The standard residuals of the above model are given by
ˆZt =
1
ˆσt
rt − ˆµ − ˆφrt−1 − ˆηˆσt−1
ˆZt−1
where ˆσ, ˆφ, ˆη and ˆµ are all estimated.
Initial Parameter Estimation: By reference to M. Pelagatti and F. Lisi. Due to the
recursive nature of σt R-Software needs to calculate an initial estimate of ˆσ. As we are working
under the assumption of stationarity it is common to set our initial estimate equal to
σ1 = ˆσ = n
t=1 rt
2. The remaining parameters of the ARMA-GARCH model are calculated in
the traditional log likelihood maximisation method. We recommend the reader see the work of
C. Francq and J.M Zakoian for more information on estimation methods.
Extensions on Standard GARCH Models
At the beginning of this section their are various different extension to the standard GARCH
model. Each different extension is suppose to capture an inherent empirical characteristic
property of the financial data. For example the GARCH-M or GARCH-in-Mean model was
introduced to pick up on correlation between the risk and expected return thus a conditional
volatility term was added into the return equation as an exogenous factor, see I. Panait and E.
Slavescu [2012] for further details. Other models include: Integrated GARCH, GJR GARCH,
Component sGARCH and Absolute Value GARCH etc, see A. Ghalanos [2014] for more detail.
As we found the exponential GARCH to be most applicable for our data we shall outline its
structure.
Definition 3.16 - Exponential GARCH (eGARCH) Model
The conditional variance equation in the eGARCH model is given by
log(σ2
t ) = ω +
q
i=1
g(zt−i) +
p
j=1
βjlog(σ2
t−j), (3.6)
where
g(zt) = γi(|zt| − E[|zt|]) + αizt
the function g(zt) covers two effects of the lagged shocks zt = t−i
σt−j
(were zt also depends on i
and j) on the conditional variance: where the γi defines the size effect, αi defines the sign effect.
Both of these effects address the asymmetric behaviour due to the leverage effect. Note how this
16

α differs from the standard GARCH models. Additionally, a useful characteristic of this model
is there are no parameter restrictions compared to other GARCH models, so, α, β and γ can be
any real number, this is due to the logarithmic transformation ensuring the positivity of the
conditional variance. Importantly it can be shown that the process is stationary if and only if
(iff) p
j=1 βj = 1. See D. Nelson [1991] for further detail on the EGARCH model.
NB: Standard GARCH models assume that positive and negative error terms have a symmetric
effect on the volatility. In other words, good and bad news have the same effect on the volatility
in the model. In practice this assumption is frequently violated, in particular by stock returns,
in that the volatility increases more after bad news than after good news, this is known as
leveraged effect.
3.1.4 Model Diagnostics
Once the models has been fitted to each univariate data set we must carry out a series of tests
to compare the models selected (it is not always obvious which model to select), and also check
goodness-of-fit measures. There are a series of tests we carry out which shall be outlined in this
section. Please note that some of these tests will overlap with the vine copula diagnostics.
AIC Criteria:
The Akaike information criterion (AIC) is a very popular criteria used to compare and select the
best model. The measure is defined as follows
AIC := 2k − 2
n
i=1
logf(xi|ˆθ),
where the xi’s refer to the observations, i = 1, ..., n. ˆθ is the maximum likelihood estimator for
the parameter vector (θ1, ..., θk) = θ , k being the number of parameters in the model. As you
can see the measure penalises the model with many parameters and gives more merit to a model
with a higher log likelihood value (a good indicator for goodness-of-fit). So when deciding which
model to select we are looking for the model with the lowest value of AIC and highest value for
the log likelihood.
BIC Criteria:
The Bayesian information criterion (BIC) is very similar and is defined as
BIC := 2log(n) − 2
n
i=1
logf(xi|ˆθ),
the difference here is the number of observations acts as a penalty instead of the number of
parameters. Again we are looking for the smaller BIC value when choosing between models.
Both the AIC and BIC measures are used when looking at: fitting ARMA and GARCH models
and finally when looking at the vine copula density models.
NB: The next series of goodness-of-fit tests and techniques are based on the standardised
residuals ( ˆZt)t ∈ Z of the fitted models. These tests are to see whether the fitted residuals are
independently and identically distributed according to the assumed distribution chosen i.e.
distribution selected from section 3.1.1 - Distribution of the Zt’s.
17

QQ plot:
The most popular and easiest way to determine whether the underlying distribution follows the
standardised residuals is by analysing the quantile-quantile-plots, more commonly known as
Q-Q plots. If the underlying distribution is a correct suitor, most of the points in the Q-Q Plot
should lie on a straight line, usually at forty-five degrees but this is not always the case. It has
other useful properties such as comparing the shapes of distributions, providing a graphical view
of how properties such as location, scale, and skewness are similar or different in the two
distributions. Please see example of Q-Q Plot below.
Figure 8: QQ Plot of HSBC - Underlying Distribution GHYP
As you can see aside from a few outliers the majority of the points nest on top of a line at
approximately forty-five degrees. This indicates that the underlying distribution is a good fit to
the standardised fitted residuals.
Ljung-Box Standardised Residuals:
To test whether or not the standardised residuals of our fitted model still exhibit serial
correlation we perform the Ljung-Box test. The null hypothesis is that the residuals behave like
white noise and the alternative is that they do not i.e. they exhibit some sort of serial
correlation. Test statistic is as follows
˜Qm(ˆρ) = n(n + 2)
m
j=1
ˆρ2
j
n − j
,
where the sample autocorrelation of ( ˆZt)t=1,...,n is
ˆρj =
n
t=j+1
ˆZt
ˆZt−j
n
t=1
ˆZ2
t
for lags j = 1, ..., n. We reject the null at the α% level if ˜Qm(ˆρ) > χ2
m−s,1−α (equivalent to the
p-value being smaller than α), here m − s is the number of degrees of freedom for χ2-
18

distribution and s = number of parameters estimated in the model. For more details on this test
please see Ljung and Box [1978].
In the illustration on the next page, we have included the printed results from our HSBA fitted
time series model, which was conducted in R. As you can see all of our p-values are high
indicating there is sufficient evidence to suggest there is no serial correlation.
Weighted Ljung-Box Test on Standardized Residuals for HSBC
------------------------------------
statistic p-value
Lag[1] 0.3228 0.5699
Lag[2*(p+q)+(p+q)-1][2] 0.3658 0.7602
Lag[4*(p+q)+(p+q)-1][5] 0.8907 0.8839
d.o.f=0
H0 : No serial correlation
Ljung-Box Squared Standardised Residuals:
In this test we aim to test for independence. This is achieved when we apply the above test to
the squared standardised residuals. Using the same data and model as used in the illustration
above we obtained the following:
Weighted Ljung-Box Test on Standardized Squared Residuals for HSBC
------------------------------------
statistic p-value
Lag[1] 7.714e-07 0.9993
Lag[2*(p+q)+(p+q)-1][5] 7.714e-01 0.9089
Lag[4*(p+q)+(p+q)-1][9] 3.047e+00 0.7511
which as above indicates that we keep the null hypothesis and their is sufficient evidence to
suggest we have independent residuals.
Note: The robustness of the Ljung-Box test applied in this context is frequently discussed in
literature and several modified versions have been made. But this detail is not required for the
purpose of this thesis. See P. Burns [2002] for further information.
ARCH Lagrange-Multiplier (LM) test:
The purpose of this procedure is to see whether there exists any ARCH effects. This is done by
regressing the squared error terms 2
t on their own lags, so we perform a linear regression, thus,
ˆ2
t = c0 + c1ˆ2
t−1 + ... + cpˆ2
t−p
with H0: c0 = c1 = ... = cp and H1: c0 ≥ 0, c1 ≥ 0, ..., cp ≥ 0, if we keep the null then there is
white noise amongst the error terms. If we reject the null, the error terms have ARCH
characteristics modelled by a ARCH(p). Performing this test on the standardised squared
residuals of the model, a high p-value will indicate the model has removed any ARCH effects.
Below we have given a practical example from our HSBC data as you can see the model seems
to be adequate (no ARCH effects present) given the large p-values. For more detail on this see
R. Engle [1982].
19

Weighted ARCH LM Tests for HSBC
------------------------------------
Statistic Shape Scale P-Value
ARCH Lag[3] 0.008469 0.500 2.000 0.9267
ARCH Lag[5] 1.584654 1.440 1.667 0.5703
ARCH Lag[7] 2.970199 2.315 1.543 0.5192
Sign Bias test:
The sign bias test is another test introduced by R. Engle [1993] which tests the presence of
different leverage effects (or asymmetry effects) as mentioned in the eGARCH definition. Again
our test is based on the standardised squared residuals and should indicate whether our GARCH
model is misspecified. If we reject the null then we should assume the model is misspecified and
try eGARCH model or others see R. Engle [1993] for different model specifics. Similar to the
above except we now regress the squared residuals on the lagged shocks, thus we have,
ˆZ2
t = d0 + d11{ ˆZt−1<0} + d21{ ˆZt−1<0}
ˆZt−1 + d31{ ˆZt−1≥0}
ˆZt−1 + et,
where 1 is the indicator function. This means it takes the value +1 if the subscript constraint is
satisfied, 0 otherwise. et is the error term.
We perform four simultaneous tests, written as follows:
Sign Bias test: H0 : d1 = 0
Negative Size Bias test: H0 : d2 = 0
Positive Size Bias test: H0 : d3 = 0
Joint Effect test: H0 : d1 = d2 = d3 = 0
The first three tests come in the form of a standard t-test but the last one is a standard F-test.
As the null hypothesis eludes to, we are looking to see whether our selected model can explain
the effects of positive and negative shocks on the conditional variance. Additionally, whether the
effects of large and small positive (or negative) shocks impact on the conditional variance. See
C. Brooks [2008] for more detail.
To illustrate the test we have included the test carried out for HSBC below:
Sign Bias Test for HSBC
------------------------------------
t-value prob sig
Sign Bias 0.42780 0.6692
Negative Sign Bias 1.09064 0.2765
Positive Sign Bias 0.01427 0.9886
Joint Effect 1.22214 0.7477
As you can see from the illustration the model has large p-values indicating that there does not
seem to be any evidence in the data of leverage effects.
20

3.2 Dependence Measures
Within this section we look to introduce dependence measures which form the foundation of this
thesis and play a crucial role throughout the application in section 4. We will describe two
particular and frequently used measures, Kendall tau and Spearman’s rho. Towards the end we
shall include the theory necessary to understand out first system plot of dependence between the
institutions chosen.
Pearson’s product moment correlation coefficient is the most popular measure, however, it has
drawbacks which limit its scope for us. Thus we move onto so called measures of association
which allow us to avoid the limitations of the Pearson Correlation such as: only measures linear
dependence, not invariant under non-linear, strictly increasing transformations and is undefined
for non-finite variance. See N. Chok [2008] for more information.
Measures of Association
Before we continue we must must define a core component which is used for both measures of
association. The following definitions have been sourced from R. Nelson [2006].
Definition 3.17 - Concordance
If we take two independent pairs of observations (xi, xj) and (yi, yj) for i, j = 1, ..., n from the
continuous random variables (X, Y ) then they are concordant if
(xi − xj)(yi − yj) > 0
i.e. xi < xj and yi < yj. This looks to see whether large (small) value of one random variable
simultaneously correspond to large (small) values of the other. Analogues for discordant,
(xi − xj)(yi − yj) < 0
which looks to see whether large (small) value of one random variable simultaneously correspond
to small (large) values of the other.
These measures give rise to Kendall tau.
Definition 3.18 - Kendall’s tau
Kendall’s tau is essentially the probability of concordance minus the probability of discordance.
Formally defined as: let (Xi, Yi), (Xj, Yj) ∈ (X, Y ) for i, j = 1, ..., n be two independent and
identically distributed copies of (X, Y ). Then Kendall’s tau is written as
τ(X, Y ) = P((Xi − Xj)(Yi − Yj) > 0) − P((Xi − Xj)(Yi − Yj) < 0)
As we work with real data we need to use a empirical version ˆτ(X, Y ) which takes on values
between [-1,1] when we have a high number of concordant pairs the value of tau will be close to
+1 and when we have a high number of discordant pairs the value will be close to -1.
21

The empirical version of Kendall tau
ˆτ(X, Y ) =
Ncon − Ndis
Ncon + Ndis + Ntie,x Ncon + Ndis + Ntie,y
(3.7)
Ncon = Number of concordant pairs
Ndis = Number of discordant pairs
Ntie,x = Number of tied pairs for x, see note below for further explanation
Ntie,y = Number of ties pairs for y, see note below for further explanation
NB: We are not going to find tied pairs in our data but for consistency we have left the
definition in the paper. Hence a pair (xi, yi), (xj, yj) is said to be tied if xi = xj or yi = yj; a
tied pair is neither concordant nor discordant.
NB: Without tied pairs the empirical Kendall tau is
ˆτ(X, Y ) =
Ncon − Ndis
n(n − 1)
2
,
with n = total number of observations.
Definition 3.19 - Spearman’s Rho
Similar to Kendall, Spearman’s rho makes use of concordance and discordance. We use the same
terminology as before except we now have three i.i.d copies of (X,Y) say
(Xi, Yi), (Xj, Yj)&(Xk, Yk) we write Spearman’s rho as follows:
ρs(X, Y ) = 3[P((Xi − Xj)(Yi − Yk) > 0) − P((Xi − Xj)(Yi − Yk) < 0)].
The empirical version of Spearman’s rho is defined as
ˆρs(X, Y ) = i(r(xi) − ¯rx)(r(yi) − ¯ry)
( i(r(xi) − ¯rx)2 ( i(r(xi) − ¯ry)2
, i = 1, ..., n, (3.8)
where r(xi) is the rank of xi and ¯rx = 1
n
n
i=1 r(xi).
Multidimensional Scaling
To accompany the results we will obtain from the above dependence measures we find it is
useful to get a pictorial view of the dependence between our institutions. Thus we use
multidimensional scaling which allows us to convert the representation of the dependence
measure between any two institutions into a distance on a [-1,1]x[-1,1] plot. This distance is
known as dissimilarity i.e. the bigger the distance between two points the less dependence
between the two firms. We define it as follows
dij := 1 − ˆτ(Xi, Xj).
To find a set of points such that the distances between these are approximately equal to the
dissimilarities dij in our data, we use Kruskal-Shephard scaling method which seeks values
z1, ..., zd ∈ R2 such that the following is minimised
i=j
(dij − ||zi − zj||)2
.
||.|| denoting the Euclidean distance in R2. The plot is shown in section 4, figure 21.
22

As we are about to begin discussing copulae it is important to mention the connection both of
these measures have with copula functions.
Result 3.20 - Link with Copulae: Kendall and Spearman
Let X and Y be two continuous random variables with copula C. Then we have that
τ(X, Y ) = 4
[0,1]2
C(u, v)dC(u, v) − 1
and
ρs(X, Y ) = 12
[0,1]2
uv dC(u, v) − 3
The result is a preliminary to show that the population analogous of these statistics has a
ration which approaches 3/2 as the joint distribution approaches that of two independent random
variables. See G.A. Fredricks and R.B Nelsen for further information on this proof. There paper
also states that C is Lipschitz continuous (this is defined under 3.23 - further property) which
means C is almost differentiable everywhere and hence ∂C/∂u & ∂C/∂v exist almost everywhere
on [0, 1]2.
Tail Dependence
Although Kendall’s tau and Spearman’s rho describe the dependence between two random
variables over the whole space [0, 1]2. As our study is looking into the occurrence of extreme
events we must look into this into more detail i.e. dependence between two extreme values of
our random variables. The idea of dependence between extreme values is still based on
concordance but we specifically look at the lower left and upper right quadrant of the unit
square. This again follows the book of Nelson [2006].
Definition 3.21 - Upper and Lower Tail Dependence
Let X and Y be two continuous random variables with distribution functions F and G,
respectively. We have the upper tail dependence parameter, λu, is the limit of the probability
(assuming its existence) that Y is greater than the 100-th percentile of G given that X is greater
than 100-th percentile of F as t approaches 1, so in math speak we have
λu
= lim
t→1−
P(Y > G−1
(t)|X > F−1
(t)). (3.9)
And for lower tail dependence parameter, λl, is defined as so
λl
= lim
t→0+
P(Y ≤ G−1
(t)|X ≤ F−1
(t)). (3.10)
As with Kendall and Spearman we can relate these definitions to copulae.
23

Result 3.22 - Upper and Lower Tail Dependence
Let X and Y be as above but with copula C. If the limits to equations 3.9 and 3.10 exist then we
get
λu
= lim
t→1−
1 − 2t + C(t, t)
1 − t
and
λl
= lim
t→0+
C(t, t)
t
.
3.3 Copulae
Copulae have become more and more popular in finance, especially the subsection of risk management.
The reason for this is they have the unique capability of decomposing a joint probability distribution
into its univariate marginal distributions and what is called a copula function, these describe the
dependence structure between variables. In this section we aim to define a copula function, give
some examples of pair-copula decompositions and detail different copula functions to be used for
vine copula construction, in section 3.4. The majority of the content comes from R. Nelson [2006],
K.Aas [2006] and K. Hendrich [2012].
Definition 3.23 - Copula
A copula is a multivariate distribution, C, with uniformly distributed marginals U(0,1) on [0,1].
More formally we define a copula as follows via R. Nelson [2006].
A d-dimensional copula is a multivariate cumulative distribution function C : [0, 1]d → [0, 1]
with the following properties:
(i) For every u = (u1, ..., ud) ∈ [0, 1]d, C(u) = 0 if at least one coordinate of u is 0.
(ii) ∀j = 1, ..., d, it holds that C(1, ..., 1, uj, 1, ..., 1) = uj
(iii) C is d-increasing i.e. C(u) ≥ 0, ∀d
Further Property: In order to justify the use of differentiation going forward we introduce a
theorem. We reference H. Li, see bibliography.
(a) For any d-dimensional copula C,
|C(u1, ..., ud) − C(v1, ..., vd)| ≤
d
i=1
|ui − vi|,
∀ (u1, ..., ud) & (v1, ..., vd) ∈ [0, 1]d
That is C is Lipschitz continuous with respect to Lipschitz constant 1. Now given all the criteria
above we are able to invoke differentiability.
24

Result 3.24 - Sklar’s Theorem
For this result let X = (X1, ..., Xd) be a vector of d random variables with joint density function
f and cumulative function F. Additionally let f1, ..., fd be corresponding marginal densities and
F1, ..., Fd the strictly increasing and continuous marginal distribution functions of X1, ..., Xd.
Sklar’s theorem states that every multivariate distribution F with marginals F1, ..., Fd can be
written as
F(x1, ..., xd) = C(F1(x1), ..., Fd(xd)). (3.11)
If F1, ..., Fd are all continuous then C is unique. And conversely if C is a d-dimensional copula
and F1, ..., Fd are distribution functions, then the function F defined by (3.9) is a d-dimensional
distribution function with margins F1, ..., Fd.
Inverting the above allows us to isolate the copula function which is the aim of this thesis i.e.
isolate dependence structure that is why sklar’s theorem is so important. So we get
C(u) = C(u1, ..., ud) = F(F−1
(x1), ..., F−1
(xd)). (3.12)
This now allows us to derive the density copula function, c, through partial differentiation,
f(x) =
∂dC(F1(x1), ..., Fd(xd))
∂x1 · · · ∂xd
=
∂dC(F1(x1), ..., Fd(xd))
∂F1(x1) · · · ∂Fd(xd)
f1(x1) · · · fd(xd) (3.13)
therefore,
c(F1(x1), ..., Fd(xd)) :=
∂dC(F1(x1), ..., Fd(xd))
∂F1(x1) · · · ∂Fd(xd)
=
f(x)
f1(x1) · · · fd(xd)
(3.14)
Now that we have described what a copula function is and how it exists with the marginals and
joint probability function, we move onto discuss how to break down a joint probability function,
into its constituent parts necessary to fit copulae.
Pair Copula Decomposition of Multivariate Distributions
For this subsection we follow K.Aas [2006] and consider d-dimensional joint density as described
in the copula section above. We begin by breaking down the general case before working
through a case with d = 3. So we start by decomposing the joint density into its marginal
(f(xd)) and conditional ((f(xd−1|xd))) densities.
f(x1, ..., xd) = f(xd) ·
f(xd−1, xd)
f(xd)
·
f(xd−2|xd−1, xd)
f(xd−1, xd)
· ... ·
f(x1|x2, ..., xd−2, xd−1, xd)
f(x2, ..., xd)
(3.15)
= f(xd) · f(xd−1|xd) · f(xd−2|xd−1, xd) · ... · f(x1|x2, ..., xd)
with the exception of relabelling the variables the decomposition is unique. If we now link what
we have defined in Sklar’s theorem we can re-write our joint density as follows
f(x1, ..., xd) = c12...d(F1(x1), ..., Fd(xd)) · f1(x1) · · · fd(xd) (3.16)
for some unique d-variate copula density c12...d.
In the bi-variate case we would have
f(x1, x2) = c12(F1(x1), F2(x2)) · f1(x1) · f2(x2)
25

where c12 is an appropriate pair-copula density to describe the pair of transformed variables
F1(x1) and F2(x2).
For the conditional density it follows that
f(x1|x2) = c12(F1(x1), F2(x2)) · f1(x1) (3.17)
for the same pair copula. If we go back to our main core equation 3.15 we can decompose our
second term f(xd−1|xd) into the pair copula c(d−1)d(Fd−1(xd−1), Fd(xd)) and a marginal density
fn(xn). For three random variables we construct the following
f(x1|x2, x3) = c12|3(F1|3(x1|x3), F2|3(x2|x3)) · f(x1|x3) (3.18)
for the appropriate pair-copula c12|3, applied to transformed variables F(x1|x3) and F(x2|x3).
However, this is not unique we can also represent it as follows
f(x1|x2, x3) = c13|2(F1|2(x1|x2), F3|2(x3|x2)) · f(x1|x2)
where the pair-copula c13|2 is different from the c12|3 above. By way of substitution we can put
f(x1|x2) into the equation above giving
f(x1|x2, x3) = c13|2(F1|2(x1|x2), F3|2(x3|x2)) · c12(F1(x1), F2(x2)) · f1(x1),
these steps are essential for breaking down the multivariate density into pair copulae acting on
conditional distributions and the marginal densities, the example below should make things
clearer.
Generalising this decomposing pair-copula pattern we can say that from equation 3.15 we can
decompose each term into the appropriate pair-copula times a conditional marginal density,
using
f(xj|xB) = cjv|B−v
(F(xj|xB−v ), F(xv|xB−v )) · f(x|xB−v ), j = 1, ..., d (3.19)
where B ⊂ {1, ..., d}{j}, xB is a |B|-dimensional sub vector of x. xv can be any single element
of xB, therefore, xB−v denotes the (|B| − 1)-dimensional vector when xv is absent from xB,
more simply B−v := B{v}. Essentially v determines the type of corresponding copula, so the
obtained constructions are not unique. So we an deduce that, under the appropriate regularity
conditions, any equation of the form in 3.15, can be expressed as a product of pair-copulae,
acting on several different conditional distributions. We can also see that the process is iterative
in nature, and given a specific factorisation, there are still many different parametrisations.
For completeness lets finish our example of our tri-variate case where we get
f(x1, x2, x3) = f(x3) · f(x2|x3) · f(x1|x2, x3)
= f(x3) · c23(F2(x2), F3(x3)) · f(x2) · f(x1|x2, x3)
= f(x3) · c23(F2(x2), F3(x3)) · f(x2)
· c12|3(F1|3(x1|x3), F2|3(x2|x3)) · f(x1|x3)
= f(x3) · c23(F2(x2), F3(x3)) · f(x2)
· c12|3(F1|3(x1|x3), F2|3(x2|x3)) · c13(F1(x1), F3(x3)) · f(x1)
26

We conducted the factorisation above using equation 3.15 followed by using 3.17 except here we
have f(x2, x3) instead of f(x1, x2) (same method applies) and finally we use 3.18. Tiding up
terms gives us,
f(x1, x2, x3) = f(x1) · f(x2) · f(x3)
· c13(F1(x1), F3(x3)) · c23(F2(x2), F3(x3))
· c12|3(F1|3(x1|x3), F2|3(x2|x3)),
like we have discussed above please note this factorisation is not unique we could also have the
following
f(x1, x2, x3) = f(x1) · f(x2) · f(x3)
· c12(F1(x1), F2(x2)) · c23(F2(x2), F3(x3))
· c13|2(F1|2(x1|x2), F3|2(x3|x2)).
Either way after decomposing our multivariate distribution we finish with the product of
marginal distributions and pair-copulae.
To finish off this section we return to where we left off in equation 3.19. We need to discuss the
nature of the marginal conditional distributions of F(xj|xB). J. Harry [1996] showed that for
every, v ∈ B
F(xj|xB) =
∂Cjv|B−v
(F(xj|xB−v ), F(xv, xB−v ))
∂F(xv, xB−v )
(3.20)
where Cij|k is a bivariate copula distribution function. When we look at the univariate case of
B = {v} note that we have
F(xj|xv) =
∂Cjv(F(xj), F(xv))
∂F(xv)
In section 3.5 - Vine Copula Simulation, we will make use of the so called h-function, h(x, v, Θ),
to represent the conditional distribution function when x & v are uniform, i.e.
f(xj) = f(xv) = 1 and F(xj) = xj and F(xv) = xv. This means we have,
h(xj|xv, Θjv) := F(xj|xv) =
∂Cjv(F(xj), F(xv))
∂F(xv)
=
∂Cjv(xj, xv)
∂xv
, (3.21)
where xv corresponds to the conditioning variable and Θjv relates to the set of parameters for
the copula of the joint distribution function of x and v. Finally, let h−1(xj|xv, Θjv) be the
inverse of the h-function w.r.t to the first variable xj, or equivalently the inverse of the
conditional distribution function F(xj|xB).
Now that we have finished decomposing our joint density into its marginals and pairwise-copulae
we need to look at the possible copula distributions to fit in order to build our vine copula’s in
the next section.
The Copula Family of Functions
In this section we will outline the different copula functions available to us, which we will later
use for the vine copula models. When choosing which copula model to use we usually look to see
whether the data shows positive or negative dependence, the different copula models will be
characterised by the shape they exhibit amongst the data clustering. For this section we follow
Hendrich [2012] as it details the copula families very clearly.
27

Gaussian
The single parameter Gaussian copula became well known for its use in the valuation of
structured products during the financial crisis of 2007 and 2008. Its popularity stems from the
fact that it is easy to parameterise and work with. See for more detail C. Meyer [2009].
The bivariate Gaussian copula with correlation parameter ρ ∈ (−1, 1) is defined to be
C(u1, u2) = Φρ(Φ−1
(u1), Φ−1
(u2)),
where Φρ(·, ·) represents the bivariate cumulative distribution function of two standard
Gaussian distributed random variables with correlation ρ. Φ−1(·) is the inverse of the univariate
standard Gaussian distributed function. The related copula density is given by
c(u1, u2) =
1
1 − ρ2
exp −
ρ2(x2
1 + x2
2) − 2ρx1x2
2(1 − ρ2)
where we have x1 = Φ−1(u1) and similarly for x2. For ρ → 1 (ρ → −1) the Gaussian copula
shows complete positive (negative) dependence, we have the independent copula if ρ = 0.
In order to give the reader some idea of this copulas graphical properties, we have plotted
scatter and contour plots for three different values of τ, which illustrate the three different levels
of dependence τ = 0.8 which is high positive dependence, τ = −0.8 which is high negative
dependence and finally τ = 0.3 which is quite neutral in terms of dependence.
Figure 9: Bivariate Normal Copula: τ = 0.8
28

Figure 10: Bivariate Normal Copula: τ = -0.8
Figure 11: Bivariate Normal Copula: τ = 0.3
t Copula
t copula is a two parametric copula function deﬁned as
C(u1, u2) = tρ,ν(t−1
ν (u1), t−1
ν (u2)),
where tρ,ν represents the bivariate cumulative distribution function of two standard student-t
distributed random variables with correlation parameter ρ ∈ (−1, 1) and ν > 0 degrees of
freedom. We let t−1
ν (·) be the quantile function of the univariate standard student-t distribution
function with ν degrees of freedom. Copula density is given by
c(u1, u2) =
Γ
ν + 1
2
Γ ν
2
νπdtν(x1)dtν(x2) 1 − ρ2
1 +
x2
1 + x2
2 − 2ρx1x2
ν(1 − ρ2)
−ν+2
2
,
where xi = t−1
ν (ui), i = 1, 2, and dtν(xi) is the density of the univariate standard Student-t
29

distribution with ν degrees of freedom, so
dtν(xi) =
Γ
ν + 1
2
Γ ν
2
√
πν
1 +
x2
i
ν
−ν+2
2
, i = 1, 2.
The t copula differs from the Gaussian copula in the sense that it exhibits fatter tails, however,
for increasing degrees of freedom the t copula does approach the Gaussian copula. As before for
ρ → 1 (ρ → −1) the t copula shows complete positive (negative) dependence.
Figure 12: t-Student Copula contour plots ν = 4: τ = 0.8, 0.3 & − 0.8, respectively
Now we look at how the contours shape the dependence given a fixed τ = 0.3 and varying
degrees of freedom ν = 3, 7 &11. What we see is that as ν gets larger we get closer and closer to
the elliptical shape of the Gaussian copula.
Figure 13: t-Student Copula contour plots τ = 0.3: ν = 3, 7 &11, respectively
We now look at introducing a different family of copulae called Archimedean copulae. They are
very popular as they model dependence of arbitrarily high dimensions with only one parameter
to indicate the strength of dependence.
30

Frank Copula
The Frank Copula distribution is given by
C(u1, u2) = −
1
θ
log 1 +
e(−θu1) − 1 e(−θu2) − 1
e(−θ) − 1
,
and the single parameter θ ∈ R{0}. The copula density is as follows
c(u1, u2) = θ(eθ
− 1)
e−θ(u1+u2)
e−θ − 1 + e−θu1 − 1 e−θu2 − 1
2 .
similar to Gaussian and t copula in the sense that we ascertain complete positive dependence for
θ → +∞, independence θ → 0 and finally complete negative dependence for θ → −∞.
Figure 14: Frank Copula contour plots τ = 0.8, 0.3 & − 0.8, respectively
In figure 14 above we have illustrated the contour plots for the Frank copula with varying values
of τ similar to previous plots, note the difference with shape of the contours compared to the
Gaussian and t-Student , here we have a significant heavy tail dependence.
Clayton Copula
The Clayton copula distribution is given by
C(u1, u2) = u−θ
1 + u−θ
2 − 1
− 1
θ
,
with the following density
c(u1, u2) = (1 + θ)(u1u2)−1−θ
(u−θ
1 + u−θ
2 − 1)− 1
θ
−2
.
As θ > 0, we are limited to model only positive dependence. Hence the Clayton only exhibits
complete positive dependence for θ → +∞ and independence for θ → 0.
31

Figure 15: Clayton Copula contour plots τ = 0.8 & 0.3, respectively
In figure 15 above we can see, in general the Clayton illustrates a considerable heavy tail
dependence in the upper right quadrant indicating heavy upper tail dependence.
Gumbel Copula
The Gumbel copula distribution is given by
C(u1, u2) = exp − (− log u1)θ
+ (− log u2)θ
1
θ
,
with single parameter θ ≥ 1. Density is as follows
c(u1, u2) = C(u1, u2)
1 + (θ − 1)Q
1
θ Q−2+ 2
θ
(u1u2)(log u1 log u2)1−θ
,
with
Q = (− log u1)θ
+ (− log u2)θ
.
So with θ = 1 we have independence and we have complete positive dependence for θ → +∞.
Figure 16: Gumbel Copula contour plots τ = 0.8 & 0.3, respectively
Note in figure 16 the similarities to the Clayton copula function essentially we now have heavy
dependence in the lower left quadrant illustrating significant lower tail dependence.
32

Note: We can see that both the Clayton and Gumbel copula distributions do not exhibit
negative dependence. In order to get around this restriction so that we can still utilise the
fundamental characteristics of these copulas we introduce rotations of the original functions.
This the allows us to model data with negative dependence properties with the Clayton and
Gumbel. We do not go into detail with this subsection of copula’s but if the reader wishes to
pursue this further we recommend reading Hendrich [2012] starting page 13.
Now that we have discussed the possible copula functions available to us to for model fitting
purposes, we need to detail how we go about building the pairwise-copula model structures.
This leads us into our next section on Vine Copula Construction.
3.4 Vine Copula Construction
When considering a joint density with a high number of dimensions we find that there exists a
considerable amount of possible pair-copulae constructions. With the work done by Bedford and
Cooke [2001] we are now able to organise different possible decompositions into a graphical
structure. In this section we explore the two subcategories of the general regular vine (R-vines)
those are conical vine (C-vine) and (D-Vine). These particular vine types are becoming
increasingly popular with a considerable surge of work going into risk management in finance
and insurance. As we are using R-software during the application process we recommend the
reader also follows E. Brechmann and U. Schepsmeier [2013] to learn the programming tools.
After discussing the theory of C & D-vines we will look at how we select a model and the
goodness-of-fit criteria used to make comparisons between the different model selections.
D-vines
The D-vine is probably the most simplistic subgroup of R-vines and gives us a good place to
start. We shall follow the work of K. Aas [2006], who brought the advancement of statistical
inference into the C & D-vine model fitting. The graphical representation is straight forward as
can be seen in figure 17 below. Each edge corresponds to a pair copula, where by the edge label
represents the relative copula density subscript i.e. 1, 3|2 represents c13|2(·). The whole
decomposition can be described by d(d − 1)/2 edges and the d marginals for the respective d
continuous random variables. Ti s i = 1, ..., 4 are the trees which describe the break down of the
pairwise copula construction. It was also shown that there are d!/2 possible vines from a
d-dimensional joint probability density.
33

Figure 17: D-Vine tree for (X1, .., X5)
Density of the D-vine copula in figure 17 looks as follows:
f(x1, x2, x3, x4, x5) = f(x1) · f(x2) · f(x3) · f(x4) · f(x5)
· c12(F(x1), F(x2)) · c23(F(x2), F3(x3)) · c34(F(x3), F(x4)) · c45(F(x4), F(x5))
· c13|2(F(x1|x3), F(x3|x2)) · c24|3(F(x2|x3), F(x4|x3)) · c35|4(F(x3|x4), F(x5|x4))
· c14|23(F(x1|x2, x3), F(x4|x2, x3)) · c25|34(F(x2|x3, x4), F(x5|x3, x4))
· c15|234(F(x1|x2, x3, x4), F(x5|x2, x3, x4))
We can generalise the above for the joint probability density f(x1, ..., xd) which gives us
d
k=1
f(xk)
d−1
j=1
d−j
i=1
ci,i+j|i+1,...,i+j−1 F(xi|xi+1, ..., xi+j−1), F(xi+j|xi+1, ..., xi+j−1) (3.22)
Conical Vines (C-vines)
In the first C-vine tree, the dependence with respect to one particular random
variable/institution, the first root node, is modelled using bivariate copulas for each pair of
variables/institutions. Conditioned on this variable, pairwise dependencies with respect to a
second variable are modelled, the second root node. In general, a root node is chosen for each
tree and all pairwise dependencies with respect to this node are modelled conditioned on all of
the previous root nodes. This is how we obtain the C-vine star structure from the trees. As
defined by E. Brechmann and U. Schepsmeier [2013].
34

We again take an example of a ﬁve dimensional joint probability density with decomposition as
follows
f(x1, x2, x3, x4, x5) = f(x1) · f(x2) · f(x3) · f(x4) · f(x5)
· c12(F(x1), F(x2)) · c13(F(x1), F3(x3)) · c14(F(x1), F(x4)) · c15(F(x1), F(x5))
· c23|1(F(x2|x1), F(x3|x1)) · c24|1(F(x2|x1), F(x4|x1)) · c25|1(F(x2|x1), F(x5|x1))
· c34|12(F(x3|x1, x2), F(x4|x1, x2)) · c35|12(F(x3|x1, x2), F(x5|x1, x2))
· c45|123(F(x4|x1, x2, x3), F(x5|x1, x2, x3))
The graph of this decomposition, ﬁgure 18, is illustrated on the next page.
As with the D-vine we can generalise this to joint probability density f(x1, ..., xd) denoted by
d
k=1
f(xk)
d−1
j=1
d−j
i=1
cj,j+1|1,...,j−1 F(xj|x1, ..., xj−1), F(xj+i|x1, ..., xj−1) (3.23)
Aas [2006] was able to show there are d!/2 possible vines
Figure 18: C-Vine tree for (X1, .., X5)
35

It is important to note that fitting a conical vine can be more advantageous when a specific
variable is known to command a lot of the interactions in the data set. As the majority of the
dependence is captured in the first tree. As we have more important material to cover we stop
here in terms of theoretical detail but we recommend the interested reader, reviews K. Aas
[2006] for more information.
Now that we have defined the types of vine copula’s we can fit, we must discuss the process used
to select the model.
Vine Model Selection Process
With the knowledge of the different types of vine copulas available to us we will now look at the
procedure necessary to fit them and make inferences on the models. In the application in section
4 we will fit both types of vine copula’s and make comparisons. For this section we will follow
the paper by E. Brechmann and U. Schepsmeier [2013] as it runs parallel with the coding. The
steps to construct a vine copula look as follows:
(i) Structure selection - First step is to decide on the structure of the decomposition as we have
shown in the previous section with the vine graphs.
(ii) Copula selection - With the structure in place we then need to choose copula functions to
model each edge of the vine copula construction.
(iii) Estimate copula parameters - Then we need to estimate the parameters for the copulae
chosen in step (ii).
(iv) Evaluation - Finally the models need to be evaluated and compared to alternatives.
(i) Structure selection: If we wanted to get a full overview of the model selection process we
would fit all possible models and compare our results. However, this is not realistic. As we have
seen in the previous section as the number of dimensions increases the possible outcomes of the
decompositions increases to a sufficiently large number. To counter this problem we need to take
a more clever approach.
C.Czado [2011] introduced a sequential approach which takes the variable with the largest value
for
ˆS :=
d
i=1
|τij| j = 1, ..., d, (3.24)
and allocates it to the root node for the first tree. The value represents an estimate for the
variable with the most significance in terms of interdependence. This process is repeated until
the whole tree structure is completed i.e. move onto the second tree and find the next largest
estimated pair kendall tau value. As the copulae specified in the first trees of the vine underpin
the whole dependence structure for our chosen model, we want to capture most of the
dependence in this tree. You will see in the application section that we order the variables for
the tree structure via equation 3.24. Please see the source mentioned for more detail.
Loading the VineCopula package (U. Schepsmeier et al [2015]) in R and then using the function
RVineStructureSelect, allows us to carry out the above procedure in an automated way. This
function can select optimal R-vine tree structures through maximum spanning trees with
absolute values of pairwise Kendall tau’s as weights but also includes the above method for
C-vines. See VineCopula package pdf for further details or R-code in the appendix.
36

(ii) Copula selection: Once we have defined the vine structure we need to conduct a copula
selection process. It can be done via Goodness-of-fit tests, Independence test, AIC/BIC and
graphical tools like contour plots. We will be using the function CDVineCopSelect for the C- &
D-vine and RVineCopSelect for R-vine, copula selections. These allow the coder to decide
whether they use AIC or BIC and/or Independent test (see detail below). The program tests an
extensive range of copula functions, we refer the reader to VineCopula package for more detail.
To recap on AIC and BIC criteria see section 3.1.4.
Independent test - looks at whether two univariate copula data sets are independent or not.
The test exploits the asymptotic normality of the test statistic
Statistic := T =
9n(n − 1)
2(2n + 5)
× |ˆτ|
where n is the number of observations for the copula data vectors and ˆτ is the estimated
Kendall tau value between two copula data vectors u1&u2. The p-value of the null hypothesis of
bivariate independence hence is asymptotically
p − value := 2 × (1 − Φ(T).
This was referred from the VineCopula package.
(iii) Estimate copula parameters:
Now we have chosen the copula distributions we look to estimate the parameters. Again to
conduct this we use R functions which are as follows: for C- & D-vine we have CDVineSeqEst
and for R-vine we use RVineSeqEst. The pair-copula parameter estimation is performed
tree-wise, i.e. for each C- or D-vine tree the results from the previous trees are used to calculate
the new copula parameters. The estimation method is either done by pairwise maximum
likelihood estimation (see page 16 E. Brechmann [2013] for elaboration on test details) or
inversion of Kendalls tau (this method is restricted to copula functions with only one
parameter). Referenced from VineCopula package, please consult for more detail.
(iv) Evaluation: Finally in order to evaluate and compare our selected models, we can again
use the classical AIC/BIC measure but also the Vuong test. Please see previous sections for
explanation of AIC/BIC criteria.
Vuong test - The Vuong test is a likelihood-ratio test which can be used for testing non-nested
models. It is carried out between two d-dimensional R-vine copula models. The test is as
follows: let c1&c2 be two competing vine copula models in terms of their densities and with
estimated parameter sets ˆθ1&ˆθ2. We compute the standardised sum, ν, of the log difference of
the pointwise likelihoods
mi := log
c1(ui|ˆθ1)
c2(ui|ˆθ2)
for observations ui with i = 1, ...n. Statistic is as follows
statistic := ν =
1
n
n
i=1 mi
n
i=1(mi − ¯m)2
Vuong showed that ν is asymptotically standard normal. According to the null-hypothesis
37

H0 : E[mi] = 0, ∀i = 1, ..., n,
thus we prefer vine model 1 over vine model 2 if
ν > Φ−1
1 −
α
2
,
where Φ−1 denotes the inverse of the standard normal distribution function. If ν < Φ−1 1 −
α
2
we choose model 2. But, if |ν| ≤ Φ−1 1 −
α
2
no decision can be made among the two models.
Like AIC and BIC this test can be altered to take into account the number of parameters used,
please see VineCopula package for more detail.
3.5 Vine Copula Simulation
Moving forward we now look at vine copula simulation. Here we set out the theory necessary to
simulate copula data from our C-vine models presented in the previous section. The simulation
will be done conditional on an arbitrarily chosen special variable xi+ . The work done by K. Aas
only considers sequential simulation from B = {1, ..., d} in an ordered fashion starting from x1.
But for us we need to be able to simulate from any starting point within B. This requires a
slight alteration to K. Aas algorithm, we will reference the paper by Hendrich [2012] who
explains this alteration. We must ﬁrstly outline the basic theory before introducing the
algorithm.
Normal Simulation
In this section we depict the work done by K. Aas. By ’Normal’ we mean that we are not
conditioning on any specially chosen variable. Going forward we will keep this labelling for
reference. Given a sample ω1, ..., ωd independently and identically distributed on uniform[0, 1]
then the general simulation procedure following a C-vine looks as follows:
x1 = ω1
x2 = F−1
(ω2|x1)
x3 = F−1
(ω3|x1, x2)
x4 = F−1
(ω4|x1, x2, x3)
.
.
.
xd = F−1
(ωd|x1, x2, x3, ..., xd−1)
To successfully run the simulation we need to be able to calculate the conditional distribution
functions such as F(xj|x1, x2, ..., xd−1), j ∈ {2, ..., d}, and their inverses, respectively, for this
procedure we make critical use of the h-function mentioned in 3.21 and the pairwise
decomposing general equation 3.20. We know from 3.20 that the selection of v ∈ B determines
the copula Cjv|B−ν
used to calculate the conditional distribution function. We only want to
38

include the copulae already involved in the decomposition of the joint density. Hence, for
v = j − 1 &B = {1, ..., j − 1} this gives us
F(xj|x1, ..., xj−1) =
∂Cj,j−1|1,...,j−2 F(xj|x1, ..., xj−2), F(xj−1|x1, ..., xj−2)
∂F(xj−1|x1, ..., xj−2)
= h F(xj|x1, ..., xj−2)|F(xj−1|x1, ..., xj−2), θj,j−1|1,...,j−2 , (3.25)
we notice again that the h-function decomposes the conditional distribution function into two
lower-dimensional distribution functions. This characteristic allows us to solve for the above
equation by applying the h-function iteratively on the first argument. This leads to
F(xj|x1, ..., xj−3, xj−2, xj−1) = h F(xj|x1, ..., xj−2)|F(xj−1|x1, ..., xj−2), θj,j−1|1,...,j−2
F(xj|x1, ..., xj−2) = h F(xj|x1, ..., xj−3)|F(xj−2|x1, ..., xj−3), θj,j−2|1,...,j−3
F(xj|x1, ..., xj−3) = h F(xj|x1, ..., xj−4)|F(xj−4|x1, ..., xj−4), θj,j−3|1,...,j−4 ·
·
·
F(xj|x1, x2) = h F(xj|x1)|F(x2|x1), θj2|1
F(xj|x1) = h(xj|x1, θj1) (3.26)
One can see from the system of equations above that equation 3.25 can essentially be written as
a nested set of h-functions. By looking at the RHS of the equations in 3.26 you can see the
subscript order is always dropping one as we move down to the next equation. This implies that
the equations can be solved sequentially. For more detail see K. Aas.
Conditional Simulation
Now consider simulation conditional on a arbitrarily selected variable of interest
xi+ , i+ ∈ {1, ..., d}. Given the same conditions as in Normal simulation procedure expect here
we condition on x3 = a where a ∈ [0, 1], we have simulation procedure as follows:
x1 = F−1
(ω1|x3)
x2 = F−1
(ω2|x1, x3)
x3 = a
x4 = F−1
(ω4|x1, x2, x3)
.
.
.
xd = F−1
(ωd|x1, x2, x3, ..., xd−1)
We can deduce from our simulation procedure above that we have a different sampling
procedure compared to the Normal one. Simulating values for variables with subscripts greater
than i is fine i.e. j = i+ + 1, i+ + 2, ..., d, problems arises when we simulate values of the
variables with subscripts less than i+ i.e. j = 1, ..., j − 1, i+.
39

We now deﬁne the conditioning set for simulated variables with indices j = 1, ..., j − 1, i+, as Bj.
The system of equations in 3.26 are the same in this conditional case ∀j ≤ i+ so again the
nested h-functions can be solved sequentially, see Hendrich [2012] for the illustration of this.
In order to clarify how the conditional simulation procedure works we give an example taken
from Hendrich [2012], please see below.
Example - Conditional Simulation for d=5 & i+ = 4
For the simulation conditional on x4 = a ∈ [0, 1] we have:
(1) ω1 = F(x1|x4) = h(x1|x4, θ14) and so
x1 = h−1
(ω1|x − 4, θ14)
(2) ω2 = F(x2|x1, x4) = h( F(x2|x1)
=h(x2|x1,θ12)
| F(x4|x1)
=h(x4|x1,θ14)
, θ34|12) with
h(x2|x1, θ12) = h−1
(ω2|h(x4|x1, θ14), θ24|1)
=:y1
⇐⇒ x2 = h−1
(y1|x1.θ12)]
(3) ω3 = F(x3|x1, x2, x4) = h( F(x3|x1, x2)
=h(F(x3|x1)|F(x2|x1),θ12)
|F(x4|x1, x2), θ24|1) and hence
F(x4|x1, x2) = h( F(x4|x1)
=h(x4|x1,θ14)
| F(x2|x1)
=h(x2|x1,θ12)
, θ12) =: y2
this gives us
h( F(x3|x1)
=h(x3|x1,θ13)
| F(x2|x1)
=h(x2|x1,θ12)
, θ12) = h−1
(ω3|y2, θ34|12)
=:y3
⇐⇒ h(x3|x1, θ13) = h−1
(y3|h(x2|x1, θ12), θ23|1)
=:y4
⇐⇒ x3 = h−1
(y4|x1, θ13)
(4) As we chose at the start, x4 = a ∈ [0, 1] (5)
ω5 = F(x5|x1, x2, x3, x4) = h F(x5|x1, x2, x3)
=h(F(x5|x1,x2)|F(x3|x1,x2),θ35|12)
|F(x4|x1, x2, x3), θ45|123 with
F(x4|x1, x2, x3) = h(F(x4|x1, x2), F(x3|x1, x2), θ34|12) =: y5
additionally
F(x3|x1, x2) = h(F(x3|x1)|F(x2|x1), θ23|1) =: y6
40

so we get the following
⇐⇒ h(F(x5|x1, x2)| F(x3|x1, x2)
=y6
, θ35|12) = h−1
(ω5|y5, θ45|123)
=:y7
⇐⇒ h(F(x5|x1)|F(x2|x1), θ25|1) = h−1
(y7|y6, θ35|12)
=:y8
⇐⇒ h(x5|x1, θ15) = h−1
(y8|h(x2|x1, θ12), θ25|1)
=:y9
⇐⇒ x5 = h−1
(y9|x1, θ15)
For more detail on this please see Hendrich [2012] page 117. Now that we have the above
sampling procedure for the conditional case we are now able to alter the algorithm for the
Normal simulation procedure introduced by K. Aas to give the following (referenced from
Hendrich page 119), see over the page:
41

Algorithm: Conditional simulation algorithm for a C-vine. Generate one sample
x1, ..., xd from the C-vine, given that variable xi+, i+ ∈ {1, ..., d}, is equal to a pre-speciﬁed
value a.
Sample ω1, ..., ωi+−1, ωi++1, ..., ωd independent and uniformly distributed on [0,1].
xi+ = vi+1 = a
for j ← 1, ..., i+ − 1, i+ + 1, ..., d
vj1 = ωj
if j > i+ then
vj1 = h−1(vj1|vi+j, θi+j|1,...,j−1)
end if
if j > 1 then
for k ← j − 1, ..., 1
vj1 = h−1(vj1|vkk, θkj|1,...,k−1)
end for
end if
xj = vj1
if j < d then
for l ← 1, ..., j − 1
vj,l+1 = h(vjl|vll, θlj|1,...,l−1)
end for
end if
if j < i+ then
vi+,j+1 = h(vi+j|vjj, θji+|1,...,j−1)
end if
end for
Summary of algorithm: The outer for-loop will run over the sampled variables. The
sampling of variable j is initialised with respect to its position in the vine. Therefore if j < i+
then calculation depends on i+, if j > i+ it does not. The j variable is then carried forward into
ﬁrst inner for-loop. For the last steps the conditional distribution functions needed for sampling
(j + 1)th variable are calculated. As F(xj|x1, ..., xj−1) is computed recursively in the second
inner for-loop for every j, the corresponding F(xi+ |x1, ..., xj−1) is worked out in the last step
∀j < i+.
42

4 Application
Now that the theory is completed we can look to apply it to our data set. There are several
steps necessary in order to acquire some meaningful results. First we need to fit the time series
models to get our standardised residuals, which then, need to be transformed into copula data
as mentioned in 2 The Data, via probability integral transform. Once this is done we move onto
fitting our copula vine structures and their respective copula distributions. Finally we look at the
copula simulation where we hope to answer our major questions mentioned in 1.2 Questions of
Our Investigation.
Through this section we also apply the procedures to the 2009 data set, for comparative purposes,
in line with our questions in section 1.2. In our analysis we focus on the 2008 data set to avoid
clouding of results and repetition of software output. For these reasons we omit the 2009 GARCH
modelling and 2009 R-Vine software output. Please note all important comparison will be made.
4.1 Fitting Time Series Models
As we discussed in section 3.1, in order to conduct our copula construction we need to ensure
our data sets are independent and identically distributed (i.i.d). In order to do this we are going
to fit the time series models as discussed in section 3.1 where we are able to acquire model
residuals which are i.i.d, note these residuals are modelled by the distributions chosen in section
3.1.1.’Distribution of the Zt’s. Once we have completed this step we will look to apply the
probability integral transform to secure our copula data.
Fitting Procedure
Our 2008 data sample comes in the format of equation 3.1 (log return), we make the assumption
that our data is weakly stationary, with this in mind we model each individual time series as
follows:
1. We first need to decide on the order p,q of ARMA(p,q) model, this is done using the
auto.arima function (forecast package) in R which selects the best fitting model via the
AIC criteria as a measure of goodness-of-fit. Note this can also be done manually using
the graphical approach i.e. ACF and PACF, depends on the readers preference. In this
paper we try to automate as many of these processes in order to tailor this to a
computationally convenient and easy to use tool for management.
2. Now we look to specify our GARCH model as you will see from our section 3.1.3 and
referring to the paper A. Ghalanos [2014]. We can try a wide variety of GARCH
specifications and fit their respective models in order to find the best fitting model via
AIC/BIC and log likelihood criteria (see 3.1.4 for recap). For this we use ugarchspec and
ugarchfit functions within the VineCopula package. Please reference Appendix C for
visibility on the code.
3. At the same time we need to look at the different types of distributions to fit to our
standardised residuals. Again we can use the criteria above for comparative purposes, but
we must also consider, tests to see whether the fitted residuals are independently and
identically distributed according to the assumed distribution chosen. We use QQ-plots,
choosing the distribution which has the most points lying on a straight line.
43

4. The final set of tests are used to ascertain whether we have a good performing model. If
after fitting a standard GARCH model we find the Sign Bias test suggests a presence of
asymmetry we should fit a eGARCH model. However, if our results do not differ
significantly we should select the basic model so we avoid over parameterisation.
In table 3 and table 4 over the next pages we detail our findings from the above procedure. The
tables include the type of ARMA(p,q) model selected with its associated parameters, the type of
GARCH(1,1) model selected i.e. eGARCH or csGARCH and their associated parameters,
respective distributions for the residuals and their parameters, and finally the info criteria
consisting of log Likelihood, AIC and BIC criteria.
During the fitting process we found that the eGARCH model fitted the majority of our models.
Often when fitting the standard GARCH (sGARCH) model the parameters were either deemed
insignificant or there were clear signs of leverage effects/asymmetry effects amongst the data.
Also we were captured by the single use of the csGARCH model (Component sGARCH) model,
it seems that its properties of modelling both short-run and long-run movements of volatility
fitted the OML.L data well. Additionally, the most common statistical distributions to model
the residuals was the t-Student and Normal distribution which was not necessarily surprising as
the t-Student model is notorious for capturing the leptokurtosis characteristic of financial data.
The final two distributions used were skewed ged and standard ged.
When we look at the p-values within the table one can see that the majority of the parameters
are significant at the 5% level, there are a couple of parameter estimates that just fall outside of
this including the alpha (p=0.154) for csGARCH on OML.L model and eta parameter (p =
0.133) for PRU.L but asides these we can say the majority of the parameters fit the data well.
The mean parameter is absent for the majority of the institutions which is due to the nature of
the data source i.e. log return (single difference data). All of the banks in fact showed no sign of
autocorrelation which meant they did not require an ARMA model. On the other hand, the
insurance companies seemed to illustrate some form of autocorrelation which had to be
modelled. The beta parameter was less than or equal to one for all eGARCH models ensuring
our stationarity condition had been satisfied. If we consider the distribution parameters we can
see that the majority of them do not exhibit signs of skewness. There are a couple which show
small signs such as PRU.L which showed marginal positive skewness, see Appendix A, figure 42.
Leptokurtosis does seem to be prominent within the data but we know to expect this from
financial time series and we fit eGARCH models accordingly.
Our goodness of fit measures are included at the end of table 4 which shows the log likelihood
number and the attaching AIC and BIC numbers, ranging [-6.334,-5.001] and [-6.264, -4.9312],
respectively.
Moving onto our informative time series plots in Appendix A - QQ-Plots, Empirical Density of
Standardized Residuals and Residual Plots. One can see with the QQ-plots that the majority of
the points lie on the diagonal straight line. HSBA.L and LLOY.L seem to have a slight tail
moving away from the line, however, generally speaking the QQ-plots suggest the models are
adequate. The empirical density of standardised residuals illustrate the fit of the chosen
distribution against the standardised residuals. As you can see the distributions capture the
44

majority of the data characteristics, the plotted bars escape the density curve, slightly in parts.
Finally we have the residual plots which has been selected to test for some sense of
boundedness. One can see that the vast majority of the data points are all contained within the
[-2,2] horizontal bounds. This is a reassuring sign that we have i.i.d residuals.
45

CompanyARMAParametersGarchtypeParameters
pqµφ1η1ωαβγη11η21
HSBA.L00estimateeGarch-0.278-0.2180.9700.154
p-value0.0000.0010.0000.020
LLOY.L00estimateeGarch-0.003-0.1610.9780.170
p-value0.0000.0000.0000.000
BARC.L00estimateeGarch-0.179-0.0960.9790.220
p-value0.0750.0880.0000.003
STAN.L00estimateeGarch-0.161-0.2700.9830.118
p-value0.0000.0000.0000.000
PRU.L01estimate-0.002-0.095eGarch-0.181-0.2510.9770.123
p-value0.0560.1330.0000.0000.0000.027
LGEN.L11estimate0.634-0.787eGarch-0.014-0.1231.0000.127
p-value0.0000.0000.0200.0050.0000.000
AV.L11estimate0.748-0.814eGarch-0.105-0.1830.9890.153
p-value0.0000.0000.0000.0000.0000.000
OML.L00estimate-0.002csGARCH0.0000.1590.1400.9790.140
p-value0.0150.0000.1540.0000.0000.002
Table3:ARMA-GARCHtableforallcompanies
CompanyDistributionNameParametersLogLikelihoodAICBIC
νζ
HSBA.Lt-student5.149806.242-6.334-6.264
0.008
LLOY.Lged1.418656.687-5.144-5.060
0.000
BARC.Lt-student9.125637.949-5.004-4.934
0.063
STAN.Lnormal671.665-5.278-5.222
PRU.Lsged0.9751.437668.269-5.220-5.108
0.0000.000
LGEN.Lt-student5.589713.341-5.584-5.486
0.006
AV.Lnormal685.300-5.370-5.286
OML.Lnormal637.628-5.001-4.931
Table4:StandardisedResidualDistributiontableforallcompanies
46

We proceed to analyse the goodness-of-fit through the diagnostics tests for the standardised
residuals, if you look on the next page, table 5 details the results from our tests mentioned in
section 3.1.4.
We begin with the Ljung Box test which is tested against lags of 1,2 & 5 for the standard test
and lags 1,5 & 9, for the squared residual test. LGEN.L and AV.L have different lags for the
standard Ljung Box test which is due to their ARMA(p,q) orders. For the standard Ljung Box
residual test all of the companies with the exception of OML.L, have consistent high values for
the p-value. This implies there is sufficient evidence to keep the null hypothesis of no
autocorrelation. OML.L at lag 2 suggests we reject the null at the 5% level but we expect this is
due to the quality of the data and can not remove this. Looking at the squared residual test we
see a similar picture, all of the p-values are significantly larger than .05, this implies there is
sufficient evidence to suggest independence between residuals. There is one particular data point
which is close to the 5% level, PRU.L at lag 9 with a p-value of 0.063. But apart from this the
tests seem to have performed in our favour.
Now we move onto the ARCH-LM test at lags 3,5 & 9 which we have performed in order to test
for ARCH effects among the residuals. Aside from PRU.L and BARC.L the rest of the
companies have large p-values which indicates the model provides sufficient evidence to suggest
there is no ARCH effects. PRU.L has a p-value = 0.007 at a lag of 5 which would suggest at the
5% significance level we reject the null of no ARCH effects. In general there is not a lot we can
do to remove this, it is a consequence of the data. From a general view point the vast majority
of the data suggest the models fitted have removed all ARCH effects.
The last test performed was the sign bias test. This includes the Sign Bias test (SB), Negative
Size Bias test (NSB), Positive Size Bias test (PSB) and Joint Effect test (JE). We conduct these
tests to detect leverage or asymmetrical effects, after the model has been fitted. Again all of the
companies exhibited large p-values indicating that the null hypothesis is not rejected and their
is sufficient evidence to suggest the absence of leverage effects.
Thus, after fitting the different time series models to our data, we deduce from the above
discussed tests that we obtain standardized model residuals that are independent and show no
serial autocorrelation. Additionally, we conclude the absence of ARCH-effects and asymmetric
effects amongst the residuals.
We now move onto transforming the residuals using their underlying distribution to be
uniformly distributed.
47

CompanyLjungBoxTestStandardLjungBoxTestStandardSquaredARCH-LMTestSignBiasTest
lag125159357SBNSBPSBJE
HSBA.Lteststat.0.3160.3680.9150.0370.7472.9040.0001.5352.7970.5270.9910.1541.010
p-value0.5740.7590.8790.8470.9140.7750.9910.5830.5530.5990.3230.8780.799
LLOY.Lteststat.0.0480.5621.0280.0631.2512.0750.4420.5100.7740.7510.8391.5783.631
p-value0.8270.6650.8530.8020.8010.8960.5060.8810.9470.4540.4020.1160.304
BARC.Lteststat.0.1321.1832.6340.0864.7387.9290.0436.2306.9440.2480.0450.1920.071
p-value0.7170.4430.4780.7690.1750.1330.8360.0530.0890.8050.9640.8480.995
STAN.Lteststat.2.2402.9414.0260.3242.1394.2722.1423.1164.5170.7980.3030.8890.983
p-value0.1350.1460.2510.5690.5860.5430.1430.2730.2780.4260.7620.3750.805
PRU.Lteststat.0.0801.9224.5280.6706.0079.5371.62410.01210.7110.4140.5221.4372.339
p-value0.7780.2410.1530.4130.0900.0630.2030.0070.0130.6790.6020.1520.505
OML.Lteststat.4.6925.3606.3120.8533.3984.8020.5181.2512.1150.0641.3720.3793.608
p-value0.0300.0330.0760.3560.3390.4590.4720.6600.6930.9490.1710.7050.307
lag159159357SBNSBPSBJE
LGEN.Lteststat.0.0210.3801.0720.5703.7436.3350.4296.1777.0040.7250.1741.2551.607
p-value0.8841.0001.0000.4500.2880.2620.5120.0550.0870.4690.8620.2110.658
AV.Lteststat.0.0592.0324.0300.0223.7485.7790.1424.1694.5051.5750.2640.8782.787
p-value0.8080.9520.6840.8830.2870.3240.7070.1590.2800.1170.7920.3810.426
Table5:StandardisedResidualDiagnosticsforallcompanies
48

We are almost done, as previously discussed, before we move onto copula model constructions
we need our marginal distributions of our residuals to belong to [0,1]. To achieve this we use the
probability integral transformation.
What is probability integral transformation?
Essentially what we are doing is applying the cumulative distribution function to the
standardised residual distribution (selected from ’Distribution of the Zt’s )to the eight
companies and the end result is values ∈ [0, 1]. The methodology looks as follows:
Let our standardised residuals from our fitted time series model be (Zi,t)t=1,...,253 which belong
to one of the following distributions: normal, t-student, generalised error distribution etc. Then
we have for our eight institutions
Ui,t = F(Zi,t), i = 1, ..., 8 & t = 1, ..., 253 =⇒ Ui,t ∼ Uniform[0, 1]
Where F the cumulative distribution function, which is selected depending on the outcome of
the GARCH modelling process, the distribution which yields the best goodness-of-fit results will
be chosen. See Hendrich [2012] for further information. If we look at the plots in Appendix A -
Histograms of Copula Data, we can see the data transformation to copula data, distributed on
[0,1]. Going forward we now use this as our underlying data sample to conduct all the necessary
procedures and tests of dependence. The first thing we do is look at the basic measures of
association discussed in section 3.2. Hence we now look to analyse the Kendall ˆτ and Spearman
ˆρ matrices for both 2008 and 2009.
Figure 19: 2008 and 2009 respectively Kendall ˆτ Matrices
Looking at figure 19 - Kendall tau matrix, we get our first indication of any dependence between
the institutions. As was described in section 3.2, Kendall’s tau is a measure of co-movements of
49

increases and decreases in return. We can see in 2008 all entries to the matrix are positive
indicating there is some form of dependence between all pairs of institutions. At the end of the
rows we have installed the collective sum of the row entries as a measure of comparisons with the
2009 data. The cells highlighted in red are also used for comparative reasons as they indicate all
values ∈ [0.5, 0.9]. There appears to be multiple entries between this interval, the highest degree
of dependence coming from PRU.L at 0.628 with AV.L. Note PRU.L also has high dependence
relations with three other institutions. If we now cross examine both tables from 2008 to 2009,
we can see that there is a significant drop in the numbers of cells with dependence measures
belonging to [0.5,0.9]. This can also be seen generally from the total Kendall tau value for each
matrix, a drop from 34.516 to 30.568, circa 11% drop. So our first sample of evidence suggests
that the dependence between these institutions was not as strong in 2009.
Figure 20: 2008 and 2009 respectively Spearman’s ˆρ Matrices
Looking at figure 20 we look at the Spearman’s rho matrix for both, 2008 and 2009. Spearman’s
rho being the alternative measure for dependence. Whilst the absolute values are larger in the
matrix compared to Kendall’s tau, the results are the same. Going from 2008 to 2009 there is a
significant drop in total dependence between our institutions. We again compare the number of
cells highlighted in red (pairs belonging to interval [0.7,0.9]) we can see a dramatic drop and the
total Spearman value drops from 44.191 to 39.886, circa 10% drop.
In order to try and give a graphical interpretation we are going to illustrate the dependence
through distance on a two-dimensional space via the theory discussed on multidimensional
scaling (section 3.2), in particular we use Kruskal-Shephard scaling method to calculate and plot
points exhibiting dependence. Naturally we have plotted the data for both 2008 and 2009 to
make comparisons. See figure 21 and figure 22 over the next couple of pages.
50

Figure21:2008PlotoftheinstitutionsnamesafterapplyingmultidimensionalscalingontheempiricalvaluesofKendallstau
51

Figure22:2009PlotoftheinstitutionsnamesafterapplyingmultidimensionalscalingontheempiricalvaluesofKendallstau
52

9.8.15.thesis (3)

Recommandé

Recommandé

Contenu connexe

Similaire à 9.8.15.thesis (3)

Similaire à 9.8.15.thesis (3) (20)

9.8.15.thesis (3)