Probabilistic Modelling with Information Filtering Networks

May 2018
Tomaso Aste
&
Tiziana Di Matteo
Probabilistic Modeling with
information filtering networks
T Aste

Financial Computing and Analytics Group
T Aste UCL Sept 2018
Tomaso Aste
Philip TreleavenDenise Gorse
Fabio CaccioliChristopher ClackGuido Germano Giacomo Livan Daniel Fricke Julius Bonart Robert Elliott Smith
Tiziana Di Matteo Paolo Tasca Ariane Chapelle
http://fincomp.cs.ucl.ac.uk/
Nick Firoozye
Daniel Hulme
Jessica James
• Centre for Doctoral Training
• in Financial Computing & Analytics
• Financial Risk Management MSc
• Computational Finance MSc
• Business Analytics MSc
• Centre for Bolckchain Technologies
Mircea Zloteanu

• A lot of them
• Interrelated
• Complex
• Non Stationary
• Unstructured
• Inconsistent
Ø High Dimensions,
hard to parametrize
Ø Multivariate models
Ø Heterogeneous: hard to put
together
Ø Small reliable observation set
Ø Hard to make predictions
and validate
Ø Hard to establish the “truth”
Artificial Intelligence
Sparse multivariate probabilistic
predictive models adaptable and
trainable from few training sets
Networks
Complex multilayered networks
to filter information for sparse
modeling with heterogeneous
data
Blockchain
Community consensus to
establish data consistency
Data Filtering Model & Prediction
T Aste

Machine Learning
Machine learning can be described as a set of
methodologies to automatize the process of
observation/modeling/predicting/testing
To discover laws automatically form (large amount of) data
If something
happens here
What are the
consequences
there?
Prediction
Parsimony
Artificial intelligence aims to find automatically
maps between sets of variables.T Aste UCL Sept 2018

A BW
Predictive modeling
In complex systems these maps are conditional
probabilities.
Prediction is the estimation of the conditional
probability ! "# "$, "& of a (future) event given
the available information about other (past) events

Predictive modeling
The conditional probability
is a tool for:
- test hypothesis
- quantify risk
- stress testing
- analyze scenarios
XA
− XB
p(XB
| XA
,XW
)

Predictive modeling
The predicted value is the expectation value
E[XB
| XA
−
,XW
−
]= XB
XB
∑ p(XB
| XA
−
,XW
−
)
The predicted uncertainty is the conditional entropy
H(XB
|XA
−
,XW
−
)=− p(XB
|XA
−
,XW
−
)logp(XB
|XA
−
,XW
−
)
XA ,XB
∑
This is the regression and for linear models (multivariate
Gaussian) this is the linear regression formula

Predictive modeling
Causality
Is the reduction of uncertainty on variables XB given
the knowledge of the past of variables X
-
A and X
-
W
discounting for the past X-
B
H(XB
|XB
−
,XW
−
)− H(XB
|XA
−
,XB
−
,XW
−
)=TE(XA
→ XB
|XW
−
)
B
B-
A-
W-
Thisis the tranfer entropy that for liner models
(multivariate Gaussians) coincides with
Granger causality

Predictive modeling
Big Data – High Dimensions
! "# "$, "& =
! "#, "$, "&
! "$, "&
We must estimate from data the most likely joint
probability ! "#, "$, "& distribution of the system
of events

To estimate probability we must estimate a
number of quantities between
High dimensional problem!
N2
2
≤ Quantities to Estimate ≤ N!
Information increases linearly with observations
But model-parameters increase at least with the
square!
“The more you look the less you see”T Aste UCL Sept 2018

Information filtering
We must construct models that
require the estimation of a
smaller number of quantities

Graphical models
To construct the joint multivariate distribution we can
make use of the structure of conditional dependency
p(XA
,XB
|XW
)≠ p(XA
|XW
)p(XB
|XW
)
XW
= XAll Variables
{XA
,XB
}
A B
dependent
p(XA
,XB
|XW
)= p(XA
|XW
)p(XB
|XW
)
A B
independent

Graphical models
If these inference networks are chordal (or
decomposable) we then have
p(X) =
p(Xcliques
)
cliques
∏
p(Xseparators
)
ks−1
separators
∏
The joint probability distribution can
be estimated form the probability of
cliques and separators
IF THE NETWORK IS SPARSE
Quantities to estimate scale
linearly with the number of
variables!
S. L. Lauritzen, Graphical Models (Oxford:Clarendon, 1996)T Aste UCL Sept 2018

This is great… however to establish conditional
dependency
p(XA
,XB
| XW
) ≠ p(XA
| XW
)p(XB
| XW
)
requires the estimation of the same number of quantities
as computing the entiere joint distribution function!
Building the exact inference network is an impossible
task for a large number of variables
Graphical models

Information filtering networks
To solve this problem we propose to build the inference
structure for the graphical model from
Information filtering network
• Massara, Guido Previde, Tiziana Di Matteo, and TA. "Network Filtering for Big Data: Triangulated Maximally Filtered Graph"
Journal of Comlex Networks (2016) arXiv preprint arXiv:1505.02445 (2015).
• Nicoló Musmeci,, Tomaso Aste, and Tiziana Di Matteo. "Relation between financial market structure and the real economy:
comparison between clustering methods." PloS one 10.3 (2015): e0116201.
• F. Pozzi, T. Di Matteo, and TA , “Spread of risk across financial markets: better to invest in the peripheries”, Scientific Reports 3
(2013) 1665.
• W.M. Song, T. Di Matteo and T. Aste, “Hierarchical information clustering by means of topologically embedded graphs”, PLoS
ONE, 7 (2012) e31929
- TA, T. Di Matteo and S. T. Hyde, Complex networks on hyperbolic surfaces Physica A 346 (2005) 20-26.
- M. Tumminello, TA, T. Di Matteo, and R. N. Mantegna, “A tool for filtering information in complex systems”
Proceedings of the National Academy of Sciences of the United States of America 102 (2005) 10421 .

Information filtering networks
Connect the nearest vertices
eucleadean distance = most correlated
hyperbolic distance = mutual information
Keep the graph chordal
clique forests
Add other constraints
max clique size (2 = MST)
planarity (TMFG)
information criteria (e.g. Akaike)
These are fast algorithms < O(N2)
(topological & homological measures, betty numbers, cycles and cliques retrieved from construction)
Massara, Guido Previde, Tiziana Di Matteo, and TA. "Network Filtering for Big Data: Triangulated Maximally Filtered Graph"
Journal of Complex Networks (2016) arXiv preprint arXiv:1505.02445 (2015).T Aste UCL Sept 2018

O
bservation
M
odel
Prediction
Parsimony
Predictive modeling
p(XAll
) =
p(Xcliques
)
cliques
∏
p(Xseparators
)
ks−1
separators
∏

By constraining the model to reproduce observed
moments while maximizing Shannon-Gibbs entropy
(maximum Entropy method), at the second order, we
have that the model must be a multivariate Gaussian:
LoGo
p(X1
,...XN
) =
1
Z
exp(− Xi
Ji,j
Xj
i, j
∑ )
We keep only the significant interactions
and set to zero (Max Ent.) the uncertain
ones: Ji,j = 0 iff Xi , Xj conditionally
independent
Ji,j is sparse and it has the structure given
by the information filtering network

Ji,j is computed form local inversion of the covariance
matrix over the clique forest
Ji,j
= Σ(C)−1
Cliques C
∑ − (kS
−1)Σ(S)−1
Separators S
∑
We obtain a sparse inverse covariance (our graphical
model) by doing local inversions only
Super-fast algorithm O(N) even O(logN) if parallelized
LoGo
W. Barfuss, GP Massara, T Di Matteo & TA “Parsimonious modeling with Information Filtering Networks” Phys Rev E 94, 062306
(2016). arXiv preprint arXiv:1602.07349 (2016).

SPARSE CAUSALITY NETWORK RETRIEVAL FROM SHORT TIME SERIES
RidgeSignificant
G-lassoSigificant
LoGoSignificant
0 1 2 3 4 5
p/q
0
0.1
0.2
0.3
0.4
0.5
γ
Ridge
Glasso
LoGo
p=100 variables
Test: Network Retrieval
From simulated time
series with a known
causality structure we
test how LoGo can
discover the causality
links
is our model,
constructed from
observations, able to
discover the ‘real’
causality structure?

From simulated time
series with a known
test how LoGo can
links
p=100 variables
8 SPARSE CAUSALITY NETWORK RETRIEVAL FROM SHORT TIME SERIES
0 2 4 6 8 10
0
0.5
γ
ridge
0
0.2
0.4
0 2 4 6 8 10
0
0.5
γ
Glasso
0
0.2
0.4
0 1 2 3 4 5
p/q
0
0.5
γ
LoGo
0
0.2
0.4
> 0.5
> 0.5
> 0.5
TP/n
TP/n
TP/n
is our model,
constructed from

p=100 variables
From simulated time
series with a known
test how LoGo can
links
is our model,
constructed from

Observation
Model
Prediction & Testing
p(System)=
p(cliques)
cliques
∏
p(separators)
ks−1
separators
∏
P a s t
F u t u r e
Statistical description
Test: Prediction Likelihood
is our model, constructed from past observations, associated
with a large likelihood for future observations?

10-3
10-2
Uncertanty (= 1/data series length)
650
700
750
800
Prediction(=Log-Likelihood)
LoGo - TMFG
State-of-the-art sparse model (G-lasso)
LoGo - MST
Linear Model
Test: Prediction Likelihood
is our model, constructed from past observations, associated
with a large likelihood for future observations?

Predictive modeling:
can we predict uncertainty propagation in the global
banking system?
10 Banking Regions
(World Bank Organization)
North America
South America
Africa
Europe
West Asia
Middle East
South Asia
East Asia
Southeast Asia
Australia
4 Industry
classifications
Banks
Diversified
Financial
Insurance
Real Estate

The predicted volatility on variables B caused by past of
variables A discounting the contribution from past of B
H(XB
|XB
−
)−H(XB
|XB
−
,XA
−
)=TE(XA
→XB
)
banking system?
TRANSFER ENTROPY
TE(XA
→XB
)=
1
2
log
Var(XB
|XB
−
)
Var(XB
|XB
−
,XA
−
)
For linear modeling it is the contribution of variable A (past) to the
variance of variable B unexplained form past B

banking system?
N America
S America
Africa
Europe W Asia
Mid East
S Asia
E Asia
SE Asia
Aus/NZ
N America
S America
1990-1992

banking system?
N America
S America
Africa
Europe W Asia
Mid East
S Asia
E Asia
SE Asia
Aus/NZ
N America
S America
1990
N America
S America
Africa
Europe W Asia
Mid East
S Asia
E Asia
SE Asia
Aus/NZ
N America
S America
1995-1997

banking system?
N America
S America
Africa
Europe W Asia
Mid East
S Asia
E Asia
SE Asia
Aus/NZ
N America
S America
1990
N America
S America
Africa
Europe W Asia
Mid East
S Asia
E Asia
SE Asia
Aus/NZ
N America
S America
1995
N America
S America
Africa
Europe W Asia
Mid East
S Asia
E Asia
SE Asia
Aus/NZ
N America
S America
2000-2002

banking system?
N America
S America
Africa
Europe W Asia
Mid East
S Asia
E Asia
SE Asia
Aus/NZ
N America
S America
1990
N America
S America
Africa
Europe W Asia
Mid East
S Asia
E Asia
SE Asia
Aus/NZ
N America
S America
1995
N America
S America
Africa
Europe W Asia
Mid East
S Asia
E Asia
SE Asia
Aus/NZ
N America
S America
2000
N America
S America
Africa
Europe W Asia
Mid East
S Asia
E Asia
SE Asia
Aus/NZ
N America
S America
2005-2007

banking system?
N America
S America
Africa
Europe W Asia
Mid East
S Asia
E Asia
SE Asia
Aus/NZ
N America
S America
1990
N America
S America
Africa
Europe W Asia
Mid East
S Asia
E Asia
SE Asia
Aus/NZ
N America
S America
1995
N America
S America
Africa
Europe W Asia
Mid East
S Asia
E Asia
SE Asia
Aus/NZ
N America
S America
2000
N America
S America
Africa
Europe W Asia
Mid East
S Asia
E Asia
SE Asia
Aus/NZ
N America
S America
2005
N America
S America
Africa
Europe W Asia
Mid East
S Asia
E Asia
SE Asia
Aus/NZ
N America
S America
2010-2012

banking system?
N America
S America
Africa
Europe W Asia
Mid East
S Asia
E Asia
SE Asia
Aus/NZ
N America
S America
1990
N America
S America
Africa
Europe W Asia
Mid East
S Asia
E Asia
SE Asia
Aus/NZ
N America
S America
1995
N America
S America
Africa
Europe W Asia
Mid East
S Asia
E Asia
SE Asia
Aus/NZ
N America
S America
2000
N America
S America
Africa
Europe W Asia
Mid East
S Asia
E Asia
SE Asia
Aus/NZ
N America
S America
2005
N America
S America
Africa
Europe W Asia
Mid East
S Asia
E Asia
SE Asia
Aus/NZ
N America
S America
2010
N America
S America
Africa
Europe W Asia
Mid East
S Asia
E Asia
SE Asia
Aus/NZ
N America
S America
2014-2016

1995 2000 2005 2010 2015
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
NA
Change of influence
North America
Centrality proxy computed
as the average fraction
between strength of
vertices in the regions
outside NA associated
with vertices in NA (black
arcs) with respect to the
total strength associated
with vertices outside the
firm’s regions (black and
red arcs)
NA

1995 2000 2005 2010 2015
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
EU
NA
Change of influence
Europe
North America

1995 2000 2005 2010 2015
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
ASE
EU
NA
Change of influence
East Asia
Europe
North America

1995 2000 2005 2010 2015
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
ASE
ASS
EU
ME
NA
Change of influence
East Asia
Europe
North America
South Asia
Middle East

1995 2000 2005 2010 2015
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
ASE
ASS
ASSE
EU
ME
NA
Change of influence
East Asia
Europe
North America
South Asia
Middle East
South East Asia

1995 2000 2005 2010 2015
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
AF
ASE
ASS
ASSE
AUS
EU
ME
NA
RU
SA
Change of influence
East Asia
Europe
North America
South Asia
Middle East
South East Asia

2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
Average TE Region -> Bank: LoGo, w 300 3 lagsSubprimemortgagecrisis
Securitizationmarketclosedown
Globalstockmarketsharpfall
USnear-recorddeficit$410billion
NationalizationofNorthernRock
FannieMae/FreddieMacrescue
LehmanBrothersfiledforbankruptcy
BankofEnglandssurprisedinterestratecut
RescueofRBS,Lloyds,andHBOS
IMFapproved$2.1bnloanforIceland
USgovtgaveBankofAmerica$20bnaid
RBSreported£2.1bnloss
12.5%economiccontractioninJapanEuropeansovereigndebtcrisis
UScreditdowngradefromA+toALIBORscandal
CityofDetroitbankruptcyUkranian/Syrian/EgyptunrestEbolaepidemic
BearSternscollapse
IndyMaccollapse
CreditdowngradeofFranceandAustriafromAAAtoAA+
PortugalreceivedbailoutLiquidityinjectionbyUS,CAN,JAP,UK,Swisscentralbanks
NA
EU
AS
Dynamic impact

Influence by activity
1995 2000 2005 2010 2015 2020
0
0.2
0.4
0.6
0.8
1
Banks
Diversified Financials
Insurance
real estate
NA
1995 2000 2005 2010 2015 2020
0
0.1
0.2
0.3
0.4
0.5
Banks
Insurance
real estate
EU
NA EU

Influence by activity
1995 2000 2005 2010 2015 2020
0
0.02
0.04
0.06
0.08
0.1
0.12
Banks
Insurance
real estate
ME
1995 2000 2005 2010 2015 2020
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Banks
Insurance
real estate
ASS
Middle East South Asia

Networks for Prediction
With p(XB|XA-) we can quantify probability of future events
With p(XB|XA) we can predict impact of unobserved scenarios
and test hypothesis
p(XA,XB) can be constructed from local
probability estimations over an
information filtering network (low dimension problem)
Nodes and edges can be added or removed with local moves
only
Aggregation of risk is straightforward
LoGo works better than state-of-the-art sparse graphical
models and it is faster

Si l’ordre satisfait la raison, le désordre fait les délices de l’imagination
http://www.cs.ucl.ac.uk/staff/tomaso_aste/
Thank YOU!
http://fincomp.cs.ucl.ac.uk/
W. Barfuss, GP Massara, T Di Matteo & TA
“Parsimonious modeling with Information Filtering Networks” Phys Rev E 94,
062306 (2016) arXiv preprint arXiv:1602.07349 (2016).
Massara, Guido Previde, Tiziana Di Matteo, and TA. "Network Filtering for Big
Data: Triangulated Maximally Filtered Graph" Journal of Comlex Networks
(2016) arXiv preprint arXiv:1505.02445 (2015).
http://blockchain.cs.ucl.ac.uk/
Tiziana Di Matteo, and TA. ”Sparse causality network retrieval from short time
series” Complexity (2017).

Probabilistic Modelling with Information Filtering Networks

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Probabilistic Modelling with Information Filtering Networks

Similaire à Probabilistic Modelling with Information Filtering Networks (20)

Dernier

Dernier (20)

Probabilistic Modelling with Information Filtering Networks