Information Content of Complex Networks

Information Content of Complex Networks
Hector Zenil
Based on the results reported in:
H. Zenil, F. Soler-Toscano, K. Dingle and A. Louis, Graph
Automorphisms and Topological Characterization of Complex
Networks by Algorithmic Information Content
April 28, 2013

A biological motivation: data and network correspondence.
Figure : Biological networks.
Figure : Which interaction network
corresponds to the biological data?
If data and associated network can
be derived from each other then
they should have about the same
information content.

Graph automorphism
Deﬁnition An automorphism of a graph g is a permutation λ of
the vertex set V , such that the pair of vertices (i, j) forms an edge
if and only if the pair (λ(i), λ(j)) also forms an edge.
Figure : Example of a non-trivial graph automorphism (graphs and
networks are synonyms in math.)

Automorphisms group
The set of all automorphisms of an object forms a group, called the
automorphism group. The size of the automorphism group A(g)
provides an indication of a formal type of symmetry of a graph.
Figure : Elements of a graph automorphism group.

Graph embeddings
Figure : (Petersen graph) Automorphisms are not ways to embed (plot)
graphs. We are interested in topological properties (how nodes are
connected), not geometrical ones (how nodes and links are distributed in
a plane or space.)

Clustering coefficient
A clustering coefficient is a measure of the degree to which nodes
in a graph tend to cluster together (for example, friends in social
networks [2]).
Definition
C(vi ) =
2 |E(Ni )|
ni (ni − 1)
where E(Ni ) denotes the set of edges with both nodes in Ni .
Figure : Some topological properties of graphs.

Adjacency matrix
A graph g = (V , E) consists of a set of vertices V (also called
nodes) and a set of edges E. Two vertices, i and j, form an edge
of the graph if (i, j) ∈ E.
A graph can be represented by its adjacency matrix. Assuming
that the vertices are indices from 1 to n, that is, that
V = {1, 2, . . . , n}, then the adjacency matrix of g is an n × n
matrix, with entries ai,j = 1 if (i, j) ∈ E and 0 otherwise.
Figure : A graph and its adjacency matrix.

Founders of Algorithmic Information Theory (60s)
Figure : A. Kolmogorov, R. Solomonoﬀ (here with C.S. Calude) and G.
Chaitin (2007).

Kolmogorov complexity
K(s) is the length of the shortest program p that outputs the string
s, when run on a universal Turing machine U. Formally [3, 1],
Deﬁnition 4.
KU(s) = min{|p|, U(p) = s} (1)
Figure : Giving directions in the
most compact/easiest way (small
K).
Figure : Compression is another
way to understand Kolmogorov
complexity. If compressible then
small K, otherwise random data.

Turing machine
By the invariance theorem [4], KU only depends on U up to a
constant, so as is conventional, we drop the subscript and write
only K.
Figure : The simple concept of a “silly” machine started an entire ﬁeld:
Computer Science. This “silly” machine helped understand the seminal
concept of Computation Universality.
K as a measure is not computable! there is no algorithm (Turing
machine) that given a string retrieves the shortest program that
produces the string. Proven: only a non-computable function can
be universal complexity measure, so live/deal with it.

Complexity and edge density
These observations show that our measure is behaving as expected
from theory.
Figure : Estimated (normalised) Kolmogorov complexity for increasing
number of edges for random graphs of 50 nodes each. The minimum
complexity (Left) standard deviation (Right) is shown for 100 random
permutations of 20 graphs in each group.

Graph duality
Figure : The dual graph of a plane graph G is a graph that has a vertex
corresponding to each face of G.
Even if very diﬀerent, dual graphs should have the same
information content (because there is a program of constant length
that transforms any graph into its dual and viceversa).

Complexity of graph duality
Figure : Graphs ranked by Kolmogorov complexity approximated by two
different methods: (Top) lossless compression and (Bottom) BDM.
These results are important because a network and its dual are
shown to have about the same information content even when they
may superficially look very different.

Graph automorphisms and Kolmogorov complexity
Figure : Graph automorphism
group size A(g) (y-axis) of
connected regular graphs of size 20
versus K complexity (x-axis). A(g)
decays with increasing K.
Figure : For V (g) = 20, the
complete, the (4,5)-lattice and the
(20,46)-noncayley transitive graphs
found in the boundaries of regular
graphs.

Figure : Plots of number of graph automorphisms normalised by
maximum number of edges of g, A(g)/V (g)! (y-axis) versus
(normalised) Kolmogorov complexity (x-axis) estimated by NBDM for
connected regular graphs found in Mathematica (GraphData[]) with size
V (g) = 20 to 36 nodes (only vertex sizes for which at least 20 graphs
were found in the dataset were plotted). The decay can be witnessed
even if noisy.

Applying BDM to real-world networks
Figure : Real-world networks also display a connection between
Kolmogorov complexity and automorphism group size, A(g). Networks
with less symmetries have greater estimated Kolmogorov complexity.
Automorphisms count is normalised by network size.

Information-content characterisation of topological
properties of complex networks
Figure : Example of a Watts–Strogatz rewiring algorithm for
n = 30-vertex graphs and rewiring probability p = 0, 0.1 and 1 starting
from a 2n-regular graph.

A network is said to have the “small-world” property of complex
networks if the average graph distance D grows no faster than the
log of the number of nodes: D ∼ log(V (g)).
Figure : The Barab´asi-Albert model is an algorithm for generating
random scale-free networks using a preferential attachment mechanism
(here for n = 30). A new vertex with s vertex is added at each step.

Figure : Kolmogorov complexity of the Watts-Strogatz model as a
function of the rewiring probability on a 1000-node network starting from
a regular graph. Both the number of nodes and the number of links are
kept constant, while p varies; Kolmogorov complexity increases with p.
This demonstrates that information-content is sensitive to topological
properties of complex networks because the size of the network is the
same both for vertex and nodes.

Topological characterization of complex networks
Figure : Network topology characterisation by Kolmogorov complexity.
Km as approximated by NBDM applied to 792 networks with V (g) = 20
nodes each: 198 connected regular graphs (e.g. Haars, circulants,
noncayley transitives, snarks, cubics, lattices, books, Andr´asfai,
resistances, etc.), 198 random graphs with edge density 0.5, 198
Barab´asi-Albert networks and 198 Watts-Strogatz networks (with
rewiring probability 0.5).

Normalised
Network description (g) V (g) Km(g) A(g)
Metabolic Network Actinobacillus
Actinomycetemcomitans 993 0.00336 4.39648 × 1077
Metabolic Network Neisseria Meningitidis 981 0.00344 2.81375 × 1079
Perl Module Authors Network 840 0.00350 3.89458 × 10473
Metabolic Network Campylobacter Jejuni 946 0.00370 6.59472 × 1077
Metabolic Network Emericella Nidulans 916 0.00378 3.14461 × 1071
Whole Network Pyrococcus Horikoshii 953 0.00382 4.0251 × 1073
Whole Network Pyrococcus Furiosus 931 0.00384 3.14461 × 1071
Metabolic Network Thermotoga Maritima 830 0.00477 1.70606 × 1067
Whole Network Mycoplasma Genitalium 878 0.00480 5.9279 × 1095
Whole Network Treponema Pallidum 899 0.00499 2.44515 × 1087
Whole Network Chlamydia Trachomatis 822 0.00511 1.42326 × 1078
Metabolic Network Pyrococcus Furiosus 751 0.00511 2.15507 × 1053
Whole Network Rickettsia Prowazekii 817 0.00523 1.13861 × 1079
Whole Network Arabidopsis Thaliana 768 0.00535 1.48263 × 1063
Whole Network Oryza Sativa 744 0.00569 2.57400 × 1060
Whole Network Chlamydia Pneumoniae 744 0.00635 1.49306 × 1073
Metabolic Network Oryza Sativa 665 0.00640 6.31369 × 1050
Metabolic Network Rickettsia Prowazekii 456 0.01080 5.04691 × 1036
Metabolic Network Mycoplasma Pneumoniae 411 0.01280 7.6059 × 1030
Metabolic Network Borrelia Burgdorferi 409 0.01460 8.6134 × 1038
Table : Random sample of 20 real-world networks from the 88 included
in the study (and plotted in Fig. 18), sorted from smallest to largest
estimated Kolmogorov complexity (NBDM). The (negative) correlation
between Km and A(g) is stronger than between Km and V (g).

Conclusions and remarks
K is applicable even if only semi-computable.
Information-theoretic measures are sensitive to graph and
network group-theoretic and topological properties.
The adjacency matrix of a graph/network is a reasonable
representation of information content of a graph.
Approaches to K for graph complexity yield results in
agreement with theory and intuition.
The Block Decomposition Method (BDM) is complementary
to lossless compression algorithms, is more accurate for
non-random cases and less accurate for random ones.
BDM runs in linear time characterising network properties
that require exponential time otherwise. (BDM requires CTM
which runs in exponential time, but this latter calculation
needs to be run only once, not every time that it is required).
The results can easily be applied to directed networks.

References I
H. Zenil, F. Soler-Toscano, K. Dingle and A. Louis Graph
Automorphisms and Topological Characterization of Complex
Networks by Algorithmic Information Content, 2013.
H. Zenil, F. Soler-Toscano, J.-P. Delahaye and N. Gauvrit,
Two-Dimensional Kolmogorov Complexity and Validation of
the Coding Theorem Method by Compressibility, 2013.
F. Soler-Toscano, H. Zenil, J.-P. Delahaye and N. Gauvrit,
Correspondence and Independence of Numerical Evaluations of
Algorithmic Information Measures
F. Soler-Toscano, H. Zenil, J.-P. Delahaye and N. Gauvrit,
Calculating Kolmogorov Complexity from the Frequency
Output Distributions of Small Turing Machines.

G. J. Chaitin.
On the length of programs for computing finite binary
sequences: Statistical considerations.
Journal of the ACM, 16(1):145–159, 1969.
M. Girvan and M. E. J. Newman.
Community structure in social and biological networks.
Proceedings of the National Academy of Sciences,
99(12):7821–7826, 2002.
A. N. Kolmogorov.
Three approaches to the quantitative definition of information.
Problems of Information and Transmission, 1(1):1–7, 1965.
M. Li and P. Vitányi.
An Introduction to Kolmogorov Complexity and Its
Applications.
Springer, Heidelberg, 2008.

Information Content of Complex Networks

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (19)

Similaire à Information Content of Complex Networks

Similaire à Information Content of Complex Networks (20)

Plus de Hector Zenil

Plus de Hector Zenil (20)

Dernier

Dernier (20)

Information Content of Complex Networks