1. Lille – 24/06/2009 – CPM'09
The Structure of Level-k
Phylogenetic Networks
Philippe Gambette
in collaboration with
Vincent Berry, Christophe Paul
2. Outline
• Phylogenetic networks
• Decomposition of level-k networks
• Construction of level-k generators
• Number of level-k generators
• Simulated level-k networks
3. Outline
• Phylogenetic networks
• Decomposition of level-k networks
• Construction of level-k generators
• Number of level-k generators
• Simulated level-k networks
6. Abstract or explicit networks
An explicit phylogenetic network is a phyogenetic
network where all reticulations can be interpreted as precise
biological events.
An abstract network reflects some phylogenetic signals
rather than explicitly displaying biological reticulation events.
Doolittle : Uprooting the Tree of Life, Scientific American (Feb..2000)
7. Abstract or explicit networks
An explicit phylogenetic network is a phyogenetic
network where all reticulations can be interpreted as precise
biological events.
An abstract network reflects some phylogenetic signals
rather than explicitly displaying biological reticulation events.
Abstract phylogenetic network of
mushroom species, www.splitstree.org
8. Hierarchy of network subclasses
http://www.lirmm.fr/~gambette/RePhylogeneticNetworks.php
10. Level-k phylogenetic networks
Motivation to generalize “galled trees” (= level-1) :
Arenas, Valiente, Posada :
Characterization of
Phylogenetic Reticulate
Networks based on the
Coalescent with
Recombination, Molecular
Biology and Evolution, to
appear.
11. Level-k phylogenetic networks
A level-k phylogenetic network N on a set X of n taxa is a
multidigraph in which:
- exactly one vertex has indegree 0 and outdegree 2: the root,
- all other vertices have either:
- indegree 1 and outdegree 2: split vertices,
- indegree 2 and outdegree ≤ 1: reticulation vertices,
- or indegree 1 and outdegree 0: leaves labeled by X,
- any blob has at most k reticulation vertices.
N r
r2
r4 All arcs are
r1 r3 oriented
downwards
a b c d e f g h i j k
12. Level-k phylogenetic networks
A level-k phylogenetic network N on a set X of n taxa is a
multidigraph in which:
- exactly one vertex has indegree 0 and outdegree 2: the root,
- all other vertices have either:
- indegree 1 and outdegree 2: split vertices,
- indegree 2 and outdegree ≤ 1: reticulation vertices,
- or indegree 1 and outdegree 0: leaves labeled by X,
- any blob has at most k reticulation vertices.
N r A blob is a maximal
induced connected
subgraph with no cut
r2 arc.
r4
r1 r3
A cut arc is an arc
a b c d e f g h i j k which disconnects the
graph.
13. Level-k phylogenetic networks
A level-k phylogenetic network N on a set X of n taxa is a
multidigraph in which:
- exactly one vertex has indegree 0 and outdegree 2: the root,
- all other vertices have either:
- indegree 1 and outdegree 2: split vertices,
- indegree 2 and outdegree ≤ 1: reticulation vertices,
- or indegree 1 and outdegree 0: leaves labeled by X,
- any blob has at most k reticulation vertices.
N r
A blob is a maximal
induced connected
r2 subgraph with no cut
r4 arc.
r1 r3
A cut arc is an arc
a b c d e f g h i j k which disconnects the
graph.
N has level 2.
14. Outline
• Phylogenetic networks
• Decomposition of level-k networks
• Construction of level-k generators
• Number of level-k generators
• Simulated level-k networks
15. Decomposition of level-k networks
We formalize the decomposition into blobs:
a b c d e f g h i j k a b c d e f g h i j k
N, a level-k network. N decomposed as a
tree of simple graph
patterns: generators.
Generators were introduced by van Iersel & al (Recomb 2008)
for the restricted class of simple level-k networks.
16. Level-k generators
A level-k generator is a level-k network with no cut arc.
G0 G1 2a 2b 2c 2d
The sides of the generator are:
- its arcs
- its reticulation vertices of outdegree 0
17. Decomposition theorem of level-k networks
N is a level-k network
iff
there exists a sequence (lj)jϵ[1,r] of r locations
(arcs or reticulation vertices of outdegree 0)
and a sequence (Gj)jϵ[0,r] of generators of level at most k, such that:
- N = Attachk(lr,Gr,Attachk(... Attachk(l2,G2,Attachk(l1,G1,G0))...)),
- or N = Attachk(lr,Gr,Attachk(... Attachk(l2,G2,SplitRootk(G1,G0))...)).
18. Decomposition theorem of level-k networks
N is a level-k network
iff
there exists a sequence (lj)jϵ[1,r] of r locations
(arcs or reticulation vertices of outdegree 0)
and a sequence (Gj)jϵ[0,r] of generators of level at most k, such that:
- N = Attachk(lr,Gr,Attachk(... Attachk(l2,G2,Attachk(l1,G1,G0))...)),
- or N = Attachk(lr,Gr,Attachk(... Attachk(l2,G2,SplitRootk(G1,G0))...)).
SplitRootk(G1,G0)
G0 G1
G0 G1
19. Decomposition theorem of level-k networks
N is a level-k network
iff
there exists a sequence (lj)jϵ[1,r] of r locations
(arcs or reticulation vertices of outdegree 0)
and a sequence (Gj)jϵ[0,r] of generators of level at most k, such that:
- N = Attachk(lr,Gr,Attachk(... Attachk(l2,G2,Attachk(l1,G1,G0))...)),
- or N = Attachk(lr,Gr,Attachk(... Attachk(l2,G2,SplitRootk(G1,G0))...)).
li is an arc of N Attachk(li,Gi,N)
N
Gi
li
Gi
20. Decomposition theorem of level-k networks
N is a level-k network
iff
there exists a sequence (lj)jϵ[1,r] of r locations
(arcs or reticulation vertices of outdegree 0)
and a sequence (Gj)jϵ[0,r] of generators of level at most k, such that:
- N = Attachk(lr,Gr,Attachk(... Attachk(l2,G2,Attachk(l1,G1,G0))...)),
- or N = Attachk(lr,Gr,Attachk(... Attachk(l2,G2,SplitRootk(G1,G0))...)).
li is a reticulation vertex of N Attachk(li,Gi,N)
N
Gi
li
Gi
21. Decomposition theorem of level-k networks
N is a level-k network
iff
there exists a sequence (lj)jϵ[1,r] of r locations
(arcs or reticulation vertices of outdegree 0)
and a sequence (Gj)jϵ[0,r] of generators of level at most k, such that:
- N = Attachk(lr,Gr,Attachk(... Attachk(l2,G2,Attachk(l1,G1,G0))...)),
- or N = Attachk(lr,Gr,Attachk(... Attachk(l2,G2,SplitRootk(G1,G0))...)).
This decomposition is not unique!
22. Decomposition theorem of level-k networks
Unique “graph-labeled tree” decomposition:
ρ 1
1
N 3 2
h1 h2
1
h3 h4
a b c d e f g h i a b c d e f g h i
Possible applications:
- exhaustive generation of level-k networks
- counting of level-k networks
23. Outline
• Phylogenetic networks
• Decomposition of level-k networks
• Construction of level-k generators
• Number of level-k generators
• Simulated level-k networks
24. Construction of the generators
Van Iersel & al build the 4 level-2 generators by a case
analysis, generalized by Steven Kelk into an exponential
algorithm to find all 65 level-3 generators.
25. Construction of the generators
Van Iersel & al give a simple case analysis for level-2.
We give rules to build level-(k+1) from level-k generators.
26. Construction of the generators
Construction rules of level-k+1 generators
from level k-generators:
N e2
e1
h1 h2
Rule R1 :
h1 h2 h1 h2 h1 h2 h1 h2
h3 h3 h3 h3
R1(N,h1,h2) R1(N,h1,e2) R1(N,e2,e2) R1(N,e1,e2)
27. Construction of the generators
Construction rules of level-k+1 generators
from level k-generators:
N e2
e1
h1 h2
Rule R2 :
h1 h3 h2 h3 h
h3
h2 h1 h1 2
R2(N,h1,e2) R2(N,e1,e1) R2(N,e2,e1)
28. Construction of the generators
Problem!
Some of the level-k+1 generators obtained from level-k
generators are isomorphic!
h1 h2 h1 h2
h3 h3
R1(2b,h1,e2) R1(2b,h2,e1)
29. Construction of the generators
Problem!
Some of the level-k+1 generators obtained from level-k
generators are isomorphic!
h1 h2 h1 h2
h3 h3
R1(N,h1,e2) R1(N,h2,e1)
→ difficult to count!
30. Outline
• Phylogenetic networks
• Decomposition of level-k networks
• Construction of level-k generators
• Number of level-k generators
• Simulated level-k networks
31. Upper bound
R1 and R2 can be applied at most on all pairs of sides
A level-k generator has at most 5k slides:
gk+1 < 50 k² gk
Upper bound:
gk < k!² 50k
Theoretical corollary:
There is a polynomial algorithm to build the set of level-
k+1 generators from the set of level-k generators.
Practical corollary:
g4 < 28350
→ it is possible to enumerate all level-4 generators.
32. Number of level-k generators
It is possible to enumerate all level-4 generators.
Isomorphism of graphs of bounded valence:
polynomial
(Luks, FOCS 1980)
Practical algorithm?
Simple backtracking exponential algorithm sufficient for
level 4 :
go through both graphs from their root in parallel and
identify their vertices: O(n2n-h)
→ g4 = 1993
→ g5 > 71000
http://www.lirmm.fr/~gambette/ProgramGenerators
34. Lower bound
Lower bound:
gk ≥ 2k-1
There is an exponential number of generators!
Idea:
Code every number between 0 and 2k-1-1 by a level-k
generator.
35. Lower bound
Lower bound:
gk ≥ 2k-1
There is an exponential number of generators!
Idea:
Code every number between 0 and 2k-1-1 by a level-k
generator.
0 1
k=1
0 1
36. Lower bound
Lower bound:
gk ≥ 2k-1
There is an exponential number of generators!
Idea:
Code every number between 0 and 2k-1-1 by a level-k
generator.
0 1 2 3
k=2
0 0 1 1
0 1 0 1
37. Outline
• Phylogenetic networks
• Decomposition of level-k networks
• Construction of level-k generators
• Number of level-k generators
• Simulated level-k networks
38. Simulated level-k networks
Simulate 1000 phylogenetic networks using the coalescent
model with recombination.
Arenas, Valiente, Posada 2008
Program Recodon
How many are level-1,2,3... networks?
nb of networks nb of networks
1000 1000
900 900
800 800
700 n=10,r=1 700 n=50,r=1
n=10,r=2 n=50,r=2
600 600
n=10,r=4 n=50,r=4
500 n=10,r=8 500 n=50,r=8
n=10,r=16 n=50,r=16
400 400
n=10,r=32 n=50,r=32
300 300
200 200
100 100
0 0
12
24
36
12
24
36
0
4
8
16
20
28
32
40
44
48
52
56
60
64
68
72
76
80
84
88
92
96
100
0
4
8
16
20
28
32
40
44
48
52
56
60
64
68
72
76
80
84
88
92
96
100
level level
http://www.lirmm.fr/~gambette/ProgramGenerators
39. Simulated level-k networks
Simulate 1000 phylogenetic networks using the coalescent
model with recombination.
Link between level and number of reticulations:
level
9
5
0 number of
reticulations
0 5 9
http://www.lirmm.fr/~gambette/ProgramGenerators
40. Summary on the level parameter
Advantages:
• natural structure for all explicit phylogenetic networks
• global tree-structure used algorithmically
• finite graph patterns to represent blobs: generators
Limits:
• number of generators exponential in the level
• complex structure of generators
• when recombination is not local, level doesn't help
41. Questions?
Thank you for your attention! TreeCloud T-Rex
TreeCloud
Split network and reticulogram of the 50 most frequent
words in CPM titles, hyperlex cooccurrence distance.
TreeCloud available at http://treecloud.org - Slides available at http://www.lirmm.fr/~gambette/RePresentationsENG.php