1. Analysis and visualization of
large collections of trees
A case study in Chalcidoidea (Insecta:
Hymenoptera)
Ana Dal Molin, Suzanne Matthews
James Munro, John Heraty, Jim Woolley
2. The tree space
The number of possible
trees is given
Criteria exist to
determine which ones
are better hypotheses
Heuristics
3. The tree space
The number of possible
trees is given
Criteria exist to
determine which ones
are better hypotheses
Heuristics
4. Case study
525 terminals
2992 characters, rDNA (18S and 28S D2-D5)
sequences
Structural alignment + MAFFT alignment of the
RAA's (EINSI)
5. Secondary structure is characterized by stems (paired bases) and loops
(unpaired bases): alignment
8. Problems
1. Growing data sets lead to growing number of trees,
sometimes too large to be compared by eye
2. Dozens of thousands of trees with hundreds of
terminals = really large files
Can I even load them?
3. Inconsistencies and polytomies in consensus trees:
Do we have rogue taxa?
Has the search run enough?
Do we have enough signal?
9. Methods
TNT, 5 seeds, unweighted parsimony
5 different seeds resulted in 30,000 trees, 20061
steps, CI=0.165, RI=0.62
Portability: TreeZip
Set operations: TreeZip
Comparison via matrices of RF distances: MrsRF
−
Heatmaps of the distance matrices plotted using R
10. 1. Portability and set operations
File size comparison
• a print screen of file
structure
• [hashing?]
• Reference for details
11. Set Operations
• All trees were unique in
every set
but
• Union = 32,300 (unique)
trees, not 150,000
• Intersection = 28,422
trees
• Consensus….