3. SCOP Protein Domain Classification
Classes
(7)
Folds
(48)
Superfamily
(1445)
Family
(2598)
Domain
(75930)
4. Protein Domain Architectures
A B C D
Protein 1: Architecture = A,A,C
Protein 2: Architecture = D,B
Protein 3: Architecture = B
Protein 4: Architecture = C,A,A
Protein 5: Architecture = D,B,C
7. Potential Use
• Data occur as presences in genomes
• Phylogenetic utility:
– Tree searches
– Synapomorphies of ancient clades
• Complexity metric:
– “The complexity of a system is some
increasing function of the number of different
types of parts or interactions it has” (McShea
1996)
20. LUCA Genome vs. Proteome size
LUCA
988
200 400 600 800 1000
N Superfamilies
1. LUCA is a prokaryote
2. Prokaryote genome size ~ N superfamilies
3. LUCA genome size estimable using SCP (1404 kb)
4. Therefore, LUCA superfamilies = 629
21. Summary
• Protein domains are ancient characters
• Phylogenetic utility still to be fully realised
• They offer a useful complexity metric
• Protein evolution switches between
creation of novel domains to shuffling and
recombining existing ones