Understanding the role and evolution of internal symmetry in protein structure is a fundamental question in structural biology. We present here CE-Symm 2.0, a key tool to address that question, which is able to detect all types of protein internal symmetry and provides a robust and intuitive sequence-to-structure analysis of all repeats. Notable features compared to the previous version include an optimized multiple alignment between repeats, determination of the full point group, and identification of multiple symmetry axes. We expect CE-Symm to find ample use in evolutionary studies, functional annotation, and structural classification of proteins.
This poster was presented at the 3DSIG 2016 conference in Orlando, FL, on July 8-9, 2016.
See also the accompanying presentation slides: http://www.slideshare.net/sbliven/3dsig-2016-presentaion-exploring-internal-symmetry-and-structural-repeats-with-cesymm
3DSIG 2016 Poster: Exploring Internal Symmetry and Structural Repeats with CE-Symm
1. This work is licensed under a Creative Commons Attribution 3.0 Unported License.
Exploring Internal Symmetry and
Structural Repeats with CE-Symm
Spencer Bliven1,2,*, Aleix Lafita1,3, Peter W. Rose4, Guido Capitani1,3, Philip E. Bourne2, Andreas Prlić4
1Paul Scherrer Institute 2National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health 3ETH Zürich 4RCSB Protein Data Bank, San Diego Supercomputer Center, University of California San Diego. *spencer.bliven@psi.ch
Poster first presented at 3DSIG 2016 in Orlando, Florida.
This research was supported by the Intramural Research Program of the National Center for Biotechnology
Information, National Library of Medicine, National Institutes of Health.
The RCSB PDB is supported by the National Science Foundation [NSF DBI 0829586]; National Institute of
General Medical Sciences; Office of Science, Department of Energy; National Library of Medicine; National
Cancer Institute; National Institute of Neurological Disorders and Stroke; and the National Institute of
Diabetes & Digestive & Kidney Diseases. The RCSB PDB is a member of the wwPDB.
Abstract
Understanding the role and evolution of internal symmetry in protein
structure is a fundamental question in structural biology. We present
here CE-Symm 2.0, a key tool to address that question, which is able
to detect all types of protein internal symmetry and provides a robust
and intuitive sequence-to-structure analysis of all repeats. Notable
features compared to the previous version1 include an optimized
multiple alignment between repeats, determination of the full point
group, and identification of multiple symmetry axes. We expect CE-
Symm to find ample use in evolutionary studies, functional
annotation, and structural classification of proteins.
1. Myers-Turnbull D, Bliven SE, Rose PW, Aziz ZK,
Youkharibache P, Bourne PE, & Prlić A. Systematic
Detection of Internal Symmetry in Proteins Using
CE-Symm. Journal of Molecular Biology, 426(11),
2255–2268 (2014).
2. Aravind, P. et al. Biochemistry, 48(51), 12180–12190 (2009).
3. Mishra, A. et al. Progress in Biophysics and Molecular Biology,
115(1), 42–51 (2014).
4. Juo, Z. S. et al. J Mol Biol 261, 239–254 (1996).
5. Monod, J. et al. J Mol Biol 12, 88–118 (1965).
6. Goodsell, D. S. & Olson, A. J. Annu Rev Biophys Biomol
Struct 29, 105–153 (2000).
7. Gosavi, S. et al. J Mol Biol 357, 986–996 (2006).
8. Fortenberry, C. et al. J Am Chem Soc 133, 18026–18029
(2011).
9. Neuwald, A. F. Nucleic Acids Research, 33(11), 3614–
3628 (2005).
10. Lee, J. & Blaber, M. PNAS 108, 126–130 (2011).
11. Zuccola, H. J., Filman, D. J., Coen, D. M., & Hogle, J. M.
Cell, 5(2), 267–278 (2000).
12. Prlić, A. et al. Bioinformatics, 28(20), 2693–2695 (2012).
13. Shindyalov, I. N. & Bourne, P. E. Protein Eng 11, 739–
747 (1998).
14. Bliven, S. E., Bourne, P. E., & Prlić, A. Bioinformatics, 31(8),
1316–1318 (2015).
15. Guda, C., Scheeff, E. D., Bourne, P. E., & Shindyalov, I.
N. Pacific Symposium on Biocomputing Pacific Symposium on
Biocomputing, 275–286 (2001).
16. Kim, C. et al. BMC Bioinformatics 11, 303 (2010).
References
CE-Symm Availability
Download & Source code: github.com/rcsb/
symmetry (LGPL)
Levels of Symmetry
Symmetry can be analyzed at numerous levels. The most familiar is
quaternary symmetry consisting of multiple identical polypeptide
chains arranged in a symmetric fashion. Such symmetry is extremely
common in proteins, occurring in approximately 90% of unique
oligomeric structures in the Protein Data Bank (PDB).
Proteins can also have internal symmetry, when a single chain
contains two or more equivalent structural repeats. The repeats
generally will differ in the exact sequence, but have substantially
similar structures. Internal symmetry is sometimes clarified as
pseudosymmetry to reflect that the equivalence between repeats is
generally at the level of residues or secondary structure elements
rather than precise coordinates, as with quaternary symmetry.
Types of Symmetry
Symmetry can be classified by the types of operators that align each
repeat onto the next. Closed symmetry consists of one or more pure
rotational operators that form a single axis of rotation (cyclic),
multiple perpendicular axes (dihedral), or more complex point groups.
Open symmetry includes proteins with translational components,
such as screw axes (helical), pure translation, or even superhelical
cases such as solenoid proteins.
CE-Symm is able to identify any types of symmetry with a consistent
orientation between all repeats. This is based on the principle that not
only should the structure of the repeats be conserved but also the
interfaces between repeats.
Methods
All algorithms are included in BioJava12 version 4.2 and as a Java
executable.
1.Self-alignment. A high-
scoring autorotation of the
structure is identified using
t h e C o m b i n a t o r i a l
Extension13 (CE) structural
comparison method, with
modifications similar to
CE-CP14 to allow alignment
of the first and last repeats
while disallowing the trivial
alignment.
2.Order Detection. The self-alignment is
analyzed for patterns characteristic of open
or closed symmetry to determine the
number of repeats.
3.Refinement. The alignment is modified to
create an initial multiple alignment between
all repeats
4.Optimization. The multiple alignment is extended and optimized
based on a Monte Carlo algorithm similar to CE-MC.15
5.Iteration. If the optimized multiple alignment is determined to be
significant, then the repeats are recursively analyzed for additional
levels of symmetry.
6.Point Group Detection. If multiple axes were identified, these are
combined into a global point group for the whole structure.
Self alignment of Keap1 Kelch domain [1U6D].
(Left) Superposition upon ~60° rotation. (Right) Dot
plot showing the identified alignment (red line) on the
dynamic programming matrix (black indicates
unfavorable scores).
Symmetry & Function
Both quaternary and internal symmetry are linked to a wide range of
protein functions.
Ligand Binding
Ligands often bind near the axis of
symmetry. Of symmetric domains
with ligands, 63% have the ligand
within 5Å of the axis of symmetry;
in 37% it is within 1Å.1
Symmetric proteins often bind
symmetric ligands, such as metal ions.
DNA binding proteins often utilize
symmetry. Many transcription factors
are symmetric dimers and recognize palindromic sequences. The
TATA binding protein (right) is an internally symmetric monomer
which has evolved to recognize a non-palindromic sequence.4
Allosteric Regulation
Cooperativity can arise from coordinated movements in symmetric
subunits. 5 This mechanism holds for both quaternary symmetry (e.g.
in hemoglobin) and for internally symmetric proteins.6
Protein Folding
Internal symmetry can smooth the folding landscape and reduce
folding time.7
Internal repeats can fold quasi-independently
Misfolding of one repeat can trigger degradation of the whole
protein, unlike in quaternary symmetric complexes.
Experimental Tools
Aid the computational design of large proteins8
Improve search for distant homologs9
TATA Binding Protein [1TGH]
Case Study: βγ-Crystallin Superfamily
The βγ-crystallin superfamily is primarily known for the important
role of several members in eye lens, but calcium binding functions are
also known to be widespread throughout the family.2,3 The core
domain consists of two greek-key motifs arranged with C2
symmetry. CE-Symm is able to identify this symmetry, as well as
align the conserved calcium-binding motif.
This family is also interesting due to the presence of varied domain
architectures. Bovine γB-crystallin contains four repeats. Sequence
conservation shows that the repeats follow an ABAB pattern
indicating two duplication events, consistent with the two levels of
C2 symmetry identified by CE-Symm.
Cyclic (C8)
Triose Phosphate
Isomerase
[1TIM]
Dihedral (D2)
Glyoxalase
[3B59]
Translational (R)
Ankyrin Repeat
[1N0R]
Helical (H3)
Antifreeze Protein
[1L0S]
Quaternary (3 chains)
C3
AmtB Ammonia Channel [1U7G]
Internal (2 repeats/chain)
C2
Combined (6 repeats)
D3
Structure
1. Structural Self
Alignment
Self-Alignment
TM-Score
2.Order Detection
Order
3. Refinement
Multiple
Alignment
4. Optimization
TM-ScoreAsymmetry Symmetry
6. Point Group
Detection
5. Iterate
Census
All superfamilies from SCOPe 2.06 were analyzed by CE-Symm based
on a random representative. Consistent with prior results,1,16 about a
quarter of domains were found to have internal symmetry or repeats.
Order
Number of
Superfamilies
% symmetric
Asymmetric 1051 75.39%
Rotational 302 21.66%
C2 237 78.48%
C3 19 6.29%
C4 12 3.97%
C5 2 0.66%
C6 8 2.65%
C7 16 5.30%
C8 8 2.65%
Dihedral 19 1.36%
D2 17 89.47%
D3 2 10.53%
Helical 7 0.50%
Translational 15 1.08%
R
H
D3
D2
C8
C7
C6
C5
C4
C3
C2
Insert Gap
Expand0.3
0.15
Shrink
Shift
RIGHT LEFT
0.15
0.4
RIGHT
LEFT LEFT
RIGHT
M-crystallin from the archaea, M. acetivorans [3HZ2]. The conserved
symmetric calcium-binding motif is highlighted in yellow.
Bovine γB-crystallin [4GCR]. A central C2 axis is
identified relating the domains, as well as C2 axes
within each domain. The calcium-binding motif
(yellow) of some subunits may have been lost.
Evolution
Internal symmetry can arise from quaternary symmetry by gene
duplication or fusion. Thus, in addition to the many functional
implications of symmetry, identifying protein symmetry can provide
information about the evolutionary history of a protein. Such fission
and fusion events often preserve the overall structure and function of
the active complex.10
Many proteins with higher order symmetry appear to have undergone
several duplication events. For instance, DNA clamps are composed
of 12 structural repeats arranged in a ring. Pairs of these repeats form
domains with the ‘processivity fold,’ which can also be found in non-
ring conformations in some species.11 Six such domains form a
complete ring, but they are fused together into either two (bacteria) or
three (eukaryotes, archaea, and viruses) chains.
12-mer 6-mer
Eukaryotic Trimer
Bacterial Dimer
Dimeric bacterial clamp:
DNA polymerase III beta
subunit from E. coli [1mmi]
Trimeric eukaryotic clamp:
proliferating cell nuclear
antigen [1VYM]
Trimeric clamp, colored to
show the 12 structural repeats
Final alignment of 1U6D showing the six blades of the beta
propeller. One residue has been deleted from the first repeat, with
four residues inserted into the second.
Download this poster!
http://www.slideshare.net/
sbliven/3dsig-2016-poster-
exploring-internal-symmetry-and-
structural-repeats-with-cesymm