4. Introduction to protein protein interactions
The importance of the interactions
Impact of protein interaction technologies on
other fields
The types of protein interactions
The methods of protein interactions
5. Introduction to protein protein
interactions
Proteins control and mediate many of the biological
activities of cells
A cell is not static
Changes in shape
Division
Metabolism
All cells are not equivalent
Lymphoid
Neural
6. Why are protein-protein
interactions so important?
The binding of one signaling protein to another can have
a number of consequences:
Such binding can serve to recruit a signaling protein to
a location where it is activated and/or where it is
needed to carry out its function.
The binding of one protein to another can induce
conformational changes that affect activity or
accessibility of additional binding domains, permitting
additional protein interactions.
7. Why are protein-protein
interactions so important?
Imagine a cell in which, suddenly, the specific
interactions between proteins would disappear.
This unfortunate cell would become deaf and
blind, paralytic and finally would disintegrate,
because specific interactions are involved in
almost any physiological process.
8. Impact on other fields
Cancer Biology
The study of protein-protein interactions has provided important insights into
the functions of many of the known oncogenes, tumor suppressors, and
DNA repair proteins.
Pharmacogenetics
Pharmacogenetic research has expanded to include the study of drug
transporters, drug receptors, and drug targets.
9. The types of protein interactions
Binary protein protein
interactions
Scaffolding proteins
http://www.udel.edu/che
m/bahnson/chem667/cr
otty/scaffolding_protein
s.html#scaffolding
10. The types of protein interactions
-another classification
Metabolic and signaling (genetic)pathways
Morphogenic pathways in which groups of
proteins participate in the same cellular function
during a developmental process
Structural complexes and molecular machines in
which numerous macromolecules are brought
together
15. Experimental methods
The first comprise and ‘atomic observation’ in which the protein interaction
is detected using, for example, X-ray crystallography. These experiments can
yield specific information on the atoms or residues involved in the
interaction.
The second is a ‘direct interaction observation’ where protein interaction
between two partners can be detected as in a two-hybrid experiment.
At a third level of observation, multi-protein complexes can be detected using
methods such as immuno-precipitation or mass-specific analysis. This type of
experiment does not unveil the chemical detail of the interactions or even
reveal which proteins are in direct contact but gives information as to which
proteins are found in a complex at a given time.
The fourth category comprises measurements at the cellular level, where an
‘activity bioassay’ is used to observe an interaction; for example, proliferation
assays of cells by a receptor-ligand interaction.
17. Introduction of BIND
Background
What is BIND
MCODE Algorithm
How to use BIND
Reference
18. Background
Recent advances in proteomics technologies such as two-hybrid, phage
display and mass spectrometry have enabled us to create a detailed map of
biomolecular interaction networks. Initial mapping efforts have already
produced a wealth of data. As the size of the interaction set increases,
databases and computational methods will be required to store, visualize and
analyze the information in order to effectively aid in knowledge discovery.
For the protein-protein interactions, there are mnay websites can
be reached, here I just show several.
BIND (Interaction Network Database)
DIP (Database of Interacting Proteins)
Protein-Protein Interaction Server
Protein-Protein Interface
19. What is BIND
The Biomolecular Interaction Network Database is a database designed to
store full descriptions of interactions, molecular complexes and pathways.
Development of the BIND 2.0 data model has led to the incorporation of
virtually all components of molecular mechanisms including interactions
between any two molecules composed of proteins, nucleic acids and small
molecules. Chemical reactions, photochemical activation and conformational
changes can also be described. Everything from small molecule biochemistry to
signal transduction is abstracted in such a way that graph theory methods may be
applied for data mining.
The database can be used to study networks of interactions, to map pathways
across taxonomic branches and to generate information for kinetic simulations.
BIND anticipates the coming large influx of interaction information from high-
throughput proteomics efforts including detailed information about post-
translational modifications from mass spectrometry.
20. What kind of data stored in BIND?
• INTERACTION: The interaction between two molecules as well as
any chemical reactions that occur as a direct result of interaction.
• Example: P-P, P-n, P-s. (phosphorylation of P, methylation of D,
hydrolysis of sugar)
• COMPLEX: describes a molecular complex by listing the series of
interaction records that are present in the complex.
• Example: multi-sub enzyme, actin fiber, ribosome
• PATHWAY: describes a cellular process pass a sequential list of
interaction records and its associated Chemical Action data.
• Example: cell-signaling pathway, synthesis of an amino acid,
transcription and splicing of a pre-massager RNA.
21. Current BIND Database Statistics
Database Record Count
Interaction Database 15145
Biomolecular Pathway Database 8
Molecular Complex Database 1306
Organisms represented 14
GI Database 4961
DI Database 0
Publication Database 454
22. What BIND can and cannot do right now
The design of the BIND database structure is a robust one that has been built to
accept data from all cell systems, the interface that you see is NOT the data
structure and it does not accurately reflect all of the potentialities of the database.
Tools are being built to implement these potentials, and changes are constantly
being made to the interface to make the database easier to use and understand.
BIND is currently able to accept records that describe protein-protein and protein-
nucleic acid interactions.
The BIND data specification is available as ASN.1 and XML DTD. ASN.1 data can
describe details underlying biochemical and genetic networks. XML versions of all
data with accompanying DTDs are supported through the use of the NCBI
programming toolkit.
23. Demonstrating the use of Binding sites and Binding Site Pairs
for a protein-protein interaction
The grey shapes represent autonomous
domains in proteins A and B that mediate a
protein-protein interaction. The black lines
in these grey shapes represent polypeptide
chains that continue outside of these
domains to make up the rest of proteins A
and B.
The protein-protein interaction between
these two domains is mediated by two
Binding Site pairs. The first pair (a salt
bridge) consists of a single amino acid on
molecule A (SLID 0) and a single amino
acid on B (SLID 0). These two amino acids
form the first Binding Site Pair. The second
pair consists of a range of amino acids on
A (SLID 1) and a range of amino acids on
B (SLID 1). These two ranges of amino
acids form the second Binding Site Pair.
24. The Algorithm MCODE
-An automated method for finding molecular complexes in
large protein interaction networks.
•The MCODE algorithm operates in three stages, vertex weighting,
complex prediction and optionally post-processing to filter or add
proteins in the resulting complexes by certain connectivity criteria
Background
Recent advances in proteomics technologies such as two-hybrid, phage
display and mass spectrometry have enabled us to create a detailed map
of biomolecular interaction networks.
The electronic version of this article is the complete one and can be found online at:
http://www.biomedcentral.com/1471-2105/4/2
25. The Algorithm MCODE
-An automated method for finding molecular
complexes in large protein interaction networks.
Results
The algorithm has the advantage over other graph clustering methods of
having a directed mode that allows fine-tuning of clusters of interest without
considering the rest of the network and allows examination of cluster
interconnectivity, which is relevant for protein networks. Protein interaction
and complex information from the yeast Saccharomyces cerevisiae was used
for evaluation.
Conclusion
Dense regions of protein interaction networks can be found, based solely on
connectivity data, many of which correspond to known protein complexes.
The algorithm is not affected by a known high rate of false positives in data
from high-throughput interaction techniques. The program is available from
ftp://ftp.mshri.on.ca/pub/BIND/Tools/MCODE
http://www.biomedcentral.com/1471-2105/4/2
26. How to use BIND
Pathway
The INAD Pathway in Drosophila Photoreceptors - A Tutorial
http://bind.ca/index2.phtml?site=tutor
27. How to use BIND
BIND Interaction Viewer Java Applet
BIND Interaction Viewer Java
applet showing how
molecules can be connected
in the database from
molecular complex to small
molecule.
Yellow, protein;
purple, small molecule;
white, molecular complex;
red, a square is fixed in
place and will not be moved
by the graph layout
algorithm.
This session was seeded by the
interaction between human
LAT and Grb2 proteins
involved in cell signaling in
the T-cell.
28. Reference
•Gary D Bader et al BMC Bioinformatics 2003 Jan 13;4(1):2
An automated method for finding molecular complexes in large protein
interaction networks
•Gary D. Bader Nucleic Acids Research, 2001, Vol. 29, No. 1 242-245
BIND—The Biomolecular Interaction Network Database
•http://bind.ca/
•http://nar.oupjournals.org/cgi/content/full/29/1/242
30. What is DIP?
Established in 1999 in UCLA
Primary goal
extract and integrate protein-protein info and
build a user-friendly environment.
The usage of DIP
31. The usage of DIP
Study
Protein function
Protein-protein relationship
Evolution of protein-protein interaction
The network of interacting proteins
The environments of protein-protein interactions
Predict
Unknown protein-protein interaction
The best interaction conditions
32. The structure of DIP
Protein Table Method Table
Interaction Table Reference Table
33. Protein Table
DIP accession number : <DIP:nnnN>
Identification numbers from :
SWISS-Prot, GenBank, PIR
Protein Name and description
Cross references
Graph
37. The current status of DIP
Number of proteins: 6978
Number of organisms: 101
Number of interactions:18260
Number of distinct experiments describing an
interaction: 22229
Number of articles: 2203
38. Other satellite databases
DLRP (http://dip.doe-mbi.ucla.edu/dip/DLRP.cgi)
- Database of Ligand-Receptor Partners
LiveDIP(http://dip.doe-mbi.ucla.edu/ldipc/tmpl/livedip.cgi)
- data of the protein states and state transition in
protein-protein interaction.
JDIP
- a stand-alone Java application that provides a
graphical, browser- independent interface to the
DIP database.
41. BIND and DIP Comparison
Data Stored Data Format
BIND interactions ASN.1
Molecular Complex XML
Pathways
DIP interactions XIN
Protein information tab-delimited
42. BIND and DIP Comparison
Size of the databases
Interactions Proteins Organisms
BIND 15145 Unknown 14
DIP 18260 6978 101
43. BIND and DIP Comparison
Graphic tools
Data display layout
45. 1) KEGG(Kyoto Encyclopedia of
Genes and Genomes)
Representation of higher order functions in terms of the network of
interaction molecules
GENES database contains 240 943 entries from the published genomes,
including the bacteria, mouse and human.
Has 3 databases, GENES, PATHWAY and LIGAND databases.
Each entry has the form, database:entry or organism:gene
ex) EC:6.3.2.3 : enzyme
genbank:DROALPC: gene
D.melanogaster:dpp : organism specific gene
46. By matching genes in the genome and gene products in the
pathway, KEGG can be used to predict protein interaction
networks and associated cellular function.
The data object stored in the PATHWAY database is called the
generalized protein interaction network, which is a network of
gene products with three types of interactions or relations:
enzyme-enzyme relations which catalyzes the successive reaction
steps in the metabolic pathway, direct protein-protein
interactions and gene expression relations. Currently, only
enzyme-enzyme relations are maintained.
PATHWAY database contains 5761 entries including 201
pathway diagrams with 14,960 enzyme-enzyme relations.
48. 2) WIT database – Oak Ridge
National Laboratory
Similar to KEGG
3) Eco Cyc – E Coli Encyclopedia
the genome and gene products of E Coli, its metabolic
and signal transduction pathways and its RNAs.
Contains 4391 genes, 904 metabolic reactions and 129
metabolic pathways
49. Graph theoretical algorithm
for finding the molecular complex
Small-world networks
- How to identify a set of central metabolites such as in BIND database MCODE
- Many biological networks have small-world characteristic
ex) Erdos number
Paul Erdos : A prominent Hungarian graph-theorist. He is the center of mathematical
collaboration. Coauthors of a paper with Erdos are one step from Erdos and has
Erdos number 1. Coauthors of a paper with mathematicians with Erdos number 1
have Etrdos number 2. Most mathematicians active in this century has a small Erdos
number
ex) Kevin Bacon game
It aims at connecting an arbitrary actor with the actor Kevin Bacon by the shortest
sequence of actor-pairs who have appeared together in a film. The average Bacon
number for an arbitrary actor turns out to be 2.87. (However, Kevin Bacon is not the
center of this small world of film actor collaboration. The center turns out to be
Christopher Lee, with a mean center of 2.60
)
50. Small-world lies between two extremes of graph,
completely regular and completely random graph.
Regular networks have long path lengths, and are
clustered, while random graphs has short path length
but shows little clustering.
Small-world networks has short path lengths but highly
clustered.
The metabolic network of E. coli falls into the small-
world network. The center of the map is glutamate
with a mean path of 2.46, followed by pyruvate with a
value of 2.59
52. MCODE(Molecular Complex
Detection) in BIND database
Algorithms for finding clusters – an active area of
computer science
- often based on network flow/minimum cut theory or
spectral clustering
- MCODE uses a vertex-weighting scheme based on the
clustering coefficient, Ci, which means the ‘cliquishness’
of the neighborhood of a vertex.
- Ci = 2n/ki (ki -1), where ki is the vertex size of the
neighborhood of vertex i and n is the number of edges
in the neighborhood.
53. Density of a subgraph is the number of edges divided by the
maximum possible number of edges, so it ranges from 0.0 to 1.0
A k-core is a subgraph of minimal degree k, i.e, every vertex of it
has degree >= k.
So, the highest k-core of a graph is the central most densely
connected subgraph
We define the core-clustering coefficient of a vertex to be the
density of the highest k-core of the immediate neighborhood of v,
including v.
54. The core-clustering coefficient amplifies the weighting of the
heavily interconnected graph regions while removing the many
less connected vertices that are characteristics of the bimolecular
interaction network
Then, the weight of a vertex is the product of the vertex core-
clustering coefficient and the highest k-core level, kmax, of the
immediate neighborhood of the vertex.
Then, finds a complex with the highest weight vertex and
recursively moves outward from this vertex, including vertices
whose weight is above a given threshold of the seed vertex. In
this way the densest regions of the network are identified.
The time complexity is O(nmh3), where n is the number of
vertices, m is the number of edges and h is the vertex size of the
average neighborhood in the graph
55. It is slower than the fastest min-cut graph clustering algorithm
with O(n2 log n) time complexity. But MCODE has a number of
advantages. Since weighting is done only once and it comprises
most of the execution time we can try many parameters. Another
is MCODE is relatively easy to implement.
57. Structure Visualization
One of the primary activities in proteomics
R&D is determining and Visualizing the 3D
structure of proteins in order to find where
drugs might modulate their activity. Other
activities include identifying all of the proteins
produced by a given cell or tissue and
determining how these proteins interact.
BIOINFORMATICS COMPUTING, p.186, Bryon Bergeron, M.D., Prentice Hall 2002
58. Structure Visualization
It’s generally understood by the molecular
biology research community that the
sequencing of the human genome, which will
likely take several more years to complete, is
relatively trivial compared to definitively
characterizing the interactions within the
proteome.
BIOINFORMATICS COMPUTING, p.186, Bryon Bergeron, M.D., Prentice Hall 2002
59. Non-Static Structure
Visualization
Unlike a nucleotide sequence, which is a
relatively static structure, proteins are dynamic
entities that change their shape and association
with other molecules as a function of
temperature, chemical interactions, pH, and
other changes in the environment.
BIOINFORMATICS COMPUTING, p.186, Bryon Bergeron, M.D., Prentice Hall 2002
60. Primary vs. Secondary and
Tertiary Structure
In contrast to visualizing the sequence of
nucleotides on a strand of DNA, visualizing the
primary structure of a protein adds little to the
knowledge of protein function. More interesting
and relevant are the higher-order structures.
61. Why Visualize?
In each area of bioinformatics, the rationale for using
graphics instead of tables or strings of data is to shift
the user’s mental processing from reading and
mathematical, logical interpretation to faster pattern
recognition.
BIOINFORMATICS COMPUTING, p.180, Bryon Bergeron, M.D., Prentice Hall 2002
Pattern recognition is an area where humans are much
more efficient than computers.
62. Some Common Tools
100’s of visualization tools have been developed
in bioinformatics.
Many are specific to hardware such as
microarray devices.
Shareware utilities for PC’s
PDB Viewer, WebMol, RasMol, Protein Explorer, Cn3D
VMD, MolMol, MidasPlus, Pymol, Chime, Chimera
63. Application Feature Summary
Feature RasMol Cn3D PyMol SWISS- Chimera
PDBViewer
Architecture Stand-Alone Plug-in Web- Web-enabled Web-enabled
Enabled
Manipulation Low High High High High
Power
Hardware Low/Moderate High High Moderate High
Requirements
Ease of Use High; Moderate Moderate High Moderate;GUI
command line +command
line
Special Small Size; Powerful GUI; ray Powerful GUI GUI;
Features easy install GUI tracing collaboration
Output Quality Moderate Very high High High Very high
Documentation Good Good Limited Good Very good
Support Online; Users Online; Online; Online; Users Online; Users
groups Users Users groups groups
groups groups
Speed High Moderate Moderate Moderate Moderate/Slow
OpenGL Yes Yes Yes Yes Yes
Support
64. Molecule Representations
Wireframe Bonds and Bond Angles
Ball and Stick Shows Atoms, Bonds and
Bonds Angles
Ribbon diagrams Shows Secondary Structure
Van der Waals Shows Atomic Volumes
surface Diagram
Backbone Shows Overall Molecular
Structure
70. Other properties that can be Visualized
MolMol supports the display of electrostatic potentials across
a protein molecule.
MidasPlus (a predecessor of Chimera) allows for the editing
of sequences visually to see the effects of point mutations.
71. HCI and Protein-Protein Interaction
Creating a suitable metaphor to transform data into a form
that means something to the user.
Large volumes of complex data require more complex
metaphors than, for example, the pie chart used in business
graphics.
Different users require different levels of complexity – and
therefore different metaphors.
The desktop, folder, trashcan metaphor could be replaced by
a chromosome, gene, protein, pathway metaphor.
72. For Protein interactions, we need a
metaphor that reveals dynamics
Haptic Joystick: Provides Stereo view of interaction of two proteins. Scripting allows for the
force feedback when user movement of individual molecules creating a movie.
manipulates a molecule
near another one.
3D Goggles combined
with haptic gloves to feel
electrostatic potentials
and see tertiary structure
dynamics.
PyMol provides scripting
that can produce a movie
in 3D of the geometrical
relationship between
multiple proteins.
73. The field is wide open.
To definitively characterize the interactions
within the proteome, we need more tools.
We need new metaphors for managing this
complex data.
We need tools to reveal dynamic relationships.