Ppi

Protein-Protein
Interaction
L519 presentation

Group Members and Content
 Introduction
- biological aspect of protein-protein interaction.
(Zhenli Su)
 Protein-protein interaction databases
- BIND (Xin Hong)
- DIP (Xiang Zhou)
 Pathway databases and Algorithms (Paul Ma)
 Visualization Tools (James Coleman)
present by Xiang Zhou

Biological Aspects of
Protein-Protein Interaction

Zhenlu Su

 Introduction to protein protein interactions
 The importance of the interactions
 Impact of protein interaction technologies on
other fields
 The types of protein interactions
 The methods of protein interactions

Introduction to protein protein
interactions
Proteins control and mediate many of the biological
activities of cells
 A cell is not static

Changes in shape
Division
Metabolism
 All cells are not equivalent

Lymphoid
Neural

Why are protein-protein
interactions so important?
The binding of one signaling protein to another can have
a number of consequences:
 Such binding can serve to recruit a signaling protein to
a location where it is activated and/or where it is
needed to carry out its function.
 The binding of one protein to another can induce
conformational changes that affect activity or
accessibility of additional binding domains, permitting
additional protein interactions.

Why are protein-protein
interactions so important?
 Imagine a cell in which, suddenly, the specific
interactions between proteins would disappear.
This unfortunate cell would become deaf and
blind, paralytic and finally would disintegrate,
because specific interactions are involved in
almost any physiological process.

Impact on other fields
 Cancer Biology
The study of protein-protein interactions has provided important insights into
the functions of many of the known oncogenes, tumor suppressors, and
DNA repair proteins.
 Pharmacogenetics
Pharmacogenetic research has expanded to include the study of drug
transporters, drug receptors, and drug targets.

The types of protein interactions
 Binary protein protein
interactions
 Scaffolding proteins

http://www.udel.edu/che
m/bahnson/chem667/cr
otty/scaffolding_protein
s.html#scaffolding

The types of protein interactions
-another classification
 Metabolic and signaling (genetic)pathways
 Morphogenic pathways in which groups of
proteins participate in the same cellular function
during a developmental process
 Structural complexes and molecular machines in
which numerous macromolecules are brought
together

Structural complexes and molecular
machines
Chaperones: protein refolding machines
http://www-cryst.bioc.cam.ac.uk/cgi-bin/cgiwrap/hom
http://www.nature.com/nsb/web_specials/movies/sa

Experimental methods
 Tagged Fusion Proteins
 Coimmunoprecipitation
 Yeast Two-hybrid
 Biacore
 Atomic Force Microscopy (AFM)
 Fluorescence Resonace Energy Trasfer (FRET)
 X-ray Diffraction

Experimental methods
 The first comprise and ‘atomic observation’ in which the protein interaction
is detected using, for example, X-ray crystallography. These experiments can
yield specific information on the atoms or residues involved in the
interaction.
 The second is a ‘direct interaction observation’ where protein interaction
between two partners can be detected as in a two-hybrid experiment.
 At a third level of observation, multi-protein complexes can be detected using
methods such as immuno-precipitation or mass-specific analysis. This type of
experiment does not unveil the chemical detail of the interactions or even
reveal which proteins are in direct contact but gives information as to which
proteins are found in a complex at a given time.
 The fourth category comprises measurements at the cellular level, where an
‘activity bioassay’ is used to observe an interaction; for example, proliferation
assays of cells by a receptor-ligand interaction.

Databases
BIND
(Biomolecular Interaction Network Database)
Xin Hong

Introduction of BIND
 Background
 What is BIND
 MCODE Algorithm
 How to use BIND
 Reference

Background

 Recent advances in proteomics technologies such as two-hybrid, phage
display and mass spectrometry have enabled us to create a detailed map of
biomolecular interaction networks. Initial mapping efforts have already
produced a wealth of data. As the size of the interaction set increases,
databases and computational methods will be required to store, visualize and
analyze the information in order to effectively aid in knowledge discovery.

 For the protein-protein interactions, there are mnay websites can
be reached, here I just show several.
 BIND (Interaction Network Database)
 DIP (Database of Interacting Proteins)
 Protein-Protein Interaction Server
 Protein-Protein Interface

What is BIND

The Biomolecular Interaction Network Database is a database designed to
store full descriptions of interactions, molecular complexes and pathways.

Development of the BIND 2.0 data model has led to the incorporation of
virtually all components of molecular mechanisms including interactions
between any two molecules composed of proteins, nucleic acids and small
molecules. Chemical reactions, photochemical activation and conformational
changes can also be described. Everything from small molecule biochemistry to
signal transduction is abstracted in such a way that graph theory methods may be
applied for data mining.

The database can be used to study networks of interactions, to map pathways
across taxonomic branches and to generate information for kinetic simulations.
BIND anticipates the coming large influx of interaction information from high-
throughput proteomics efforts including detailed information about post-
translational modifications from mass spectrometry.

What kind of data stored in BIND?
• INTERACTION: The interaction between two molecules as well as
any chemical reactions that occur as a direct result of interaction.
• Example: P-P, P-n, P-s. (phosphorylation of P, methylation of D,
hydrolysis of sugar)

• COMPLEX: describes a molecular complex by listing the series of
interaction records that are present in the complex.
• Example: multi-sub enzyme, actin fiber, ribosome

• PATHWAY: describes a cellular process pass a sequential list of
interaction records and its associated Chemical Action data.
• Example: cell-signaling pathway, synthesis of an amino acid,
transcription and splicing of a pre-massager RNA.

Current BIND Database Statistics

Database Record Count
Interaction Database 15145
Biomolecular Pathway Database 8
Molecular Complex Database 1306
Organisms represented 14
GI Database 4961
DI Database 0
Publication Database 454

What BIND can and cannot do right now

 The design of the BIND database structure is a robust one that has been built to
accept data from all cell systems, the interface that you see is NOT the data
structure and it does not accurately reflect all of the potentialities of the database.
Tools are being built to implement these potentials, and changes are constantly
being made to the interface to make the database easier to use and understand.

 BIND is currently able to accept records that describe protein-protein and protein-
nucleic acid interactions.

 The BIND data specification is available as ASN.1 and XML DTD. ASN.1 data can
describe details underlying biochemical and genetic networks. XML versions of all
data with accompanying DTDs are supported through the use of the NCBI
programming toolkit.

Demonstrating the use of Binding sites and Binding Site Pairs
for a protein-protein interaction
 The grey shapes represent autonomous
domains in proteins A and B that mediate a
protein-protein interaction. The black lines
in these grey shapes represent polypeptide
chains that continue outside of these
domains to make up the rest of proteins A
and B.
 The protein-protein interaction between
these two domains is mediated by two
Binding Site pairs. The first pair (a salt
bridge) consists of a single amino acid on
molecule A (SLID 0) and a single amino
acid on B (SLID 0). These two amino acids
form the first Binding Site Pair. The second
pair consists of a range of amino acids on
A (SLID 1) and a range of amino acids on
B (SLID 1). These two ranges of amino
acids form the second Binding Site Pair.

The Algorithm MCODE

-An automated method for finding molecular complexes in
large protein interaction networks.

•The MCODE algorithm operates in three stages, vertex weighting,
complex prediction and optionally post-processing to filter or add
proteins in the resulting complexes by certain connectivity criteria

Background
Recent advances in proteomics technologies such as two-hybrid, phage
display and mass spectrometry have enabled us to create a detailed map
of biomolecular interaction networks.

The electronic version of this article is the complete one and can be found online at:
http://www.biomedcentral.com/1471-2105/4/2

The Algorithm MCODE

-An automated method for finding molecular
complexes in large protein interaction networks.
Results
The algorithm has the advantage over other graph clustering methods of
having a directed mode that allows fine-tuning of clusters of interest without
considering the rest of the network and allows examination of cluster
interconnectivity, which is relevant for protein networks. Protein interaction
and complex information from the yeast Saccharomyces cerevisiae was used
for evaluation.

Conclusion
Dense regions of protein interaction networks can be found, based solely on
connectivity data, many of which correspond to known protein complexes.
The algorithm is not affected by a known high rate of false positives in data
from high-throughput interaction techniques. The program is available from
ftp://ftp.mshri.on.ca/pub/BIND/Tools/MCODE
http://www.biomedcentral.com/1471-2105/4/2

How to use BIND
Pathway

The INAD Pathway in Drosophila Photoreceptors - A Tutorial
http://bind.ca/index2.phtml?site=tutor

How to use BIND
BIND Interaction Viewer Java Applet

BIND Interaction Viewer Java
applet showing how
molecules can be connected
in the database from
molecular complex to small
molecule.
 Yellow, protein;
 purple, small molecule;
 white, molecular complex;
 red, a square is fixed in
place and will not be moved
by the graph layout
algorithm.
This session was seeded by the
interaction between human
LAT and Grb2 proteins
involved in cell signaling in
the T-cell.

Reference

•Gary D Bader et al BMC Bioinformatics 2003 Jan 13;4(1):2
An automated method for finding molecular complexes in large protein
interaction networks

•Gary D. Bader Nucleic Acids Research, 2001, Vol. 29, No. 1 242-245
BIND—The Biomolecular Interaction Network Database

•http://bind.ca/

•http://nar.oupjournals.org/cgi/content/full/29/1/242

Databases
DIP
(Database of Interacting Proteins)
Xiang Zhou

What is DIP?
 Established in 1999 in UCLA
 Primary goal
extract and integrate protein-protein info and
build a user-friendly environment.
 The usage of DIP

The usage of DIP
Study
 Protein function
 Protein-protein relationship
 Evolution of protein-protein interaction
 The network of interacting proteins
 The environments of protein-protein interactions

Predict
 Unknown protein-protein interaction
 The best interaction conditions

The structure of DIP

Protein Table Method Table

Interaction Table Reference Table

Protein Table
 DIP accession number : <DIP:nnnN>
 Identification numbers from :
SWISS-Prot, GenBank, PIR
 Protein Name and description
 Cross references
 Graph

Interaction Table
 Interacting proteins
 Links to
- Methods
- Original papers

The current status of DIP
 Number of proteins: 6978
 Number of organisms: 101
 Number of interactions:18260
 Number of distinct experiments describing an
interaction: 22229
 Number of articles: 2203

Other satellite databases
 DLRP (http://dip.doe-mbi.ucla.edu/dip/DLRP.cgi)
- Database of Ligand-Receptor Partners
 LiveDIP(http://dip.doe-mbi.ucla.edu/ldipc/tmpl/livedip.cgi)
- data of the protein states and state transition in
protein-protein interaction.
 JDIP
- a stand-alone Java application that provides a
graphical, browser- independent interface to the
DIP database.

Document types and annotations
 Document types
- XIN and tab-delimited formats
 Annotations
- Node: <DIP: nnnN>
- Edge: <DIP: nnnE>

Search DIP

http://dip.doe-mbi.ucla.edu/dip/Search.cgi

BIND and DIP Comparison
Data Stored Data Format
BIND  interactions  ASN.1
 Molecular Complex  XML
 Pathways

DIP  interactions  XIN
 Protein information  tab-delimited

 Size of the databases

Interactions Proteins Organisms

BIND 15145 Unknown 14

DIP 18260 6978 101

 Graphic tools
 Data display layout

Pathway Databases and Algorithms

Paul Ma

1) KEGG(Kyoto Encyclopedia of
Genes and Genomes)
 Representation of higher order functions in terms of the network of
interaction molecules
 GENES database contains 240 943 entries from the published genomes,
including the bacteria, mouse and human.
 Has 3 databases, GENES, PATHWAY and LIGAND databases.

 Each entry has the form, database:entry or organism:gene
ex) EC:6.3.2.3 : enzyme
genbank:DROALPC: gene
D.melanogaster:dpp : organism specific gene

 By matching genes in the genome and gene products in the
pathway, KEGG can be used to predict protein interaction
networks and associated cellular function.
 The data object stored in the PATHWAY database is called the
generalized protein interaction network, which is a network of
gene products with three types of interactions or relations:
enzyme-enzyme relations which catalyzes the successive reaction
steps in the metabolic pathway, direct protein-protein
interactions and gene expression relations. Currently, only
enzyme-enzyme relations are maintained.
 PATHWAY database contains 5761 entries including 201
pathway diagrams with 14,960 enzyme-enzyme relations.

An example of a pathway entry in KEGG- Glycolysis

2) WIT database – Oak Ridge
National Laboratory
 Similar to KEGG

3) Eco Cyc – E Coli Encyclopedia
 the genome and gene products of E Coli, its metabolic
and signal transduction pathways and its RNAs.
Contains 4391 genes, 904 metabolic reactions and 129
metabolic pathways

Graph theoretical algorithm
for finding the molecular complex

Small-world networks
- How to identify a set of central metabolites such as in BIND database  MCODE
- Many biological networks have small-world characteristic
ex) Erdos number
Paul Erdos : A prominent Hungarian graph-theorist. He is the center of mathematical
collaboration. Coauthors of a paper with Erdos are one step from Erdos and has
Erdos number 1. Coauthors of a paper with mathematicians with Erdos number 1
have Etrdos number 2. Most mathematicians active in this century has a small Erdos
number
ex) Kevin Bacon game
It aims at connecting an arbitrary actor with the actor Kevin Bacon by the shortest
sequence of actor-pairs who have appeared together in a film. The average Bacon
number for an arbitrary actor turns out to be 2.87. (However, Kevin Bacon is not the
center of this small world of film actor collaboration. The center turns out to be
Christopher Lee, with a mean center of 2.60
)

 Small-world lies between two extremes of graph,
completely regular and completely random graph.
 Regular networks have long path lengths, and are
clustered, while random graphs has short path length
but shows little clustering.
 Small-world networks has short path lengths but highly
clustered.
 The metabolic network of E. coli falls into the small-
world network. The center of the map is glutamate
with a mean path of 2.46, followed by pyruvate with a
value of 2.59

MCODE(Molecular Complex
Detection) in BIND database
 Algorithms for finding clusters – an active area of
computer science
- often based on network flow/minimum cut theory or
spectral clustering
- MCODE uses a vertex-weighting scheme based on the
clustering coefficient, Ci, which means the ‘cliquishness’
of the neighborhood of a vertex.
- Ci = 2n/ki (ki -1), where ki is the vertex size of the
neighborhood of vertex i and n is the number of edges
in the neighborhood.

 Density of a subgraph is the number of edges divided by the
maximum possible number of edges, so it ranges from 0.0 to 1.0
 A k-core is a subgraph of minimal degree k, i.e, every vertex of it
has degree >= k.
So, the highest k-core of a graph is the central most densely
connected subgraph
 We define the core-clustering coefficient of a vertex to be the
density of the highest k-core of the immediate neighborhood of v,
including v.

 The core-clustering coefficient amplifies the weighting of the
heavily interconnected graph regions while removing the many
less connected vertices that are characteristics of the bimolecular
interaction network
 Then, the weight of a vertex is the product of the vertex core-
clustering coefficient and the highest k-core level, kmax, of the
immediate neighborhood of the vertex.
 Then, finds a complex with the highest weight vertex and
recursively moves outward from this vertex, including vertices
whose weight is above a given threshold of the seed vertex. In
this way the densest regions of the network are identified.
 The time complexity is O(nmh3), where n is the number of
vertices, m is the number of edges and h is the vertex size of the
average neighborhood in the graph

 It is slower than the fastest min-cut graph clustering algorithm
with O(n2 log n) time complexity. But MCODE has a number of
advantages. Since weighting is done only once and it comprises
most of the execution time we can try many parameters. Another
is MCODE is relatively easy to implement.

Structure Visualization Tools

Written by James Coleman
Presented by Xiang Zhou

Structure Visualization
 One of the primary activities in proteomics
R&D is determining and Visualizing the 3D
structure of proteins in order to find where
drugs might modulate their activity. Other
activities include identifying all of the proteins
produced by a given cell or tissue and
determining how these proteins interact.
 BIOINFORMATICS COMPUTING, p.186, Bryon Bergeron, M.D., Prentice Hall 2002

Structure Visualization

 It’s generally understood by the molecular
biology research community that the
sequencing of the human genome, which will
likely take several more years to complete, is
relatively trivial compared to definitively
characterizing the interactions within the
proteome.

Non-Static Structure
Visualization
 Unlike a nucleotide sequence, which is a
relatively static structure, proteins are dynamic
entities that change their shape and association
with other molecules as a function of
temperature, chemical interactions, pH, and
other changes in the environment.

Primary vs. Secondary and
Tertiary Structure
 In contrast to visualizing the sequence of
nucleotides on a strand of DNA, visualizing the
primary structure of a protein adds little to the
knowledge of protein function. More interesting
and relevant are the higher-order structures.

Why Visualize?
 In each area of bioinformatics, the rationale for using
graphics instead of tables or strings of data is to shift
the user’s mental processing from reading and
mathematical, logical interpretation to faster pattern
recognition.
BIOINFORMATICS COMPUTING, p.180, Bryon Bergeron, M.D., Prentice Hall 2002

 Pattern recognition is an area where humans are much
more efficient than computers.

Some Common Tools
 100’s of visualization tools have been developed
in bioinformatics.
 Many are specific to hardware such as
microarray devices.
 Shareware utilities for PC’s
 PDB Viewer, WebMol, RasMol, Protein Explorer, Cn3D
 VMD, MolMol, MidasPlus, Pymol, Chime, Chimera

Application Feature Summary
Feature RasMol Cn3D PyMol SWISS- Chimera
PDBViewer
Architecture Stand-Alone Plug-in Web- Web-enabled Web-enabled
Enabled
Manipulation Low High High High High
Power
Hardware Low/Moderate High High Moderate High
Requirements
Ease of Use High; Moderate Moderate High Moderate;GUI
command line +command
line

Special Small Size; Powerful GUI; ray Powerful GUI GUI;
Features easy install GUI tracing collaboration

Output Quality Moderate Very high High High Very high

Documentation Good Good Limited Good Very good

Support Online; Users Online; Online; Online; Users Online; Users
groups Users Users groups groups
groups groups
Speed High Moderate Moderate Moderate Moderate/Slow

OpenGL Yes Yes Yes Yes Yes
Support

Molecule Representations
Wireframe Bonds and Bond Angles

Ball and Stick Shows Atoms, Bonds and
Bonds Angles

Ribbon diagrams Shows Secondary Structure

Van der Waals Shows Atomic Volumes
surface Diagram

Backbone Shows Overall Molecular
Structure

Wireframe used to show individual chains:

Stick view showing atoms and bonds:

Surface View showing surface fields:

Ribbon view of secondary structure:

Distinct geometrical features by color:

Other properties that can be Visualized

 MolMol supports the display of electrostatic potentials across
a protein molecule.
 MidasPlus (a predecessor of Chimera) allows for the editing
of sequences visually to see the effects of point mutations.

HCI and Protein-Protein Interaction

 Creating a suitable metaphor to transform data into a form
that means something to the user.
 Large volumes of complex data require more complex
metaphors than, for example, the pie chart used in business
graphics.
 Different users require different levels of complexity – and
therefore different metaphors.
 The desktop, folder, trashcan metaphor could be replaced by
a chromosome, gene, protein, pathway metaphor.

For Protein interactions, we need a
metaphor that reveals dynamics
 Haptic Joystick: Provides Stereo view of interaction of two proteins. Scripting allows for the
force feedback when user movement of individual molecules creating a movie.
manipulates a molecule
near another one.
 3D Goggles combined
with haptic gloves to feel
electrostatic potentials
and see tertiary structure
dynamics.
 PyMol provides scripting
that can produce a movie
in 3D of the geometrical
relationship between
multiple proteins.

The field is wide open.
 To definitively characterize the interactions
within the proteome, we need more tools.
 We need new metaphors for managing this
complex data.
 We need tools to reveal dynamic relationships.

Ppi

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Ppi

Similaire à Ppi (20)

Ppi