Web-based application to survey properties of homologous proteins

Web-based application
to survey properties of
homologous proteins.
proteins

Candidato:
Diego Poggioli
Relatore:
Prof. Rita Casadio
Correlatore:
Dr. Brigitte Boeckmann

• Bio-problem: Visualization and interaction with
biological data and performing a comparative protein
analysis
• Info-solution: Web application – CGI

The portal gives access to four web pages:
1) Function-related annotation derived from UniProtKB/Swiss-Prot;
2) Feature of the protein group;
3) Conservation score;
4) Tree.

Members of a protein family normally perform
a general biochemical function in common,
but one or more subgroups may evolve a
slightly different function, such as different
substrate specificity.

By comparing groups and subgroups of proteins it is possible
to identify or estimate:

• similarity and differences between the proteins sequences
as well as the information available for the given protein
group;

• the ranges, within which functional information on proteins
can be transferred from experimentally characterized proteins
to their homologs from poorly studied organism;

• errors in the annotations of proteins;

Visualization and interact with biological data

Available from
any PC

System and browser
independent

php C GI

Dinamic page

HTML JavaScript, PHP, Perl, Python, Ajax, ASP, Ruby…

ID AVID_CHICK Reviewed; 152 AA.
Form filling and data type AC
DT
P02701; Q91958; Q98SH4;
21-JUL-1986, integrated into
DT 11-SEP-2007, sequence version
DT 10-JUN-2008, entry version 87.
DE Avidin precursor.
GN Name=AVD;
OS Gallus gallus (Chicken).
OC Eukaryota; Metazoa; Chordata
OC Archosauria; Dinosauria
OC Neognathae; Galliformes
OX NCBI_TaxID=9031; RN [1] RP NUC
RX MEDLINE=87203384; PubMed
RA Gope M.L., Keinaenen R.A.,
RA Zarucki-Schulz T., O'Malley B.
RT quot;Molecular cloning of the chic
RL Nucleic Acids Res. 15:3595
RN [2] RP NUCLEOTIDE SEQUENCE [MR
RX MEDLINE=90355928; PubMed
RA Chandra G., Gray J.G.;
RT quot;Cloning and expression of
RL Methods Enzymol. 184:70
…

AVID_CHICK
AVR2_CHICK
AVR4_CHICK
AVR1_CHICK
AVR3_CHICK
AVR6_CHICK
AVR7_CHICK

P02701
P56732
P56734
O13153
P56733
P56735
P56736

BioView
• overview on biological informations
• taxonomic descriptive statistics

a compact summary view on the biological information of
a protein group is important especially when having a
large dataset. This way it will be possible to observe,
compare and count all common and dissimilar
characteristics; it is also possible to analyze in every
single detail of component with the same featuring.

- gene name, functional (catalytic activity, enzyme regulation, pathway…) and general
descriptive information;
- organism classification (OC) and organism species (OS);
- non-experimental qualifiers (by similarities, putative or probable).

Pipeline BioView page

ID, AC, DE, CC:'FUNCTION', 'PATHWAY', 'CATALYTIC
ACTIVITY', 'ENZYME REGULATION', 'SUBUNIT',
'SIMILARITY', 'COFACTOR', 'DEVELOPMENTAL STAGE',
'INDUCTION', 'PTM', 'SUBCELLULAR LOCALIZATION',
'TISSUE SPECIFICITY'

OS, OC

Eukaryota -
Viridiplantae Eukaryota
Streptophyta Viridiplantae
Embryophyta Streptophyta
Tracheophyta Embryophyta
... ...

Nuber of entries
Non-redundant annotation

Number of entries with non-experimental qualifier

Number of entries with annotated experimental qualifier

On mouse-click the relevant entry names are listed

Expande all the hierarchy

FeatureView

• Interactive interface for visualizing
function-related features on the protein
sequence and 3D structure

• This page should allow the user to analyze
combined sequences-structure on a broad
set of data showing the greatest number of
information available in a clear and
intuitive way.

Function-related features derived from the FT lines of
UniProtKB:

active sites, binding sites, domain, transmembrane
region, DNA binding domain…

are mapped on the alignment and highlighted to allow a
clear and compact presentation of the relevant
information. The characteristics are mapped on the
structure in the same way, allowing to identify regions
and conserved sites.

Sequence FT Structure

FeatureView
• Choose the best structure

• Alignment

• Mapping the feature on the alignment and
on the structure

Choose the best structure

*
...
'91 ' => ‘91',
'25 ' => ‘25',
'92 ' => ‘92',
'81 ' => ‘82',
'71 ' => ‘71',
'21 ' => ‘23',
'-' => 'x',
'61 ' => ‘61',
'37 ' => ‘37',
'68 ' => ‘68',
'50 ' => ‘50',
'18 ' => ‘15',
...

F.P.A. David and Y.L. Yip. SSMap*: a new UniProt-PDB mapping resource for the curation of structural-related
information in the UniProt/Swiss-Prot Knowledgebase. Submitted

Jmol: an open-source Java viewer for chemical structures in 3D. http://www.jmol.org/

Alignment
Input file

Edgar, Robert C. (2004), MUSCLE: multiple sequence alignment with high accuracy and
high throughput, Nucleic Acids Research 32(5), 1792-97.

FeatureView
• Choose the best structure
• Alignment
• Mapping the feature on the alignment
and on the structure

Alignment

Input file

FT (Feature Table) lines
I group: ('CA_BIND', 'NP_BIND', 'MOTIF', 'ACT_SITE', 'METAL',
'BINDING', 'SITE', 'NON_STD', 'MOD_RES', 'LIPID', 'CARBOHYD',
'DISULFID', 'CROSSLINK');

II group: ('PEPTIDE', 'TOPO_DOM', 'TRANSMEM', 'DOMAIN',
'REPEAT', 'ZN_FING', 'DNA_BIND', 'REGION', 'COILED');

FT (Feature Table) lines
distinct font color and with a toolbox I group: ('CA_BIND', 'NP_BIND', 'MOTIF',
containing the description of the feature 'ACT_SITE', 'METAL', 'BINDING', 'SITE',
(entry name, feature key, sequence position, 'NON_STD', 'MOD_RES', 'LIPID', 'CARBOHYD',
description) 'DISULFID', 'CROSSLINK');

II group: ('PEPTIDE', 'TOPO_DOM',
different background colour and a toolbox with the 'TRANSMEM', 'DOMAIN', 'REPEAT', 'ZN_FING',
content as described above. 'DNA_BIND', 'REGION', 'COILED');

-overlapping into the first group represented in toolbox.
-ovelapping into the second group different background color.

ATOM 1817 N MET B 3 -31.380 87.126 39.296 1.0 100.00
ATOM 1818 CA MET B 3 -30.684 88.400 39.176 1.0 100.00
ATOM 1819 C MET B 3 -30.858 88.967 37.771 1.0 100.00
ATOM 1820 O MET B 3 -30.195 88.514 36.832 1.0 100.00
ATOM 1821 CB MET B 3 -29.190 88.285 39.498 1.0 100.00
ATOM 1822 CG MET B 3 -28.465 89.628 39.501 1.0 100.00
ATOM 1823 SD MET B 3 -26.671 89.415 39.661 1.0 100.00
ATOM 1824 CE MET B 3 -26.312 90.705 40.863 1.0 100.00
ATOM 1825 N GLU B 4 -31.750 89.938 37.638 1.0 50.00
ATOM 1826 CA GLU B 4 -31.927 90.498 36.300 1.0 50.00
… … … … … … … … … … …

100.00
Alignment position 00.00
50.00

On mouse-click run blastp on UniProt web page

On mouse-click start Jalview applet

Conservation
• Interactive interface for visualizing the
structural conservation of protein groups
on the protein sequence and 3D structure
• Highlight positions and regions conserved
in the group of proteins
• Conservation scores are mapped on the
multiple sequence alignment (MSA) and
into the 3D-structure

Scoring residue conservation

Input file

Scoring methods
Method name Type of score Description
basicmdm Sum-of-Pairs (SP), matrix score Simplest SP score possible
Normalized Shanon entropy with 7
entropynorm7 Entropic
symbol types
Normalized Shannon entropy with
entropynorm21 Entropic
21 symbol types.
Entropic, matrix score, sequence
trident Mixed model score.
weighted
SP, matrix score, sequence Score used in Valdar & Thornton
valdar01
weighted 2001

0.000 # ---S--------
0.000 # ---T--------
0.000 # ---S--------
0.000 # ---T--------
0.000 # ---S--------
0.024 # ---TM-M-----
0.320 # MMMSV-VVMM--
0.278 # VVVDHMHHGGG-
0.500 # LLLYLLWWLLL-
0.603 # SSSSTTTSSSS-
0.391 # PAAAPAAEDDD-
0.424 # AAAAEEEVGGQT
0.809 # DDDDEEEEEEEE

At the moment it is a framework integrated for the development
of the visualization of info such as annotation and for the
visualization of sites that differ in conservation between protein
subgroups.

• develop a method to compare two or more protein subgroups
• profile

Input file

Tree
The phylogenetic tree of the protein group
will be shown in this page .

Software for phylogenetic tree visualization and manipulations
http://bioinfo.unice.fr/biodiv/Tree_editors.html

- Treedyn: works in local machine but not in server side (graphical applet needed)
- Phylodendron: trouble with cgi script
-phyfi: private program it is not possible to install on own server, eventually URL
request
-nexplorer: NEXUS format needed and it is not possible to install on own server
- dnd2svg.pl: strict sequence number – output only in SVG format
-TreeFam: only private program
ATV 1.92

Input file

Gascuel O.1997. BIONJ: an improved version of the NJ algorithm based on a
simple model of sequence data. Molecular Biology and Evolution, 14:685-695.

Tree in Newick format

((((ACADM_HUMAN:0.000925,ACADM_PANTR:0.003941):0.014922,ACADM_MACFA:0.021579):0.041621,((ACADM
_MOUSE:0.015113,ACADM_RAT:0.029420):0.051559,(ACADM_DROME:0.187088,((ACAD8_MOUSE:0.049728,ACAD
8_HUMAN:0.052753):0.013706,ACAD8_BOVIN:0.104627):1.146493):0.149078):0.010918):0.015504,ACADM_
PIG:0.057735,ACADM_BOVIN:0.023577);

http://www.phylosoft.org/atv/
Zmasek C.M. and Eddy S.R. (2001) ATV: display
and manipulation of annotated phylogenetic trees.
Bioinformatics, 17, 383-384.

http://www.jalview.org/
Clamp, M., Cuff, J., Searle, S. M. and
Barton, G. J. (2004). The Jalview Java
Alignment Editor. Bioinformatics, 20, 426-7

Future plans
• Normalize HTML pages according to the W3C standard
• Improve the use of CSS
• Test the application on different web browser
• Write the application in a server side language
• Integrate the application with other databases
• Ensuring multiple access to the application and analysis
history
• Develop a view of phylogenetic tree to show and to
interact with additional information
• Hierarchical phylogeny-based classification in UniProtKB

Following the hierarchical
phylogeny-based classification in
UniProtKB

Acknowledgements

• Brigitte Boeckmann & Rita Casadio
• Swiss-Prot lab, Biocomputing group
• Fabrice David & Marco Vassura
• Tutti i miei amici e Fra
• Dolores e Davide

And now?

practical examples

- identifysimilarity and differences between the proteins
sequences as well as the information available for the given
protein group;
- estimating the ranges, within which functional information
on proteins can be transferred from experimentally
characterized proteins to their homologs from poorly studied
organism;
- identify errors in the annotations of proteins;

Compact summary view on the biological information of a protein group is important
especially when having a large dataset. This way it will be possible to observe,
compare and count all common and dissimilar characteristics; it is also possible to
analyze in every single detail of component with the same featuring.

Acetylglutamate kinase family

Web-based application to survey properties of homologous proteins

Web-based application to survey properties of homologous proteins

Recommandé

Recommandé

Contenu connexe

Similaire à Web-based application to survey properties of homologous proteins

Similaire à Web-based application to survey properties of homologous proteins (20)

Dernier

Dernier (20)

Web-based application to survey properties of homologous proteins