SlideShare une entreprise Scribd logo
1  sur  53
Télécharger pour lire hors ligne
Web-based application
 to survey properties of
    homologous proteins.
               proteins

Candidato:
Diego Poggioli
Relatore:
Prof. Rita Casadio
Correlatore:
Dr. Brigitte Boeckmann
• Bio-problem: Visualization and interaction with
  biological data and performing a comparative protein
  analysis
• Info-solution: Web application – CGI


The portal gives access to four web pages:
1) Function-related annotation derived from UniProtKB/Swiss-Prot;
2) Feature of the protein group;
3) Conservation score;
4) Tree.
Members of a protein family normally perform
 a general biochemical function in common,
  but one or more subgroups may evolve a
  slightly different function, such as different
               substrate specificity.
By comparing groups and subgroups of proteins it is possible
to identify or estimate:

• similarity and differences between the proteins sequences
as well as the information available for the given protein
group;

• the ranges, within which functional information on proteins
can be transferred from experimentally characterized proteins
to their homologs from poorly studied organism;

• errors in the annotations of proteins;
Visualization and interact with biological data
Available from
                                                         any PC




                                                 System and browser
                                                 independent




           php       C GI




                                                         Dinamic page


HTML   JavaScript, PHP, Perl, Python, Ajax, ASP, Ruby…
ID   AVID_CHICK Reviewed; 152 AA.
Form filling and data type   AC
                             DT
                                  P02701; Q91958; Q98SH4;
                                  21-JUL-1986, integrated into
                             DT   11-SEP-2007, sequence version
                             DT   10-JUN-2008, entry version 87.
                             DE   Avidin precursor.
                             GN   Name=AVD;
                             OS   Gallus gallus (Chicken).
                             OC   Eukaryota; Metazoa; Chordata
                             OC   Archosauria; Dinosauria
                             OC   Neognathae; Galliformes
                             OX   NCBI_TaxID=9031; RN [1] RP NUC
                             RX   MEDLINE=87203384; PubMed
                             RA   Gope M.L., Keinaenen R.A.,
                             RA   Zarucki-Schulz T., O'Malley B.
                             RT   quot;Molecular cloning of the chic
                             RL   Nucleic Acids Res. 15:3595
                             RN   [2] RP NUCLEOTIDE SEQUENCE [MR
                             RX   MEDLINE=90355928; PubMed
                             RA   Chandra G., Gray J.G.;
                             RT   quot;Cloning and expression of
                             RL   Methods Enzymol. 184:70
                             …


                             AVID_CHICK
                             AVR2_CHICK
                             AVR4_CHICK
                             AVR1_CHICK
                             AVR3_CHICK
                             AVR6_CHICK
                             AVR7_CHICK

                             P02701
                             P56732
                             P56734
                             O13153
                             P56733
                             P56735
                             P56736
BioView
   • overview on biological informations
   • taxonomic descriptive statistics

       a compact summary view on the biological information of
       a protein group is important especially when having a
       large dataset. This way it will be possible to observe,
       compare and count all common and dissimilar
       characteristics; it is also possible to analyze in every
       single detail of component with the same featuring.

- gene name, functional (catalytic activity, enzyme regulation, pathway…) and general
descriptive information;
- organism classification (OC) and organism species (OS);
- non-experimental qualifiers (by similarities, putative or probable).
Pipeline BioView page




             ID, AC, DE, CC:'FUNCTION', 'PATHWAY', 'CATALYTIC
             ACTIVITY', 'ENZYME REGULATION', 'SUBUNIT',
             'SIMILARITY', 'COFACTOR', 'DEVELOPMENTAL STAGE',
             'INDUCTION', 'PTM', 'SUBCELLULAR LOCALIZATION',
             'TISSUE SPECIFICITY'

OS, OC




   Eukaryota       -
   Viridiplantae   Eukaryota
   Streptophyta    Viridiplantae
   Embryophyta     Streptophyta
   Tracheophyta    Embryophyta
   ...             ...
Nuber of entries
                            Non-redundant annotation




       Number of entries with non-experimental qualifier




Number of entries with annotated experimental qualifier
On mouse-click the relevant entry names are listed




           Expande all the hierarchy
FeatureView

• Interactive interface for visualizing
  function-related features on the protein
  sequence and 3D structure

• This page should allow the user to analyze
  combined sequences-structure on a broad
  set of data showing the greatest number of
  information available in a clear and
  intuitive way.
Function-related features derived from the FT lines of
UniProtKB:

active sites, binding sites, domain, transmembrane
region, DNA binding domain…

are mapped on the alignment and highlighted to allow a
clear and compact presentation of the relevant
information. The characteristics are mapped on the
structure in the same way, allowing to identify regions
and conserved sites.

            Sequence      FT    Structure
FeatureView
• Choose the best structure

• Alignment

• Mapping the feature on the alignment and
  on the structure
Choose the best structure




     *
                                                                                        ...
                                                                                        '91   '   =>   ‘91',
                                                                                        '25   '   =>   ‘25',
                                                                                        '92   '   =>   ‘92',
                                                                                        '81   '   =>   ‘82',
                                                                                        '71   '   =>   ‘71',
                                                                                        '21   '   =>   ‘23',
                                                                                        '-'       =>   'x',
                                                                                        '61   '   =>   ‘61',
                                                                                        '37   '   =>   ‘37',
                                                                                        '68   '   =>   ‘68',
                                                                                        '50   '   =>   ‘50',
                                                                                        '18   '   =>   ‘15',
                                                                                        ...


F.P.A. David and Y.L. Yip. SSMap*: a new UniProt-PDB mapping resource for the curation of structural-related
information in the UniProt/Swiss-Prot Knowledgebase. Submitted
Jmol: an open-source Java viewer for chemical structures in 3D. http://www.jmol.org/
FeatureView
• Choose the best structure

• Alignment

• Mapping the feature on the alignment and
  on the structure
Alignment
                                    Input file




Edgar, Robert C. (2004), MUSCLE: multiple sequence alignment with high accuracy and
high throughput, Nucleic Acids Research 32(5), 1792-97.
FeatureView
• Choose the best structure
• Alignment
• Mapping the feature on the alignment
  and on the structure
Alignment

                             Input file




                     FT (Feature Table) lines
I group: ('CA_BIND', 'NP_BIND', 'MOTIF', 'ACT_SITE', 'METAL',
'BINDING', 'SITE', 'NON_STD', 'MOD_RES', 'LIPID', 'CARBOHYD',
'DISULFID', 'CROSSLINK');

II group: ('PEPTIDE', 'TOPO_DOM', 'TRANSMEM', 'DOMAIN',
'REPEAT', 'ZN_FING', 'DNA_BIND', 'REGION', 'COILED');
FT (Feature Table) lines
distinct font color and with a toolbox                     I group: ('CA_BIND', 'NP_BIND', 'MOTIF',
containing the description of the feature                  'ACT_SITE', 'METAL', 'BINDING', 'SITE',
(entry name, feature key, sequence position,               'NON_STD', 'MOD_RES', 'LIPID', 'CARBOHYD',
description)                                               'DISULFID', 'CROSSLINK');

                                                           II group: ('PEPTIDE', 'TOPO_DOM',
different background colour and a toolbox with the         'TRANSMEM', 'DOMAIN', 'REPEAT', 'ZN_FING',
content as described above.                                'DNA_BIND', 'REGION', 'COILED');


-overlapping into the first group represented in toolbox.
-ovelapping into the second group    different background color.
ATOM   1817   N     MET   B   3   -31.380   87.126   39.296   1.0   100.00
ATOM   1818   CA    MET   B   3   -30.684   88.400   39.176   1.0   100.00
ATOM   1819   C     MET   B   3   -30.858   88.967   37.771   1.0   100.00
ATOM   1820   O     MET   B   3   -30.195   88.514   36.832   1.0   100.00
ATOM   1821   CB    MET   B   3   -29.190   88.285   39.498   1.0   100.00
ATOM   1822   CG    MET   B   3   -28.465   89.628   39.501   1.0   100.00
ATOM   1823   SD    MET   B   3   -26.671   89.415   39.661   1.0   100.00
ATOM   1824   CE    MET   B   3   -26.312   90.705   40.863   1.0   100.00
ATOM   1825   N     GLU   B   4   -31.750   89.938   37.638   1.0   50.00
ATOM   1826   CA    GLU   B   4   -31.927   90.498   36.300   1.0   50.00
…      …      …     …     …   …   …         …        …        …     …




                                                                    100.00
                   Alignment position                                00.00
                                                                     50.00
On mouse-click run blastp on UniProt web page
On mouse-click start Jalview applet
Conservation
• Interactive interface for visualizing the
  structural conservation of protein groups
  on the protein sequence and 3D structure
• Highlight positions and regions conserved
  in the group of proteins
• Conservation scores are mapped on the
  multiple sequence alignment (MSA) and
  into the 3D-structure
Scoring residue conservation




Input file
Scoring methods
Method name     Type of score                      Description
basicmdm        Sum-of-Pairs (SP), matrix score    Simplest SP score possible
                                                   Normalized Shanon entropy with 7
entropynorm7    Entropic
                                                      symbol types
                                                   Normalized Shannon entropy with
entropynorm21   Entropic
                                                      21 symbol types.
                Entropic, matrix score, sequence
trident                                            Mixed model score.
                    weighted
                SP, matrix score, sequence         Score used in Valdar & Thornton
valdar01
                   weighted                           2001


                   0.000           #               ---S--------
                   0.000           #               ---T--------
                   0.000           #               ---S--------
                   0.000           #               ---T--------
                   0.000           #               ---S--------
                   0.024           #               ---TM-M-----
                   0.320           #               MMMSV-VVMM--
                   0.278           #               VVVDHMHHGGG-
                   0.500           #               LLLYLLWWLLL-
                   0.603           #               SSSSTTTSSSS-
                   0.391           #               PAAAPAAEDDD-
                   0.424           #               AAAAEEEVGGQT
                   0.809           #               DDDDEEEEEEEE
At the moment it is a framework integrated for the development
 of the visualization of info such as annotation and for the
 visualization of sites that differ in conservation between protein
 subgroups.


• develop a method to compare two or more protein subgroups
• profile




    Input file
Tree
The phylogenetic tree of the protein group
 will be shown in this page .
Software for phylogenetic tree visualization and manipulations
                  http://bioinfo.unice.fr/biodiv/Tree_editors.html




- Treedyn: works in local machine but not in server side (graphical applet needed)
- Phylodendron: trouble with cgi script
-phyfi: private program it is not possible to install on own server, eventually URL
request
-nexplorer: NEXUS format needed and it is not possible to install on own server
- dnd2svg.pl: strict sequence number – output only in SVG format
-TreeFam: only private program
   ATV 1.92
Input file


 Gascuel O.1997. BIONJ: an improved version of the NJ algorithm based on a
 simple model of sequence data. Molecular Biology and Evolution, 14:685-695.

                                       Tree in Newick format

((((ACADM_HUMAN:0.000925,ACADM_PANTR:0.003941):0.014922,ACADM_MACFA:0.021579):0.041621,((ACADM
_MOUSE:0.015113,ACADM_RAT:0.029420):0.051559,(ACADM_DROME:0.187088,((ACAD8_MOUSE:0.049728,ACAD
8_HUMAN:0.052753):0.013706,ACAD8_BOVIN:0.104627):1.146493):0.149078):0.010918):0.015504,ACADM_
PIG:0.057735,ACADM_BOVIN:0.023577);




http://www.phylosoft.org/atv/
Zmasek C.M. and Eddy S.R. (2001) ATV: display
and manipulation of annotated phylogenetic trees.
Bioinformatics, 17, 383-384.


                                              http://www.jalview.org/
                                              Clamp, M., Cuff, J., Searle, S. M. and
                                              Barton, G. J. (2004). The Jalview Java
                                              Alignment Editor. Bioinformatics, 20, 426-7
Future plans
• Normalize HTML pages according to the W3C standard
• Improve the use of CSS
• Test the application on different web browser
• Write the application in a server side language
• Integrate the application with other databases
• Ensuring multiple access to the application and analysis
  history
• Develop a view of phylogenetic tree to show and to
  interact with additional information
• Hierarchical phylogeny-based classification in UniProtKB
Following the hierarchical
phylogeny-based classification in
UniProtKB
Acknowledgements


• Brigitte Boeckmann & Rita Casadio
• Swiss-Prot lab, Biocomputing group
• Fabrice David & Marco Vassura
• Tutti i miei amici e Fra
• Dolores e Davide




And now?
practical examples

- identifysimilarity and differences between the proteins
sequences as well as the information available for the given
protein group;
- estimating the ranges, within which functional information
on proteins can be transferred from experimentally
characterized proteins to their homologs from poorly studied
organism;
- identify errors in the annotations of proteins;
Compact summary view on the biological information of a protein group is important
especially when having a large dataset. This way it will be possible to observe,
compare and count all common and dissimilar characteristics; it is also possible to
analyze in every single detail of component with the same featuring.

                       Acetylglutamate kinase family
Acyl-CoA dehydrogenase family
gatB/gatE family
IPP transferase family
Web-based application to survey properties of homologous proteins

Contenu connexe

Similaire à Web-based application to survey properties of homologous proteins

Next-generation sequencing - variation discovery
Next-generation sequencing - variation discoveryNext-generation sequencing - variation discovery
Next-generation sequencing - variation discoveryJan Aerts
 
ECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPsECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPsJan Aerts
 
Predikin and PredikinDB: tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificityPredikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB: tools to predict protein kinase peptide specificityNeil Saunders
 
Ben Rothke Getting A Handle On Wireless Security For Pci Dss Compliance
Ben Rothke   Getting A Handle On Wireless Security For Pci Dss ComplianceBen Rothke   Getting A Handle On Wireless Security For Pci Dss Compliance
Ben Rothke Getting A Handle On Wireless Security For Pci Dss ComplianceBen Rothke
 
Km Zonalito Feb 2010
Km Zonalito Feb 2010Km Zonalito Feb 2010
Km Zonalito Feb 2010guest69ba7b
 
Webinar - Getting a handle on wireless security for PCI DSS Compliance
Webinar - Getting a handle on wireless security for PCI DSS ComplianceWebinar - Getting a handle on wireless security for PCI DSS Compliance
Webinar - Getting a handle on wireless security for PCI DSS ComplianceBen Rothke
 
Gonzalo Miranda ENEI Valpo 2010
Gonzalo Miranda ENEI Valpo 2010Gonzalo Miranda ENEI Valpo 2010
Gonzalo Miranda ENEI Valpo 2010SOFOFAInnova
 
chapter 2 Java at rupp cambodia
chapter 2 Java at rupp cambodiachapter 2 Java at rupp cambodia
chapter 2 Java at rupp cambodiaSami Mut
 
LLVM Register Allocation (2nd Version)
LLVM Register Allocation (2nd Version)LLVM Register Allocation (2nd Version)
LLVM Register Allocation (2nd Version)Wang Hsiangkai
 
Pittsburgh Learning Classifier Systems for Protein Structure Prediction: Sca...
Pittsburgh Learning Classifier Systems for Protein  Structure Prediction: Sca...Pittsburgh Learning Classifier Systems for Protein  Structure Prediction: Sca...
Pittsburgh Learning Classifier Systems for Protein Structure Prediction: Sca...Xavier Llorà
 
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013Prof. Wim Van Criekinge
 
ThinkPad® T400 M R400
ThinkPad® T400 M R400ThinkPad® T400 M R400
ThinkPad® T400 M R400zcejzr
 
Optimizing Data Extracts from Oracle Clinical SAS Views
Optimizing Data Extracts from Oracle Clinical SAS ViewsOptimizing Data Extracts from Oracle Clinical SAS Views
Optimizing Data Extracts from Oracle Clinical SAS Viewsrajopadhye
 

Similaire à Web-based application to survey properties of homologous proteins (20)

Next-generation sequencing - variation discovery
Next-generation sequencing - variation discoveryNext-generation sequencing - variation discovery
Next-generation sequencing - variation discovery
 
ECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPsECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPs
 
Predikin and PredikinDB: tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificityPredikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB: tools to predict protein kinase peptide specificity
 
Ben Rothke Getting A Handle On Wireless Security For Pci Dss Compliance
Ben Rothke   Getting A Handle On Wireless Security For Pci Dss ComplianceBen Rothke   Getting A Handle On Wireless Security For Pci Dss Compliance
Ben Rothke Getting A Handle On Wireless Security For Pci Dss Compliance
 
Km Zonalito Feb 2010
Km Zonalito Feb 2010Km Zonalito Feb 2010
Km Zonalito Feb 2010
 
Molecular markers
Molecular markersMolecular markers
Molecular markers
 
Webinar - Getting a handle on wireless security for PCI DSS Compliance
Webinar - Getting a handle on wireless security for PCI DSS ComplianceWebinar - Getting a handle on wireless security for PCI DSS Compliance
Webinar - Getting a handle on wireless security for PCI DSS Compliance
 
Munish Virang Rp
Munish Virang RpMunish Virang Rp
Munish Virang Rp
 
W-Curve & Perl
W-Curve & PerlW-Curve & Perl
W-Curve & Perl
 
Gonzalo Miranda ENEI Valpo 2010
Gonzalo Miranda ENEI Valpo 2010Gonzalo Miranda ENEI Valpo 2010
Gonzalo Miranda ENEI Valpo 2010
 
DTEx 02042009
DTEx 02042009DTEx 02042009
DTEx 02042009
 
chapter 2 Java at rupp cambodia
chapter 2 Java at rupp cambodiachapter 2 Java at rupp cambodia
chapter 2 Java at rupp cambodia
 
RENIN1.ppt
RENIN1.pptRENIN1.ppt
RENIN1.ppt
 
LLVM Register Allocation (2nd Version)
LLVM Register Allocation (2nd Version)LLVM Register Allocation (2nd Version)
LLVM Register Allocation (2nd Version)
 
Pittsburgh Learning Classifier Systems for Protein Structure Prediction: Sca...
Pittsburgh Learning Classifier Systems for Protein  Structure Prediction: Sca...Pittsburgh Learning Classifier Systems for Protein  Structure Prediction: Sca...
Pittsburgh Learning Classifier Systems for Protein Structure Prediction: Sca...
 
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
 
ThinkPad® T400 M R400
ThinkPad® T400 M R400ThinkPad® T400 M R400
ThinkPad® T400 M R400
 
Comande oss
Comande ossComande oss
Comande oss
 
Barcelona sabatica
Barcelona sabaticaBarcelona sabatica
Barcelona sabatica
 
Optimizing Data Extracts from Oracle Clinical SAS Views
Optimizing Data Extracts from Oracle Clinical SAS ViewsOptimizing Data Extracts from Oracle Clinical SAS Views
Optimizing Data Extracts from Oracle Clinical SAS Views
 

Dernier

Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 

Dernier (20)

Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 

Web-based application to survey properties of homologous proteins

  • 1. Web-based application to survey properties of homologous proteins. proteins Candidato: Diego Poggioli Relatore: Prof. Rita Casadio Correlatore: Dr. Brigitte Boeckmann
  • 2. • Bio-problem: Visualization and interaction with biological data and performing a comparative protein analysis • Info-solution: Web application – CGI The portal gives access to four web pages: 1) Function-related annotation derived from UniProtKB/Swiss-Prot; 2) Feature of the protein group; 3) Conservation score; 4) Tree.
  • 3. Members of a protein family normally perform a general biochemical function in common, but one or more subgroups may evolve a slightly different function, such as different substrate specificity.
  • 4. By comparing groups and subgroups of proteins it is possible to identify or estimate: • similarity and differences between the proteins sequences as well as the information available for the given protein group; • the ranges, within which functional information on proteins can be transferred from experimentally characterized proteins to their homologs from poorly studied organism; • errors in the annotations of proteins;
  • 5. Visualization and interact with biological data
  • 6. Available from any PC System and browser independent php C GI Dinamic page HTML JavaScript, PHP, Perl, Python, Ajax, ASP, Ruby…
  • 7. ID AVID_CHICK Reviewed; 152 AA. Form filling and data type AC DT P02701; Q91958; Q98SH4; 21-JUL-1986, integrated into DT 11-SEP-2007, sequence version DT 10-JUN-2008, entry version 87. DE Avidin precursor. GN Name=AVD; OS Gallus gallus (Chicken). OC Eukaryota; Metazoa; Chordata OC Archosauria; Dinosauria OC Neognathae; Galliformes OX NCBI_TaxID=9031; RN [1] RP NUC RX MEDLINE=87203384; PubMed RA Gope M.L., Keinaenen R.A., RA Zarucki-Schulz T., O'Malley B. RT quot;Molecular cloning of the chic RL Nucleic Acids Res. 15:3595 RN [2] RP NUCLEOTIDE SEQUENCE [MR RX MEDLINE=90355928; PubMed RA Chandra G., Gray J.G.; RT quot;Cloning and expression of RL Methods Enzymol. 184:70 … AVID_CHICK AVR2_CHICK AVR4_CHICK AVR1_CHICK AVR3_CHICK AVR6_CHICK AVR7_CHICK P02701 P56732 P56734 O13153 P56733 P56735 P56736
  • 8.
  • 9. BioView • overview on biological informations • taxonomic descriptive statistics a compact summary view on the biological information of a protein group is important especially when having a large dataset. This way it will be possible to observe, compare and count all common and dissimilar characteristics; it is also possible to analyze in every single detail of component with the same featuring. - gene name, functional (catalytic activity, enzyme regulation, pathway…) and general descriptive information; - organism classification (OC) and organism species (OS); - non-experimental qualifiers (by similarities, putative or probable).
  • 10. Pipeline BioView page ID, AC, DE, CC:'FUNCTION', 'PATHWAY', 'CATALYTIC ACTIVITY', 'ENZYME REGULATION', 'SUBUNIT', 'SIMILARITY', 'COFACTOR', 'DEVELOPMENTAL STAGE', 'INDUCTION', 'PTM', 'SUBCELLULAR LOCALIZATION', 'TISSUE SPECIFICITY' OS, OC Eukaryota - Viridiplantae Eukaryota Streptophyta Viridiplantae Embryophyta Streptophyta Tracheophyta Embryophyta ... ...
  • 11. Nuber of entries Non-redundant annotation Number of entries with non-experimental qualifier Number of entries with annotated experimental qualifier
  • 12. On mouse-click the relevant entry names are listed Expande all the hierarchy
  • 13.
  • 14. FeatureView • Interactive interface for visualizing function-related features on the protein sequence and 3D structure • This page should allow the user to analyze combined sequences-structure on a broad set of data showing the greatest number of information available in a clear and intuitive way.
  • 15. Function-related features derived from the FT lines of UniProtKB: active sites, binding sites, domain, transmembrane region, DNA binding domain… are mapped on the alignment and highlighted to allow a clear and compact presentation of the relevant information. The characteristics are mapped on the structure in the same way, allowing to identify regions and conserved sites. Sequence FT Structure
  • 16. FeatureView • Choose the best structure • Alignment • Mapping the feature on the alignment and on the structure
  • 17. Choose the best structure * ... '91 ' => ‘91', '25 ' => ‘25', '92 ' => ‘92', '81 ' => ‘82', '71 ' => ‘71', '21 ' => ‘23', '-' => 'x', '61 ' => ‘61', '37 ' => ‘37', '68 ' => ‘68', '50 ' => ‘50', '18 ' => ‘15', ... F.P.A. David and Y.L. Yip. SSMap*: a new UniProt-PDB mapping resource for the curation of structural-related information in the UniProt/Swiss-Prot Knowledgebase. Submitted
  • 18. Jmol: an open-source Java viewer for chemical structures in 3D. http://www.jmol.org/
  • 19.
  • 20.
  • 21.
  • 22. FeatureView • Choose the best structure • Alignment • Mapping the feature on the alignment and on the structure
  • 23. Alignment Input file Edgar, Robert C. (2004), MUSCLE: multiple sequence alignment with high accuracy and high throughput, Nucleic Acids Research 32(5), 1792-97.
  • 24. FeatureView • Choose the best structure • Alignment • Mapping the feature on the alignment and on the structure
  • 25. Alignment Input file FT (Feature Table) lines I group: ('CA_BIND', 'NP_BIND', 'MOTIF', 'ACT_SITE', 'METAL', 'BINDING', 'SITE', 'NON_STD', 'MOD_RES', 'LIPID', 'CARBOHYD', 'DISULFID', 'CROSSLINK'); II group: ('PEPTIDE', 'TOPO_DOM', 'TRANSMEM', 'DOMAIN', 'REPEAT', 'ZN_FING', 'DNA_BIND', 'REGION', 'COILED');
  • 26. FT (Feature Table) lines distinct font color and with a toolbox I group: ('CA_BIND', 'NP_BIND', 'MOTIF', containing the description of the feature 'ACT_SITE', 'METAL', 'BINDING', 'SITE', (entry name, feature key, sequence position, 'NON_STD', 'MOD_RES', 'LIPID', 'CARBOHYD', description) 'DISULFID', 'CROSSLINK'); II group: ('PEPTIDE', 'TOPO_DOM', different background colour and a toolbox with the 'TRANSMEM', 'DOMAIN', 'REPEAT', 'ZN_FING', content as described above. 'DNA_BIND', 'REGION', 'COILED'); -overlapping into the first group represented in toolbox. -ovelapping into the second group different background color.
  • 27. ATOM 1817 N MET B 3 -31.380 87.126 39.296 1.0 100.00 ATOM 1818 CA MET B 3 -30.684 88.400 39.176 1.0 100.00 ATOM 1819 C MET B 3 -30.858 88.967 37.771 1.0 100.00 ATOM 1820 O MET B 3 -30.195 88.514 36.832 1.0 100.00 ATOM 1821 CB MET B 3 -29.190 88.285 39.498 1.0 100.00 ATOM 1822 CG MET B 3 -28.465 89.628 39.501 1.0 100.00 ATOM 1823 SD MET B 3 -26.671 89.415 39.661 1.0 100.00 ATOM 1824 CE MET B 3 -26.312 90.705 40.863 1.0 100.00 ATOM 1825 N GLU B 4 -31.750 89.938 37.638 1.0 50.00 ATOM 1826 CA GLU B 4 -31.927 90.498 36.300 1.0 50.00 … … … … … … … … … … … 100.00 Alignment position 00.00 50.00
  • 28. On mouse-click run blastp on UniProt web page
  • 29. On mouse-click start Jalview applet
  • 30. Conservation • Interactive interface for visualizing the structural conservation of protein groups on the protein sequence and 3D structure • Highlight positions and regions conserved in the group of proteins • Conservation scores are mapped on the multiple sequence alignment (MSA) and into the 3D-structure
  • 32. Scoring methods Method name Type of score Description basicmdm Sum-of-Pairs (SP), matrix score Simplest SP score possible Normalized Shanon entropy with 7 entropynorm7 Entropic symbol types Normalized Shannon entropy with entropynorm21 Entropic 21 symbol types. Entropic, matrix score, sequence trident Mixed model score. weighted SP, matrix score, sequence Score used in Valdar & Thornton valdar01 weighted 2001 0.000 # ---S-------- 0.000 # ---T-------- 0.000 # ---S-------- 0.000 # ---T-------- 0.000 # ---S-------- 0.024 # ---TM-M----- 0.320 # MMMSV-VVMM-- 0.278 # VVVDHMHHGGG- 0.500 # LLLYLLWWLLL- 0.603 # SSSSTTTSSSS- 0.391 # PAAAPAAEDDD- 0.424 # AAAAEEEVGGQT 0.809 # DDDDEEEEEEEE
  • 33.
  • 34.
  • 35. At the moment it is a framework integrated for the development of the visualization of info such as annotation and for the visualization of sites that differ in conservation between protein subgroups. • develop a method to compare two or more protein subgroups • profile Input file
  • 36. Tree The phylogenetic tree of the protein group will be shown in this page .
  • 37. Software for phylogenetic tree visualization and manipulations http://bioinfo.unice.fr/biodiv/Tree_editors.html - Treedyn: works in local machine but not in server side (graphical applet needed) - Phylodendron: trouble with cgi script -phyfi: private program it is not possible to install on own server, eventually URL request -nexplorer: NEXUS format needed and it is not possible to install on own server - dnd2svg.pl: strict sequence number – output only in SVG format -TreeFam: only private program ATV 1.92
  • 38. Input file Gascuel O.1997. BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Molecular Biology and Evolution, 14:685-695. Tree in Newick format ((((ACADM_HUMAN:0.000925,ACADM_PANTR:0.003941):0.014922,ACADM_MACFA:0.021579):0.041621,((ACADM _MOUSE:0.015113,ACADM_RAT:0.029420):0.051559,(ACADM_DROME:0.187088,((ACAD8_MOUSE:0.049728,ACAD 8_HUMAN:0.052753):0.013706,ACAD8_BOVIN:0.104627):1.146493):0.149078):0.010918):0.015504,ACADM_ PIG:0.057735,ACADM_BOVIN:0.023577); http://www.phylosoft.org/atv/ Zmasek C.M. and Eddy S.R. (2001) ATV: display and manipulation of annotated phylogenetic trees. Bioinformatics, 17, 383-384. http://www.jalview.org/ Clamp, M., Cuff, J., Searle, S. M. and Barton, G. J. (2004). The Jalview Java Alignment Editor. Bioinformatics, 20, 426-7
  • 39.
  • 40. Future plans • Normalize HTML pages according to the W3C standard • Improve the use of CSS • Test the application on different web browser • Write the application in a server side language • Integrate the application with other databases • Ensuring multiple access to the application and analysis history • Develop a view of phylogenetic tree to show and to interact with additional information • Hierarchical phylogeny-based classification in UniProtKB
  • 41. Following the hierarchical phylogeny-based classification in UniProtKB
  • 42.
  • 43. Acknowledgements • Brigitte Boeckmann & Rita Casadio • Swiss-Prot lab, Biocomputing group • Fabrice David & Marco Vassura • Tutti i miei amici e Fra • Dolores e Davide And now?
  • 44. practical examples - identifysimilarity and differences between the proteins sequences as well as the information available for the given protein group; - estimating the ranges, within which functional information on proteins can be transferred from experimentally characterized proteins to their homologs from poorly studied organism; - identify errors in the annotations of proteins;
  • 45. Compact summary view on the biological information of a protein group is important especially when having a large dataset. This way it will be possible to observe, compare and count all common and dissimilar characteristics; it is also possible to analyze in every single detail of component with the same featuring. Acetylglutamate kinase family
  • 47.
  • 48.
  • 49.
  • 50.