SlideShare une entreprise Scribd logo
1  sur  40
The annotation of Plant Proteins in
           UniProtKB
                     Michel Schneider

     Plant protein annotation program, Swiss-Prot group
               Swiss Institute of Bioinformatics
                     Geneva, Switzerland
                 Michel.Schneider@isb-sib.ch
1. The UniProt consortium and its products

2. Content of an entry in UniProtKB and manual curation

3. Complete proteomes and reference proteomes

4. Synchronization between UniProtKB and TAIR

5. Some statistics




        “Pioneers at the Heart of Science” 1998 – 2008
                         PAG XX, San Diego, January 15, 2012
The UniProt consortium




     “Pioneers at the Heart of Science” 1998 – 2008
                      PAG XX, San Diego, January 15, 2012
The missions of the UniProt consortium
Provide the scientific community with a resource of protein
sequence and functional annotation which has to be …


 comprehensive

 high quality

 and freely accessible


         “Pioneers at the Heart of Science” 1998 – 2008
                          PAG XX, San Diego, January 15, 2012
Four components to fulfill specific demands
                                   UniProtKB
                             Protein Knowledgebase
      UniRef
                              UniProtKB/Swiss-Prot                      UniMes
 Sequence clusters
                                   Reviewed                        Metagenomic and
    UniRef100
                                    (533’657 entries)
     UniRef90                                                        environmental
                       Manual curation                             sample sequences
     UniRef50
                                UniProtKB/Trembl
                                  Unreviewed
                                   (19 million entries)

                Automated annotation

      UniParc – Sequence archive contains current and obsolete sequences
                               (29.6 million sequences)

            “Pioneers at the Heart of Science” 1998 – 2008
                             PAG XX, San Diego, January 15, 2012
UniProtKB, the expertly curated
component of UniProt


 The high-quality curated protein knowledge database

     where data becomes structured knowledge




       “Pioneers at the Heart of Science” 1998 – 2008
                        PAG XX, San Diego, January 15, 2012
UniProtKB, the expertly curated
component of UniProt




                                                  Shigeo Fukuda
     “Pioneers at the Heart of Science” 1998 – 2008
                      PAG XX, San Diego, January 15, 2012
Protein sequence
             One gene - One species




© 2009 SIB
Protein and gene names
         Taxonomic information




                                   Protein sequence
                                  One gene - One species




© 2009 SIB
Protein and gene names
         Taxonomic information




                                                                Sequence annotation:
                                                            PTMs, alternative splicing products,
                                   Protein sequence        mutagenesis, transmembrane domains,
                                  One gene - One species              signal peptide…




© 2009 SIB
Protein and gene names
                                                                    General annotation:
         Taxonomic information                                  Function, Subcellular location,
                                                                       Catalytic activity,
                                                           Tissue specificity, Disruption phenotype…




                                                                                   Sequence annotation:
                                                                               PTMs, alternative splicing products,
                                   Protein sequence                           mutagenesis, transmembrane domains,
                                  One gene - One species                                 signal peptide…




© 2009 SIB
Protein and gene names
                                                                    General annotation:
         Taxonomic information                                  Function, Subcellular location,
                                                                       Catalytic activity,
                                                           Tissue specificity, Disruption phenotype…




                                                                                   Sequence annotation:
             References                                                        PTMs, alternative splicing products,
                                   Protein sequence                           mutagenesis, transmembrane domains,
                                  One gene - One species                                 signal peptide…




© 2009 SIB
Protein and gene names
                                                                    General annotation:
         Taxonomic information                                  Function, Subcellular location,
                                                                       Catalytic activity,
                                                           Tissue specificity, Disruption phenotype…




                                                                                   Sequence annotation:
             References                                                        PTMs, alternative splicing products,
                                   Protein sequence                           mutagenesis, transmembrane domains,
                                  One gene - One species                                 signal peptide…




                                                                                              Keywords
                                                                                                  -
                                                                                            Gene Ontology




© 2009 SIB
Protein and gene names
                                                                    General annotation:
         Taxonomic information                                  Function, Subcellular location,
                                                                       Catalytic activity,
                                                           Tissue specificity, Disruption phenotype…




                                                                                   Sequence annotation:
             References                                                        PTMs, alternative splicing products,
                                   Protein sequence                           mutagenesis, transmembrane domains,
                                  One gene - One species                                 signal peptide…




                                                                                              Keywords
   Cross-references                                                                               -
                                                                                            Gene Ontology
     (~ 130 databases)




© 2009 SIB
Origin of the sequences in UniProtKB


 International Nucleotide Sequence Database Collection
  (INSDC)
 Ensembl or EnsemblGenomes
 RefSeq
 Direct submissions (protein sequences)
 Literature
 Protein Data Bank


        “Pioneers at the Heart of Science” 1998 – 2008
                         PAG XX, San Diego, January 15, 2012
The process of manual sequence curation
    1. Select entry/gene (priorities)

    2. Identify entries from same gene and homologs
       using BLAST against UniProtKB

    3. Merge entries from the same gene and same
       species into a single record

    4. Select a canonical sequence


        “Pioneers at the Heart of Science” 1998 – 2008
                         PAG XX, San Diego, January 15, 2012
Critical analysis and report of sequence discrepancies
QPCT_ARATH (Q84WV9) Glutaminyl-peptide cyclotransferase (At4g25720)




               “Pioneers at the Heart of Science” 1998 – 2008
                                PAG XX, San Diego, January 15, 2012
Critical analysis and report of sequence discrepancies
QPCT_ARATH (Q84WV9) Glutaminyl-peptide cyclotransferase (At4g25720)




               “Pioneers at the Heart of Science” 1998 – 2008
                                PAG XX, San Diego, January 15, 2012
“Pioneers at the Heart of Science” 1998 – 2008
                 PAG XX, San Diego, January 15, 2012
Literature-based curation
 Identify relevant papers through searching literature
  databases




 Read full text of papers and extract and summarize
  relevant information




        “Pioneers at the Heart of Science” 1998 – 2008
                         PAG XX, San Diego, January 15, 2012
Literature-based curation




     “Pioneers at the Heart of Science” 1998 – 2008
                      PAG XX, San Diego, January 15, 2012
Literature-based curation




     “Pioneers at the Heart of Science” 1998 – 2008
                      PAG XX, San Diego, January 15, 2012
Literature-based curation




     “Pioneers at the Heart of Science” 1998 – 2008
                      PAG XX, San Diego, January 15, 2012
Controlled vocabularies
• Keywords provide a summary of the entry content
• We annotate using the Gene Ontology (GO)




      “Pioneers at the Heart of Science” 1998 – 2008
                       PAG XX, San Diego, January 15, 2012
UniProtKB, complete proteome
sequence sets
  • Genome completely sequenced

  • Proteins mapped to the genome

  2’902 complete proteomes

  Fully manually reviewed (e.g. S. cerevisiae)
  Partially manually reviewed (e.g. A. thaliana)
  Unreviewed (e.g. Chlorella variabilis)
       “Pioneers at the Heart of Science” 1998 – 2008
                        PAG XX, San Diego, January 15, 2012
UniProtKB, reference proteome
sequence sets
A reference proteome is the complete proteome of a
representative, well-studied model organism or an organism
of interest for biomedical research.

509 reference proteomes




       “Pioneers at the Heart of Science” 1998 – 2008
                        PAG XX, San Diego, January 15, 2012
UniProtKB, complete proteome
sequence sets




    “Pioneers at the Heart of Science” 1998 – 2008
                     PAG XX, San Diego, January 15, 2012
Arabidopsis thaliana



The building of the complete proteome sequence set:

• Based on the re-annotation of complete genome by TAIR:

  27’416 protein coding genes



       “Pioneers at the Heart of Science” 1998 – 2008
                        PAG XX, San Diego, January 15, 2012
UniProtKB – TAIR synchronization
   cDNAs, ESTs,
   genomic sequences


                                        Nucleic acid
                                         databases

    UniProtKB/TrEMBL
       Unreviewed
       (40’574 entries)



   UniProtKB/Swiss-Prot
        Reviewed
       (10’340 entries)


release 2011_03 - Mar 08, 2011



                       “Pioneers at the Heart of Science” 1998 – 2008
                                        PAG XX, San Diego, January 15, 2012
UniProtKB – TAIR synchronization
cDNAs, ESTs,
genomic sequences                                                       Genome re-annotation
                                                                         35’386 gene products

                                  Nucleic acid
                                   databases

UniProtKB/TrEMBL                                                        Temporary TrEMBL set
                                                                            33’341 entries
   Unreviewed
   (40’574 entries)



UniProtKB/Swiss-Prot
     Reviewed
   (10’340 entries)




                 “Pioneers at the Heart of Science” 1998 – 2008
                                  PAG XX, San Diego, January 15, 2012
UniProtKB – TAIR synchronization
cDNAs, ESTs,
genomic sequences                                                       Genome re-annotation
                                                                         35’386 gene products

                                  Nucleic acid
                                   databases

UniProtKB/TrEMBL                                                        Temporary TrEMBL set
                                                                             33’341 entries
   Unreviewed
   (40’574 entries)
                                                          11’508 sequences

UniProtKB/Swiss-Prot        Compare translations from the same gene, merge if 100 %
                              identical, report sequence discrepancies, align with
     Reviewed
   (10’340 entries)
                                             orthologs and paralogs




                 “Pioneers at the Heart of Science” 1998 – 2008
                                  PAG XX, San Diego, January 15, 2012
UniProtKB – TAIR synchronization
cDNAs, ESTs,
genomic sequences                                                      Genome re-annotation


                                 Nucleic acid
                                  databases

UniProtKB/TrEMBL                                                       Temporary TrEMBL set
   Unreviewed



UniProtKB/Swiss-Prot       Compare translations from the same gene, merge if 100 %
                             identical, report sequence discrepancies, align with
     Reviewed
                                            orthologs and paralogs
                                                                                  Feedback to TAIR
                                                                                      90 gene models


       correct gene models or add new isoforms
           283 corrections at the Heart of Science” 1998 – 2008
                “Pioneers
                                 PAG XX, San Diego, January 15, 2012
UniProtKB – TAIR synchronization
cDNAs, ESTs,
genomic sequences                                                     Genome re-annotation


                                Nucleic acid
                                 databases

UniProtKB/TrEMBL                                                      Temporary TrEMBL set
   Unreviewed



                                   Cleaned set of new TrEMBL entries
UniProtKB/Swiss-Prot
                                                (21’656 entries)
     Reviewed




               “Pioneers at the Heart of Science” 1998 – 2008
                                PAG XX, San Diego, January 15, 2012
UniProtKB – TAIR synchronization
    cDNAs, ESTs,
    genomic sequences                                                           Genome re-annotation


                                          Nucleic acid
                                           databases

    UniProtKB/TrEMBL                                                            Temporary TrEMBL set
       Unreviewed
       (44’628 entries)


                                             Cleaned set of new TrEMBL entries
   UniProtKB/Swiss-Prot
                                                          (21’656 entries)
        Reviewed
                                                              +
        (10’875 entries)
                                                    UniProtKB/Swiss-Prot
                                                  Reviewed (10’865 entries)
release 2011_12 - Dec 14, 2011

                                            Arabidopsis thaliana, cv. Columbia
                                            Complete proteome: 32’521 entries
                        “Pioneers at the Heart of Science” 1998 – 2008
                                          PAG XX, San Diego, January 15, 2012
1001 Arabidopsis genomes

• Deposited to INSDC ?

• Fully Annotated ? With CDS ?

• Should we still merge all the identical sequences together?

• If they are not merged but kept separate, how to get
  relevant Blast results?


        “Pioneers at the Heart of Science” 1998 – 2008
                         PAG XX, San Diego, January 15, 2012
Some UniProtKB/Swiss-Prot Statistics
concerning plant entries
(UniProt release 2011_12 - Dec 14, 2011)


• 31,959 entries of Viridiplantae
• from 1,924 species
• 10’875 entries from Arabidopsis thaliana (with 1,219 isoforms)
• 2,823 entries from Oryza sativa sp. Japonica
• 11,897 plant entries with an EC number
• 966 different complete EC numbers
• 5,744 putative transporters or proteins involved in transport
           “Pioneers at the Heart of Science” 1998 – 2008
                              PAG XX, San Diego, January 15, 2012
Summary
UniProtKB/Swiss-Prot, the manually curated knowledgebase:

• Protein sequence database covering all kingdoms of life (533’657
  sequence entries; 12’664 species)
• Manually annotated
• Non-redundant: all products of one gene in one species in a single entry
• Highly cross-referenced (links to ~130 databases).

Plant protein annotation:

• Complete proteome for Arabidopsis thaliana

• Synchronization with TAIR

         “Pioneers at the Heart of Science” 1998 – 2008
                            PAG XX, San Diego, January 15, 2012
We need your feedback and your collaboration !

                   help@uniprot.org




      “Pioneers at the Heart of Science” 1998 – 2008
                       PAG XX, San Diego, January 15, 2012
Acknowledgements
SIB
Ioannis Xenarios, Lydie Bougueleret, Andrea Auchincloss, Kristian Axelsen, Delphine Baratin, Marie-Claude Blatter,
Brigitte Boeckmann, Jerven Bolleman, Laurent Bollondi, Emmanuel Boutet, Lionel Breuza, Alan Bridge, Edouard de
Castro, Lorenzo Cerutti, Elisabeth Coudert, Béatrice Cuche, Mikael Doche, Dolnide Dornevil, Severine Duvaud, Anne
Estreicher, Livia Famiglietti, Marc Feuermann, Sebastien Gehant, Elisabeth Gasteiger, Vivienne Gerritsen, Arnaud Gos,
Nadine Gruaz-Gumowski, Ursula Hinz, Chantal Hulo, Nicolas Hulo, Janet James, Florence Jungo, Guillaume Keller,
Vicente Lara, Philippe Lemercier, Damien Lieberherr, Xavier Martin, Patrick Masson, Anne Morgat, Salvo Paesano, Ivo
Pedruzzi, Sandrine Pilbout, Sylvain Poux, Monica Pozzato, Manuela Pruess, Nicole Redaschi, Catherine Rivoire, Bernd
Roechert, Michel Schneider, Christian Sigrist, Karin Sonesson, Sylvie Staehli, Eleanor Stanley, André Stutz, Shyamala
Sundaram, Michael Tognolli, Laure Verbregue and Anne-Lise Veuthey

EBI
Rolf Apweiler, Maria Jesus Martin, Claire O'Donovan, Michele Magrane, Yasmin Alam-Faruque, Ricardo Antunes,
Benoit Bely, Mark Bingley, David Binns, Lawrence Bower, Wei Mun Chan, Emily Dimmer, Francesco Fazzini, Alexander
Fedotov, John Garavelli, Leyla Garcia Castro, Rachael Huntley, Julius Jacobsen, Michael Kleen, Duncan Legge, Wudong
Liu, Jie Luo, Sandra Orchard, Samuel Patient, Klemens Pichler, Diego Poggioli, Nikolas Pontikos, Steven Rosanoff, Tony
Sawford, Harminder Sehra, Edward Turner, Matt Corbett, Mike Donnelly and Pieter van Rensburg

PIR
Cathy H. Wu, Cecilia N. Arighi, Leslie Arminski, Winona C. Barker, Chuming Chen, Yongxing Chen, Pratibha Dubey,
Hongzhan Huang, Kati Laiho, Raja Mazumder, Peter McGarvey, Darren A. Natale, Thanemozhi G. Natarajan, Jules
Nchoutmboube, Natalia V. Roberts, Baris E. Suzek, Uzoamaka Ugochukwu, C. R. Vinayaka, Qinghua Wang, Yuqi Wang,
Lai-Su Yeh and Jian Zhang




                                      www.uniprot.org
UniProt is mainly supported by the National Institutes of
Health (NIH) grant 1 U41 HG006104-01. Additional support for
the EBI's involvement in UniProt comes from the NIH grant
2P41 HG02273-07. Swiss-Prot activities at the SIB are
supported by the Swiss Federal Government through the
Federal Office of Education and Science and the European
Commission contracts SLING (226073), Gen2Phen (200754)
and MICROME (222886). PIR activities are also supported by
the NIH grants 5R01GM080646-04, 3R01GM080646-04S2,
1G08LM010720-01, and 3P20RR016472-09S2, and NSF grant
DBI-0850319.



       “Pioneers at the Heart of Science” 1998 – 2008
                        PAG XX, San Diego, January 15, 2012

Contenu connexe

En vedette

European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...
European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...
European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...ExternalEvents
 
GenBank Coding Sequences
GenBank Coding SequencesGenBank Coding Sequences
GenBank Coding SequencesBenoit Leclerc
 
Science Big, Science Connected
Science Big, Science ConnectedScience Big, Science Connected
Science Big, Science ConnectedDeepak Singh
 
UniProtKB/Swiss-Prot:Why sparql?
UniProtKB/Swiss-Prot:Why sparql?UniProtKB/Swiss-Prot:Why sparql?
UniProtKB/Swiss-Prot:Why sparql?Jerven Bolleman
 
Types of PCR ((APEH Daniel O.))
Types of  PCR ((APEH Daniel O.))Types of  PCR ((APEH Daniel O.))
Types of PCR ((APEH Daniel O.))Daniel Apeh
 
Types of pcr
Types of pcr Types of pcr
Types of pcr Asma Gul
 
PCR types and applications
PCR types and applicationsPCR types and applications
PCR types and applicationsKarthi Kumar
 
Nucleic Acid Sequence databases
Nucleic Acid Sequence databasesNucleic Acid Sequence databases
Nucleic Acid Sequence databasesPranavathiyani G
 
ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001
ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001
ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001Zohaib HUSSAIN
 
Site directed mutagenesis
Site directed mutagenesisSite directed mutagenesis
Site directed mutagenesisArunima Sur
 
databases in bioinformatics
databases in bioinformaticsdatabases in bioinformatics
databases in bioinformaticsnadeem akhter
 
PCR, Real Time PCR
PCR, Real Time PCRPCR, Real Time PCR
PCR, Real Time PCRdineshnbagr
 

En vedette (20)

EMBL-EBI
EMBL-EBIEMBL-EBI
EMBL-EBI
 
European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...
European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...
European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...
 
Protein Data Bank
Protein Data BankProtein Data Bank
Protein Data Bank
 
Biological databases
Biological databasesBiological databases
Biological databases
 
GenBank Coding Sequences
GenBank Coding SequencesGenBank Coding Sequences
GenBank Coding Sequences
 
Science Big, Science Connected
Science Big, Science ConnectedScience Big, Science Connected
Science Big, Science Connected
 
UniProtKB/Swiss-Prot:Why sparql?
UniProtKB/Swiss-Prot:Why sparql?UniProtKB/Swiss-Prot:Why sparql?
UniProtKB/Swiss-Prot:Why sparql?
 
Types of PCR ((APEH Daniel O.))
Types of  PCR ((APEH Daniel O.))Types of  PCR ((APEH Daniel O.))
Types of PCR ((APEH Daniel O.))
 
Types of pcr
Types of pcr Types of pcr
Types of pcr
 
Site directed mutagenesis by pcr
Site directed mutagenesis by pcrSite directed mutagenesis by pcr
Site directed mutagenesis by pcr
 
PCR types and applications
PCR types and applicationsPCR types and applications
PCR types and applications
 
Nucleic Acid Sequence databases
Nucleic Acid Sequence databasesNucleic Acid Sequence databases
Nucleic Acid Sequence databases
 
ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001
ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001
ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001
 
PCR
PCRPCR
PCR
 
Gene silencing last
Gene silencing lastGene silencing last
Gene silencing last
 
Real time PCR
Real time PCRReal time PCR
Real time PCR
 
Gene silencing
Gene silencing Gene silencing
Gene silencing
 
Site directed mutagenesis
Site directed mutagenesisSite directed mutagenesis
Site directed mutagenesis
 
databases in bioinformatics
databases in bioinformaticsdatabases in bioinformatics
databases in bioinformatics
 
PCR, Real Time PCR
PCR, Real Time PCRPCR, Real Time PCR
PCR, Real Time PCR
 

Similaire à The annotation of plant proteins in UniProtKB

Unison: Enabling easy, rapid, and comprehensive proteomic mining
Unison: Enabling easy, rapid, and comprehensive proteomic miningUnison: Enabling easy, rapid, and comprehensive proteomic mining
Unison: Enabling easy, rapid, and comprehensive proteomic miningReece Hart
 
Biopharma Solution
Biopharma SolutionBiopharma Solution
Biopharma SolutionSujin Prabhu
 
Bairoch ISB closing-talk: CALIPHO
Bairoch ISB closing-talk: CALIPHOBairoch ISB closing-talk: CALIPHO
Bairoch ISB closing-talk: CALIPHOPascale Gaudet
 
Proteomics course 1
Proteomics course 1Proteomics course 1
Proteomics course 1utpaltatu
 
Stephen Friend Fanconi Anemia Research Fund 2012-01-21
Stephen Friend Fanconi Anemia Research Fund 2012-01-21Stephen Friend Fanconi Anemia Research Fund 2012-01-21
Stephen Friend Fanconi Anemia Research Fund 2012-01-21Sage Base
 
Omics in plant breeding
Omics in plant breedingOmics in plant breeding
Omics in plant breedingpoornimakn04
 
Identification of pathological mutations from the single-gene case to exome p...
Identification of pathological mutations from the single-gene case to exome p...Identification of pathological mutations from the single-gene case to exome p...
Identification of pathological mutations from the single-gene case to exome p...Vall d'Hebron Institute of Research (VHIR)
 
Specificity Assessment At Santaris Pharma
Specificity Assessment At Santaris PharmaSpecificity Assessment At Santaris Pharma
Specificity Assessment At Santaris PharmaMorten Lindow
 
Protein function and bioinformatics
Protein function and bioinformaticsProtein function and bioinformatics
Protein function and bioinformaticsNeil Saunders
 
Molecular quantitative genetics for plant breeding roundtable 2010x
Molecular quantitative genetics for plant breeding roundtable 2010xMolecular quantitative genetics for plant breeding roundtable 2010x
Molecular quantitative genetics for plant breeding roundtable 2010xFOODCROPS
 
Genomics and proteomics II
Genomics and proteomics IIGenomics and proteomics II
Genomics and proteomics IINikolay Vyahhi
 
Selection of Safer and More Effective Anti-inflammatory Kinase Inhibitors usi...
Selection of Safer and More Effective Anti-inflammatory Kinase Inhibitors usi...Selection of Safer and More Effective Anti-inflammatory Kinase Inhibitors usi...
Selection of Safer and More Effective Anti-inflammatory Kinase Inhibitors usi...BioMAP® Systems
 
The Phenoscape Knowledgebase
The Phenoscape KnowledgebaseThe Phenoscape Knowledgebase
The Phenoscape Knowledgebasebalhoff
 
Reference Data Integration: A Strategy for the Future
Reference Data Integration: A Strategy for the FutureReference Data Integration: A Strategy for the Future
Reference Data Integration: A Strategy for the FutureBarry Smith
 
Experimentos de nubes científicas: Medical Genome Project
Experimentos de nubes científicas: Medical Genome ProjectExperimentos de nubes científicas: Medical Genome Project
Experimentos de nubes científicas: Medical Genome ProjectFundación Ramón Areces
 
Computational Protein Design. 1. Challenges in Protein Engineering
Computational Protein Design. 1. Challenges in Protein EngineeringComputational Protein Design. 1. Challenges in Protein Engineering
Computational Protein Design. 1. Challenges in Protein EngineeringPablo Carbonell
 
Proteomics in VSC for crop improvement programme
Proteomics in VSC for crop improvement programmeProteomics in VSC for crop improvement programme
Proteomics in VSC for crop improvement programmeSumanthBT1
 

Similaire à The annotation of plant proteins in UniProtKB (20)

Unison: Enabling easy, rapid, and comprehensive proteomic mining
Unison: Enabling easy, rapid, and comprehensive proteomic miningUnison: Enabling easy, rapid, and comprehensive proteomic mining
Unison: Enabling easy, rapid, and comprehensive proteomic mining
 
Biopharmaceutical
BiopharmaceuticalBiopharmaceutical
Biopharmaceutical
 
Biopharma Solution
Biopharma SolutionBiopharma Solution
Biopharma Solution
 
Bairoch ISB closing-talk: CALIPHO
Bairoch ISB closing-talk: CALIPHOBairoch ISB closing-talk: CALIPHO
Bairoch ISB closing-talk: CALIPHO
 
Proteomics course 1
Proteomics course 1Proteomics course 1
Proteomics course 1
 
Stephen Friend Fanconi Anemia Research Fund 2012-01-21
Stephen Friend Fanconi Anemia Research Fund 2012-01-21Stephen Friend Fanconi Anemia Research Fund 2012-01-21
Stephen Friend Fanconi Anemia Research Fund 2012-01-21
 
Omics in plant breeding
Omics in plant breedingOmics in plant breeding
Omics in plant breeding
 
Identification of pathological mutations from the single-gene case to exome p...
Identification of pathological mutations from the single-gene case to exome p...Identification of pathological mutations from the single-gene case to exome p...
Identification of pathological mutations from the single-gene case to exome p...
 
Specificity Assessment At Santaris Pharma
Specificity Assessment At Santaris PharmaSpecificity Assessment At Santaris Pharma
Specificity Assessment At Santaris Pharma
 
Protein function and bioinformatics
Protein function and bioinformaticsProtein function and bioinformatics
Protein function and bioinformatics
 
Molecular quantitative genetics for plant breeding roundtable 2010x
Molecular quantitative genetics for plant breeding roundtable 2010xMolecular quantitative genetics for plant breeding roundtable 2010x
Molecular quantitative genetics for plant breeding roundtable 2010x
 
Genomics and proteomics II
Genomics and proteomics IIGenomics and proteomics II
Genomics and proteomics II
 
Selection of Safer and More Effective Anti-inflammatory Kinase Inhibitors usi...
Selection of Safer and More Effective Anti-inflammatory Kinase Inhibitors usi...Selection of Safer and More Effective Anti-inflammatory Kinase Inhibitors usi...
Selection of Safer and More Effective Anti-inflammatory Kinase Inhibitors usi...
 
Surp09 Signaling
Surp09 SignalingSurp09 Signaling
Surp09 Signaling
 
The Phenoscape Knowledgebase
The Phenoscape KnowledgebaseThe Phenoscape Knowledgebase
The Phenoscape Knowledgebase
 
Reference Data Integration: A Strategy for the Future
Reference Data Integration: A Strategy for the FutureReference Data Integration: A Strategy for the Future
Reference Data Integration: A Strategy for the Future
 
Experimentos de nubes científicas: Medical Genome Project
Experimentos de nubes científicas: Medical Genome ProjectExperimentos de nubes científicas: Medical Genome Project
Experimentos de nubes científicas: Medical Genome Project
 
TDikow Hennig 2011
TDikow Hennig 2011TDikow Hennig 2011
TDikow Hennig 2011
 
Computational Protein Design. 1. Challenges in Protein Engineering
Computational Protein Design. 1. Challenges in Protein EngineeringComputational Protein Design. 1. Challenges in Protein Engineering
Computational Protein Design. 1. Challenges in Protein Engineering
 
Proteomics in VSC for crop improvement programme
Proteomics in VSC for crop improvement programmeProteomics in VSC for crop improvement programme
Proteomics in VSC for crop improvement programme
 

Plus de EBI

UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOAEBI
 
InterPro and InterProScan 5.0
InterPro and InterProScan 5.0InterPro and InterProScan 5.0
InterPro and InterProScan 5.0EBI
 
The European Nucleotide Archive
The European Nucleotide ArchiveThe European Nucleotide Archive
The European Nucleotide ArchiveEBI
 
Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl GenomesGenome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl GenomesEBI
 
Automatic Annotation in UniProtKB
Automatic Annotation in UniProtKBAutomatic Annotation in UniProtKB
Automatic Annotation in UniProtKBEBI
 
The Vertebrate Genome Annotation Database
The Vertebrate Genome Annotation DatabaseThe Vertebrate Genome Annotation Database
The Vertebrate Genome Annotation DatabaseEBI
 
Train online
Train onlineTrain online
Train onlineEBI
 

Plus de EBI (7)

UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOA
 
InterPro and InterProScan 5.0
InterPro and InterProScan 5.0InterPro and InterProScan 5.0
InterPro and InterProScan 5.0
 
The European Nucleotide Archive
The European Nucleotide ArchiveThe European Nucleotide Archive
The European Nucleotide Archive
 
Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl GenomesGenome resources at EMBL-EBI: Ensembl and Ensembl Genomes
Genome resources at EMBL-EBI: Ensembl and Ensembl Genomes
 
Automatic Annotation in UniProtKB
Automatic Annotation in UniProtKBAutomatic Annotation in UniProtKB
Automatic Annotation in UniProtKB
 
The Vertebrate Genome Annotation Database
The Vertebrate Genome Annotation DatabaseThe Vertebrate Genome Annotation Database
The Vertebrate Genome Annotation Database
 
Train online
Train onlineTrain online
Train online
 

Dernier

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 

Dernier (20)

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 

The annotation of plant proteins in UniProtKB

  • 1. The annotation of Plant Proteins in UniProtKB Michel Schneider Plant protein annotation program, Swiss-Prot group Swiss Institute of Bioinformatics Geneva, Switzerland Michel.Schneider@isb-sib.ch
  • 2. 1. The UniProt consortium and its products 2. Content of an entry in UniProtKB and manual curation 3. Complete proteomes and reference proteomes 4. Synchronization between UniProtKB and TAIR 5. Some statistics “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 3. The UniProt consortium “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 4. The missions of the UniProt consortium Provide the scientific community with a resource of protein sequence and functional annotation which has to be …  comprehensive  high quality  and freely accessible “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 5. Four components to fulfill specific demands UniProtKB Protein Knowledgebase UniRef UniProtKB/Swiss-Prot UniMes Sequence clusters Reviewed Metagenomic and UniRef100 (533’657 entries) UniRef90 environmental Manual curation sample sequences UniRef50 UniProtKB/Trembl Unreviewed (19 million entries) Automated annotation UniParc – Sequence archive contains current and obsolete sequences (29.6 million sequences) “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 6. UniProtKB, the expertly curated component of UniProt The high-quality curated protein knowledge database where data becomes structured knowledge “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 7. UniProtKB, the expertly curated component of UniProt Shigeo Fukuda “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 8. Protein sequence One gene - One species © 2009 SIB
  • 9. Protein and gene names Taxonomic information Protein sequence One gene - One species © 2009 SIB
  • 10. Protein and gene names Taxonomic information Sequence annotation: PTMs, alternative splicing products, Protein sequence mutagenesis, transmembrane domains, One gene - One species signal peptide… © 2009 SIB
  • 11. Protein and gene names General annotation: Taxonomic information Function, Subcellular location, Catalytic activity, Tissue specificity, Disruption phenotype… Sequence annotation: PTMs, alternative splicing products, Protein sequence mutagenesis, transmembrane domains, One gene - One species signal peptide… © 2009 SIB
  • 12. Protein and gene names General annotation: Taxonomic information Function, Subcellular location, Catalytic activity, Tissue specificity, Disruption phenotype… Sequence annotation: References PTMs, alternative splicing products, Protein sequence mutagenesis, transmembrane domains, One gene - One species signal peptide… © 2009 SIB
  • 13. Protein and gene names General annotation: Taxonomic information Function, Subcellular location, Catalytic activity, Tissue specificity, Disruption phenotype… Sequence annotation: References PTMs, alternative splicing products, Protein sequence mutagenesis, transmembrane domains, One gene - One species signal peptide… Keywords - Gene Ontology © 2009 SIB
  • 14. Protein and gene names General annotation: Taxonomic information Function, Subcellular location, Catalytic activity, Tissue specificity, Disruption phenotype… Sequence annotation: References PTMs, alternative splicing products, Protein sequence mutagenesis, transmembrane domains, One gene - One species signal peptide… Keywords Cross-references - Gene Ontology (~ 130 databases) © 2009 SIB
  • 15. Origin of the sequences in UniProtKB  International Nucleotide Sequence Database Collection (INSDC)  Ensembl or EnsemblGenomes  RefSeq  Direct submissions (protein sequences)  Literature  Protein Data Bank “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 16. The process of manual sequence curation 1. Select entry/gene (priorities) 2. Identify entries from same gene and homologs using BLAST against UniProtKB 3. Merge entries from the same gene and same species into a single record 4. Select a canonical sequence “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 17. Critical analysis and report of sequence discrepancies QPCT_ARATH (Q84WV9) Glutaminyl-peptide cyclotransferase (At4g25720) “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 18. Critical analysis and report of sequence discrepancies QPCT_ARATH (Q84WV9) Glutaminyl-peptide cyclotransferase (At4g25720) “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 19. “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 20. Literature-based curation  Identify relevant papers through searching literature databases  Read full text of papers and extract and summarize relevant information “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 21. Literature-based curation “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 22. Literature-based curation “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 23. Literature-based curation “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 24. Controlled vocabularies • Keywords provide a summary of the entry content • We annotate using the Gene Ontology (GO) “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 25. UniProtKB, complete proteome sequence sets • Genome completely sequenced • Proteins mapped to the genome 2’902 complete proteomes Fully manually reviewed (e.g. S. cerevisiae) Partially manually reviewed (e.g. A. thaliana) Unreviewed (e.g. Chlorella variabilis) “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 26. UniProtKB, reference proteome sequence sets A reference proteome is the complete proteome of a representative, well-studied model organism or an organism of interest for biomedical research. 509 reference proteomes “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 27. UniProtKB, complete proteome sequence sets “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 28. Arabidopsis thaliana The building of the complete proteome sequence set: • Based on the re-annotation of complete genome by TAIR: 27’416 protein coding genes “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 29. UniProtKB – TAIR synchronization cDNAs, ESTs, genomic sequences Nucleic acid databases UniProtKB/TrEMBL Unreviewed (40’574 entries) UniProtKB/Swiss-Prot Reviewed (10’340 entries) release 2011_03 - Mar 08, 2011 “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 30. UniProtKB – TAIR synchronization cDNAs, ESTs, genomic sequences Genome re-annotation 35’386 gene products Nucleic acid databases UniProtKB/TrEMBL Temporary TrEMBL set 33’341 entries Unreviewed (40’574 entries) UniProtKB/Swiss-Prot Reviewed (10’340 entries) “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 31. UniProtKB – TAIR synchronization cDNAs, ESTs, genomic sequences Genome re-annotation 35’386 gene products Nucleic acid databases UniProtKB/TrEMBL Temporary TrEMBL set 33’341 entries Unreviewed (40’574 entries) 11’508 sequences UniProtKB/Swiss-Prot Compare translations from the same gene, merge if 100 % identical, report sequence discrepancies, align with Reviewed (10’340 entries) orthologs and paralogs “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 32. UniProtKB – TAIR synchronization cDNAs, ESTs, genomic sequences Genome re-annotation Nucleic acid databases UniProtKB/TrEMBL Temporary TrEMBL set Unreviewed UniProtKB/Swiss-Prot Compare translations from the same gene, merge if 100 % identical, report sequence discrepancies, align with Reviewed orthologs and paralogs Feedback to TAIR 90 gene models correct gene models or add new isoforms 283 corrections at the Heart of Science” 1998 – 2008 “Pioneers PAG XX, San Diego, January 15, 2012
  • 33. UniProtKB – TAIR synchronization cDNAs, ESTs, genomic sequences Genome re-annotation Nucleic acid databases UniProtKB/TrEMBL Temporary TrEMBL set Unreviewed Cleaned set of new TrEMBL entries UniProtKB/Swiss-Prot (21’656 entries) Reviewed “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 34. UniProtKB – TAIR synchronization cDNAs, ESTs, genomic sequences Genome re-annotation Nucleic acid databases UniProtKB/TrEMBL Temporary TrEMBL set Unreviewed (44’628 entries) Cleaned set of new TrEMBL entries UniProtKB/Swiss-Prot (21’656 entries) Reviewed + (10’875 entries) UniProtKB/Swiss-Prot Reviewed (10’865 entries) release 2011_12 - Dec 14, 2011 Arabidopsis thaliana, cv. Columbia Complete proteome: 32’521 entries “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 35. 1001 Arabidopsis genomes • Deposited to INSDC ? • Fully Annotated ? With CDS ? • Should we still merge all the identical sequences together? • If they are not merged but kept separate, how to get relevant Blast results? “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 36. Some UniProtKB/Swiss-Prot Statistics concerning plant entries (UniProt release 2011_12 - Dec 14, 2011) • 31,959 entries of Viridiplantae • from 1,924 species • 10’875 entries from Arabidopsis thaliana (with 1,219 isoforms) • 2,823 entries from Oryza sativa sp. Japonica • 11,897 plant entries with an EC number • 966 different complete EC numbers • 5,744 putative transporters or proteins involved in transport “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 37. Summary UniProtKB/Swiss-Prot, the manually curated knowledgebase: • Protein sequence database covering all kingdoms of life (533’657 sequence entries; 12’664 species) • Manually annotated • Non-redundant: all products of one gene in one species in a single entry • Highly cross-referenced (links to ~130 databases). Plant protein annotation: • Complete proteome for Arabidopsis thaliana • Synchronization with TAIR “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 38. We need your feedback and your collaboration ! help@uniprot.org “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012
  • 39. Acknowledgements SIB Ioannis Xenarios, Lydie Bougueleret, Andrea Auchincloss, Kristian Axelsen, Delphine Baratin, Marie-Claude Blatter, Brigitte Boeckmann, Jerven Bolleman, Laurent Bollondi, Emmanuel Boutet, Lionel Breuza, Alan Bridge, Edouard de Castro, Lorenzo Cerutti, Elisabeth Coudert, Béatrice Cuche, Mikael Doche, Dolnide Dornevil, Severine Duvaud, Anne Estreicher, Livia Famiglietti, Marc Feuermann, Sebastien Gehant, Elisabeth Gasteiger, Vivienne Gerritsen, Arnaud Gos, Nadine Gruaz-Gumowski, Ursula Hinz, Chantal Hulo, Nicolas Hulo, Janet James, Florence Jungo, Guillaume Keller, Vicente Lara, Philippe Lemercier, Damien Lieberherr, Xavier Martin, Patrick Masson, Anne Morgat, Salvo Paesano, Ivo Pedruzzi, Sandrine Pilbout, Sylvain Poux, Monica Pozzato, Manuela Pruess, Nicole Redaschi, Catherine Rivoire, Bernd Roechert, Michel Schneider, Christian Sigrist, Karin Sonesson, Sylvie Staehli, Eleanor Stanley, André Stutz, Shyamala Sundaram, Michael Tognolli, Laure Verbregue and Anne-Lise Veuthey EBI Rolf Apweiler, Maria Jesus Martin, Claire O'Donovan, Michele Magrane, Yasmin Alam-Faruque, Ricardo Antunes, Benoit Bely, Mark Bingley, David Binns, Lawrence Bower, Wei Mun Chan, Emily Dimmer, Francesco Fazzini, Alexander Fedotov, John Garavelli, Leyla Garcia Castro, Rachael Huntley, Julius Jacobsen, Michael Kleen, Duncan Legge, Wudong Liu, Jie Luo, Sandra Orchard, Samuel Patient, Klemens Pichler, Diego Poggioli, Nikolas Pontikos, Steven Rosanoff, Tony Sawford, Harminder Sehra, Edward Turner, Matt Corbett, Mike Donnelly and Pieter van Rensburg PIR Cathy H. Wu, Cecilia N. Arighi, Leslie Arminski, Winona C. Barker, Chuming Chen, Yongxing Chen, Pratibha Dubey, Hongzhan Huang, Kati Laiho, Raja Mazumder, Peter McGarvey, Darren A. Natale, Thanemozhi G. Natarajan, Jules Nchoutmboube, Natalia V. Roberts, Baris E. Suzek, Uzoamaka Ugochukwu, C. R. Vinayaka, Qinghua Wang, Yuqi Wang, Lai-Su Yeh and Jian Zhang www.uniprot.org
  • 40. UniProt is mainly supported by the National Institutes of Health (NIH) grant 1 U41 HG006104-01. Additional support for the EBI's involvement in UniProt comes from the NIH grant 2P41 HG02273-07. Swiss-Prot activities at the SIB are supported by the Swiss Federal Government through the Federal Office of Education and Science and the European Commission contracts SLING (226073), Gen2Phen (200754) and MICROME (222886). PIR activities are also supported by the NIH grants 5R01GM080646-04, 3R01GM080646-04S2, 1G08LM010720-01, and 3P20RR016472-09S2, and NSF grant DBI-0850319. “Pioneers at the Heart of Science” 1998 – 2008 PAG XX, San Diego, January 15, 2012

Notes de l'éditeur

  1. Alignment of sequences deduced from 2 genomic DNAs, one cDNA and one ESTAnnotation of erroneous gene model predictions
  2. Annotation of isoforms
  3. Information about how to reconstruct all isoformsAccess to the sequences of all isoformsCan apply various tools
  4. The sequencing of 1001 Arabidopsis genomes is raising several questions and we have to find new solutionsIf not merged, one solution for the blast is to use UniRef, but only valid for functional annotation and not for finding if an homologous protein is already known in a given species