SlideShare une entreprise Scribd logo
1  sur  73
Marie-Claude.Blatter@isb-sib.ch
Swiss-Prot group, Geneva
SIB Swiss Institute of Bioinformatics
The UniProt knowledgebase
www.uniprot.org
a hub of integrated protein data
http://education.expasy.org/cours/Prague2011/
Science cover, february 2011
protein sequence functional information
data knowledge
UniProt consortium
EBI : European Bioinformatics Institute (UK)
SIB : Swiss Institute of Bioinformatics (CH)
PIR : Protein information resource (US)
www.uniprot.org
UniProt databases
UniProtKB: protein sequence knowledgebase, 2 sections
UniProtKB/Swiss-Prot and UniProtKB/TrEMBL (query, Blast,
download) (~15 mo entries)
UniParc: protein sequence archive (ENA equivalent at the protein
level). Each entry contains a protein sequence with cross-
links to other databases where you find the sequence
(active or not). Not annotated (query, Blast, download) (~25 mo entries)
UniRef: 3 clusters of protein sequences with 100, 90 and
50 % identity; useful to speed up sequence similarity
search (BLAST) (query, Blast, download) (UniRef100 10 mo entries; UniRef90 7 mo
entries; UniRef50 3.3 mo entries)
UniMES: protein sequences derived from metagenomic
projects (mostly Global Ocean Sampling (GOS)) (download)
(8 mo entries, included in UniParc)
UniProt databases
The central piece
UniProtKB
an encyclopedia on proteins
composed of 2 sections
UniProtKB/TrEMBL and UniProtKB/Swiss-Prot
unreviewed and reviewed
automatically annotated and manually annotated
released every 4 weeks
UniProtKB
Origin of protein sequences
UniProtKB protein sequences are mainly derived from
- INSDC (translated submitted coding sequences - CDS)
- Ensembl (gene prediction ) and RefSeq sequences
- Sequences of PDB structures
- Direct submission or sequences scanned from literature
Notes: - UniProt is not doing any gene prediction
- Most non-germline immunoglobulins, T-cell receptors , most patent sequences,
highly over-represented data (e.g. viral antigens), pseudogenes sequences are
excluded from UniProtKB, - but stored in UniParc
- Data from the PIR database have been integrated in UniProtKB since 2003.
15 %
85 %
Swiss-Prot
TrEMBL
EMBL
Automated extraction
of protein sequence
(translated CDS), gene
name and references.
Automated annotation
Manual annotation of
the sequence and
associated biological
information
UniProtKB/TrEMBL
unreviewed
Automatic annotation
released every 4 weeks
One protein sequence
One species
Automated annotation
Keywords
and
Gene Ontology
Automated annotation
Function, Subcellular location,
Catalytic activity,
Sequence similarities…
Automated annotation
transmembrane domains,
signal peptide…
Cross-references
to over 125 databases
References
Protein and gene names
Taxonomic information
UniProtKB/TrEMBL
www.uniprot.org
UniProtKB/TrEMBL
Automatic annotation
Protein sequence
- The quality of the protein sequences is dependent on the information
provided by the submitter of the original nucleotide entry (CDS) or of the
gene prediction pipeline (i.e. Ensembl).
- 100% identical sequences (same length, same organism are merged
automatically).
Biological information
Sources of annotation
- Provided by the submitter (EMBL, PDB, TAIR…)
- From automated annotation (automated generated annotation rules (i.e.
SAAS) and/or manually generated annotation rules (i.e. UniRule))
Example of fully automatic annotation: SAAS
• Rules are derived from the UniProtKB/Swiss-Prot manual annotation.
• Fully automated rule generation based on C4.5 decision tree algorithm.
• One annotation, one rule.
• High stringency – require 99% or greater estimated precision to
generate annotation (test on UniProtKB/Swiss-Prot)
• Rules are produced, updated and validated at each release.
UniProtKB/TrEMBL
UniProtKB/Swiss-Prot
reviewed
manually annotated
released every 4 weeks
MSKEKFERTKPHVNVGTIGHVDHGKTTLTAAITTVLAKTYGGAAR
AFDQIDNAPEEKARGITINTSHVEYDTPTRHYAHVDCPGHADYVK
NMITGAAQMDGAILVVAATDGPMPQTREHILLGRQVGVPYIIVFL
NKCDMVDDEELLELVEMEVRELLSQYDFPGDDTPIVRGSALKALE
GDAEWEAKILELAGFLDSYIPEPERAIDKPFLLPIEDVFSISGRG
TVVTGRVERGIIKVGEEVEIVGIKETQKSTCTGVEMFRKLLDEGR
AGENVGVLLRGIKREEIERGQVLAKPGTIKPHTKFESEVYILSKD
EGGRHTPFFKGYRPQFYFRTTDVTGTIELPEGVEMVMPGDNIKMV
VTLIHPIAMDDGLRFAIREGGRTVGAGVVAKVLG
One protein sequence
One gene
One species
Manual annotation
Keywords
and
Gene Ontology
Manual annotation
Function, Subcellular location,
Catalytic activity, Disease,
Tissue specificty, Pathway…
Manual annotation
Post-translational modifications,
variants, transmembrane domains,
signal peptide…
Cross-references
to over 125 databases
References
Protein and gene names
Taxonomic information
Alternative products:
protein sequences produced by
alternative splicing,
alternative promoter usage,
alternative initiation…
UniProtKB/Swiss-Prot
www.uniprot.org
UniProtKB/Swiss-Prot
Manual annotation
1. Protein sequence (merge available CDS, annotate
sequence discrepancies, report sequencing mistakes…)
2. Biological information (sequence analysis, extract
literature information, ortholog data propagation, …)
UniProtKB/Swiss-Prot
1- Protein sequence curation
The displayed protein sequence:
…canonical, representative, consensus…
+
alternative sequences (described within the entry)
1 entry <-> 1 gene (1 species)
UniProtKB/Swiss-Prot
a gene-centric view of the protein space
What is the current status?
• At least 20% of Swiss-Prot entries required a minimal
amount of curation effort so as to obtain the “correct”
sequence.
• Typical problems
– unsolved conflicts
– uncorrected initiation sites
– frameshifts
– wrong gene prediction
– other ‘problems’
UCSC genome browser
examples of CDS annotation submitted to INSDC…
UniProtKB/Swiss-Prot
2- Biological data curation
UniProtKB/Swiss-Prot gathers data form multiple sources:
- publications (literature/Pubmed)
- prediction programs (Prosite, TMHMM, …)
- contacts with experts
- other databases
- nomenclature committees
An evidence attribution system allows to easily trace the
source of each annotation
Extract literature information
and protein sequence analysis
maximum usage of controlled vocabulary
Protein and gene names
…enable researchers to
obtain a summary of
what is known about a
protein…
General annotation
(Comments)
www.uniprot.org
Human protein manual annotation:
some statistics (June 2011)
Sequence annotation
(Features)
…enable researchers to
obtain a summary of
what is known about a
protein…
www.uniprot.org
Non-experimental qualifiers
UniProtKB/Swiss-Prot considers both experimental and predicted
data and makes a clear distinction between both
Type of evidence Qualifier
Strong experimental evidence None or Ref.X
Light experimental evidence Probable
Inferred by similarity with homologous protein By similarity
Inferred by prediction Potential
Find all the proteins localized in
the cytoplasm (experimentally
proven) which are phosphorylated
on a serine (experimentally proven)
• The ‘Protein existence’ tag indicates what is the evidence
for the existence of a given protein;
• Different qualifiers:
– 1. Evidence at protein level (~18%)
– (MS, western blot (tissue specificity), immuno (subcellular
location),…)
– 2. Evidence at transcript level (~19%)
– 3. Inferred from homology (~58 %)
– 4. Predicted (~5%)
– 5. Uncertain (mainly in TrEMBL)
‘Protein existence’ tag
http://www.uniprot.org/docs/pe_criteria
UniProtKB
Additional information
can be found in the cross-references
(to more than 140 databases)
2D gel
2DBase-Ecoli
ANU-2DPAGE
Aarhus/Ghent-2DPAGE (no server)
COMPLUYEAST-2DPAGE
Cornea-2DPAGE
DOSAC-COBS-2DPAGE
ECO2DBASE (no server)
OGP
PHCI-2DPAGE
PMMA-2DPAGE
Rat-heart-2DPAGE
REPRODUCTION-2DPAGE
Siena-2DPAGE
SWISS-2DPAGE
UCD-2DPAGE
World-2DPAGE
Family and domain
Gene3D
HAMAP
InterPro
PANTHER
Pfam
PIRSF
PRINTS
ProDom
PROSITE
SMART
SUPFAM
TIGRFAMs
Organism-specific
AGD
ArachnoServer
CGD
ConoServer
CTD
CYGD
dictyBase
EchoBASE
EcoGene
euHCVdb
EuPathDB
FlyBase
GeneCards
GeneDB_Spombe
GeneFarm
GenoList
Gramene
H-InvDB
HGNC
HPA
LegioList
Leproma
MaizeGDB
MGI
MIM
neXtProt
Orphanet
PharmGKB
PseudoCAP
RGD
SGD
TAIR
TubercuList
WormBase
Xenbase
ZFIN
Protein family/group
Allergome
CAZy
MEROPS
PeroxiBase
PptaseDB
REBASE
TCDB
Genome annotation
Ensembl
EnsemblBacteria
EnsemblFungi
EnsemblMetazoa
EnsemblPlants
EnsemblProtists
GeneID
GenomeReviews
KEGG
NMPDR
TIGR
UCSC
VectorBase
Enzyme and pathway
BioCyc
BRENDA
Pathway_Interaction_DB
Reactome
Other
BindingDB
DrugBank
NextBio
PMAP-CutDB
Sequence
EMBL
IPI
PIR
RefSeq
UniGene
3D structure
DisProt
HSSP
PDB
PDBsum
ProteinModelPortal
SMR
PTM
GlycoSuiteDB
PhosphoSite
PhosSite
UniProtKB/Swiss-Prot:
129 explicit links
and 14 implicit links!
Proteomic
PeptideAtlas
PRIDE
ProMEX
PPI
DIP
IntAct
MINT
STRING
Phylogenomic dbs
eggNOG
GeneTree
HOGENOM
HOVERGEN
InParanoid
OMA
OrthoDB
PhylomeDB
ProtClustDB
Polymorphism
dbSNP
Gene expression
ArrayExpress
Bgee
CleanEx
Genevestigator
GermOnline
Ontologies
GO
The UniProt web site
www.uniprot.org
• Powerful search engine, google-like and easy-to-use, but also
supports very directed field searches
• Scoring mechanism presenting relevant matches first
• Entry views, search result views and downloads are customizable
• The URL of a result page reflects the query; all pages and queries
are bookmarkable, supporting programmatic access
• Search, Blast, Align, Retrieve, ID mapping
Search
A very powerful text search tool with
autocompletion and refinement options
allowing to look for UniProt entries and
documentation by biological information
Find all human proteins
located in the nucleus
The search interface
guides users with helpful
suggestions and hints
Advanced Search
A very powerful search tool
To be used when you know in which
entry section the information is stored
Find all the protein localized in the
cytoplasm (experimentally proven)
which are phosphorylated on a
serine (experimentally proven)
Result pages: highly customizable
Result pages: downloadable
The URL can be bookmarked
and manually modified.
Blast
A tool associated with the standard
options to search sequences
in different UniProt databases and
data sets
Blast: customize the result display
Blast: local alignment
sequence annotation highlighting option
Align
A ClustalW multiple alignment tool
with
sequence annotation highlighting option
Align
sequence annotation highlighting option
Retrieve
A UniProt specific tool allowing to retrieve a list of
entries in several standard identifiers formats.
You can then query your ‘personal database’ with the
UniProt search tool.
Query your own dataset
ID Mapping
Gives the possibility to get a mapping between
different databases for a given protein
These identifiers are all pointing to a TP53 (p53) protein sequence !
●
P04637, NP_000537, NP_001119584.1, NP_001119585.1,
●
NP_001119584.1, NP_001119584.1, NP_001119584.1,
●
NP_001119584.1, ENSG00000141510, CCDS11118,
●
UPI000002ED67, IPI00025087, etc.
Download
Download UniProt
http://www.uniprot.org/downloads
Canonical and isoform sequences (fasta format)
A few words on the UniProt
‘complete proteome’
sequence sets…
2’747 complete proteomes
 Genome completely sequenced
 Proteins mapped to the genome
 Entries tagged with the KW ‘Complete proteome’
 UniProtKB/Swiss-Prot isoform sequences are available
in FASTA format only
Fully manually reviewed (e.g. S. cerevisiae)
Partially manually reviewed (e.g. Homo sapiens)
Unreviewed (e.g. Acinetobacter baumannii (strain 1656-2))
UniProtKB - complete proteomes
Can be downloaded:
 From our complete proteome page
www.uniprot.org/taxonomy/complete-proteomes
 From the ‘ftp download ‘ page
 By querying UniProtKB + download
Query: organism:93062 AND keyword:"complete proteome"
UniProtKB - complete proteomes
Additional information: www.uniprot.org/faq/15
Query UniProtKB + download
Human proteome ~ 20’200 genes
Query for ‘homo sapiens’ (August 2011)
• UniProtKB: 110,056 entries + alt sequences (~ 15’435) = 125’491
• UniProtKB/Swiss-Prot: 20’244 entries + alt sequences (~ 15’435) = 35’679
• UniProtKB/TrEMBL: 89,834 entries
• RefSeq: 32’898 sequences
• Ensembl: 90’720 sequences
Query for ‘homo sapiens’ + Complete proteome (KW-181)
• UniProtKB: 56’392 + alt sequences (15’435) = 71’827
• UniProtKB/Swiss-Prot: 20’238 + alt sequences (15’435) = 35’673
• UniProtKB/TrEMBL: 36’154
92% of human entries are linked with at least one RefSeq entry…
Summary
Do not hesitate to contact us !
help@uniprot.org
The UniProt Consortium
SIB
Ioannis Xenarios, Lydie Bougueleret, Andrea Auchincloss, Kristian Axelsen, Delphine Baratin, Marie-
Claude Blatter, Brigitte Boeckmann, Jerven Bolleman, Laurent Bollondi, Emmanuel Boutet, Lionel
Breuza, Alan Bridge, Edouard de Castro, Lorenzo Cerutti, Elisabeth Coudert, Béatrice Cuche, Mikael
Doche, Dolnide Dornevil, Severine Duvaud, Anne Estreicher, Livia Famiglietti, Marc Feuermann,
Sebastien Gehant, Elisabeth Gasteiger, Alain Gateau, Vivienne Gerritsen, Arnaud Gos, Nadine Gruaz-
Gumowski, Ursula Hinz, Chantal Hulo, Nicolas Hulo, Janet James, Florence Jungo, Guillaume Keller,
Vicente Lara, Philippe Lemercier, Damien Lieberherr, Xavier Martin, Patrick Masson, Anne Morgat,
Salvo Paesano, Ivo Pedruzzi, Sandrine Pilbout, Sylvain Poux, Monica Pozzato, Manuela Pruess, Nicole
Redaschi, Catherine Rivoire, Bernd Roechert, Michel Schneider, Christian Sigrist, Karin Sonesson,
Sylvie Staehli, Eleanor Stanley, André Stutz, Shyamala Sundaram, Michael Tognolli, Laure Verbregue,
Anne-Lise Veuthey
EBI
Rolf Apweiler, Maria Jesus Martin, Claire O'Donovan, Michele Magrane, Yasmin Alam-Faruque, Ricardo
Antunes, Benoit Bely, Mark Bingley, David Binns, Lawrence Bower, Wei Mun Chan, Emily Dimmer,
Francesco Fazzini, Alexander Fedotov, John Garavelli, Leyla Garcia Castro, Rachael Huntley, Julius
Jacobsen, Michael Kleen, Duncan Legge, Wudong Liu, Jie Luo, Sandra Orchard, Samuel Patient,
Klemens Pichler, Diego Poggioli, Nikolas Pontikos, Steven Rosanoff, Tony Sawford, Harminder Sehra,
Edward Turner, Matt Corbett, Mike Donnelly and Pieter van Rensburg
PIR
Cathy H. Wu, Cecilia N. Arighi, Leslie Arminski, Winona C. Barker, Chuming Chen, Yongxing Chen,
Pratibha Dubey, Hongzhan Huang, Kati Laiho, Raja Mazumder, Peter McGarvey, Darren A. Natale,
Thanemozhi G. Natarajan, Jules Nchoutmboube, Natalia V. Roberts, Baris E. Suzek, Uzoamaka
Ugochukwu, C. R. Vinayaka, Qinghua Wang, Yuqi Wang, Lai-Su Yeh and Jian Zhang
www.uniprot.org
UniProt is mainly supported by the National
Institutes of Health (NIH) grant 1 U41 HG006104-
01. Additional support for the EBI's involvement in
UniProt comes from the NIH grant 2P41 HG02273-07.
Swiss-Prot activities at the SIB are supported by the
Swiss Federal Government through the Federal
Office of Education and Science and the European
Commission contracts SLING (226073), Gen2Phen
(200754) and MICROME (222886). PIR activities are
also supported by the NIH grants 5R01GM080646-04,
3R01GM080646-04S2, 1G08LM010720-01, and
3P20RR016472-09S2, and NSF grant DBI-0850319.
www.isb-sib.ch
Thank you for your attention
http://education.expasy.org/cours/Prague2011/

Contenu connexe

Tendances (20)

MULTIPLE SEQUENCE ALIGNMENT
MULTIPLE  SEQUENCE  ALIGNMENTMULTIPLE  SEQUENCE  ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENT
 
Protein Structure, Databases and Structural Alignment
Protein Structure, Databases and Structural AlignmentProtein Structure, Databases and Structural Alignment
Protein Structure, Databases and Structural Alignment
 
Sequence file formats
Sequence file formatsSequence file formats
Sequence file formats
 
Tools of bioinforformatics by kk
Tools of bioinforformatics by kkTools of bioinforformatics by kk
Tools of bioinforformatics by kk
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
Structural databases
Structural databases Structural databases
Structural databases
 
Secondary protein structure prediction
Secondary protein structure predictionSecondary protein structure prediction
Secondary protein structure prediction
 
Swiss prot database
Swiss prot databaseSwiss prot database
Swiss prot database
 
Phylogenetic analysis
Phylogenetic analysis Phylogenetic analysis
Phylogenetic analysis
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Scop database
Scop databaseScop database
Scop database
 
Protein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modelingProtein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modeling
 
Sequence alignment global vs. local
Sequence alignment  global vs. localSequence alignment  global vs. local
Sequence alignment global vs. local
 
Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-
 
Genome annotation
Genome annotationGenome annotation
Genome annotation
 
Gene prediction and expression
Gene prediction and expressionGene prediction and expression
Gene prediction and expression
 
Ddbj
DdbjDdbj
Ddbj
 
Basics of bioinformatics
Basics of bioinformaticsBasics of bioinformatics
Basics of bioinformatics
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Proteins databases
Proteins databasesProteins databases
Proteins databases
 

En vedette

Biological Database Systems
Biological Database SystemsBiological Database Systems
Biological Database SystemsDenis Shestakov
 
PROTEIN STRUCTURE DATABANK
PROTEIN STRUCTURE DATABANKPROTEIN STRUCTURE DATABANK
PROTEIN STRUCTURE DATABANKMalvika Bansal
 
Presentation on Biological database By Elufer Akram @ University Of Science ...
Presentation on Biological database  By Elufer Akram @ University Of Science ...Presentation on Biological database  By Elufer Akram @ University Of Science ...
Presentation on Biological database By Elufer Akram @ University Of Science ...Elufer Akram
 

En vedette (6)

Learning sparql 2012 12
Learning sparql 2012 12Learning sparql 2012 12
Learning sparql 2012 12
 
Biological Database Systems
Biological Database SystemsBiological Database Systems
Biological Database Systems
 
PROTEIN STRUCTURE DATABANK
PROTEIN STRUCTURE DATABANKPROTEIN STRUCTURE DATABANK
PROTEIN STRUCTURE DATABANK
 
Presentation on Biological database By Elufer Akram @ University Of Science ...
Presentation on Biological database  By Elufer Akram @ University Of Science ...Presentation on Biological database  By Elufer Akram @ University Of Science ...
Presentation on Biological database By Elufer Akram @ University Of Science ...
 
Proteome databases
Proteome databasesProteome databases
Proteome databases
 
Proteomics
ProteomicsProteomics
Proteomics
 

Similaire à The uni prot knowledgebase

TheUniProtKBpptx__2022_03_30_13_07_41.pptx
TheUniProtKBpptx__2022_03_30_13_07_41.pptxTheUniProtKBpptx__2022_03_30_13_07_41.pptx
TheUniProtKBpptx__2022_03_30_13_07_41.pptxPRIYANKAZALA9
 
Biodatabases 101220022654-phpapp02
Biodatabases 101220022654-phpapp02Biodatabases 101220022654-phpapp02
Biodatabases 101220022654-phpapp02Sreekanth Gali
 
Introduction to Bioinformatics: Part 3
Introduction to Bioinformatics: Part 3Introduction to Bioinformatics: Part 3
Introduction to Bioinformatics: Part 3AhmedAbdElMoniem35
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchAnshika Bansal
 
02. Biological sequence databases.pptx
02. Biological sequence databases.pptx02. Biological sequence databases.pptx
02. Biological sequence databases.pptxHussainTaqi1
 
BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES nadeem akhter
 
Bioinformatics final
Bioinformatics finalBioinformatics final
Bioinformatics finalRainu Rajeev
 
protein databases
 protein databases protein databases
protein databaseswasisyed
 
CS Lecture 2017 04-11 from Data to Precision Medicine
CS Lecture 2017 04-11 from Data to Precision MedicineCS Lecture 2017 04-11 from Data to Precision Medicine
CS Lecture 2017 04-11 from Data to Precision MedicineGabe Rudy
 

Similaire à The uni prot knowledgebase (20)

TheUniProtKBpptx__2022_03_30_13_07_41.pptx
TheUniProtKBpptx__2022_03_30_13_07_41.pptxTheUniProtKBpptx__2022_03_30_13_07_41.pptx
TheUniProtKBpptx__2022_03_30_13_07_41.pptx
 
Biodatabases 101220022654-phpapp02
Biodatabases 101220022654-phpapp02Biodatabases 101220022654-phpapp02
Biodatabases 101220022654-phpapp02
 
Biological Databases
Biological DatabasesBiological Databases
Biological Databases
 
Major biological nucleotide databases
Major biological nucleotide databasesMajor biological nucleotide databases
Major biological nucleotide databases
 
NCBI
NCBINCBI
NCBI
 
Introduction to Bioinformatics: Part 3
Introduction to Bioinformatics: Part 3Introduction to Bioinformatics: Part 3
Introduction to Bioinformatics: Part 3
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
Understanding Genome
Understanding Genome Understanding Genome
Understanding Genome
 
PIR- Protein Information Resource
PIR- Protein Information ResourcePIR- Protein Information Resource
PIR- Protein Information Resource
 
02. Biological sequence databases.pptx
02. Biological sequence databases.pptx02. Biological sequence databases.pptx
02. Biological sequence databases.pptx
 
Introduction to databases.pptx
Introduction to databases.pptxIntroduction to databases.pptx
Introduction to databases.pptx
 
BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES
 
NIH-mar2604.rm.ppt
NIH-mar2604.rm.pptNIH-mar2604.rm.ppt
NIH-mar2604.rm.ppt
 
Bioinformatics final
Bioinformatics finalBioinformatics final
Bioinformatics final
 
protein databases
 protein databases protein databases
protein databases
 
Genomic databases
Genomic databasesGenomic databases
Genomic databases
 
CS Lecture 2017 04-11 from Data to Precision Medicine
CS Lecture 2017 04-11 from Data to Precision MedicineCS Lecture 2017 04-11 from Data to Precision Medicine
CS Lecture 2017 04-11 from Data to Precision Medicine
 
Data retrieval
Data retrievalData retrieval
Data retrieval
 
Introduction to Biological databases
Introduction to Biological databasesIntroduction to Biological databases
Introduction to Biological databases
 
Protein databases
Protein databasesProtein databases
Protein databases
 

Dernier

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 

Dernier (20)

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 

The uni prot knowledgebase

  • 1. Marie-Claude.Blatter@isb-sib.ch Swiss-Prot group, Geneva SIB Swiss Institute of Bioinformatics The UniProt knowledgebase www.uniprot.org a hub of integrated protein data http://education.expasy.org/cours/Prague2011/
  • 3. protein sequence functional information data knowledge
  • 4. UniProt consortium EBI : European Bioinformatics Institute (UK) SIB : Swiss Institute of Bioinformatics (CH) PIR : Protein information resource (US)
  • 7. UniProtKB: protein sequence knowledgebase, 2 sections UniProtKB/Swiss-Prot and UniProtKB/TrEMBL (query, Blast, download) (~15 mo entries) UniParc: protein sequence archive (ENA equivalent at the protein level). Each entry contains a protein sequence with cross- links to other databases where you find the sequence (active or not). Not annotated (query, Blast, download) (~25 mo entries) UniRef: 3 clusters of protein sequences with 100, 90 and 50 % identity; useful to speed up sequence similarity search (BLAST) (query, Blast, download) (UniRef100 10 mo entries; UniRef90 7 mo entries; UniRef50 3.3 mo entries) UniMES: protein sequences derived from metagenomic projects (mostly Global Ocean Sampling (GOS)) (download) (8 mo entries, included in UniParc)
  • 9. UniProtKB an encyclopedia on proteins composed of 2 sections UniProtKB/TrEMBL and UniProtKB/Swiss-Prot unreviewed and reviewed automatically annotated and manually annotated released every 4 weeks
  • 10. UniProtKB Origin of protein sequences UniProtKB protein sequences are mainly derived from - INSDC (translated submitted coding sequences - CDS) - Ensembl (gene prediction ) and RefSeq sequences - Sequences of PDB structures - Direct submission or sequences scanned from literature Notes: - UniProt is not doing any gene prediction - Most non-germline immunoglobulins, T-cell receptors , most patent sequences, highly over-represented data (e.g. viral antigens), pseudogenes sequences are excluded from UniProtKB, - but stored in UniParc - Data from the PIR database have been integrated in UniProtKB since 2003. 15 % 85 %
  • 11. Swiss-Prot TrEMBL EMBL Automated extraction of protein sequence (translated CDS), gene name and references. Automated annotation Manual annotation of the sequence and associated biological information
  • 13. One protein sequence One species Automated annotation Keywords and Gene Ontology Automated annotation Function, Subcellular location, Catalytic activity, Sequence similarities… Automated annotation transmembrane domains, signal peptide… Cross-references to over 125 databases References Protein and gene names Taxonomic information UniProtKB/TrEMBL www.uniprot.org
  • 14. UniProtKB/TrEMBL Automatic annotation Protein sequence - The quality of the protein sequences is dependent on the information provided by the submitter of the original nucleotide entry (CDS) or of the gene prediction pipeline (i.e. Ensembl). - 100% identical sequences (same length, same organism are merged automatically). Biological information Sources of annotation - Provided by the submitter (EMBL, PDB, TAIR…) - From automated annotation (automated generated annotation rules (i.e. SAAS) and/or manually generated annotation rules (i.e. UniRule))
  • 15.
  • 16.
  • 17. Example of fully automatic annotation: SAAS • Rules are derived from the UniProtKB/Swiss-Prot manual annotation. • Fully automated rule generation based on C4.5 decision tree algorithm. • One annotation, one rule. • High stringency – require 99% or greater estimated precision to generate annotation (test on UniProtKB/Swiss-Prot) • Rules are produced, updated and validated at each release. UniProtKB/TrEMBL
  • 19. MSKEKFERTKPHVNVGTIGHVDHGKTTLTAAITTVLAKTYGGAAR AFDQIDNAPEEKARGITINTSHVEYDTPTRHYAHVDCPGHADYVK NMITGAAQMDGAILVVAATDGPMPQTREHILLGRQVGVPYIIVFL NKCDMVDDEELLELVEMEVRELLSQYDFPGDDTPIVRGSALKALE GDAEWEAKILELAGFLDSYIPEPERAIDKPFLLPIEDVFSISGRG TVVTGRVERGIIKVGEEVEIVGIKETQKSTCTGVEMFRKLLDEGR AGENVGVLLRGIKREEIERGQVLAKPGTIKPHTKFESEVYILSKD EGGRHTPFFKGYRPQFYFRTTDVTGTIELPEGVEMVMPGDNIKMV VTLIHPIAMDDGLRFAIREGGRTVGAGVVAKVLG One protein sequence One gene One species Manual annotation Keywords and Gene Ontology Manual annotation Function, Subcellular location, Catalytic activity, Disease, Tissue specificty, Pathway… Manual annotation Post-translational modifications, variants, transmembrane domains, signal peptide… Cross-references to over 125 databases References Protein and gene names Taxonomic information Alternative products: protein sequences produced by alternative splicing, alternative promoter usage, alternative initiation… UniProtKB/Swiss-Prot www.uniprot.org
  • 20. UniProtKB/Swiss-Prot Manual annotation 1. Protein sequence (merge available CDS, annotate sequence discrepancies, report sequencing mistakes…) 2. Biological information (sequence analysis, extract literature information, ortholog data propagation, …)
  • 22. The displayed protein sequence: …canonical, representative, consensus… + alternative sequences (described within the entry) 1 entry <-> 1 gene (1 species) UniProtKB/Swiss-Prot a gene-centric view of the protein space
  • 23. What is the current status? • At least 20% of Swiss-Prot entries required a minimal amount of curation effort so as to obtain the “correct” sequence. • Typical problems – unsolved conflicts – uncorrected initiation sites – frameshifts – wrong gene prediction – other ‘problems’
  • 24. UCSC genome browser examples of CDS annotation submitted to INSDC…
  • 26. UniProtKB/Swiss-Prot gathers data form multiple sources: - publications (literature/Pubmed) - prediction programs (Prosite, TMHMM, …) - contacts with experts - other databases - nomenclature committees An evidence attribution system allows to easily trace the source of each annotation Extract literature information and protein sequence analysis maximum usage of controlled vocabulary
  • 28. …enable researchers to obtain a summary of what is known about a protein… General annotation (Comments) www.uniprot.org
  • 29. Human protein manual annotation: some statistics (June 2011)
  • 30. Sequence annotation (Features) …enable researchers to obtain a summary of what is known about a protein… www.uniprot.org
  • 31. Non-experimental qualifiers UniProtKB/Swiss-Prot considers both experimental and predicted data and makes a clear distinction between both Type of evidence Qualifier Strong experimental evidence None or Ref.X Light experimental evidence Probable Inferred by similarity with homologous protein By similarity Inferred by prediction Potential
  • 32. Find all the proteins localized in the cytoplasm (experimentally proven) which are phosphorylated on a serine (experimentally proven)
  • 33. • The ‘Protein existence’ tag indicates what is the evidence for the existence of a given protein; • Different qualifiers: – 1. Evidence at protein level (~18%) – (MS, western blot (tissue specificity), immuno (subcellular location),…) – 2. Evidence at transcript level (~19%) – 3. Inferred from homology (~58 %) – 4. Predicted (~5%) – 5. Uncertain (mainly in TrEMBL) ‘Protein existence’ tag http://www.uniprot.org/docs/pe_criteria
  • 34.
  • 35. UniProtKB Additional information can be found in the cross-references (to more than 140 databases)
  • 36. 2D gel 2DBase-Ecoli ANU-2DPAGE Aarhus/Ghent-2DPAGE (no server) COMPLUYEAST-2DPAGE Cornea-2DPAGE DOSAC-COBS-2DPAGE ECO2DBASE (no server) OGP PHCI-2DPAGE PMMA-2DPAGE Rat-heart-2DPAGE REPRODUCTION-2DPAGE Siena-2DPAGE SWISS-2DPAGE UCD-2DPAGE World-2DPAGE Family and domain Gene3D HAMAP InterPro PANTHER Pfam PIRSF PRINTS ProDom PROSITE SMART SUPFAM TIGRFAMs Organism-specific AGD ArachnoServer CGD ConoServer CTD CYGD dictyBase EchoBASE EcoGene euHCVdb EuPathDB FlyBase GeneCards GeneDB_Spombe GeneFarm GenoList Gramene H-InvDB HGNC HPA LegioList Leproma MaizeGDB MGI MIM neXtProt Orphanet PharmGKB PseudoCAP RGD SGD TAIR TubercuList WormBase Xenbase ZFIN Protein family/group Allergome CAZy MEROPS PeroxiBase PptaseDB REBASE TCDB Genome annotation Ensembl EnsemblBacteria EnsemblFungi EnsemblMetazoa EnsemblPlants EnsemblProtists GeneID GenomeReviews KEGG NMPDR TIGR UCSC VectorBase Enzyme and pathway BioCyc BRENDA Pathway_Interaction_DB Reactome Other BindingDB DrugBank NextBio PMAP-CutDB Sequence EMBL IPI PIR RefSeq UniGene 3D structure DisProt HSSP PDB PDBsum ProteinModelPortal SMR PTM GlycoSuiteDB PhosphoSite PhosSite UniProtKB/Swiss-Prot: 129 explicit links and 14 implicit links! Proteomic PeptideAtlas PRIDE ProMEX PPI DIP IntAct MINT STRING Phylogenomic dbs eggNOG GeneTree HOGENOM HOVERGEN InParanoid OMA OrthoDB PhylomeDB ProtClustDB Polymorphism dbSNP Gene expression ArrayExpress Bgee CleanEx Genevestigator GermOnline Ontologies GO
  • 37. The UniProt web site www.uniprot.org • Powerful search engine, google-like and easy-to-use, but also supports very directed field searches • Scoring mechanism presenting relevant matches first • Entry views, search result views and downloads are customizable • The URL of a result page reflects the query; all pages and queries are bookmarkable, supporting programmatic access • Search, Blast, Align, Retrieve, ID mapping
  • 38. Search A very powerful text search tool with autocompletion and refinement options allowing to look for UniProt entries and documentation by biological information
  • 39. Find all human proteins located in the nucleus
  • 40. The search interface guides users with helpful suggestions and hints
  • 41.
  • 42. Advanced Search A very powerful search tool To be used when you know in which entry section the information is stored
  • 43. Find all the protein localized in the cytoplasm (experimentally proven) which are phosphorylated on a serine (experimentally proven)
  • 44. Result pages: highly customizable
  • 46.
  • 47. The URL can be bookmarked and manually modified.
  • 48. Blast A tool associated with the standard options to search sequences in different UniProt databases and data sets
  • 49. Blast: customize the result display
  • 50. Blast: local alignment sequence annotation highlighting option
  • 51. Align A ClustalW multiple alignment tool with sequence annotation highlighting option
  • 53. Retrieve A UniProt specific tool allowing to retrieve a list of entries in several standard identifiers formats. You can then query your ‘personal database’ with the UniProt search tool.
  • 54. Query your own dataset
  • 55. ID Mapping Gives the possibility to get a mapping between different databases for a given protein
  • 56. These identifiers are all pointing to a TP53 (p53) protein sequence ! ● P04637, NP_000537, NP_001119584.1, NP_001119585.1, ● NP_001119584.1, NP_001119584.1, NP_001119584.1, ● NP_001119584.1, ENSG00000141510, CCDS11118, ● UPI000002ED67, IPI00025087, etc.
  • 57.
  • 60. Canonical and isoform sequences (fasta format)
  • 61. A few words on the UniProt ‘complete proteome’ sequence sets…
  • 62. 2’747 complete proteomes  Genome completely sequenced  Proteins mapped to the genome  Entries tagged with the KW ‘Complete proteome’  UniProtKB/Swiss-Prot isoform sequences are available in FASTA format only Fully manually reviewed (e.g. S. cerevisiae) Partially manually reviewed (e.g. Homo sapiens) Unreviewed (e.g. Acinetobacter baumannii (strain 1656-2)) UniProtKB - complete proteomes
  • 63. Can be downloaded:  From our complete proteome page www.uniprot.org/taxonomy/complete-proteomes  From the ‘ftp download ‘ page  By querying UniProtKB + download Query: organism:93062 AND keyword:"complete proteome" UniProtKB - complete proteomes Additional information: www.uniprot.org/faq/15
  • 64. Query UniProtKB + download
  • 65.
  • 66. Human proteome ~ 20’200 genes Query for ‘homo sapiens’ (August 2011) • UniProtKB: 110,056 entries + alt sequences (~ 15’435) = 125’491 • UniProtKB/Swiss-Prot: 20’244 entries + alt sequences (~ 15’435) = 35’679 • UniProtKB/TrEMBL: 89,834 entries • RefSeq: 32’898 sequences • Ensembl: 90’720 sequences Query for ‘homo sapiens’ + Complete proteome (KW-181) • UniProtKB: 56’392 + alt sequences (15’435) = 71’827 • UniProtKB/Swiss-Prot: 20’238 + alt sequences (15’435) = 35’673 • UniProtKB/TrEMBL: 36’154 92% of human entries are linked with at least one RefSeq entry…
  • 68.
  • 69. Do not hesitate to contact us ! help@uniprot.org
  • 70. The UniProt Consortium SIB Ioannis Xenarios, Lydie Bougueleret, Andrea Auchincloss, Kristian Axelsen, Delphine Baratin, Marie- Claude Blatter, Brigitte Boeckmann, Jerven Bolleman, Laurent Bollondi, Emmanuel Boutet, Lionel Breuza, Alan Bridge, Edouard de Castro, Lorenzo Cerutti, Elisabeth Coudert, Béatrice Cuche, Mikael Doche, Dolnide Dornevil, Severine Duvaud, Anne Estreicher, Livia Famiglietti, Marc Feuermann, Sebastien Gehant, Elisabeth Gasteiger, Alain Gateau, Vivienne Gerritsen, Arnaud Gos, Nadine Gruaz- Gumowski, Ursula Hinz, Chantal Hulo, Nicolas Hulo, Janet James, Florence Jungo, Guillaume Keller, Vicente Lara, Philippe Lemercier, Damien Lieberherr, Xavier Martin, Patrick Masson, Anne Morgat, Salvo Paesano, Ivo Pedruzzi, Sandrine Pilbout, Sylvain Poux, Monica Pozzato, Manuela Pruess, Nicole Redaschi, Catherine Rivoire, Bernd Roechert, Michel Schneider, Christian Sigrist, Karin Sonesson, Sylvie Staehli, Eleanor Stanley, André Stutz, Shyamala Sundaram, Michael Tognolli, Laure Verbregue, Anne-Lise Veuthey EBI Rolf Apweiler, Maria Jesus Martin, Claire O'Donovan, Michele Magrane, Yasmin Alam-Faruque, Ricardo Antunes, Benoit Bely, Mark Bingley, David Binns, Lawrence Bower, Wei Mun Chan, Emily Dimmer, Francesco Fazzini, Alexander Fedotov, John Garavelli, Leyla Garcia Castro, Rachael Huntley, Julius Jacobsen, Michael Kleen, Duncan Legge, Wudong Liu, Jie Luo, Sandra Orchard, Samuel Patient, Klemens Pichler, Diego Poggioli, Nikolas Pontikos, Steven Rosanoff, Tony Sawford, Harminder Sehra, Edward Turner, Matt Corbett, Mike Donnelly and Pieter van Rensburg PIR Cathy H. Wu, Cecilia N. Arighi, Leslie Arminski, Winona C. Barker, Chuming Chen, Yongxing Chen, Pratibha Dubey, Hongzhan Huang, Kati Laiho, Raja Mazumder, Peter McGarvey, Darren A. Natale, Thanemozhi G. Natarajan, Jules Nchoutmboube, Natalia V. Roberts, Baris E. Suzek, Uzoamaka Ugochukwu, C. R. Vinayaka, Qinghua Wang, Yuqi Wang, Lai-Su Yeh and Jian Zhang www.uniprot.org
  • 71. UniProt is mainly supported by the National Institutes of Health (NIH) grant 1 U41 HG006104- 01. Additional support for the EBI's involvement in UniProt comes from the NIH grant 2P41 HG02273-07. Swiss-Prot activities at the SIB are supported by the Swiss Federal Government through the Federal Office of Education and Science and the European Commission contracts SLING (226073), Gen2Phen (200754) and MICROME (222886). PIR activities are also supported by the NIH grants 5R01GM080646-04, 3R01GM080646-04S2, 1G08LM010720-01, and 3P20RR016472-09S2, and NSF grant DBI-0850319.
  • 73. Thank you for your attention http://education.expasy.org/cours/Prague2011/

Notes de l'éditeur

  1. This Science cover clearly shows the well known discepancy between the amount of data and the amount of knowledge which are available.This is a first challenge …but there is a second one: how is to link the 2 together ?
  2. The mission of UniProt is….to link the protein squences (data) together with the biological knowledge (functional information)
  3. The UniProt databases and web site are maintained by the UniProt consortium, which is composed of:
  4. Screen shot of the web page
  5. UniProt provides 4 databases, the central one beiing the UniProtKB.
  6. UniProt provides 4 databases, the central one beiing the UniProtKB.
  7. Computer prediction: if no other evidence from this protein or a similar protein, the keyword is not put.
  8. &amp;lt;number&amp;gt; dbSNP is NOT in DR lines!!! =&amp;gt; not included in the release notes statistics. Note : Replaces BuruList, ListiList, MypuList, PhotoList, SagaList and SubtiList
  9. &amp;lt;number&amp;gt; 3 groups working together Encyclopedia of proteins function in biology and life science Considered by the life science community as the GOLD standard in annotation practices Over 600’000 users per month originating from 149 countries. Is it uniprot or swiss-prot? Used by life science scientists (biologists, MDs), but also by chemists, engineers in nanotechnologies; Bioinformaticians; Used by pharma and biotechnology industry;
  10. &amp;lt;number&amp;gt; 3 groups working together Encyclopedia of proteins function in biology and life science Considered by the life science community as the GOLD standard in annotation practices Over 600’000 users per month originating from 149 countries. Is it uniprot or swiss-prot? Used by life science scientists (biologists, MDs), but also by chemists, engineers in nanotechnologies; Bioinformaticians; Used by pharma and biotechnology industry;
  11. &amp;lt;number&amp;gt; Take home message
  12. a bit of this, a bit of that…