Introduction to Proteomics History and Bioinformatics
1. Proteomics: History and introduction to
the course
Dr. Juan Antonio Vizcaíno
Proteomics Team Leader
EMBL-EBI
Hinxton, Cambridge, UK
2. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 4 December 2016
Data resources at EMBL-EBI
Genes, genomes & variation
ArrayExpress
Expression Atlas PRIDE
InterPro Pfam UniProt
ChEMBL ChEBI
Molecular structures
Protein Data Bank in Europe
Electron Microscopy Data Bank
European Nucleotide Archive
European Variation Archive
European Genome-phenome Archive
Gene & protein expression
Protein sequences, families & motifs
Chemical biology
Reactions, interactions &
pathways
IntAct Reactome MetaboLights
Systems
BioModels Enzyme Portal BioSamples
Ensembl
Ensembl Genomes
GWAS Catalog
Metagenomics portal
Europe PubMed Central
Gene Ontology
Experimental Factor
Ontology
Literature & ontologies
3. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 4 December 2016
• Useful definitions and concepts to start
• A little bit of history… and curiosities
• Importance of bioinformatics
Overview
4. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 4 December 2016
Proteomics is the large-scale study of proteins, particularly
their structures and functions
The proteome is the entire complement of proteins
including the modifications made to a particular set of
proteins, produced by an organism or system. This will vary
with time and distinct requirements, or stresses, that a cell
or organism undergoes
proteome = ‘protein’ + ‘genome’ (M. Wilkins, 1994)
Definitions
5. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 4 December 2016
Genomics
Transcriptomics
Proteomics
From the genome to the proteome
6. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 4 December 2016
Genome vs. proteome
•Genome
• Essentially static over time
• Non location specific
• Human genome mapped
(initially on 2000)
• ~20,000 genes
• PCR is available to amplify
DNA
•Proteome
• Dynamic over time
• Location specific
• Human proteome non-
mapped:
• How many???
• No equivalent of PCR for
proteins
7. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 4 December 2016
• Large increase in protein diversity due to:
• Alternative splicing of pre-mRNA (introns and exons)
• Post-translational modifications of proteins
• Cell age and health/disease state
Genome -> Proteome
8. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 4 December 2016
20 naturally occurring
amino acids
Chirality
L-aa
Amino acids
9. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 4 December 2016
From: Molecular Biology of the Cell (4th Ed)
http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=mboc4&part=A388&rendertype=figure&id=A3
91
Individual amino acids
polypeptide
Peptide bond
Protein backbone
10. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 4 December 2016
• Useful definitions and concepts to start
• A little bit of history… and curiosities
• Importance of bioinformatics
Overview
11. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 4 December 2016
Sanger's principal conclusion was that the two polypeptide chains of the
protein insulin had precise amino acid sequences and, by extension, that
every protein had a unique sequence.
Nobel Prize in Chemistry in 1958
F. Sanger
Protein sequencing: the pioneers
12. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 4 December 2016
F. Sanger
By 1975, he had developed the “dideoxy”
method for sequencing DNA molecules,
also known as the Sanger method. He
sequenced the first organism: Phague F-
x-174
Nobel Prize in Chemistry in 1980
Not only protein sequencing…
13. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 4 December 2016
MS is an analytical technique that measures the mass-to-charge (m/z)
ratio of charged particles. It is used for determining masses of particles,
for the determination of the elemental composition of a sample or
molecule, and for elucidating the chemical structures of molecules, such as
peptides and other chemical compounds.
Many applications…
one of them is proteomics
Mass spectrometry (MS)
14. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 4 December 2016
P. V. Edman
By 1950, he first developed the Edman degradation
method.
A major drawback of this technique is that the peptides
being sequenced cannot be longer than around 30
residues
Protein sequencing: the pioneers
15. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 4 December 2016
Wolfgang Paul / Hans G. Dehmelt developed the ion trap
technique (1950s and 1960s).
Nobel Prize in Physics (1989)
A commercial quadrupole ion trap
(Finnigan MAT) was introduced in 1983.
The ion trap quickly became the primary
instrument for conducting proteomics
because of its ability to conduct tandem
MS (MS/ MS) analysis of complex mixtures
of peptides, generated by enzymatic
digestion of proteome samples such as cell
lysates.
History of Mass spectrometry
16. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 4 December 2016
John B. Fenn (Yale University) and co-workers use
electrospray (ESI) to ionize biomolecules (high-
molecular weight proteins).
Koichi Tanaka (Shimadzu Corp) used the “ultra fine metal
plus liquid matrix method” to ionize intact proteins (Soft
Laser Desorption): “With the proper combination of laser
wavelength and matrix, a protein can be ionized”.
Fenn and Tanaka: Nobel Prize in Chemistry (2002)
Ionization methods were too energetic to be used with biological molecules
F. Hillenkamp & M. Karas developed the MALDI technique:
use of organic matrices to obtain MS of large proteins
Mass spectrometry: Soft ionization methods
17. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 4 December 2016
Patrick H. O’Farrell
J. Klose
1D SDS gel
MW
MW
pI
2D SDS gel
2D gel image from: http://www.fixingproteomics.org/
Gel electroforesis
18. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 4 December 2016
The rapid
development of
genomics allowed the
development of
proteomics
Shot-gun
proteomics:
Method of
identifying proteins
in complex mixture
HPLC
MS
100 300 500 700 900 1100 1300 1500 1700 1900 2100
m/z0
100
%
100 300 500 700 900 1100 1300 1500 1700 1900 2100
m/z0
100
%
There are only 20 aminoacids.
The physico-chemical
properties of the peptides are
more homogeneous and
‘manageable’ than the ones
from the proteins
From protein centric to peptide centric
19. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 4 December 2016
Mass Spectrometry (MS)-based proteomics
• Many different workflows.
• Discovery mode:
• Bottom-up proteomics
• Data dependent acquisition (DDA)
• Data independent acquisition (DIA)
• Top down proteomics (intact proteins)
• Targeted mode:
• SRM/ MRM (Selected Reaction
Monitoring/ Multiple Reaction Monitoring)
20. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 4 December 2016
Not only identify, but also quantify the
amount of each protein in the sample
The current methods rely mainly on MS:
Vaudel et al., Proteomics 2010 Feb;10(4):650-670
Proteomics becomes quantitative
21. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 4 December 2016
The Yeast-two-hybrid method was developed by S. Fields in 1989.
Many more methods developed since then:
- Affinity electrophoresis
- Co-inmunoprecipitation
-Tandem affinity purification (TAP)
Protein-protein interactions: yeast-two hybrid
22. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 4 December 2016
Proteomics in a clinical environment
• Biomarker discovery is a very active field of research.
• MS technology is slowly incorporating into the clinical world.
• Used to identify microorganisms by
MALDI MS profiling.
• Approved in Europe. On August
2013 it become the first MS
diagnostic tool approved in the US.
J Rohn (2013) Nat Biotechnol, 31, 862
23. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 4 December 2016
http://thehpp.org/
The Human Proteome Project (HPP)
24. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 4 December 2016
Draft Human proteome papers published in 2014
Wilhelm et al., Nature, 2014 Kim et al., Nature, 2014
•Two independent groups claimed to have produced the
first complete draft of the human proteome by MS.
• Some of their findings are controversial and need further
validation… but generated a lot of discussion and put
proteomics in the spotlight.
•They used many different tissues.
Nature cover 29 May 2014
25. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 4 December 2016
Proteomics for structural biology
• Increased focus
in recent years (a
lot more to
come).
• MS/MS cross-
linking
approaches
• HD-exchange
mass
spectrometry
Lössl et al., EMBO J, 2016
26. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 4 December 2016
• Useful definitions and concepts to start
• A little bit of history… and curiosities
• Importance of bioinformatics
Overview
27. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 4 December 2016
Atlas
what happens
where
Need for bioinformatics
Biology is changing:
• High-throughput
• More data produced
• New types of data
• Emphasis on systems biology
Bioinformatics enables new
applications:
• molecular medicine
• agriculture
• food
• environmental sciences
28. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 4 December 2016
On 21 July 1986, SWISS-PROT was created by
A. Bairoch (it contained around 3,900 protein
sequences)
In 1979, the first software was developed for 2DE image analysis (ELSIE)
Bioinformatics is very much needed in proteomics
29. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 4 December 2016
On 21 July 1986, SWISS-PROT was created by
A. Bairoch (it contained around 3,900 protein
sequences)
In 1979, the first software was developed for 2DE image analysis (ELSIE)
Bioinformatics is very much needed in proteomics
30. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 4 December 2016
Mallick & Kuster, Nat. Biotechnol. 2010 Jul;28(7):695-709
Proteomics is a complex discipline
31. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 4 December 2016
MS based proteomics
Hein et al., Handbook of Systems Biology, 2012
32. Juan A. Vizcaíno
juan@ebi.ac.uk
WT Proteomics Bioinformatics Course 2016
Hinxton, 4 December 2016
Genomics
Transcriptomics
Proteomics
More multi-omics studies…
Metabolomics