Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Peptide Informatics - Bridging the gap between small-molecule and large-molecule systems
1. Peptide Informatics
Bridging the gap between small-molecule and large-
molecule systems
Lisa Sach-Peltason
Data Science, pRED Informatics, Roche Basel
2. Peptide Therapeutics – An Emerging Modality
US FDA approved drugs (2009-2011)
Small molecule
34
Protein
9
Monocl. antibody
8
Peptide
8
Natural product
6
Amino acid
5
Steroid
2
Nucleoside
1 Enzyme
1
Macrocycle
1 Other
1
Adapted from Albericio & Kruger; Future Med. Chem. (2012), 4(12), 1527-1531.
3. Peptide Therapeutics – An Emerging Modality
Saladin et al.; IDrugs (2009), 12(12), 779-784.
Therapeutic categories of peptide candidates
entering clinical trials (1980-2007)
4. Peptide Therapeutics – Opportunities
Selectivity Generation
Intracellular
access
Delivery Action
Oral
delivery
Small
molecules
Low to
high
synthetic High all routes Antago./
Agonist
Yes
Peptides High
synthetic or
recombinant
Possible
i.v. / s.c.
non-parenteral
delivery feasible
Agonist /
Antagonist
Potential
Biologics High recombinant Low i.v. / s.c. Antago./
Agonist
No
Proven Advantages of Peptides
• Efficacy at extracellular targets, especially for polar or shallow binding pockets
• Rapid optimization
• Low off-target pharmacology
• High target selectivity
*
*reflects current status; future potential
for peptide antagonists, e.g., PPI’s
5. Peptides at Roche
Growing asset of internal and external peptide compounds
• Global Roche compound DB: >25,000 compounds registered with PEPTIDE flag (of 3.9M
total)
• Increasing demand for informatics infrastructure and support for peptide projects
Combination Chart
Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
993
920
850
780
710
640
567
496
426
355
284
213
140
70
0
26200
25400
24600
23800
23000
22200
21400
20600
19800
19000
18200
17400
16600
15800
15000
Newregistrations
Peptides in IRCI 2003-2013
Totalno.peptides
6. Peptide Therapeutics – Informatics Challenges
Molecule graphs Sequences
Cheminformatics Bioinformatics
Similarity searching
SAR analysis, visualization
Property prediction
Small-molecule registration
Sequence searching
Alignment
Sequence analysis
Size, complexity
Non-standard residues
Chemical modifications
No format standards
Peptide informatics
Figure adapted from J.H.Jensen, ChemAxon European UGM, 2012
7. Data Capture Challenges
Peptide sequence format
IUPAC-IUB Nomenclature and Symbolism for Amino Acids and Peptides
(“3AA”, 1983)
• 3-letter code for standard and common non-standard amino acids
• Symbolism for representing amino acid sequences
H -Asp-Arg-Val-DTyr-Ile-His-Pro-Phe-OH
Ac - -NH2
Boc- - H
… …
Separator /
Peptide bond
N-terminal
specification
Residue
C-terminal
specificationStereoconfiguration
8. Data Capture Challenges
How to capture non-standard sequence elements?
Residue symbols
Modified amino acids
OH
NH2
O L-Norvaline Nva (discouraged by IUPAC but commonly used)
L-2-Aminovaleric acid? Avl (IUPAC)
L-2-Aminopentanoic acid? Ape (IUPAC)
O
O
OH
NH2
L-4-Benzoylphenylalanine 4Bpa
Phe(4-Bz) (systematic; avoid combinatorial
explosion)
9. Data Capture Challenges
How to capture non-standard sequence elements?
Cyclic peptides
Cross-links (disulfide bridges within or across chains, isopeptide bonds, …)
O
O
O
O
O
O
O
O
N
H
O
N
NH
NH
NH2
NH
N
H
O
N
NH
NH
NH2
NH
(IUPAC recommendation,
depiction rather than text)
cyclo[Leu-DPhe-Pro-Val-Orn-Leu-DPhe-Pro-Val-Orn]
H-Cys(1)-Tyr-Ile-Gln-Asn-Cys(1)-Pro-Leu-Gly-NH2
(IUPAC)
SMILES-like notation; see
also Biochemfusion’s PLN
10. Peptide Data Inventory
Digest Roche peptides with NextMove’s Sugar&Splice
26000
24000
22000
20000
18000
16000
14000
12000
10000
8000
6000
4000
2000
0
Top 50 monomer frequencies of 23k Roche peptides
Standard AA (without Gly and Pro): 93%
Top 50 monomers: 98%
11. Peptide Data Inventory
Monomer library
Roche Peptide Building Blocks
• ~200 manually curated templates
• Up to 600 monomers extracted from
Roche peptides
• Direct cartridge with normalization
& uniqueness check
Structure ID Short
Name
Chemical
Name
Category CAS Roche
Number
Ala A L-Alanine L-AA 56-41-7 ROxyz
Fmoc Fmoc 9-
Fluorenylmeth
oxy-carbonyl
SAG
Sequence registrationPeptide drawing
12. Peptide Sequence Information
Harmonizing peptide registration
LINEAR STRUCTURE DESCRIPTION field
Draw structure from local
monomer templates
H-His-Asp-Glu-Phe-Glu-Arg-His-
Ala-Glu-Gly- ... -OH
Enter sequence manually
No format standards or validation
PEPTIDE comment
Compound registration
system
13. Peptide Sequence Information
Harmonizing peptide registration
Synchronize drawing
templates with monomer library
Automatic sequence generation &
validation
Consistent
structure and
sequence
information
Atoms and bonds
• Chemical identification
• Novelty check
• (Sub-)Structure
searches
Sequence
• Depiction
• Visual comparison
• Sequence
searches
Tools for data analysis
Building
block library H-His-Asp-Glu-Phe-Glu-Arg-His-
Ala-Glu-Gly- ... -OH
LINEAR STRUCTURE DESCRIPTION fieldPEPTIDE comment
Compound registration
system
14. Peptide Drawing
Central template management in Accelrys Draw
Roche Peptide Building Blocks
• ~200 manually curated templates
• Categories: L-AA, D-AA, nS-AA,
Linkers, Attachments, Resins
Accelrys Draw Add-In
• Download templates to Draw
• Regular check for updates
• Register new templates via
Sequence Template Manager
• Validate new templates
15. Peptide Sequence Information
Sequence generation with NextMove’s Sugar&Splice
Computational perception of peptide sequence from chemical structure
• Output of sequence in standard format
• Lookup of non-standard names in building block library
Pipeline Pilot wrapper with easy-to-use web interface for registration
Maintenance procedure for batch registration and validation
• Check for peptides with empty/outdated sequences and update
• Process legacy peptides and complete sequence information
O
O
O
O
O
O
O
O
N
H
O
N
NH
NH
NH2
NH
N
H
O
N
NH
NH
NH2
NH
cyclo[Leu-DPhe-Pro-Val-Orn-Leu-DPhe-Pro-Val-Orn]
Building
block library
Sugar & Splice
16. Peptide Sequence Information
Interface to biologics landscape
Sequence-based analysis tools
• Sequence alignment, BLAST database search, …
• Conversion to standard FASTA via Sugar & Splice:
– Remove cycles and cross-links
– Replace non-standard residues by X or the closest natural analog
– Convert D-amino acids to L form
Data exchange with biologics research
• HELM format for macromolecule representation
• Shared dictionary for peptide building blocks
• Conversion to HELM via Sugar & Splice
cyclo[Leu-DPhe-Pro-Val-Orn-Leu-DPhe-Pro-Val-Orn]
PEPTIDE1{L.[dF].P.V.[Orn].L.[dF].P.V.[Orn]}$PEPTIDE1,PEPTIDE1,10:R2-1:R1$$$
LFPVXLFPVX
17. Summary & Benefits
Re-use and adapt small-molecule tools and systems
Ensure consistent structure and sequence information
Interface to large-molecule world
Benefits
• Maximized data value & quality through harmonized sequence information
• Enable automated sequence searches & analysis for synthetic peptides
• Time savings for peptide drawing, registration and analysis
• Future prospect: store sequence information within the molecular structure
Compound registration
system
H-His-Asp-Glu-Phe-Glu-Arg-His-
Ala-Glu-Gly- ... -OH
18. Acknowledgments
Discovery Chemistry
Konrad Bleicher
Eric Kitas
Kersten Klar
Betty Hennequin
Katja Ostmann
Adrian Schäublin
Patrick Studer-Schriber
pRED Informatics
Fausto Agnetti
Gerd Blanke
Gunther Dörnen
Sébastien Fournier
Werner Gotzeina
Peter Hilty
Ralf Horstmöller
Dieter Imark
Frederic Klein
Stefan Klostermann
Francesca Milletti
Denis Ribaud
Jörg Schmiedle
Daniel Stoffler
Klaus Weymann
Steering Committee
Alexander Alanine
Margret Assfalg
Ralph Haffner
Harald Mauser
Martin Stahl
Accelrys
François Culot
Jonas Danielsson
James Jack
Georgios Rafeletos
NextMove Software
Roger Sayle