SlideShare une entreprise Scribd logo
1  sur  19
Télécharger pour lire hors ligne
Peptide Informatics
Bridging the gap between small-molecule and large-
molecule systems
Lisa Sach-Peltason
Data Science, pRED Informatics, Roche Basel
Peptide Therapeutics – An Emerging Modality
US FDA approved drugs (2009-2011)
Small molecule
34
Protein
9
Monocl. antibody
8
Peptide
8
Natural product
6
Amino acid
5
Steroid
2
Nucleoside
1 Enzyme
1
Macrocycle
1 Other
1
Adapted from Albericio & Kruger; Future Med. Chem. (2012), 4(12), 1527-1531.
Peptide Therapeutics – An Emerging Modality
Saladin et al.; IDrugs (2009), 12(12), 779-784.
Therapeutic categories of peptide candidates
entering clinical trials (1980-2007)
Peptide Therapeutics – Opportunities
Selectivity Generation
Intracellular
access
Delivery Action
Oral
delivery
Small
molecules
Low to
high
synthetic High all routes Antago./
Agonist
Yes
Peptides High
synthetic or
recombinant
Possible
i.v. / s.c.
non-parenteral
delivery feasible
Agonist /
Antagonist
Potential
Biologics High recombinant Low i.v. / s.c. Antago./
Agonist
No
Proven Advantages of Peptides
• Efficacy at extracellular targets, especially for polar or shallow binding pockets
• Rapid optimization
• Low off-target pharmacology
• High target selectivity
*
*reflects current status; future potential
for peptide antagonists, e.g., PPI’s
Peptides at Roche
Growing asset of internal and external peptide compounds
• Global Roche compound DB: >25,000 compounds registered with PEPTIDE flag (of 3.9M
total)
• Increasing demand for informatics infrastructure and support for peptide projects
Combination Chart
Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
993
920
850
780
710
640
567
496
426
355
284
213
140
70
0
26200
25400
24600
23800
23000
22200
21400
20600
19800
19000
18200
17400
16600
15800
15000
Newregistrations
Peptides in IRCI 2003-2013
Totalno.peptides
Peptide Therapeutics – Informatics Challenges
Molecule graphs Sequences
Cheminformatics Bioinformatics
Similarity searching
SAR analysis, visualization
Property prediction
Small-molecule registration
Sequence searching
Alignment
Sequence analysis
Size, complexity
Non-standard residues
Chemical modifications
No format standards
Peptide informatics
Figure adapted from J.H.Jensen, ChemAxon European UGM, 2012
Data Capture Challenges
Peptide sequence format
IUPAC-IUB Nomenclature and Symbolism for Amino Acids and Peptides
(“3AA”, 1983)
• 3-letter code for standard and common non-standard amino acids
• Symbolism for representing amino acid sequences
H -Asp-Arg-Val-DTyr-Ile-His-Pro-Phe-OH
Ac - -NH2
Boc- - H
… …
Separator /
Peptide bond
N-terminal
specification
Residue
C-terminal
specificationStereoconfiguration
Data Capture Challenges
How to capture non-standard sequence elements?
Residue symbols
Modified amino acids
OH
NH2
O L-Norvaline Nva (discouraged by IUPAC but commonly used)
L-2-Aminovaleric acid? Avl (IUPAC)
L-2-Aminopentanoic acid? Ape (IUPAC)
O
O
OH
NH2
L-4-Benzoylphenylalanine 4Bpa
Phe(4-Bz) (systematic; avoid combinatorial
explosion)
Data Capture Challenges
How to capture non-standard sequence elements?
Cyclic peptides
Cross-links (disulfide bridges within or across chains, isopeptide bonds, …)
O
O
O
O
O
O
O
O
N
H
O
N
NH
NH
NH2
NH
N
H
O
N
NH
NH
NH2
NH
(IUPAC recommendation,
depiction rather than text)
cyclo[Leu-DPhe-Pro-Val-Orn-Leu-DPhe-Pro-Val-Orn]
H-Cys(1)-Tyr-Ile-Gln-Asn-Cys(1)-Pro-Leu-Gly-NH2
(IUPAC)
SMILES-like notation; see
also Biochemfusion’s PLN
Peptide Data Inventory
Digest Roche peptides with NextMove’s Sugar&Splice
26000
24000
22000
20000
18000
16000
14000
12000
10000
8000
6000
4000
2000
0
Top 50 monomer frequencies of 23k Roche peptides
Standard AA (without Gly and Pro): 93%
Top 50 monomers: 98%
Peptide Data Inventory
Monomer library
Roche Peptide Building Blocks
• ~200 manually curated templates
• Up to 600 monomers extracted from
Roche peptides
• Direct cartridge with normalization
& uniqueness check
Structure ID Short
Name
Chemical
Name
Category CAS Roche
Number
Ala A L-Alanine L-AA 56-41-7 ROxyz
Fmoc Fmoc 9-
Fluorenylmeth
oxy-carbonyl
SAG
Sequence registrationPeptide drawing
Peptide Sequence Information
Harmonizing peptide registration
LINEAR STRUCTURE DESCRIPTION field
Draw structure from local
monomer templates
H-His-Asp-Glu-Phe-Glu-Arg-His-
Ala-Glu-Gly- ... -OH
Enter sequence manually
No format standards or validation
PEPTIDE comment
Compound registration
system
Peptide Sequence Information
Harmonizing peptide registration
Synchronize drawing
templates with monomer library
Automatic sequence generation &
validation
Consistent
structure and
sequence
information
Atoms and bonds
• Chemical identification
• Novelty check
• (Sub-)Structure
searches
Sequence
• Depiction
• Visual comparison
• Sequence
searches
Tools for data analysis
Building
block library H-His-Asp-Glu-Phe-Glu-Arg-His-
Ala-Glu-Gly- ... -OH
LINEAR STRUCTURE DESCRIPTION fieldPEPTIDE comment
Compound registration
system
Peptide Drawing
Central template management in Accelrys Draw
Roche Peptide Building Blocks
• ~200 manually curated templates
• Categories: L-AA, D-AA, nS-AA,
Linkers, Attachments, Resins
Accelrys Draw Add-In
• Download templates to Draw
• Regular check for updates
• Register new templates via
Sequence Template Manager
• Validate new templates
Peptide Sequence Information
Sequence generation with NextMove’s Sugar&Splice
Computational perception of peptide sequence from chemical structure
• Output of sequence in standard format
• Lookup of non-standard names in building block library
Pipeline Pilot wrapper with easy-to-use web interface for registration
Maintenance procedure for batch registration and validation
• Check for peptides with empty/outdated sequences and update
• Process legacy peptides and complete sequence information
O
O
O
O
O
O
O
O
N
H
O
N
NH
NH
NH2
NH
N
H
O
N
NH
NH
NH2
NH
cyclo[Leu-DPhe-Pro-Val-Orn-Leu-DPhe-Pro-Val-Orn]
Building
block library
Sugar & Splice
Peptide Sequence Information
Interface to biologics landscape
Sequence-based analysis tools
• Sequence alignment, BLAST database search, …
• Conversion to standard FASTA via Sugar & Splice:
– Remove cycles and cross-links
– Replace non-standard residues by X or the closest natural analog
– Convert D-amino acids to L form
Data exchange with biologics research
• HELM format for macromolecule representation
• Shared dictionary for peptide building blocks
• Conversion to HELM via Sugar & Splice
cyclo[Leu-DPhe-Pro-Val-Orn-Leu-DPhe-Pro-Val-Orn]
PEPTIDE1{L.[dF].P.V.[Orn].L.[dF].P.V.[Orn]}$PEPTIDE1,PEPTIDE1,10:R2-1:R1$$$
LFPVXLFPVX
Summary & Benefits
Re-use and adapt small-molecule tools and systems
Ensure consistent structure and sequence information
Interface to large-molecule world
Benefits
• Maximized data value & quality through harmonized sequence information
• Enable automated sequence searches & analysis for synthetic peptides
• Time savings for peptide drawing, registration and analysis
• Future prospect: store sequence information within the molecular structure
Compound registration
system
H-His-Asp-Glu-Phe-Glu-Arg-His-
Ala-Glu-Gly- ... -OH
Acknowledgments
Discovery Chemistry
Konrad Bleicher
Eric Kitas
Kersten Klar
Betty Hennequin
Katja Ostmann
Adrian Schäublin
Patrick Studer-Schriber
pRED Informatics
Fausto Agnetti
Gerd Blanke
Gunther Dörnen
Sébastien Fournier
Werner Gotzeina
Peter Hilty
Ralf Horstmöller
Dieter Imark
Frederic Klein
Stefan Klostermann
Francesca Milletti
Denis Ribaud
Jörg Schmiedle
Daniel Stoffler
Klaus Weymann
Steering Committee
Alexander Alanine
Margret Assfalg
Ralph Haffner
Harald Mauser
Martin Stahl
Accelrys
François Culot
Jonas Danielsson
James Jack
Georgios Rafeletos
NextMove Software
Roger Sayle
Doing now what patients need next

Contenu connexe

En vedette

Using Matched Series to decide what compound to make next
Using Matched Series to decide what compound to make nextUsing Matched Series to decide what compound to make next
Using Matched Series to decide what compound to make nextNextMove Software
 
Representation and display of non-standard peptides using semi-systematic ami...
Representation and display of non-standard peptides using semi-systematic ami...Representation and display of non-standard peptides using semi-systematic ami...
Representation and display of non-standard peptides using semi-systematic ami...NextMove Software
 
Chemistry and reactions from non-US patents
Chemistry and reactions from non-US patentsChemistry and reactions from non-US patents
Chemistry and reactions from non-US patentsNextMove Software
 
Carbon-14 labelled ADCs and Peptides by Sean L Kitson
Carbon-14 labelled ADCs and Peptides by Sean L KitsonCarbon-14 labelled ADCs and Peptides by Sean L Kitson
Carbon-14 labelled ADCs and Peptides by Sean L Kitsonseankitson
 
Solution And Solid Phase Synthesis Publication
Solution  And Solid Phase Synthesis PublicationSolution  And Solid Phase Synthesis Publication
Solution And Solid Phase Synthesis Publicationadotse
 
tboc fmoc protocol in solid phase peptide synthesis
tboc fmoc protocol in solid phase peptide synthesistboc fmoc protocol in solid phase peptide synthesis
tboc fmoc protocol in solid phase peptide synthesisSANTOSH KUMAR SAHOO
 
Marketing of Proteins and Peptide Pharmaceuticals
Marketing of Proteins and Peptide PharmaceuticalsMarketing of Proteins and Peptide Pharmaceuticals
Marketing of Proteins and Peptide Pharmaceuticalsguest6c594976
 
Using Matched Molecular Series as a Predictive Tool To Optimize Biological Ac...
Using Matched Molecular Series as a Predictive Tool To Optimize Biological Ac...Using Matched Molecular Series as a Predictive Tool To Optimize Biological Ac...
Using Matched Molecular Series as a Predictive Tool To Optimize Biological Ac...NextMove Software
 
Peptide line notations for biologics registration and patent filings
Peptide line notations for biologics registration and patent filingsPeptide line notations for biologics registration and patent filings
Peptide line notations for biologics registration and patent filingsNextMove Software
 
Standardized Representations of ELN Reactions for Categorization and Duplicat...
Standardized Representations of ELN Reactions for Categorization and Duplicat...Standardized Representations of ELN Reactions for Categorization and Duplicat...
Standardized Representations of ELN Reactions for Categorization and Duplicat...NextMove Software
 
Evidence-based medicinal chemistry using matched molecular series
Evidence-based medicinal chemistry using matched molecular seriesEvidence-based medicinal chemistry using matched molecular series
Evidence-based medicinal chemistry using matched molecular seriesNextMove Software
 
T boc fmoc protocols in peptide synthesis
T boc fmoc protocols in peptide synthesisT boc fmoc protocols in peptide synthesis
T boc fmoc protocols in peptide synthesisSANTOSH KUMAR SAHOO
 
Chemistry of amino acids&proteins
Chemistry of amino acids&proteinsChemistry of amino acids&proteins
Chemistry of amino acids&proteinsDr.Amr Abouzied
 
Revising the Topliss Decision Tree
Revising the Topliss Decision TreeRevising the Topliss Decision Tree
Revising the Topliss Decision TreeNextMove Software
 
Chapter 3(part1) - Amino acids, peptides, and proteins
Chapter 3(part1) - Amino acids, peptides, and proteinsChapter 3(part1) - Amino acids, peptides, and proteins
Chapter 3(part1) - Amino acids, peptides, and proteinsAmmedicine Medicine
 
Pharmacology of Peptides and Proteins
Pharmacology of Peptides and ProteinsPharmacology of Peptides and Proteins
Pharmacology of Peptides and ProteinsRohan Kolla
 

En vedette (20)

Using Matched Series to decide what compound to make next
Using Matched Series to decide what compound to make nextUsing Matched Series to decide what compound to make next
Using Matched Series to decide what compound to make next
 
Representation and display of non-standard peptides using semi-systematic ami...
Representation and display of non-standard peptides using semi-systematic ami...Representation and display of non-standard peptides using semi-systematic ami...
Representation and display of non-standard peptides using semi-systematic ami...
 
Chemistry and reactions from non-US patents
Chemistry and reactions from non-US patentsChemistry and reactions from non-US patents
Chemistry and reactions from non-US patents
 
InChI for Large Molecules
InChI for Large MoleculesInChI for Large Molecules
InChI for Large Molecules
 
Ifpma - Points to consider - biotherapeutics vs small molecule medicines - Wo...
Ifpma - Points to consider - biotherapeutics vs small molecule medicines - Wo...Ifpma - Points to consider - biotherapeutics vs small molecule medicines - Wo...
Ifpma - Points to consider - biotherapeutics vs small molecule medicines - Wo...
 
Peptide 2
Peptide 2Peptide 2
Peptide 2
 
Carbon-14 labelled ADCs and Peptides by Sean L Kitson
Carbon-14 labelled ADCs and Peptides by Sean L KitsonCarbon-14 labelled ADCs and Peptides by Sean L Kitson
Carbon-14 labelled ADCs and Peptides by Sean L Kitson
 
amino acids and metabolism
amino acids and metabolismamino acids and metabolism
amino acids and metabolism
 
Solution And Solid Phase Synthesis Publication
Solution  And Solid Phase Synthesis PublicationSolution  And Solid Phase Synthesis Publication
Solution And Solid Phase Synthesis Publication
 
tboc fmoc protocol in solid phase peptide synthesis
tboc fmoc protocol in solid phase peptide synthesistboc fmoc protocol in solid phase peptide synthesis
tboc fmoc protocol in solid phase peptide synthesis
 
Marketing of Proteins and Peptide Pharmaceuticals
Marketing of Proteins and Peptide PharmaceuticalsMarketing of Proteins and Peptide Pharmaceuticals
Marketing of Proteins and Peptide Pharmaceuticals
 
Using Matched Molecular Series as a Predictive Tool To Optimize Biological Ac...
Using Matched Molecular Series as a Predictive Tool To Optimize Biological Ac...Using Matched Molecular Series as a Predictive Tool To Optimize Biological Ac...
Using Matched Molecular Series as a Predictive Tool To Optimize Biological Ac...
 
Peptide line notations for biologics registration and patent filings
Peptide line notations for biologics registration and patent filingsPeptide line notations for biologics registration and patent filings
Peptide line notations for biologics registration and patent filings
 
Standardized Representations of ELN Reactions for Categorization and Duplicat...
Standardized Representations of ELN Reactions for Categorization and Duplicat...Standardized Representations of ELN Reactions for Categorization and Duplicat...
Standardized Representations of ELN Reactions for Categorization and Duplicat...
 
Evidence-based medicinal chemistry using matched molecular series
Evidence-based medicinal chemistry using matched molecular seriesEvidence-based medicinal chemistry using matched molecular series
Evidence-based medicinal chemistry using matched molecular series
 
T boc fmoc protocols in peptide synthesis
T boc fmoc protocols in peptide synthesisT boc fmoc protocols in peptide synthesis
T boc fmoc protocols in peptide synthesis
 
Chemistry of amino acids&proteins
Chemistry of amino acids&proteinsChemistry of amino acids&proteins
Chemistry of amino acids&proteins
 
Revising the Topliss Decision Tree
Revising the Topliss Decision TreeRevising the Topliss Decision Tree
Revising the Topliss Decision Tree
 
Chapter 3(part1) - Amino acids, peptides, and proteins
Chapter 3(part1) - Amino acids, peptides, and proteinsChapter 3(part1) - Amino acids, peptides, and proteins
Chapter 3(part1) - Amino acids, peptides, and proteins
 
Pharmacology of Peptides and Proteins
Pharmacology of Peptides and ProteinsPharmacology of Peptides and Proteins
Pharmacology of Peptides and Proteins
 

Similaire à Peptide Informatics - Bridging the gap between small-molecule and large-molecule systems

PubChem as a Biologics Database
PubChem as a Biologics DatabasePubChem as a Biologics Database
PubChem as a Biologics DatabaseNextMove Software
 
Peptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdbPeptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdbChris Southan
 
Peptide Tribulations
Peptide TribulationsPeptide Tribulations
Peptide TribulationsChris Southan
 
Host Cell Protein Analysis by Mass Spectrometry | KBI Biopharma
Host Cell Protein Analysis by Mass Spectrometry | KBI BiopharmaHost Cell Protein Analysis by Mass Spectrometry | KBI Biopharma
Host Cell Protein Analysis by Mass Spectrometry | KBI BiopharmaKBI Biopharma
 
Proteins – Basics you need to know for Proteomics
Proteins – Basics you need to know for ProteomicsProteins – Basics you need to know for Proteomics
Proteins – Basics you need to know for ProteomicsLionel Wolberger
 
Mapping millions of peptidoforms to Genome Coordinates
Mapping millions of peptidoforms to Genome CoordinatesMapping millions of peptidoforms to Genome Coordinates
Mapping millions of peptidoforms to Genome CoordinatesYasset Perez-Riverol
 
SureChEMBL and Open PHACTS
SureChEMBL and Open PHACTSSureChEMBL and Open PHACTS
SureChEMBL and Open PHACTSGeorge Papadatos
 
OPSIN: Taming the Jungle of IUPAC Chemical Nomenclature
OPSIN: Taming the Jungle of IUPAC Chemical NomenclatureOPSIN: Taming the Jungle of IUPAC Chemical Nomenclature
OPSIN: Taming the Jungle of IUPAC Chemical Nomenclaturedan2097
 
Best practices and challenges for robust Quantitative proteomics of DMEs
Best practices and challenges for robust Quantitative proteomics of DMEsBest practices and challenges for robust Quantitative proteomics of DMEs
Best practices and challenges for robust Quantitative proteomics of DMEsDeepak Kumar Bhatt
 
Session 1 part 2
Session 1 part 2Session 1 part 2
Session 1 part 2plmiami
 
Poster: Functional analysis of essential hypothetical proteins of Staphylococ...
Poster: Functional analysis of essential hypothetical proteins of Staphylococ...Poster: Functional analysis of essential hypothetical proteins of Staphylococ...
Poster: Functional analysis of essential hypothetical proteins of Staphylococ...Pranavathiyani G
 
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekingeProf. Wim Van Criekinge
 
Bioinformatics t7-proteinstructure v2014
Bioinformatics t7-proteinstructure v2014Bioinformatics t7-proteinstructure v2014
Bioinformatics t7-proteinstructure v2014Prof. Wim Van Criekinge
 
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekingeProf. Wim Van Criekinge
 
Protein Engineering Strategies
Protein Engineering StrategiesProtein Engineering Strategies
Protein Engineering StrategiesSOURIKDEY1
 

Similaire à Peptide Informatics - Bridging the gap between small-molecule and large-molecule systems (20)

PubChem as a Biologics Database
PubChem as a Biologics DatabasePubChem as a Biologics Database
PubChem as a Biologics Database
 
Peptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdbPeptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdb
 
Peptide Tribulations
Peptide TribulationsPeptide Tribulations
Peptide Tribulations
 
Host Cell Protein Analysis by Mass Spectrometry | KBI Biopharma
Host Cell Protein Analysis by Mass Spectrometry | KBI BiopharmaHost Cell Protein Analysis by Mass Spectrometry | KBI Biopharma
Host Cell Protein Analysis by Mass Spectrometry | KBI Biopharma
 
Proteins – Basics you need to know for Proteomics
Proteins – Basics you need to know for ProteomicsProteins – Basics you need to know for Proteomics
Proteins – Basics you need to know for Proteomics
 
Mapping millions of peptidoforms to Genome Coordinates
Mapping millions of peptidoforms to Genome CoordinatesMapping millions of peptidoforms to Genome Coordinates
Mapping millions of peptidoforms to Genome Coordinates
 
Biopra activities
Biopra activitiesBiopra activities
Biopra activities
 
Overview of SureChEMBL
Overview of SureChEMBLOverview of SureChEMBL
Overview of SureChEMBL
 
SureChEMBL and Open PHACTS
SureChEMBL and Open PHACTSSureChEMBL and Open PHACTS
SureChEMBL and Open PHACTS
 
Peptide mapping
Peptide mappingPeptide mapping
Peptide mapping
 
OPSIN: Taming the Jungle of IUPAC Chemical Nomenclature
OPSIN: Taming the Jungle of IUPAC Chemical NomenclatureOPSIN: Taming the Jungle of IUPAC Chemical Nomenclature
OPSIN: Taming the Jungle of IUPAC Chemical Nomenclature
 
Best practices and challenges for robust Quantitative proteomics of DMEs
Best practices and challenges for robust Quantitative proteomics of DMEsBest practices and challenges for robust Quantitative proteomics of DMEs
Best practices and challenges for robust Quantitative proteomics of DMEs
 
Session 1 part 2
Session 1 part 2Session 1 part 2
Session 1 part 2
 
Rosa_PhD
Rosa_PhDRosa_PhD
Rosa_PhD
 
Poster: Functional analysis of essential hypothetical proteins of Staphylococ...
Poster: Functional analysis of essential hypothetical proteins of Staphylococ...Poster: Functional analysis of essential hypothetical proteins of Staphylococ...
Poster: Functional analysis of essential hypothetical proteins of Staphylococ...
 
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
2016 bioinformatics i_bio_cheminformatics_wimvancriekinge
 
Bioinformatics t7-proteinstructure v2014
Bioinformatics t7-proteinstructure v2014Bioinformatics t7-proteinstructure v2014
Bioinformatics t7-proteinstructure v2014
 
Brian D Strahl, Faculty Director of the UNC Peptide Synthesis Core Facility
Brian D Strahl, Faculty Director of the UNC Peptide Synthesis Core FacilityBrian D Strahl, Faculty Director of the UNC Peptide Synthesis Core Facility
Brian D Strahl, Faculty Director of the UNC Peptide Synthesis Core Facility
 
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
 
Protein Engineering Strategies
Protein Engineering StrategiesProtein Engineering Strategies
Protein Engineering Strategies
 

Plus de NextMove Software

CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...NextMove Software
 
CINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speedCINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speedNextMove Software
 
A de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESA de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESNextMove Software
 
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionRecent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionNextMove Software
 
Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...NextMove Software
 
Comparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsComparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsNextMove Software
 
Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...NextMove Software
 
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...NextMove Software
 
Recent improvements to the RDKit
Recent improvements to the RDKitRecent improvements to the RDKit
Recent improvements to the RDKitNextMove Software
 
Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...NextMove Software
 
Digital Chemical Representations
Digital Chemical RepresentationsDigital Chemical Representations
Digital Chemical RepresentationsNextMove Software
 
Challenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptionsChallenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptionsNextMove Software
 
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...NextMove Software
 
CINF 13: Pistachio - Search and Faceting of Large Reaction Databases
CINF 13: Pistachio - Search and Faceting of Large Reaction DatabasesCINF 13: Pistachio - Search and Faceting of Large Reaction Databases
CINF 13: Pistachio - Search and Faceting of Large Reaction DatabasesNextMove Software
 
Building on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesBuilding on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesNextMove Software
 
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...NextMove Software
 
Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)NextMove Software
 
Challenges in Chemical Information Exchange
Challenges in Chemical Information ExchangeChallenges in Chemical Information Exchange
Challenges in Chemical Information ExchangeNextMove Software
 
Automatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patentsAutomatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patentsNextMove Software
 

Plus de NextMove Software (20)

DeepSMILES
DeepSMILESDeepSMILES
DeepSMILES
 
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
 
CINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speedCINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speed
 
A de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESA de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILES
 
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionRecent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
 
Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...
 
Comparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsComparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule Implementations
 
Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...
 
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
 
Recent improvements to the RDKit
Recent improvements to the RDKitRecent improvements to the RDKit
Recent improvements to the RDKit
 
Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...
 
Digital Chemical Representations
Digital Chemical RepresentationsDigital Chemical Representations
Digital Chemical Representations
 
Challenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptionsChallenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptions
 
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
 
CINF 13: Pistachio - Search and Faceting of Large Reaction Databases
CINF 13: Pistachio - Search and Faceting of Large Reaction DatabasesCINF 13: Pistachio - Search and Faceting of Large Reaction Databases
CINF 13: Pistachio - Search and Faceting of Large Reaction Databases
 
Building on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesBuilding on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfiles
 
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
 
Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)
 
Challenges in Chemical Information Exchange
Challenges in Chemical Information ExchangeChallenges in Chemical Information Exchange
Challenges in Chemical Information Exchange
 
Automatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patentsAutomatic extraction of bioactivity data from patents
Automatic extraction of bioactivity data from patents
 

Dernier

Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptJoemSTuliba
 
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxGood agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxSimeonChristian
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 

Dernier (20)

Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.ppt
 
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxGood agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 

Peptide Informatics - Bridging the gap between small-molecule and large-molecule systems

  • 1. Peptide Informatics Bridging the gap between small-molecule and large- molecule systems Lisa Sach-Peltason Data Science, pRED Informatics, Roche Basel
  • 2. Peptide Therapeutics – An Emerging Modality US FDA approved drugs (2009-2011) Small molecule 34 Protein 9 Monocl. antibody 8 Peptide 8 Natural product 6 Amino acid 5 Steroid 2 Nucleoside 1 Enzyme 1 Macrocycle 1 Other 1 Adapted from Albericio & Kruger; Future Med. Chem. (2012), 4(12), 1527-1531.
  • 3. Peptide Therapeutics – An Emerging Modality Saladin et al.; IDrugs (2009), 12(12), 779-784. Therapeutic categories of peptide candidates entering clinical trials (1980-2007)
  • 4. Peptide Therapeutics – Opportunities Selectivity Generation Intracellular access Delivery Action Oral delivery Small molecules Low to high synthetic High all routes Antago./ Agonist Yes Peptides High synthetic or recombinant Possible i.v. / s.c. non-parenteral delivery feasible Agonist / Antagonist Potential Biologics High recombinant Low i.v. / s.c. Antago./ Agonist No Proven Advantages of Peptides • Efficacy at extracellular targets, especially for polar or shallow binding pockets • Rapid optimization • Low off-target pharmacology • High target selectivity * *reflects current status; future potential for peptide antagonists, e.g., PPI’s
  • 5. Peptides at Roche Growing asset of internal and external peptide compounds • Global Roche compound DB: >25,000 compounds registered with PEPTIDE flag (of 3.9M total) • Increasing demand for informatics infrastructure and support for peptide projects Combination Chart Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 Q1 Q3 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 993 920 850 780 710 640 567 496 426 355 284 213 140 70 0 26200 25400 24600 23800 23000 22200 21400 20600 19800 19000 18200 17400 16600 15800 15000 Newregistrations Peptides in IRCI 2003-2013 Totalno.peptides
  • 6. Peptide Therapeutics – Informatics Challenges Molecule graphs Sequences Cheminformatics Bioinformatics Similarity searching SAR analysis, visualization Property prediction Small-molecule registration Sequence searching Alignment Sequence analysis Size, complexity Non-standard residues Chemical modifications No format standards Peptide informatics Figure adapted from J.H.Jensen, ChemAxon European UGM, 2012
  • 7. Data Capture Challenges Peptide sequence format IUPAC-IUB Nomenclature and Symbolism for Amino Acids and Peptides (“3AA”, 1983) • 3-letter code for standard and common non-standard amino acids • Symbolism for representing amino acid sequences H -Asp-Arg-Val-DTyr-Ile-His-Pro-Phe-OH Ac - -NH2 Boc- - H … … Separator / Peptide bond N-terminal specification Residue C-terminal specificationStereoconfiguration
  • 8. Data Capture Challenges How to capture non-standard sequence elements? Residue symbols Modified amino acids OH NH2 O L-Norvaline Nva (discouraged by IUPAC but commonly used) L-2-Aminovaleric acid? Avl (IUPAC) L-2-Aminopentanoic acid? Ape (IUPAC) O O OH NH2 L-4-Benzoylphenylalanine 4Bpa Phe(4-Bz) (systematic; avoid combinatorial explosion)
  • 9. Data Capture Challenges How to capture non-standard sequence elements? Cyclic peptides Cross-links (disulfide bridges within or across chains, isopeptide bonds, …) O O O O O O O O N H O N NH NH NH2 NH N H O N NH NH NH2 NH (IUPAC recommendation, depiction rather than text) cyclo[Leu-DPhe-Pro-Val-Orn-Leu-DPhe-Pro-Val-Orn] H-Cys(1)-Tyr-Ile-Gln-Asn-Cys(1)-Pro-Leu-Gly-NH2 (IUPAC) SMILES-like notation; see also Biochemfusion’s PLN
  • 10. Peptide Data Inventory Digest Roche peptides with NextMove’s Sugar&Splice 26000 24000 22000 20000 18000 16000 14000 12000 10000 8000 6000 4000 2000 0 Top 50 monomer frequencies of 23k Roche peptides Standard AA (without Gly and Pro): 93% Top 50 monomers: 98%
  • 11. Peptide Data Inventory Monomer library Roche Peptide Building Blocks • ~200 manually curated templates • Up to 600 monomers extracted from Roche peptides • Direct cartridge with normalization & uniqueness check Structure ID Short Name Chemical Name Category CAS Roche Number Ala A L-Alanine L-AA 56-41-7 ROxyz Fmoc Fmoc 9- Fluorenylmeth oxy-carbonyl SAG Sequence registrationPeptide drawing
  • 12. Peptide Sequence Information Harmonizing peptide registration LINEAR STRUCTURE DESCRIPTION field Draw structure from local monomer templates H-His-Asp-Glu-Phe-Glu-Arg-His- Ala-Glu-Gly- ... -OH Enter sequence manually No format standards or validation PEPTIDE comment Compound registration system
  • 13. Peptide Sequence Information Harmonizing peptide registration Synchronize drawing templates with monomer library Automatic sequence generation & validation Consistent structure and sequence information Atoms and bonds • Chemical identification • Novelty check • (Sub-)Structure searches Sequence • Depiction • Visual comparison • Sequence searches Tools for data analysis Building block library H-His-Asp-Glu-Phe-Glu-Arg-His- Ala-Glu-Gly- ... -OH LINEAR STRUCTURE DESCRIPTION fieldPEPTIDE comment Compound registration system
  • 14. Peptide Drawing Central template management in Accelrys Draw Roche Peptide Building Blocks • ~200 manually curated templates • Categories: L-AA, D-AA, nS-AA, Linkers, Attachments, Resins Accelrys Draw Add-In • Download templates to Draw • Regular check for updates • Register new templates via Sequence Template Manager • Validate new templates
  • 15. Peptide Sequence Information Sequence generation with NextMove’s Sugar&Splice Computational perception of peptide sequence from chemical structure • Output of sequence in standard format • Lookup of non-standard names in building block library Pipeline Pilot wrapper with easy-to-use web interface for registration Maintenance procedure for batch registration and validation • Check for peptides with empty/outdated sequences and update • Process legacy peptides and complete sequence information O O O O O O O O N H O N NH NH NH2 NH N H O N NH NH NH2 NH cyclo[Leu-DPhe-Pro-Val-Orn-Leu-DPhe-Pro-Val-Orn] Building block library Sugar & Splice
  • 16. Peptide Sequence Information Interface to biologics landscape Sequence-based analysis tools • Sequence alignment, BLAST database search, … • Conversion to standard FASTA via Sugar & Splice: – Remove cycles and cross-links – Replace non-standard residues by X or the closest natural analog – Convert D-amino acids to L form Data exchange with biologics research • HELM format for macromolecule representation • Shared dictionary for peptide building blocks • Conversion to HELM via Sugar & Splice cyclo[Leu-DPhe-Pro-Val-Orn-Leu-DPhe-Pro-Val-Orn] PEPTIDE1{L.[dF].P.V.[Orn].L.[dF].P.V.[Orn]}$PEPTIDE1,PEPTIDE1,10:R2-1:R1$$$ LFPVXLFPVX
  • 17. Summary & Benefits Re-use and adapt small-molecule tools and systems Ensure consistent structure and sequence information Interface to large-molecule world Benefits • Maximized data value & quality through harmonized sequence information • Enable automated sequence searches & analysis for synthetic peptides • Time savings for peptide drawing, registration and analysis • Future prospect: store sequence information within the molecular structure Compound registration system H-His-Asp-Glu-Phe-Glu-Arg-His- Ala-Glu-Gly- ... -OH
  • 18. Acknowledgments Discovery Chemistry Konrad Bleicher Eric Kitas Kersten Klar Betty Hennequin Katja Ostmann Adrian Schäublin Patrick Studer-Schriber pRED Informatics Fausto Agnetti Gerd Blanke Gunther Dörnen Sébastien Fournier Werner Gotzeina Peter Hilty Ralf Horstmöller Dieter Imark Frederic Klein Stefan Klostermann Francesca Milletti Denis Ribaud Jörg Schmiedle Daniel Stoffler Klaus Weymann Steering Committee Alexander Alanine Margret Assfalg Ralph Haffner Harald Mauser Martin Stahl Accelrys François Culot Jonas Danielsson James Jack Georgios Rafeletos NextMove Software Roger Sayle
  • 19. Doing now what patients need next