SlideShare une entreprise Scribd logo
1  sur  21
Trials and tribulations of curating and
searching bioactive peptides in
databases
Christopher Southan
University of Copenhagen, Feb 2020
Host: David Gloriam
1
Abstract
The theme will be presented from the perspective of both past
involvement in peptide curation in the Guide to Pharmacology
(GtoPdb) and in current searching for bioactive peptides in the wider
ecosystem that includes ChEMBL and PubChem. The core problem
is that peptides hang in limbo land between bioinformatics (BLAST)
and cheminformatics (Tanimoto) neither of which provide optimal
searching. Curating peptides in GtoPdb presents many challenges,
including mapping endogenous peptides to Swiss-Prot cleavage
annotations. For synthetic peptides, equivocal specification of
modifications and exact positions of radiolabels are also problematic
However, target-mapped citation-supported quantitative binding
parameters are curated where possible. For those peptides falling
below the PubChem CID SMILES limit of approximately 70 residues,
GtoPdb has been using Sugar and Splice from NextMove Software to
convert into CIDs. Specific problems associated with finding
bioactive peptides in databases will be outlined.
2
Outline
• Peptide tribulations
• Intoducing GtoPdb
• GtoPdb peptide content and stats
• PubChem peptidic pros and cons
• Getting more peptides > SMILES
3
Bad news: neither GtoPdb nor ChEMBL nor PubChem
seach-index their peptides
4
Tribulations with peptides
• Dificult to define structurally
• Endogenous peptide activities can be complex many-to-many systems
• Author specifications often insuficient for complete molecular definition
• Structural equivocalties slip through the editor/referee net
• Correct IUPAC peptide nomenclature use for modifications is rare
• Exact location of radiolable often not specified
• Absence of purity verification and/or in vivo stability against proteolytic clipping
• Noisy peptide name-to-structure (n2s) mappings
• SMILES only adequate for ~ 70 residues
• Image rendering not standardised
• Searching patents for peptide prior art more difficult than small-molecules
• Literature extraction > databases proportionally lower than small molecules
• Author database submissions for bioactive peptides non existant
• Species ”zoo” for venom peptides and their names
• Conjugates (e.g. peptide + linker + protein) even more difficult
• The PIR RESID Database of Protein Modifications is no longer maintained
5
GtoPdb > NCBI Entrez PubMed < > PubChem
6
Introducing the IUPHAR/BPS Guide to
PHARMACOLOGY (GtoPdb)
• IUPHAR = International Union of Basic and Clinical Pharmacology, BPS = British
Pharmacological Society
• Molecular mechanism of action (mmoa) mapping primary & secondary targets
• Release cycle time (with PubChem refreshes) ~ 2 months
• Seven NAR Annual Database issues, latest as PMID: 31691834 (2020)
• Every 2 years distilled into the BritishJournal of Pharmacology “Concise Guide
to PHARMACOLOGY” as a nine-paper series (see PMID 29055037) with outlinks
• Curates selected quality compounds for pharmacology research in silico, in
vitro, in cellulo, in vivo, in clinico
• An ELIXIR UK Node resource since 2016
7
8
Expert-curated, citation provenanced,
quantitative binding data
Document > assay > result > compound > location > protein target
D- A- R - C- L- P
Where “C” is not a small molecule, GtoP has ~ 2000 peptides included in
the ~ 9000 substances we submit to PubChem
Endogenous peptides (786)
9
http://www.guidetopharmacology.org/GRAC/LigandListForward?type=Endogenous-peptide&database=all
Non-endogenous peptides (1310)
10http://www.guidetopharmacology.org/GRAC/LigandListForward?type=Peptide&database=all
GtoPdb peptide stats (release 2019.4)
• Peptide ligands/all ligands = 22%.
• Ligands with quantitative binding data/all ligs = 75%
• Peptides with quantitative binding data/all peps = 63%
• CID quantitative binding data peptides/all CID peps = 89%
11
Endothelin-1 in GtoPdb (before the SMILES backfill)
12
GtoPdb Entrez linkage (after 2019 back-fill
13
The peptidic triple-whammy
14
Endothelin-1, CID 91928636, 1470 ”Similar Compounds” and top-100 BLAST hits
1. Too big to search or cluster by SMILES
2. Too small to BLAST cleanly (and sans PTMs)
3. Too many species splits for precursors
Swiss-Prot precursor annotation
15
• Evidence support for endogenous processing curated from the primary literature
• PTMs are indicated but text-only
• Very low Mass-spec verification of existence in vivo
• No standardised accession identifiers
• Difficult to query across (mixed feature keys)
• No secondary bioactivity annotation (e.g. from most of PubMed)
• No cross-pointers (e.g. to PubChem or RefSeq)
Will the real Endothelin please stand up?
16
• Submissions mixed between SMILES (CIDs) and sequence strings (SIDs)
• "endothelin 1"[CompleteSynonym] > 6 CIDs > 36 SIDs (10 SID-only)
• “MW 2491.9140 NOT endothelin 1“ > 16 CIDs > 23 SIDs (some unnamed)
• BioAssay spliting is problematic
Hierarchical Editing Language for Macromolecules (HELM)
17
GtoPdb push:
Peptides > S&S > SMILES > SIDs > CIDs
18
http://www.guidetopharmacology.org/GRAC/LigandDisplayForward?ligandId=3854
The Next Move move (Noel O'Boyle)
19
https://www.nextmovesoftware.com/talks/OBoyle_PubChemBiologics_ACS_201708.pdf
NextMove Biologics 8699 SIDs > 4969 CIDs
Low bioactivity annotation (e.g. 259 in ChEMBL
from 1.9 million CIDs, 36 in GtoPdb from 7674
Acknowledgments and info
21
• Past and present GtoPdb curators working on peptide entries
• The NextMove team for Sugar &Splice support and their peptide processing in PubChem
• Lin Yikai, M.Sc. project; ”Developing bio/cheminformatics methods for converting
bioactive peptide structures into machine-readable formats”
• Anna Gaulton for ChEMBL FASTA sequences
• Paul Thiessen for PubChem for peptide CIDs

Contenu connexe

Similaire à Peptide tribulations

Slicing and dicing expert-curated protein targets in the Guide to PHARMACOLGY
Slicing and dicing expert-curated protein targets in the Guide to PHARMACOLGYSlicing and dicing expert-curated protein targets in the Guide to PHARMACOLGY
Slicing and dicing expert-curated protein targets in the Guide to PHARMACOLGYChris Southan
 
Drug-to-protein mappings in the Guide to PHARMACOLOGY: Utility as a target va...
Drug-to-protein mappings in the Guide to PHARMACOLOGY: Utility as a target va...Drug-to-protein mappings in the Guide to PHARMACOLOGY: Utility as a target va...
Drug-to-protein mappings in the Guide to PHARMACOLOGY: Utility as a target va...Guide to PHARMACOLOGY
 
5HT2A modulators in GtoPdb and other databses
5HT2A modulators in GtoPdb and other databses5HT2A modulators in GtoPdb and other databses
5HT2A modulators in GtoPdb and other databsesChris Southan
 
Analysing targets and drugs to populate the GToP database
Analysing  targets and drugs to populate the GToP databaseAnalysing  targets and drugs to populate the GToP database
Analysing targets and drugs to populate the GToP databaseChris Southan
 
Analysing the drug targets in the human genome
Analysing the drug targets in the human genomeAnalysing the drug targets in the human genome
Analysing the drug targets in the human genomeGuide to PHARMACOLOGY
 
The IUPHAR/MMV Guide to Malaria Pharmacology
The  IUPHAR/MMV Guide to Malaria Pharmacology  The  IUPHAR/MMV Guide to Malaria Pharmacology
The IUPHAR/MMV Guide to Malaria Pharmacology Chris Southan
 
Correct drug structures for pharmacology
Correct drug structures for pharmacologyCorrect drug structures for pharmacology
Correct drug structures for pharmacologyChris Southan
 
Biologics information in PubChem
Biologics information in PubChemBiologics information in PubChem
Biologics information in PubChemJian Zhang
 
Evolving consensus-based curatorial strategies
Evolving consensus-based curatorial strategiesEvolving consensus-based curatorial strategies
Evolving consensus-based curatorial strategiesChris Southan
 
Curatorial data wrangling for the Guide to PHARMACOLGY
Curatorial data wrangling for the Guide to PHARMACOLGY Curatorial data wrangling for the Guide to PHARMACOLGY
Curatorial data wrangling for the Guide to PHARMACOLGY Chris Southan
 
SF and PE CTR-IN 2016 Poster_FInal
SF and PE CTR-IN 2016 Poster_FInalSF and PE CTR-IN 2016 Poster_FInal
SF and PE CTR-IN 2016 Poster_FInalSteve Flynn
 
FAIR connectivity for DARCP
FAIR  connectivity for DARCPFAIR  connectivity for DARCP
FAIR connectivity for DARCPChris Southan
 
Estimating bioactivity database error rates, tiikkainen
Estimating bioactivity database error rates, tiikkainenEstimating bioactivity database error rates, tiikkainen
Estimating bioactivity database error rates, tiikkainenPekka Tiikkainen
 
LifeZoneWellnessSNPIntro.pptx
LifeZoneWellnessSNPIntro.pptxLifeZoneWellnessSNPIntro.pptx
LifeZoneWellnessSNPIntro.pptxssuserebe2aa
 
Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014Prof. Wim Van Criekinge
 
The Application and Methods for Peptidomics
The Application and Methods for PeptidomicsThe Application and Methods for Peptidomics
The Application and Methods for PeptidomicsCreative Proteomics
 
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekingeProf. Wim Van Criekinge
 
Isolation of rhizobium species from soil and to
Isolation of rhizobium species from soil and toIsolation of rhizobium species from soil and to
Isolation of rhizobium species from soil and totusha madan
 

Similaire à Peptide tribulations (20)

Slicing and dicing expert-curated protein targets in the Guide to PHARMACOLGY
Slicing and dicing expert-curated protein targets in the Guide to PHARMACOLGYSlicing and dicing expert-curated protein targets in the Guide to PHARMACOLGY
Slicing and dicing expert-curated protein targets in the Guide to PHARMACOLGY
 
Drug-to-protein mappings in the Guide to PHARMACOLOGY: Utility as a target va...
Drug-to-protein mappings in the Guide to PHARMACOLOGY: Utility as a target va...Drug-to-protein mappings in the Guide to PHARMACOLOGY: Utility as a target va...
Drug-to-protein mappings in the Guide to PHARMACOLOGY: Utility as a target va...
 
GPCRs_HouseLA
GPCRs_HouseLAGPCRs_HouseLA
GPCRs_HouseLA
 
5HT2A modulators in GtoPdb and other databses
5HT2A modulators in GtoPdb and other databses5HT2A modulators in GtoPdb and other databses
5HT2A modulators in GtoPdb and other databses
 
Analysing targets and drugs to populate the GToP database
Analysing  targets and drugs to populate the GToP databaseAnalysing  targets and drugs to populate the GToP database
Analysing targets and drugs to populate the GToP database
 
GtoPdb_StatusReport_May2018_Core
GtoPdb_StatusReport_May2018_CoreGtoPdb_StatusReport_May2018_Core
GtoPdb_StatusReport_May2018_Core
 
Analysing the drug targets in the human genome
Analysing the drug targets in the human genomeAnalysing the drug targets in the human genome
Analysing the drug targets in the human genome
 
The IUPHAR/MMV Guide to Malaria Pharmacology
The  IUPHAR/MMV Guide to Malaria Pharmacology  The  IUPHAR/MMV Guide to Malaria Pharmacology
The IUPHAR/MMV Guide to Malaria Pharmacology
 
Correct drug structures for pharmacology
Correct drug structures for pharmacologyCorrect drug structures for pharmacology
Correct drug structures for pharmacology
 
Biologics information in PubChem
Biologics information in PubChemBiologics information in PubChem
Biologics information in PubChem
 
Evolving consensus-based curatorial strategies
Evolving consensus-based curatorial strategiesEvolving consensus-based curatorial strategies
Evolving consensus-based curatorial strategies
 
Curatorial data wrangling for the Guide to PHARMACOLGY
Curatorial data wrangling for the Guide to PHARMACOLGY Curatorial data wrangling for the Guide to PHARMACOLGY
Curatorial data wrangling for the Guide to PHARMACOLGY
 
SF and PE CTR-IN 2016 Poster_FInal
SF and PE CTR-IN 2016 Poster_FInalSF and PE CTR-IN 2016 Poster_FInal
SF and PE CTR-IN 2016 Poster_FInal
 
FAIR connectivity for DARCP
FAIR  connectivity for DARCPFAIR  connectivity for DARCP
FAIR connectivity for DARCP
 
Estimating bioactivity database error rates, tiikkainen
Estimating bioactivity database error rates, tiikkainenEstimating bioactivity database error rates, tiikkainen
Estimating bioactivity database error rates, tiikkainen
 
LifeZoneWellnessSNPIntro.pptx
LifeZoneWellnessSNPIntro.pptxLifeZoneWellnessSNPIntro.pptx
LifeZoneWellnessSNPIntro.pptx
 
Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014
 
The Application and Methods for Peptidomics
The Application and Methods for PeptidomicsThe Application and Methods for Peptidomics
The Application and Methods for Peptidomics
 
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge2015 bioinformatics bio_cheminformatics_wim_vancriekinge
2015 bioinformatics bio_cheminformatics_wim_vancriekinge
 
Isolation of rhizobium species from soil and to
Isolation of rhizobium species from soil and toIsolation of rhizobium species from soil and to
Isolation of rhizobium species from soil and to
 

Plus de Chris Southan

Connectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivityConnectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivityChris Southan
 
Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2 Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2 Chris Southan
 
Guide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updaeGuide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updaeChris Southan
 
In silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug DevelopmentIn silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug DevelopmentChris Southan
 
Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?Chris Southan
 
Desperately seeking DARCP
Desperately seeking DARCPDesperately seeking DARCP
Desperately seeking DARCPChris Southan
 
Seeking glimmers of light in Pharos “Tdark” proteins
Seeking glimmers of light in  Pharos “Tdark” proteinsSeeking glimmers of light in  Pharos “Tdark” proteins
Seeking glimmers of light in Pharos “Tdark” proteinsChris Southan
 
5HT2A modulators update for SAFER
5HT2A modulators update for SAFER5HT2A modulators update for SAFER
5HT2A modulators update for SAFERChris Southan
 
Quality and noise in big chemistry databases
Quality and noise in big chemistry databasesQuality and noise in big chemistry databases
Quality and noise in big chemistry databasesChris Southan
 
Connecting chemistry-to-biology
Connecting chemistry-to-biology Connecting chemistry-to-biology
Connecting chemistry-to-biology Chris Southan
 
GtoPdb June 2019 poster
GtoPdb June 2019 posterGtoPdb June 2019 poster
GtoPdb June 2019 posterChris Southan
 
PubChem as a source of systems biology perturbagens
PubChem as a source of  systems biology perturbagensPubChem as a source of  systems biology perturbagens
PubChem as a source of systems biology perturbagensChris Southan
 
PubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biologyPubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biologyChris Southan
 
Will the real proteins please stand up
Will the real proteins please stand upWill the real proteins please stand up
Will the real proteins please stand upChris Southan
 
Looking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIRLooking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIRChris Southan
 
Guide to Immunopharmacology update
Guide to Immunopharmacology updateGuide to Immunopharmacology update
Guide to Immunopharmacology updateChris Southan
 
Pub Med to PubChem Connectivity
Pub Med to PubChem ConnectivityPub Med to PubChem Connectivity
Pub Med to PubChem ConnectivityChris Southan
 
The big data join in pharmacology
The big data join in pharmacologyThe big data join in pharmacology
The big data join in pharmacologyChris Southan
 
Linking GtoP <> PubChem <> PubMed
Linking GtoP <> PubChem <> PubMed Linking GtoP <> PubChem <> PubMed
Linking GtoP <> PubChem <> PubMed Chris Southan
 

Plus de Chris Southan (20)

Connectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivityConnectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivity
 
Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2 Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2
 
Guide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updaeGuide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updae
 
In silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug DevelopmentIn silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug Development
 
Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?
 
Desperately seeking DARCP
Desperately seeking DARCPDesperately seeking DARCP
Desperately seeking DARCP
 
Seeking glimmers of light in Pharos “Tdark” proteins
Seeking glimmers of light in  Pharos “Tdark” proteinsSeeking glimmers of light in  Pharos “Tdark” proteins
Seeking glimmers of light in Pharos “Tdark” proteins
 
5HT2A modulators update for SAFER
5HT2A modulators update for SAFER5HT2A modulators update for SAFER
5HT2A modulators update for SAFER
 
Quality and noise in big chemistry databases
Quality and noise in big chemistry databasesQuality and noise in big chemistry databases
Quality and noise in big chemistry databases
 
Connecting chemistry-to-biology
Connecting chemistry-to-biology Connecting chemistry-to-biology
Connecting chemistry-to-biology
 
GtoPdb June 2019 poster
GtoPdb June 2019 posterGtoPdb June 2019 poster
GtoPdb June 2019 poster
 
PubChem as a source of systems biology perturbagens
PubChem as a source of  systems biology perturbagensPubChem as a source of  systems biology perturbagens
PubChem as a source of systems biology perturbagens
 
PubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biologyPubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biology
 
Will the real proteins please stand up
Will the real proteins please stand upWill the real proteins please stand up
Will the real proteins please stand up
 
Looking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIRLooking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIR
 
Guide to Immunopharmacology update
Guide to Immunopharmacology updateGuide to Immunopharmacology update
Guide to Immunopharmacology update
 
Patents in PubChem
Patents in PubChemPatents in PubChem
Patents in PubChem
 
Pub Med to PubChem Connectivity
Pub Med to PubChem ConnectivityPub Med to PubChem Connectivity
Pub Med to PubChem Connectivity
 
The big data join in pharmacology
The big data join in pharmacologyThe big data join in pharmacology
The big data join in pharmacology
 
Linking GtoP <> PubChem <> PubMed
Linking GtoP <> PubChem <> PubMed Linking GtoP <> PubChem <> PubMed
Linking GtoP <> PubChem <> PubMed
 

Dernier

Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 

Dernier (20)

Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 

Peptide tribulations

  • 1. Trials and tribulations of curating and searching bioactive peptides in databases Christopher Southan University of Copenhagen, Feb 2020 Host: David Gloriam 1
  • 2. Abstract The theme will be presented from the perspective of both past involvement in peptide curation in the Guide to Pharmacology (GtoPdb) and in current searching for bioactive peptides in the wider ecosystem that includes ChEMBL and PubChem. The core problem is that peptides hang in limbo land between bioinformatics (BLAST) and cheminformatics (Tanimoto) neither of which provide optimal searching. Curating peptides in GtoPdb presents many challenges, including mapping endogenous peptides to Swiss-Prot cleavage annotations. For synthetic peptides, equivocal specification of modifications and exact positions of radiolabels are also problematic However, target-mapped citation-supported quantitative binding parameters are curated where possible. For those peptides falling below the PubChem CID SMILES limit of approximately 70 residues, GtoPdb has been using Sugar and Splice from NextMove Software to convert into CIDs. Specific problems associated with finding bioactive peptides in databases will be outlined. 2
  • 3. Outline • Peptide tribulations • Intoducing GtoPdb • GtoPdb peptide content and stats • PubChem peptidic pros and cons • Getting more peptides > SMILES 3
  • 4. Bad news: neither GtoPdb nor ChEMBL nor PubChem seach-index their peptides 4
  • 5. Tribulations with peptides • Dificult to define structurally • Endogenous peptide activities can be complex many-to-many systems • Author specifications often insuficient for complete molecular definition • Structural equivocalties slip through the editor/referee net • Correct IUPAC peptide nomenclature use for modifications is rare • Exact location of radiolable often not specified • Absence of purity verification and/or in vivo stability against proteolytic clipping • Noisy peptide name-to-structure (n2s) mappings • SMILES only adequate for ~ 70 residues • Image rendering not standardised • Searching patents for peptide prior art more difficult than small-molecules • Literature extraction > databases proportionally lower than small molecules • Author database submissions for bioactive peptides non existant • Species ”zoo” for venom peptides and their names • Conjugates (e.g. peptide + linker + protein) even more difficult • The PIR RESID Database of Protein Modifications is no longer maintained 5
  • 6. GtoPdb > NCBI Entrez PubMed < > PubChem 6
  • 7. Introducing the IUPHAR/BPS Guide to PHARMACOLOGY (GtoPdb) • IUPHAR = International Union of Basic and Clinical Pharmacology, BPS = British Pharmacological Society • Molecular mechanism of action (mmoa) mapping primary & secondary targets • Release cycle time (with PubChem refreshes) ~ 2 months • Seven NAR Annual Database issues, latest as PMID: 31691834 (2020) • Every 2 years distilled into the BritishJournal of Pharmacology “Concise Guide to PHARMACOLOGY” as a nine-paper series (see PMID 29055037) with outlinks • Curates selected quality compounds for pharmacology research in silico, in vitro, in cellulo, in vivo, in clinico • An ELIXIR UK Node resource since 2016 7
  • 8. 8 Expert-curated, citation provenanced, quantitative binding data Document > assay > result > compound > location > protein target D- A- R - C- L- P Where “C” is not a small molecule, GtoP has ~ 2000 peptides included in the ~ 9000 substances we submit to PubChem
  • 11. GtoPdb peptide stats (release 2019.4) • Peptide ligands/all ligands = 22%. • Ligands with quantitative binding data/all ligs = 75% • Peptides with quantitative binding data/all peps = 63% • CID quantitative binding data peptides/all CID peps = 89% 11
  • 12. Endothelin-1 in GtoPdb (before the SMILES backfill) 12
  • 13. GtoPdb Entrez linkage (after 2019 back-fill 13
  • 14. The peptidic triple-whammy 14 Endothelin-1, CID 91928636, 1470 ”Similar Compounds” and top-100 BLAST hits 1. Too big to search or cluster by SMILES 2. Too small to BLAST cleanly (and sans PTMs) 3. Too many species splits for precursors
  • 15. Swiss-Prot precursor annotation 15 • Evidence support for endogenous processing curated from the primary literature • PTMs are indicated but text-only • Very low Mass-spec verification of existence in vivo • No standardised accession identifiers • Difficult to query across (mixed feature keys) • No secondary bioactivity annotation (e.g. from most of PubMed) • No cross-pointers (e.g. to PubChem or RefSeq)
  • 16. Will the real Endothelin please stand up? 16 • Submissions mixed between SMILES (CIDs) and sequence strings (SIDs) • "endothelin 1"[CompleteSynonym] > 6 CIDs > 36 SIDs (10 SID-only) • “MW 2491.9140 NOT endothelin 1“ > 16 CIDs > 23 SIDs (some unnamed) • BioAssay spliting is problematic
  • 17. Hierarchical Editing Language for Macromolecules (HELM) 17
  • 18. GtoPdb push: Peptides > S&S > SMILES > SIDs > CIDs 18 http://www.guidetopharmacology.org/GRAC/LigandDisplayForward?ligandId=3854
  • 19. The Next Move move (Noel O'Boyle) 19 https://www.nextmovesoftware.com/talks/OBoyle_PubChemBiologics_ACS_201708.pdf
  • 20. NextMove Biologics 8699 SIDs > 4969 CIDs Low bioactivity annotation (e.g. 259 in ChEMBL from 1.9 million CIDs, 36 in GtoPdb from 7674
  • 21. Acknowledgments and info 21 • Past and present GtoPdb curators working on peptide entries • The NextMove team for Sugar &Splice support and their peptide processing in PubChem • Lin Yikai, M.Sc. project; ”Developing bio/cheminformatics methods for converting bioactive peptide structures into machine-readable formats” • Anna Gaulton for ChEMBL FASTA sequences • Paul Thiessen for PubChem for peptide CIDs