SlideShare une entreprise Scribd logo
Desperately seeking curated D-A-R-C-P:
Assessing the past to predict the future
Introduction
Bioscientists reading papers or patents on bioactive chemistry strive to discern the
key relationships reported within a document “D“ (e.g. with a PubMed ID) where a
bioactivity “A” with a quantitative result “R” (e.g. an IC50) is reported for chemical
structure “C” that modulates (e.g. inhibits) a protein target “P” (e.g. a UniProt ID).
D – A – R – C – P
While it cannot encompass all mechanistic cases a useful shorthand for this
connectivity thus becomes DARCP. Biocuration for extraction and structured capture
of this relationship chain in databases has high value that can be explored both
manually and computationally, viz;
• “D”: clustering by relatedness, entity content, citation networks, connections
via authors and institutions
• “A”: classified by various assay ontologies
• “R”: log transformations (e.g. pIC50 or pKi) for potency ranking and SAR,
sorting by molecular mechanism of action (mmoa), (e.g. where A-R indicates
C to be a potent inhibitor of P)
• “C”: the full range of cheminformatic analysis including 2D/3D clustering,
property prediction, substructures, analogue searching and chemical ontologies
• “P” a full range of bioinformatic analysis including; target classes, Gene
Ontology (GO) assignments, pathway annotation, structural homology, disease
associations and genetic variation (e.g. for target validation).
The problem the community faces is that we have spent millions burying DARCP in
paywalled PDFs (a.k.a. “Hamburgerisation”) over many decades but must now
spend millions more trying to get it back out.
Assessing the past
The table below shows the statistics of DARCP entity accumulation from three
manually curated resources over approximately the last decade. In the table these are
compared with PubChem wherein these four are integrated as submitting sources
(GtoPdb = IUPHAR/BPS Guide to Pharmacology, PMID 31691834).
Statistical comparisons between databases can be confounded by differences in their
data models, publication selectivity, curatorial practice and activity thresholds.
Nonetheless, discrete entity count can be informative for assessing relative
extraction capture of documents, structures and proteins. The DCP counts are shown
below for the three sources.
PubMed IDs PubChem CIDs Swiss-Prot human IDs
Christopher Southan, TW2Informatics, Göteborg, Sweden
41266
Interpreting entity count differences
The capture of PMIDs shows a pattern of intersects and differences that is to some
extent also reflected in chemistry and protein targets. Each source has some unique
capture but ChEMBL and BindingDB overlap for ~25K papers (partially due to
collaborative mirroring between them). The total from all four of is ~75K PMIDs.
The chemistry (as PubChem identifiers) shows similar disproportionation with
ChEMBL, as expected, dominating with unique content of ~1.2 million. While
this is skewed by their BioAssay subsumation of ~0.5 million, most has been
extracted from ~35K unique papers. In BindingDB unique structures are mainly
from SAR curation of US Patents. In terms of interpreting difference we should
also note that GtoPdb extract on average ~ 1 lead compound per-paper, ChEMBL
~14 per-paper and BindingDB ~ 40 per-patent.
For the differences in target coverage (i.e. as “P” in DARCP) further work is needed
to know what selectivity causes this divergence (e.g. journal choice) but some
BindingDB unique proteins are patent-only. While exploring further causes of target
divergence are outside the scope of this work, the total of 3745 human proteins
(with A-R-C modulating chemistry) covered by these three, represents ~18% of the
UniProt proteome of 20,365.
So how much could be captured?
While an upper limit is difficult to assess, commercial DARCP extraction sources
such as Exelra GOSTAR and Reaxys Medicinal Chemistry, declare curated entity
counts in the range of 6-8 million activity-mapped compounds from ~200-350,000
papers plus ~70-130,000 patents. They also count over 10,000 targets (but not all as
protein identifiers). While there are caveats with comparisons (i.e. not counting the
entities in exactly the same way and no disclosure of entities-in-common) the
indication is that these two sources have captured (very roughly) 4-fold more
DARCP than public efforts, largely due to the larger number of curators employed
or contracted. However, in terms of upper limits for public capture, we must not
overlook issues of data reproducibility arising from the increasingly patchy quality
of PubMed (i.e. many papers from which DARCP should perhaps not be extracted).
Predicting the future
The future flow of DARCP into databases is constrained by the following factors;
• The three resources that continue to capture the majority of open DARCP are to
be congratulated and we hope their funding will be sustained. However, their
capacity is limited by the number of biocurators in the face of increasing
bioactivity publications (and which cheminformatics AI may accelerate).
• Progress in entity recognition via Natural Language Processing now means that
the extraction of discrete D,A,R,C, and P per se can be automated with
reasonable specificity as well as indexed by resource look-ups in European
PubMed Central (EPMC). However, this has not been achieved for D-A-R-C-P
relationships that biocurators can discern and extract from documents in minutes.
• The good news on the journal front is that we have J.Med.Chem. supplementary
SMILES listings (occasionally even with activities), Nat. Chem.Biol pointing to
PubChem entries and Brit J. Pharmacol. incorporating GtoPdb out-links and
(via those) links to PubChem. The bad new is we will move into 2020 without
even a single journal (from 1000s across the domains of medicinal chemistry,
drug discovery, pharmacology and chemical biology) facilitating author-specified
explicit DARCP automatically piped to databases (e.g. PubChem BioAssay).
• The FAIR initiative (Findable, Accessible Interoperable, Reusable) is gaining
momentum and should lead to at least discrete D,A,R,C,P annotations flowing
into various repositories, However, the proportion of fully connected D-A-R-C-P
may be low and it is unclear technically how this might flow through to major
databases. For example, there is currently neither push nor pull for DARCP to
flow from Figshare into PubChem BioAssay.
• While Open Access and Plan S are also gaining momentum, paywalls still
seriously impede extraction. The legacy problem is that only 14% of the ~62K
papers extracted by ChEMBL (as indexed in EPMC) are free full text.
• For the future we need publications to facilitate FAIR data extraction. Non-
document surfacing (e.g. open Electronic Notebooks and Wikidata) also needs
encouraging as an alternative to journals. Both trends should increase DARCP
flow into open databases to enable big data mining and knowledge distillation.
N.b. additional details from this work are given in a ChemRxiv preprint
(10.6084/m9.figshare.11295323) that is under consideration by a journal.
https://sites.google.com/view/tw2informatics/home

Contenu connexe

Tendances

Metabolic Set Enrichment Analysis - chemrich - 2019
Metabolic Set Enrichment Analysis - chemrich - 2019Metabolic Set Enrichment Analysis - chemrich - 2019
Metabolic Set Enrichment Analysis - chemrich - 2019
Dinesh Barupal
 
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental scienceUS-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...
ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...
ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Implementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSImplementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTS
Valery Tkachenko
 
Presentation from Code Camp 2017
Presentation from Code Camp 2017Presentation from Code Camp 2017
Presentation from Code Camp 2017
Mitch Miller
 
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Web-based access to data for >600 disinfection by-products via the EPA CompTo...
Web-based access to data for >600 disinfection by-products via the EPA CompTo...Web-based access to data for >600 disinfection by-products via the EPA CompTo...
Web-based access to data for >600 disinfection by-products via the EPA CompTo...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
PFAS Chemistry: Range, Complexity, Groupings, and the CompTox Chemicals Dash...
PFAS Chemistry: Range, Complexity, Groupings, and the CompTox  Chemicals Dash...PFAS Chemistry: Range, Complexity, Groupings, and the CompTox  Chemicals Dash...
PFAS Chemistry: Range, Complexity, Groupings, and the CompTox Chemicals Dash...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...
Valery Tkachenko
 
ReVeaLD: A User-driven Domain Specific Interactive Search Platform for Biomed...
ReVeaLD: A User-driven Domain Specific Interactive Search Platform for Biomed...ReVeaLD: A User-driven Domain Specific Interactive Search Platform for Biomed...
ReVeaLD: A User-driven Domain Specific Interactive Search Platform for Biomed...
Maulik Kamdar
 
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
Dr. Haxel Consult
 
Identification of “Known Unknowns” Utilizing Accurate Mass Data and ChemSpider
Identification of “Known Unknowns” Utilizing Accurate Mass Data and ChemSpiderIdentification of “Known Unknowns” Utilizing Accurate Mass Data and ChemSpider
Identification of “Known Unknowns” Utilizing Accurate Mass Data and ChemSpider
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Tox...
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Tox...The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Tox...
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Tox...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Searching for chemical information using PubChem
Searching for chemical information using PubChemSearching for chemical information using PubChem
Searching for chemical information using PubChem
Sunghwan Kim
 
Chemspider For Mass Spectrometrists Public Version
Chemspider For Mass Spectrometrists Public VersionChemspider For Mass Spectrometrists Public Version
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG
open_phacts
 
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
ChemSpider and How The Wisdom Of The Crowds Can Improve The Quality Of ...
ChemSpider  and How The Wisdom Of The  Crowds  Can  Improve The  Quality Of  ...ChemSpider  and How The Wisdom Of The  Crowds  Can  Improve The  Quality Of  ...
ChemSpider and How The Wisdom Of The Crowds Can Improve The Quality Of ...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Data Journalism - Cleaning Data
Data Journalism - Cleaning DataData Journalism - Cleaning Data
Data Journalism - Cleaning Data
Bahareh Heravi
 
What chemicals constitute the Exposome? Accessing data via the US EPA’s Comp...
What chemicals constitute the Exposome? Accessing data via the US EPA’s  Comp...What chemicals constitute the Exposome? Accessing data via the US EPA’s  Comp...
What chemicals constitute the Exposome? Accessing data via the US EPA’s Comp...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 

Tendances (20)

Metabolic Set Enrichment Analysis - chemrich - 2019
Metabolic Set Enrichment Analysis - chemrich - 2019Metabolic Set Enrichment Analysis - chemrich - 2019
Metabolic Set Enrichment Analysis - chemrich - 2019
 
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental scienceUS-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
 
ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...
ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...
ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...
 
Implementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSImplementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTS
 
Presentation from Code Camp 2017
Presentation from Code Camp 2017Presentation from Code Camp 2017
Presentation from Code Camp 2017
 
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
 
Web-based access to data for >600 disinfection by-products via the EPA CompTo...
Web-based access to data for >600 disinfection by-products via the EPA CompTo...Web-based access to data for >600 disinfection by-products via the EPA CompTo...
Web-based access to data for >600 disinfection by-products via the EPA CompTo...
 
PFAS Chemistry: Range, Complexity, Groupings, and the CompTox Chemicals Dash...
PFAS Chemistry: Range, Complexity, Groupings, and the CompTox  Chemicals Dash...PFAS Chemistry: Range, Complexity, Groupings, and the CompTox  Chemicals Dash...
PFAS Chemistry: Range, Complexity, Groupings, and the CompTox Chemicals Dash...
 
Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...
 
ReVeaLD: A User-driven Domain Specific Interactive Search Platform for Biomed...
ReVeaLD: A User-driven Domain Specific Interactive Search Platform for Biomed...ReVeaLD: A User-driven Domain Specific Interactive Search Platform for Biomed...
ReVeaLD: A User-driven Domain Specific Interactive Search Platform for Biomed...
 
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
 
Identification of “Known Unknowns” Utilizing Accurate Mass Data and ChemSpider
Identification of “Known Unknowns” Utilizing Accurate Mass Data and ChemSpiderIdentification of “Known Unknowns” Utilizing Accurate Mass Data and ChemSpider
Identification of “Known Unknowns” Utilizing Accurate Mass Data and ChemSpider
 
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Tox...
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Tox...The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Tox...
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Tox...
 
Searching for chemical information using PubChem
Searching for chemical information using PubChemSearching for chemical information using PubChem
Searching for chemical information using PubChem
 
Chemspider For Mass Spectrometrists Public Version
Chemspider For Mass Spectrometrists Public VersionChemspider For Mass Spectrometrists Public Version
Chemspider For Mass Spectrometrists Public Version
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG
 
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
 
ChemSpider and How The Wisdom Of The Crowds Can Improve The Quality Of ...
ChemSpider  and How The Wisdom Of The  Crowds  Can  Improve The  Quality Of  ...ChemSpider  and How The Wisdom Of The  Crowds  Can  Improve The  Quality Of  ...
ChemSpider and How The Wisdom Of The Crowds Can Improve The Quality Of ...
 
Data Journalism - Cleaning Data
Data Journalism - Cleaning DataData Journalism - Cleaning Data
Data Journalism - Cleaning Data
 
What chemicals constitute the Exposome? Accessing data via the US EPA’s Comp...
What chemicals constitute the Exposome? Accessing data via the US EPA’s  Comp...What chemicals constitute the Exposome? Accessing data via the US EPA’s  Comp...
What chemicals constitute the Exposome? Accessing data via the US EPA’s Comp...
 

Similaire à Desperately seeking DARCP

FAIR connectivity for DARCP
FAIR  connectivity for DARCPFAIR  connectivity for DARCP
FAIR connectivity for DARCP
Chris Southan
 
Looking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIRLooking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIR
Chris Southan
 
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
Dr. Haxel Consult
 
Connectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivityConnectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivity
Chris Southan
 
Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...
Greg Landrum
 
Precompetitive preclinical ADME/tox data and set it free on the web to facili...
Precompetitive preclinical ADME/tox data and set it free on the web to facili...Precompetitive preclinical ADME/tox data and set it free on the web to facili...
Precompetitive preclinical ADME/tox data and set it free on the web to facili...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Peptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdbPeptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdb
Chris Southan
 
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
open_phacts
 
Revolution in the Connectivity Between Medicinal Chemistry and Biology
Revolution in the Connectivity Between Medicinal Chemistry and BiologyRevolution in the Connectivity Between Medicinal Chemistry and Biology
Revolution in the Connectivity Between Medicinal Chemistry and Biology
Chris Southan
 
Towards semantic systems chemical biology
Towards semantic systems chemical biology Towards semantic systems chemical biology
Towards semantic systems chemical biology
Bin Chen
 
Pub Med to PubChem Connectivity
Pub Med to PubChem ConnectivityPub Med to PubChem Connectivity
Pub Med to PubChem Connectivity
Chris Southan
 
The big data join in pharmacology
The big data join in pharmacologyThe big data join in pharmacology
The big data join in pharmacology
Chris Southan
 
Cadd and molecular modeling for M.Pharm
Cadd and molecular modeling for M.PharmCadd and molecular modeling for M.Pharm
Cadd and molecular modeling for M.Pharm
Shikha Popali
 
Assessing GtoPdb ligand content in PubChem
Assessing GtoPdb ligand content in PubChemAssessing GtoPdb ligand content in PubChem
Assessing GtoPdb ligand content in PubChem
Chris Southan
 
Exploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural productsExploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural products
Sunghwan Kim
 
PubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biologyPubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biology
Chris Southan
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Greg Landrum
 
In silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug DevelopmentIn silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug Development
Chris Southan
 
GtoPdb ELIXIR-All Hands 2018
GtoPdb ELIXIR-All Hands 2018GtoPdb ELIXIR-All Hands 2018
GtoPdb ELIXIR-All Hands 2018
Guide to PHARMACOLOGY
 
The Patent Chemistry “Big Bang” In Pubchem
The Patent Chemistry “Big Bang” In PubchemThe Patent Chemistry “Big Bang” In Pubchem
The Patent Chemistry “Big Bang” In Pubchem
Chris Southan
 

Similaire à Desperately seeking DARCP (20)

FAIR connectivity for DARCP
FAIR  connectivity for DARCPFAIR  connectivity for DARCP
FAIR connectivity for DARCP
 
Looking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIRLooking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIR
 
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
 
Connectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivityConnectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivity
 
Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...
 
Precompetitive preclinical ADME/tox data and set it free on the web to facili...
Precompetitive preclinical ADME/tox data and set it free on the web to facili...Precompetitive preclinical ADME/tox data and set it free on the web to facili...
Precompetitive preclinical ADME/tox data and set it free on the web to facili...
 
Peptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdbPeptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdb
 
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
 
Revolution in the Connectivity Between Medicinal Chemistry and Biology
Revolution in the Connectivity Between Medicinal Chemistry and BiologyRevolution in the Connectivity Between Medicinal Chemistry and Biology
Revolution in the Connectivity Between Medicinal Chemistry and Biology
 
Towards semantic systems chemical biology
Towards semantic systems chemical biology Towards semantic systems chemical biology
Towards semantic systems chemical biology
 
Pub Med to PubChem Connectivity
Pub Med to PubChem ConnectivityPub Med to PubChem Connectivity
Pub Med to PubChem Connectivity
 
The big data join in pharmacology
The big data join in pharmacologyThe big data join in pharmacology
The big data join in pharmacology
 
Cadd and molecular modeling for M.Pharm
Cadd and molecular modeling for M.PharmCadd and molecular modeling for M.Pharm
Cadd and molecular modeling for M.Pharm
 
Assessing GtoPdb ligand content in PubChem
Assessing GtoPdb ligand content in PubChemAssessing GtoPdb ligand content in PubChem
Assessing GtoPdb ligand content in PubChem
 
Exploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural productsExploiting PubChem for drug discovery based on natural products
Exploiting PubChem for drug discovery based on natural products
 
PubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biologyPubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biology
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...
 
In silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug DevelopmentIn silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug Development
 
GtoPdb ELIXIR-All Hands 2018
GtoPdb ELIXIR-All Hands 2018GtoPdb ELIXIR-All Hands 2018
GtoPdb ELIXIR-All Hands 2018
 
The Patent Chemistry “Big Bang” In Pubchem
The Patent Chemistry “Big Bang” In PubchemThe Patent Chemistry “Big Bang” In Pubchem
The Patent Chemistry “Big Bang” In Pubchem
 

Plus de Chris Southan

Peptide tribulations
Peptide tribulationsPeptide tribulations
Peptide tribulations
Chris Southan
 
Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2 Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2
Chris Southan
 
Guide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updaeGuide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updae
Chris Southan
 
Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?
Chris Southan
 
Seeking glimmers of light in Pharos “Tdark” proteins
Seeking glimmers of light in  Pharos “Tdark” proteinsSeeking glimmers of light in  Pharos “Tdark” proteins
Seeking glimmers of light in Pharos “Tdark” proteins
Chris Southan
 
5HT2A modulators update for SAFER
5HT2A modulators update for SAFER5HT2A modulators update for SAFER
5HT2A modulators update for SAFER
Chris Southan
 
Quality and noise in big chemistry databases
Quality and noise in big chemistry databasesQuality and noise in big chemistry databases
Quality and noise in big chemistry databases
Chris Southan
 
GtoPdb June 2019 poster
GtoPdb June 2019 posterGtoPdb June 2019 poster
GtoPdb June 2019 poster
Chris Southan
 
PubChem as a source of systems biology perturbagens
PubChem as a source of  systems biology perturbagensPubChem as a source of  systems biology perturbagens
PubChem as a source of systems biology perturbagens
Chris Southan
 
Will the real proteins please stand up
Will the real proteins please stand upWill the real proteins please stand up
Will the real proteins please stand up
Chris Southan
 
Peptide Tribulations
Peptide TribulationsPeptide Tribulations
Peptide Tribulations
Chris Southan
 
Guide to Immunopharmacology update
Guide to Immunopharmacology updateGuide to Immunopharmacology update
Guide to Immunopharmacology update
Chris Southan
 
Druggable Proteome sources in UniProt
Druggable Proteome sources in UniProtDruggable Proteome sources in UniProt
Druggable Proteome sources in UniProt
Chris Southan
 
Patents in PubChem
Patents in PubChemPatents in PubChem
Patents in PubChem
Chris Southan
 
The IUPHAR/MMV Guide to Malaria Pharmacology
The  IUPHAR/MMV Guide to Malaria Pharmacology  The  IUPHAR/MMV Guide to Malaria Pharmacology
The IUPHAR/MMV Guide to Malaria Pharmacology
Chris Southan
 
Linking GtoP <> PubChem <> PubMed
Linking GtoP <> PubChem <> PubMed Linking GtoP <> PubChem <> PubMed
Linking GtoP <> PubChem <> PubMed
Chris Southan
 
Druggable genome in GtoPdb and other dbs
Druggable genome in GtoPdb and other dbsDruggable genome in GtoPdb and other dbs
Druggable genome in GtoPdb and other dbs
Chris Southan
 
5HT2A modulators in GtoPdb and other databses
5HT2A modulators in GtoPdb and other databses5HT2A modulators in GtoPdb and other databses
5HT2A modulators in GtoPdb and other databses
Chris Southan
 
Pros and cons of patent-extracted structures in PubChem
Pros and cons of patent-extracted structures in PubChemPros and cons of patent-extracted structures in PubChem
Pros and cons of patent-extracted structures in PubChem
Chris Southan
 
GtoPdb: A resource for cell-based perturbogens
GtoPdb:  A resource for cell-based perturbogensGtoPdb:  A resource for cell-based perturbogens
GtoPdb: A resource for cell-based perturbogens
Chris Southan
 

Plus de Chris Southan (20)

Peptide tribulations
Peptide tribulationsPeptide tribulations
Peptide tribulations
 
Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2 Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2
 
Guide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updaeGuide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updae
 
Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?
 
Seeking glimmers of light in Pharos “Tdark” proteins
Seeking glimmers of light in  Pharos “Tdark” proteinsSeeking glimmers of light in  Pharos “Tdark” proteins
Seeking glimmers of light in Pharos “Tdark” proteins
 
5HT2A modulators update for SAFER
5HT2A modulators update for SAFER5HT2A modulators update for SAFER
5HT2A modulators update for SAFER
 
Quality and noise in big chemistry databases
Quality and noise in big chemistry databasesQuality and noise in big chemistry databases
Quality and noise in big chemistry databases
 
GtoPdb June 2019 poster
GtoPdb June 2019 posterGtoPdb June 2019 poster
GtoPdb June 2019 poster
 
PubChem as a source of systems biology perturbagens
PubChem as a source of  systems biology perturbagensPubChem as a source of  systems biology perturbagens
PubChem as a source of systems biology perturbagens
 
Will the real proteins please stand up
Will the real proteins please stand upWill the real proteins please stand up
Will the real proteins please stand up
 
Peptide Tribulations
Peptide TribulationsPeptide Tribulations
Peptide Tribulations
 
Guide to Immunopharmacology update
Guide to Immunopharmacology updateGuide to Immunopharmacology update
Guide to Immunopharmacology update
 
Druggable Proteome sources in UniProt
Druggable Proteome sources in UniProtDruggable Proteome sources in UniProt
Druggable Proteome sources in UniProt
 
Patents in PubChem
Patents in PubChemPatents in PubChem
Patents in PubChem
 
The IUPHAR/MMV Guide to Malaria Pharmacology
The  IUPHAR/MMV Guide to Malaria Pharmacology  The  IUPHAR/MMV Guide to Malaria Pharmacology
The IUPHAR/MMV Guide to Malaria Pharmacology
 
Linking GtoP <> PubChem <> PubMed
Linking GtoP <> PubChem <> PubMed Linking GtoP <> PubChem <> PubMed
Linking GtoP <> PubChem <> PubMed
 
Druggable genome in GtoPdb and other dbs
Druggable genome in GtoPdb and other dbsDruggable genome in GtoPdb and other dbs
Druggable genome in GtoPdb and other dbs
 
5HT2A modulators in GtoPdb and other databses
5HT2A modulators in GtoPdb and other databses5HT2A modulators in GtoPdb and other databses
5HT2A modulators in GtoPdb and other databses
 
Pros and cons of patent-extracted structures in PubChem
Pros and cons of patent-extracted structures in PubChemPros and cons of patent-extracted structures in PubChem
Pros and cons of patent-extracted structures in PubChem
 
GtoPdb: A resource for cell-based perturbogens
GtoPdb:  A resource for cell-based perturbogensGtoPdb:  A resource for cell-based perturbogens
GtoPdb: A resource for cell-based perturbogens
 

Dernier

Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
Frédéric Baudron
 
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
Advanced-Concepts-Team
 
gastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptxgastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptx
Shekar Boddu
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
Leonel Morgado
 
Microbiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdfMicrobiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdf
sammy700571
 
2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf
lucianamillenium
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
hozt8xgk
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
Vandana Devesh Sharma
 
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdfHUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
Ritik83251
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
frank0071
 
Clinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdfClinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdf
RAYMUNDONAVARROCORON
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
vluwdy49
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
Carl Bergstrom
 
Anti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark UniverseAnti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark Universe
Sérgio Sacani
 
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptxBIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
goluk9330
 
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptxLEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
yourprojectpartner05
 
cathode ray oscilloscope and its applications
cathode ray oscilloscope and its applicationscathode ray oscilloscope and its applications
cathode ray oscilloscope and its applications
sandertein
 
Sustainable Land Management - Climate Smart Agriculture
Sustainable Land Management - Climate Smart AgricultureSustainable Land Management - Climate Smart Agriculture
Sustainable Land Management - Climate Smart Agriculture
International Food Policy Research Institute- South Asia Office
 
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfMending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Selcen Ozturkcan
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
PirithiRaju
 

Dernier (20)

Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
 
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
ESA/ACT Science Coffee: Diego Blas - Gravitational wave detection with orbita...
 
gastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptxgastroretentive drug delivery system-PPT.pptx
gastroretentive drug delivery system-PPT.pptx
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
 
Microbiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdfMicrobiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdf
 
2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf2001_Book_HumanChromosomes - Genéticapdf
2001_Book_HumanChromosomes - Genéticapdf
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
 
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdfHUMAN EYE By-R.M Class 10 phy best digital notes.pdf
HUMAN EYE By-R.M Class 10 phy best digital notes.pdf
 
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
Juaristi, Jon. - El canon espanol. El legado de la cultura española a la civi...
 
Clinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdfClinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdf
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
 
Anti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark UniverseAnti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark Universe
 
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptxBIRDS  DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
BIRDS DIVERSITY OF SOOTEA BISWANATH ASSAM.ppt.pptx
 
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptxLEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
 
cathode ray oscilloscope and its applications
cathode ray oscilloscope and its applicationscathode ray oscilloscope and its applications
cathode ray oscilloscope and its applications
 
Sustainable Land Management - Climate Smart Agriculture
Sustainable Land Management - Climate Smart AgricultureSustainable Land Management - Climate Smart Agriculture
Sustainable Land Management - Climate Smart Agriculture
 
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfMending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
 

Desperately seeking DARCP

  • 1. Desperately seeking curated D-A-R-C-P: Assessing the past to predict the future Introduction Bioscientists reading papers or patents on bioactive chemistry strive to discern the key relationships reported within a document “D“ (e.g. with a PubMed ID) where a bioactivity “A” with a quantitative result “R” (e.g. an IC50) is reported for chemical structure “C” that modulates (e.g. inhibits) a protein target “P” (e.g. a UniProt ID). D – A – R – C – P While it cannot encompass all mechanistic cases a useful shorthand for this connectivity thus becomes DARCP. Biocuration for extraction and structured capture of this relationship chain in databases has high value that can be explored both manually and computationally, viz; • “D”: clustering by relatedness, entity content, citation networks, connections via authors and institutions • “A”: classified by various assay ontologies • “R”: log transformations (e.g. pIC50 or pKi) for potency ranking and SAR, sorting by molecular mechanism of action (mmoa), (e.g. where A-R indicates C to be a potent inhibitor of P) • “C”: the full range of cheminformatic analysis including 2D/3D clustering, property prediction, substructures, analogue searching and chemical ontologies • “P” a full range of bioinformatic analysis including; target classes, Gene Ontology (GO) assignments, pathway annotation, structural homology, disease associations and genetic variation (e.g. for target validation). The problem the community faces is that we have spent millions burying DARCP in paywalled PDFs (a.k.a. “Hamburgerisation”) over many decades but must now spend millions more trying to get it back out. Assessing the past The table below shows the statistics of DARCP entity accumulation from three manually curated resources over approximately the last decade. In the table these are compared with PubChem wherein these four are integrated as submitting sources (GtoPdb = IUPHAR/BPS Guide to Pharmacology, PMID 31691834). Statistical comparisons between databases can be confounded by differences in their data models, publication selectivity, curatorial practice and activity thresholds. Nonetheless, discrete entity count can be informative for assessing relative extraction capture of documents, structures and proteins. The DCP counts are shown below for the three sources. PubMed IDs PubChem CIDs Swiss-Prot human IDs Christopher Southan, TW2Informatics, Göteborg, Sweden 41266 Interpreting entity count differences The capture of PMIDs shows a pattern of intersects and differences that is to some extent also reflected in chemistry and protein targets. Each source has some unique capture but ChEMBL and BindingDB overlap for ~25K papers (partially due to collaborative mirroring between them). The total from all four of is ~75K PMIDs. The chemistry (as PubChem identifiers) shows similar disproportionation with ChEMBL, as expected, dominating with unique content of ~1.2 million. While this is skewed by their BioAssay subsumation of ~0.5 million, most has been extracted from ~35K unique papers. In BindingDB unique structures are mainly from SAR curation of US Patents. In terms of interpreting difference we should also note that GtoPdb extract on average ~ 1 lead compound per-paper, ChEMBL ~14 per-paper and BindingDB ~ 40 per-patent. For the differences in target coverage (i.e. as “P” in DARCP) further work is needed to know what selectivity causes this divergence (e.g. journal choice) but some BindingDB unique proteins are patent-only. While exploring further causes of target divergence are outside the scope of this work, the total of 3745 human proteins (with A-R-C modulating chemistry) covered by these three, represents ~18% of the UniProt proteome of 20,365. So how much could be captured? While an upper limit is difficult to assess, commercial DARCP extraction sources such as Exelra GOSTAR and Reaxys Medicinal Chemistry, declare curated entity counts in the range of 6-8 million activity-mapped compounds from ~200-350,000 papers plus ~70-130,000 patents. They also count over 10,000 targets (but not all as protein identifiers). While there are caveats with comparisons (i.e. not counting the entities in exactly the same way and no disclosure of entities-in-common) the indication is that these two sources have captured (very roughly) 4-fold more DARCP than public efforts, largely due to the larger number of curators employed or contracted. However, in terms of upper limits for public capture, we must not overlook issues of data reproducibility arising from the increasingly patchy quality of PubMed (i.e. many papers from which DARCP should perhaps not be extracted). Predicting the future The future flow of DARCP into databases is constrained by the following factors; • The three resources that continue to capture the majority of open DARCP are to be congratulated and we hope their funding will be sustained. However, their capacity is limited by the number of biocurators in the face of increasing bioactivity publications (and which cheminformatics AI may accelerate). • Progress in entity recognition via Natural Language Processing now means that the extraction of discrete D,A,R,C, and P per se can be automated with reasonable specificity as well as indexed by resource look-ups in European PubMed Central (EPMC). However, this has not been achieved for D-A-R-C-P relationships that biocurators can discern and extract from documents in minutes. • The good news on the journal front is that we have J.Med.Chem. supplementary SMILES listings (occasionally even with activities), Nat. Chem.Biol pointing to PubChem entries and Brit J. Pharmacol. incorporating GtoPdb out-links and (via those) links to PubChem. The bad new is we will move into 2020 without even a single journal (from 1000s across the domains of medicinal chemistry, drug discovery, pharmacology and chemical biology) facilitating author-specified explicit DARCP automatically piped to databases (e.g. PubChem BioAssay). • The FAIR initiative (Findable, Accessible Interoperable, Reusable) is gaining momentum and should lead to at least discrete D,A,R,C,P annotations flowing into various repositories, However, the proportion of fully connected D-A-R-C-P may be low and it is unclear technically how this might flow through to major databases. For example, there is currently neither push nor pull for DARCP to flow from Figshare into PubChem BioAssay. • While Open Access and Plan S are also gaining momentum, paywalls still seriously impede extraction. The legacy problem is that only 14% of the ~62K papers extracted by ChEMBL (as indexed in EPMC) are free full text. • For the future we need publications to facilitate FAIR data extraction. Non- document surfacing (e.g. open Electronic Notebooks and Wikidata) also needs encouraging as an alternative to journals. Both trends should increase DARCP flow into open databases to enable big data mining and knowledge distillation. N.b. additional details from this work are given in a ChemRxiv preprint (10.6084/m9.figshare.11295323) that is under consideration by a journal. https://sites.google.com/view/tw2informatics/home