SlideShare une entreprise Scribd logo
1  sur  34
Télécharger pour lire hors ligne
Bringing bioassay protocols to the world of
informatics, using semantic annotations
Alex M. Clark
SAR
◈ The activity column:
▷ TCruzi (chagas)
▷ IC50
▷ nM
◈ Underdefined in a
typical SDfile or table
◈ Only alternative is a
verbose publication
2
Assay Informatics
◈ Measurement of an activity has many details:
▷ target, organism, cell line, measurement type,
units, experimental design, instrumentation, etc.
◈ Usually only available as text...
◈ ... needs to be informatics.
◈ Searching for experiments, comparing datasets,
spotting trends, evaluating completeness
3
Cell-Free Homogeneous Primary HTS to Identify Inhibitors of
GSK3beta Activity
(1) Dispense 1 uL/well of CABPE, 0.5 uL of ATP, and 1 uL of
positive control GW8510 or AB in respective wells according to
plate design to 1536-well assay ready plates (Aurora 29847)
that contain 2.5 nL/well of 10 mM compound using BioRAPTR
(Beckman) to start the reaction. Incubate at room temperature for
60 minutes.
(2) Add 2.5 uL/well of ADP-glo (Promega, V9103) with
BioRAPTR, incubate at room temperature for 40 minutes
(3) Add 5 uL/well of ADP-glo (Promega, V9103) with Combi nL
(Thermo), incubate at room temperature for 30 minutes
Assay Informatics
◈ Measurement of an activity has many details:
▷ target, organism, cell line, measurement type,
units, experimental design, instrumentation, etc.
◈ Usually only available as text...
◈ ... needs to be informatics.
◈ Searching for experiments, comparing datasets,
spotting trends, evaluating completeness
3
Cell-Free Homogeneous Primary HTS to Identify Inhibitors of
GSK3beta Activity
(1) Dispense 1 uL/well of CABPE, 0.5 uL of ATP, and 1 uL of
positive control GW8510 or AB in respective wells according to
plate design to 1536-well assay ready plates (Aurora 29847)
that contain 2.5 nL/well of 10 mM compound using BioRAPTR
(Beckman) to start the reaction. Incubate at room temperature for
60 minutes.
(2) Add 2.5 uL/well of ADP-glo (Promega, V9103) with
BioRAPTR, incubate at room temperature for 40 minutes
(3) Add 5 uL/well of ADP-glo (Promega, V9103) with Combi nL
(Thermo), incubate at room temperature for 30 minutes
http://www.bioassayontology.org/bao#BAO_0002909
http://www.drugtargetontology.org/dto/DTO_03300101
http://purl.obolibrary.org/obo/NCBITaxon_9606
The Mission
4
◈ To make the world's bioassay protocol data
machine readable
◈ Much progress toward making this goal an actual
reality...
◈ With the help of ontology markup:
▷ BAO, DTO, CLO, GO, UO, UniProt...
5
Ontologies
BAO, DTO, CLO,

UO, UniProtLegacy Data
PubMed

PubChem

MLPCN

Tox21

ChEMBL
Common Assay

Template
Fully Annotated

Assays
New Data
ELN
ML
Common Assay Template
◈ A recipe for how to use ontologies
◈ Capture most important information for most assays
6
Legacy Curation
◈ Text-centric data: millions of assays...
▷ private collections (ELNs)
▷ published literature (PubMed)
▷ assay databases (PubChem)
◈ Most accessible sources: MLPCN, Tox21
◈ Data comes in as plain text + several fields + SAR
7
Hybrid machine learning
1. Text extraction
2. Probabilistic:
▷ natural language models
▷ correlation models
3. Axiomatic rules
◈ Each of these accelerates the human curator
8
Text extraction
◈ Optimised for low false positives
9
qHTS Assay for Small Molecule Inhibitors of the
Human hERG Channel Activity
NIH Chemical Genomics Center [NCGC]
National Institutes of Environmental Health Sciences
[NIEHS]
National Toxicology Program [NTP]
NCGC Assay Overview:
We have developed a 1536-well cell-based assay for
quantitative high throughput screening (qHTS) to
determine in vitro hERG channel blockage as a
measure of cardio toxicity with small molecules. This
particular assay uses the U2OS cell line which is
derived from human osteosarcoma.
...
Text extraction
◈ Optimised for low false positives
9
qHTS Assay for Small Molecule Inhibitors of the
Human hERG Channel Activity
NIH Chemical Genomics Center [NCGC]
National Institutes of Environmental Health Sciences
[NIEHS]
National Toxicology Program [NTP]
NCGC Assay Overview:
We have developed a 1536-well cell-based assay for
quantitative high throughput screening (qHTS) to
determine in vitro hERG channel blockage as a
measure of cardio toxicity with small molecules. This
particular assay uses the U2OS cell line which is
derived from human osteosarcoma.
...
Natural language models
◈ Use NLP to create fingerprints for each text document
◈ One Bayesian model for each possible annotation term
10
"(VBN performed)"
"(NP (JJ 20-nL))"
"(NP (DT this) (NN screening))"
"(NN DMSO)"
"(NN confirmation)"
"(NP (DT a) (NN solution))"
"(NNP NADP)"
...
Text
NLP
assay format
→organism-based format
bao:BAO_0000218
01101010000...
10001010100...
01010101010...
10101110010...
11001010111...
...
fingerprints
target
→transcription factor
bao:BAO_0000282
...
Correlation models
◈ Similar principle: existing annotations are fingerprints
11
one model
per term
◈ Combine natural language + correlation = ranking
Metrics
12
Metrics
◈ NLP only: 96.9% ± 4.4
12
Metrics
◈ NLP only: 96.9% ± 4.4
12
Metrics
◈ NLP only: 96.9% ± 4.4
12
◈ + correlation: 97.5% ± 3.8
Metrics
◈ NLP only: 96.9% ± 4.4
12
◈ + correlation: 97.5% ± 3.8
◈ + text mining: 97.6% ± 3.8
Axioms
◈ Ontologies like BAO & DTO have axioms
◈ Convert these into rules:
13
subject impact
limit / exclude
◈ Use to constrain predictions at each step
◈ Like correlation models, except absolute
New data: ELN
◈ Paste text or drag a document...
14
New data: ELN
◈ Paste text or drag a document...
14
New data: ELN
◈ Paste text or drag a document...
14
◈ Or re-use...
Autogenerated text
◈ Annotations-to-text is
much easier
◈ Transliteration
template:
15
Extending ontologies
◈ Science is never complete: diction evolves
◈ Curators can request a provisional term
16
Extending ontologies
◈ Science is never complete: diction evolves
◈ Curators can request a provisional term
16
Upgrading ontologies
◈ Next step: submit provisional terms to OntoloBridge
◈ Joint collaboration between CDD and:
▷ Stanford (BioPortal, CEDAR)
▷ University of Miami (BAO, DTO)
◈ Ontology owner receives new additions via API
◈ Experts vet all new terms, outcome made available
◈ Users of BAE are unimpeded...
17
Detailed templates
◈ The Common Assay Template is
a summary
◈ Can insert extra branch
groups
◈ Ultimate goal:
▷ capture whole experiment
▷ replace text entirely
◈ Syncing with ontology
maintainers...
18
★measurement
• measurement details
★assay components
• buffer concentration
• carrier protein concentration
• chelator concentration
• detergent concentration
• metal salt concentration
• reducing agent concentration
• solvent concentration
★control details
★organism details
• cell line details
• microbe/virus details
★protocol details
• antibody details
• assay molecule
• biochemical assay
• coupled substrate incubation
• enzyme reaction
• ligand incubation
• perturbagen incubation
• substrate incubation
★screened entity details
Content expansion
◈ Public data originally from PubChem subset:
▷ MLPCN: thousands of screening assays
▷ Tox21: hundreds of assays from EPA
◈ Detailed text, and compounds + measurements (SAR)
◈ Done most of them: where's the next goldmine?
19
PubMed
◈ PubMed is massive (1.7M open access), but:
▷ only some have relevant assays
▷ no compounds or other useful metadata
◈ Current project:
▷ rank assays based on likelihood of assay description
▷ pick out relevant text
◈ Hundreds of thousands of candidate assays for BAE
◈ Overlap with ChEMBL: ~600 (1.7M ∩ 70K) - metadata, SAR
20
Crowdsourcing
◈ Anyone can authenticate on beta.bioassayexpress.com
21
Crowdsourcing
◈ Anyone can authenticate on beta.bioassayexpress.com
21
Integration
◈ Basic functionality
integrated into
CDD Vault
◈ ELN users can
insert an assay
◈ Annotations are
semantic, like
standalone BAE
22
Integration
◈ Basic functionality
integrated into
CDD Vault
◈ ELN users can
insert an assay
◈ Annotations are
semantic, like
standalone BAE
22
Acknowledgments
◈ BioAssay Express Team
⇨ Peter Gedeck
⇨ Hande Küçük
⇨ Barry Bunin
⇨ Jason Harris
⇨ Charlie Read
⇨ Samantha Jeschonek
23
◈ Collaborators
⇨ René Witte & Bahar
Sateli (Knowlet)
⇨ Julia Davies (UCSF)
⇨ Mark Musen & John
Graybeal (Stanford)
⇨ Stephan Schürer (U.
of Miami)◈ More information
http://www.bioassayexpress.com
http://collaborativedrug.com
Funding
NIH SBIR

Contenu connexe

Tendances (6)

Accurate detection of low frequency genetic variants using novel, molecular t...
Accurate detection of low frequency genetic variants using novel, molecular t...Accurate detection of low frequency genetic variants using novel, molecular t...
Accurate detection of low frequency genetic variants using novel, molecular t...
 
Gold on paper-paper platform for Au-nanoprobe TB detection
Gold on paper-paper platform for Au-nanoprobe TB detectionGold on paper-paper platform for Au-nanoprobe TB detection
Gold on paper-paper platform for Au-nanoprobe TB detection
 
DNA_Services
DNA_ServicesDNA_Services
DNA_Services
 
Rt pcr
Rt pcrRt pcr
Rt pcr
 
Introduction to High Resolution Melt Analysis
Introduction to High Resolution Melt AnalysisIntroduction to High Resolution Melt Analysis
Introduction to High Resolution Melt Analysis
 
Pcr presentation
Pcr presentationPcr presentation
Pcr presentation
 

Similaire à Bringing bioassay protocols to the world of informatics, using semantic annotations

Isolation of rhizobium species from soil and to
Isolation of rhizobium species from soil and toIsolation of rhizobium species from soil and to
Isolation of rhizobium species from soil and to
tusha madan
 
2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues
Dongyan Zhao
 
Microarray validation
Microarray validationMicroarray validation
Microarray validation
Elsa von Licy
 
Cox1998-Automated_RNA_selection.
Cox1998-Automated_RNA_selection.Cox1998-Automated_RNA_selection.
Cox1998-Automated_RNA_selection.
J. Colin Cox
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Elia Brodsky
 
208 1st lecture
208 1st lecture208 1st lecture
208 1st lecture
Bulger208
 

Similaire à Bringing bioassay protocols to the world of informatics, using semantic annotations (20)

Autonomous model building with a preponderance of well annotated assay protocols
Autonomous model building with a preponderance of well annotated assay protocolsAutonomous model building with a preponderance of well annotated assay protocols
Autonomous model building with a preponderance of well annotated assay protocols
 
Isolation of rhizobium species from soil and to
Isolation of rhizobium species from soil and toIsolation of rhizobium species from soil and to
Isolation of rhizobium species from soil and to
 
2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues2015.04.08-Next-generation-sequencing-issues
2015.04.08-Next-generation-sequencing-issues
 
Unit 2 Star Activity.pdf
Unit 2 Star Activity.pdfUnit 2 Star Activity.pdf
Unit 2 Star Activity.pdf
 
Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121
 
Microarray validation
Microarray validationMicroarray validation
Microarray validation
 
Gene Expression Analysis by Real Time PCR
Gene Expression Analysis by Real Time PCRGene Expression Analysis by Real Time PCR
Gene Expression Analysis by Real Time PCR
 
CHEM3204_PRAC_Manual_2016
CHEM3204_PRAC_Manual_2016CHEM3204_PRAC_Manual_2016
CHEM3204_PRAC_Manual_2016
 
Cox1998-Automated_RNA_selection.
Cox1998-Automated_RNA_selection.Cox1998-Automated_RNA_selection.
Cox1998-Automated_RNA_selection.
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
 
PCR lecture.ppt
PCR lecture.pptPCR lecture.ppt
PCR lecture.ppt
 
Challenges of FFPE Sample Materials – Where Does Variation in Quantity of Pur...
Challenges of FFPE Sample Materials – Where Does Variation in Quantity of Pur...Challenges of FFPE Sample Materials – Where Does Variation in Quantity of Pur...
Challenges of FFPE Sample Materials – Where Does Variation in Quantity of Pur...
 
208 1st lecture
208 1st lecture208 1st lecture
208 1st lecture
 
Group b
Group bGroup b
Group b
 
New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...
New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...
New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...
 
Improved Reagents & Methods for Target Enrichment in Next Generation Sequencing
Improved Reagents & Methods for Target Enrichment in Next Generation SequencingImproved Reagents & Methods for Target Enrichment in Next Generation Sequencing
Improved Reagents & Methods for Target Enrichment in Next Generation Sequencing
 
Pcr new
Pcr newPcr new
Pcr new
 
Thesis biobix
Thesis biobixThesis biobix
Thesis biobix
 
Dario Lijtmaer - PCR Amplification
Dario Lijtmaer - PCR AmplificationDario Lijtmaer - PCR Amplification
Dario Lijtmaer - PCR Amplification
 
Barcode Data Standards
Barcode Data Standards Barcode Data Standards
Barcode Data Standards
 

Plus de Alex Clark

Representing molecules with minimalism: A solution to the entropy of informatics
Representing molecules with minimalism: A solution to the entropy of informaticsRepresenting molecules with minimalism: A solution to the entropy of informatics
Representing molecules with minimalism: A solution to the entropy of informatics
Alex Clark
 

Plus de Alex Clark (20)

Mixtures QSAR: modelling collections of chemicals
Mixtures QSAR: modelling collections of chemicalsMixtures QSAR: modelling collections of chemicals
Mixtures QSAR: modelling collections of chemicals
 
Mixtures InChI: a story of how standards drive upstream products
Mixtures InChI: a story of how standards drive upstream productsMixtures InChI: a story of how standards drive upstream products
Mixtures InChI: a story of how standards drive upstream products
 
Mixtures as first class citizens in the realm of informatics
Mixtures as first class citizens in the realm of informaticsMixtures as first class citizens in the realm of informatics
Mixtures as first class citizens in the realm of informatics
 
Mixtures: informatics for formulations and consumer products
Mixtures: informatics for formulations and consumer productsMixtures: informatics for formulations and consumer products
Mixtures: informatics for formulations and consumer products
 
Coordination InChI (2019)
Coordination InChI (2019)Coordination InChI (2019)
Coordination InChI (2019)
 
Chemical mixtures: File format, open source tools, example data, and mixtures...
Chemical mixtures: File format, open source tools, example data, and mixtures...Chemical mixtures: File format, open source tools, example data, and mixtures...
Chemical mixtures: File format, open source tools, example data, and mixtures...
 
ACS CINF Luncheon talk (Boston 2018)
ACS CINF Luncheon talk (Boston 2018)ACS CINF Luncheon talk (Boston 2018)
ACS CINF Luncheon talk (Boston 2018)
 
Representing molecules with minimalism: A solution to the entropy of informatics
Representing molecules with minimalism: A solution to the entropy of informaticsRepresenting molecules with minimalism: A solution to the entropy of informatics
Representing molecules with minimalism: A solution to the entropy of informatics
 
CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...
CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...
CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...
 
BioAssay Express
BioAssay ExpressBioAssay Express
BioAssay Express
 
SLAS2016: Why have one model when you could have thousands?
SLAS2016: Why have one model when you could have thousands?SLAS2016: Why have one model when you could have thousands?
SLAS2016: Why have one model when you could have thousands?
 
The anatomy of a chemical reaction: Dissection by machine learning algorithms
The anatomy of a chemical reaction: Dissection by machine learning algorithmsThe anatomy of a chemical reaction: Dissection by machine learning algorithms
The anatomy of a chemical reaction: Dissection by machine learning algorithms
 
Compact models for compact devices: Visualisation of SAR using mobile apps
Compact models for compact devices: Visualisation of SAR using mobile appsCompact models for compact devices: Visualisation of SAR using mobile apps
Compact models for compact devices: Visualisation of SAR using mobile apps
 
Green chemistry in chemical reactions: informatics by design
Green chemistry in chemical reactions: informatics by designGreen chemistry in chemical reactions: informatics by design
Green chemistry in chemical reactions: informatics by design
 
ICCE 2014: The Green Lab Notebook
ICCE 2014: The Green Lab NotebookICCE 2014: The Green Lab Notebook
ICCE 2014: The Green Lab Notebook
 
Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)
Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)
Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)
 
Building a mobile reaction lab notebook (ACS Dallas 2014)
Building a mobile reaction lab notebook (ACS Dallas 2014)Building a mobile reaction lab notebook (ACS Dallas 2014)
Building a mobile reaction lab notebook (ACS Dallas 2014)
 
Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013
Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013
Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013
 
Alex Clark : NETTAB 2013
Alex Clark : NETTAB 2013Alex Clark : NETTAB 2013
Alex Clark : NETTAB 2013
 
Open Drug Discovery Teams @ Hacking Health Montreal
Open Drug Discovery Teams @ Hacking Health MontrealOpen Drug Discovery Teams @ Hacking Health Montreal
Open Drug Discovery Teams @ Hacking Health Montreal
 

Dernier

Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
University of Hertfordshire
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
anilsa9823
 

Dernier (20)

Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 

Bringing bioassay protocols to the world of informatics, using semantic annotations

  • 1. Bringing bioassay protocols to the world of informatics, using semantic annotations Alex M. Clark
  • 2. SAR ◈ The activity column: ▷ TCruzi (chagas) ▷ IC50 ▷ nM ◈ Underdefined in a typical SDfile or table ◈ Only alternative is a verbose publication 2
  • 3. Assay Informatics ◈ Measurement of an activity has many details: ▷ target, organism, cell line, measurement type, units, experimental design, instrumentation, etc. ◈ Usually only available as text... ◈ ... needs to be informatics. ◈ Searching for experiments, comparing datasets, spotting trends, evaluating completeness 3 Cell-Free Homogeneous Primary HTS to Identify Inhibitors of GSK3beta Activity (1) Dispense 1 uL/well of CABPE, 0.5 uL of ATP, and 1 uL of positive control GW8510 or AB in respective wells according to plate design to 1536-well assay ready plates (Aurora 29847) that contain 2.5 nL/well of 10 mM compound using BioRAPTR (Beckman) to start the reaction. Incubate at room temperature for 60 minutes. (2) Add 2.5 uL/well of ADP-glo (Promega, V9103) with BioRAPTR, incubate at room temperature for 40 minutes (3) Add 5 uL/well of ADP-glo (Promega, V9103) with Combi nL (Thermo), incubate at room temperature for 30 minutes
  • 4. Assay Informatics ◈ Measurement of an activity has many details: ▷ target, organism, cell line, measurement type, units, experimental design, instrumentation, etc. ◈ Usually only available as text... ◈ ... needs to be informatics. ◈ Searching for experiments, comparing datasets, spotting trends, evaluating completeness 3 Cell-Free Homogeneous Primary HTS to Identify Inhibitors of GSK3beta Activity (1) Dispense 1 uL/well of CABPE, 0.5 uL of ATP, and 1 uL of positive control GW8510 or AB in respective wells according to plate design to 1536-well assay ready plates (Aurora 29847) that contain 2.5 nL/well of 10 mM compound using BioRAPTR (Beckman) to start the reaction. Incubate at room temperature for 60 minutes. (2) Add 2.5 uL/well of ADP-glo (Promega, V9103) with BioRAPTR, incubate at room temperature for 40 minutes (3) Add 5 uL/well of ADP-glo (Promega, V9103) with Combi nL (Thermo), incubate at room temperature for 30 minutes http://www.bioassayontology.org/bao#BAO_0002909 http://www.drugtargetontology.org/dto/DTO_03300101 http://purl.obolibrary.org/obo/NCBITaxon_9606
  • 5. The Mission 4 ◈ To make the world's bioassay protocol data machine readable ◈ Much progress toward making this goal an actual reality... ◈ With the help of ontology markup: ▷ BAO, DTO, CLO, GO, UO, UniProt...
  • 6. 5 Ontologies BAO, DTO, CLO, UO, UniProtLegacy Data PubMed PubChem MLPCN Tox21 ChEMBL Common Assay Template Fully Annotated Assays New Data ELN ML
  • 7. Common Assay Template ◈ A recipe for how to use ontologies ◈ Capture most important information for most assays 6
  • 8. Legacy Curation ◈ Text-centric data: millions of assays... ▷ private collections (ELNs) ▷ published literature (PubMed) ▷ assay databases (PubChem) ◈ Most accessible sources: MLPCN, Tox21 ◈ Data comes in as plain text + several fields + SAR 7
  • 9. Hybrid machine learning 1. Text extraction 2. Probabilistic: ▷ natural language models ▷ correlation models 3. Axiomatic rules ◈ Each of these accelerates the human curator 8
  • 10. Text extraction ◈ Optimised for low false positives 9 qHTS Assay for Small Molecule Inhibitors of the Human hERG Channel Activity NIH Chemical Genomics Center [NCGC] National Institutes of Environmental Health Sciences [NIEHS] National Toxicology Program [NTP] NCGC Assay Overview: We have developed a 1536-well cell-based assay for quantitative high throughput screening (qHTS) to determine in vitro hERG channel blockage as a measure of cardio toxicity with small molecules. This particular assay uses the U2OS cell line which is derived from human osteosarcoma. ...
  • 11. Text extraction ◈ Optimised for low false positives 9 qHTS Assay for Small Molecule Inhibitors of the Human hERG Channel Activity NIH Chemical Genomics Center [NCGC] National Institutes of Environmental Health Sciences [NIEHS] National Toxicology Program [NTP] NCGC Assay Overview: We have developed a 1536-well cell-based assay for quantitative high throughput screening (qHTS) to determine in vitro hERG channel blockage as a measure of cardio toxicity with small molecules. This particular assay uses the U2OS cell line which is derived from human osteosarcoma. ...
  • 12. Natural language models ◈ Use NLP to create fingerprints for each text document ◈ One Bayesian model for each possible annotation term 10 "(VBN performed)" "(NP (JJ 20-nL))" "(NP (DT this) (NN screening))" "(NN DMSO)" "(NN confirmation)" "(NP (DT a) (NN solution))" "(NNP NADP)" ... Text NLP assay format →organism-based format bao:BAO_0000218 01101010000... 10001010100... 01010101010... 10101110010... 11001010111... ... fingerprints target →transcription factor bao:BAO_0000282 ...
  • 13. Correlation models ◈ Similar principle: existing annotations are fingerprints 11 one model per term ◈ Combine natural language + correlation = ranking
  • 15. Metrics ◈ NLP only: 96.9% ± 4.4 12
  • 16. Metrics ◈ NLP only: 96.9% ± 4.4 12
  • 17. Metrics ◈ NLP only: 96.9% ± 4.4 12 ◈ + correlation: 97.5% ± 3.8
  • 18. Metrics ◈ NLP only: 96.9% ± 4.4 12 ◈ + correlation: 97.5% ± 3.8 ◈ + text mining: 97.6% ± 3.8
  • 19. Axioms ◈ Ontologies like BAO & DTO have axioms ◈ Convert these into rules: 13 subject impact limit / exclude ◈ Use to constrain predictions at each step ◈ Like correlation models, except absolute
  • 20. New data: ELN ◈ Paste text or drag a document... 14
  • 21. New data: ELN ◈ Paste text or drag a document... 14
  • 22. New data: ELN ◈ Paste text or drag a document... 14 ◈ Or re-use...
  • 23. Autogenerated text ◈ Annotations-to-text is much easier ◈ Transliteration template: 15
  • 24. Extending ontologies ◈ Science is never complete: diction evolves ◈ Curators can request a provisional term 16
  • 25. Extending ontologies ◈ Science is never complete: diction evolves ◈ Curators can request a provisional term 16
  • 26. Upgrading ontologies ◈ Next step: submit provisional terms to OntoloBridge ◈ Joint collaboration between CDD and: ▷ Stanford (BioPortal, CEDAR) ▷ University of Miami (BAO, DTO) ◈ Ontology owner receives new additions via API ◈ Experts vet all new terms, outcome made available ◈ Users of BAE are unimpeded... 17
  • 27. Detailed templates ◈ The Common Assay Template is a summary ◈ Can insert extra branch groups ◈ Ultimate goal: ▷ capture whole experiment ▷ replace text entirely ◈ Syncing with ontology maintainers... 18 ★measurement • measurement details ★assay components • buffer concentration • carrier protein concentration • chelator concentration • detergent concentration • metal salt concentration • reducing agent concentration • solvent concentration ★control details ★organism details • cell line details • microbe/virus details ★protocol details • antibody details • assay molecule • biochemical assay • coupled substrate incubation • enzyme reaction • ligand incubation • perturbagen incubation • substrate incubation ★screened entity details
  • 28. Content expansion ◈ Public data originally from PubChem subset: ▷ MLPCN: thousands of screening assays ▷ Tox21: hundreds of assays from EPA ◈ Detailed text, and compounds + measurements (SAR) ◈ Done most of them: where's the next goldmine? 19
  • 29. PubMed ◈ PubMed is massive (1.7M open access), but: ▷ only some have relevant assays ▷ no compounds or other useful metadata ◈ Current project: ▷ rank assays based on likelihood of assay description ▷ pick out relevant text ◈ Hundreds of thousands of candidate assays for BAE ◈ Overlap with ChEMBL: ~600 (1.7M ∩ 70K) - metadata, SAR 20
  • 30. Crowdsourcing ◈ Anyone can authenticate on beta.bioassayexpress.com 21
  • 31. Crowdsourcing ◈ Anyone can authenticate on beta.bioassayexpress.com 21
  • 32. Integration ◈ Basic functionality integrated into CDD Vault ◈ ELN users can insert an assay ◈ Annotations are semantic, like standalone BAE 22
  • 33. Integration ◈ Basic functionality integrated into CDD Vault ◈ ELN users can insert an assay ◈ Annotations are semantic, like standalone BAE 22
  • 34. Acknowledgments ◈ BioAssay Express Team ⇨ Peter Gedeck ⇨ Hande Küçük ⇨ Barry Bunin ⇨ Jason Harris ⇨ Charlie Read ⇨ Samantha Jeschonek 23 ◈ Collaborators ⇨ René Witte & Bahar Sateli (Knowlet) ⇨ Julia Davies (UCSF) ⇨ Mark Musen & John Graybeal (Stanford) ⇨ Stephan Schürer (U. of Miami)◈ More information http://www.bioassayexpress.com http://collaborativedrug.com Funding NIH SBIR