SlideShare une entreprise Scribd logo
1  sur  1
Télécharger pour lire hors ligne
OPERA, AN OPEN-SOURCE AND OPEN-DATA SUITE OF QSAR MODELS
K. Mansouri1, X. Chang2, D. Allen2, R. Judson3, A.J. Williams3, W. Casey1, N. Kleinstreuer1
1NIH/NIEHS/DNTP/NICEATM, RTP, NC, USA; 2ILS, RTP, NC, USA; 3CCTE/EPA, RTP, NC, USA
References
Introduction Consensus of international collaborations
Existing, recently updated and future models
OPERA application
Recent model updates:
PK parameters: FU and CLint
CERAPP: Collaborative Estrogen Receptor Activity Prediction Project
CoMPARA: Collaborative Modeling Project for Androgen Receptor Activity
CATMoS: Collaborative Acute Toxicity Modeling Suite
General approach:
Availability:
Interfaces:
pKa: acid dissociation constant
LogP: octanol-water partition coefficient
LogD: distribution coefficient
Predictions:
- EPA CompTox Chemicals Dashboard (https://comptox.epa.gov/dashboard)
- NTP’s Integrated Chemical Environment (https://ice.ntp.niehs.nih.gov/)
Standalone desktop application (current version 2.7):
- Github: https://github.com/NIEHS/OPERA
• Windows and Linux packaged installers with dependencies.
• Additional wrappers and libraries: Java, Python, C/C++
- NTP KNIME server: knime.niehs.nih.gov/knime/
More info:
- https://ntp.niehs.nih.gov/go/opera
Command line Graphical user interface
• OECD 5 principles for QSAR validation are employed during modeling
• Only high-quality curated data are used to build the models
• Chemical structures are processed using the QSAR-ready standardization
workflow prior to modeling
• The QSAR-ready workflow is also implemented in the app for user input
processing structures prior to prediction
• Works with different input and output formats
• Provides applicability domain and prediction accuracy assessment
• Provides experimental values when available
• Provides information about the nearest neighbors
• Provides molecular descriptor values for transparency
• OECD-compliant QSAR model reporting format (QMRF) reports available
• The OPERA logP model was initially built using a
curated dataset from the PHYSPROP database.
• The overall statistics of the model reached an R2 of
0.86 and an RMSE of 0.78 for the test set.
• The logP model as well as other OPERA models (water
solubility, and vapor pressure) have been updated to
account for highly investigated groups of chemicals
such as polyfluorinated substances (PFAS).
• The OPERA pKa model was built on a curated
version of the DataWarrior dataset.
• The acidic (3260 chemicals) and basic (3680
chemicals) datasets were modeled separately
• First, a weighted-kNN classification model predicts
whether a chemical is acidic, basic or both. Then a
SVM model predicts the strongest acidic and basic
pKa values
• The acidic and basic pKa models reached an R2 of
0.72 and 0.78 and RMSE of 1.80 and 1.53,
respectively.
• LogD is the distribution coefficient that takes into account pH-dependence and is used to estimate the different relative
concentrations of the ionized and non-ionized forms of a chemical at a given pH.
• OPERA uses both pKa and logP predictions to provide logD estimates for ionizable chemicals at pH 5.5 and pH 7.4.
• LogD is estimated using the following formula: 𝑙𝑜𝑔𝐷(𝑝𝐻) = 𝑙𝑜𝑔𝑃 − log(1 + 10 𝑝𝐻−𝑝𝐾𝑎
)
Binding Agonist Antagonist
Training Validation Training Validation Training Validation
Sn 0.93 0.58 0.85 0.94 0.67 0.18
Sp 0.97 0.92 0.98 0.94 0.94 0.90
BA 0.95 0.75 0.92 0.94 0.80 0.54
Binding Agonist Antagonist
Training Validation Training Validation Training Validation
Sn 0.99 0.69 0.95 0.74 1.00 0.61
Sp 0.91 0.87 0.98 0.97 0.95 0.87
BA 0.95 0.78 0.97 0.86 0.97 0.74
Very-Toxic Non-Toxic
Training Evaluation Train Evaluation
BA 0.93 0.84 0.92 0.78
Sn 0.87 0.70 0.88 0.67
Sp 0.99 0.97 0.97 0.90
EPA categories
Training Evaluation
Cat 1 Cat 2 Cat 3 Cat 4 Cat 1 Cat 2 Cat 3 Cat 4
BA 0.87 0.74
Sn 0.87 0.83 0.91 0.63 0.70 0.56 0.81 0.40
Sp 0.99 0.95 0.75 0.98 0.97 0.88 0.62 0.97
GHS categories
Training Evaluation
Cat 1 Cat 2 Cat 3 Cat 4 Cat 5 Cat 1 Cat 2 Cat 3 Cat 4 Cat 5
BA 0.88 0.74
Sn 0.73 0.75 0.84 0.80 0.88 0.50 0.53 0.56 0.66 0.67
Sp 0.99 0.99 0.92 0.89 0.96 0.99 0.97 0.89 0.74 0.90
LD50
Training Evaluation
R2 0.85 0.65
RMSE 0.30 0.49
• Both CLint and FU OPERA models were built using datasets
combined from different sources.
• Most of the data entries are also available in the EPA’s high-
throughput toxicokinetic (httk) R package.
• After several rounds of automated and manual curation to
reduce errors, variability and outliers, the CLint and FU datasets
consisted of 1056 and 1873 chemicals, respectively.
• The CLint dataset was modeled in two steps:
• First a classification model to separate the cleared from
non-cleared chemicals
• Then, a regression model is applied to predict the CLint
value for the cleared chemicals.
• The toxicity endpoints included in
OPERA are the estrogen and
androgen pathway activities and
the acute oral toxicity
• The models were the result of three
international collaborations
including over a hundred scientists
from a total of 35 research groups
covering governmental institutions,
industry and academia
• Multiple models were combined
into a unique consensus as show in
the diagram.
• CATMoS consisted of five different
endpoints and the final consensus
model was a combination of all
predictions using a weight of
evidence approach.
• CATMoS is currently
being evaluated for
regulatory use
by the US EPA.
Cross-
validation
Test
R2 0.63 0.65
RMSE 0.20 0.19
Cross-
validation
Test
BA 0.70 0.57
R2 0.40 0.39
RMSE 0.73 0.79
• OPERA is a free and open-source/open-data suite of QSAR models
providing predictions for toxicity endpoints and physicochemical,
environmental fate, and ADME properties.
• In addition to predictions, OPERA provides accuracy estimates, applicability
domain assessment and experimental data when available.
• Recent additions to OPERA include models for estrogenic activity,
androgenic activity, and acute oral systemic toxicity developed through
international collaborative modeling projects, and updates to models
predicting plasma protein binding and intrinsic hepatic clearance.
• OPERA predictions for ADME parameters (CLint and FU) as well as
physicochemical parameters (logP, pKa, and logD) are used as inputs for the
in vitro to in vivo extrapolation (IVIVE) workflow on the NTP’s Integrated
Chemical Environment (ICE: https://ice.ntp.niehs.nih.gov/).
• OPERA predictions are also available both via the user interface and for
download from the EPA’s CompTox Chemicals Dashboard.
(https://comptox.epa.gov/dashboard).
[1] Mansouri K. et al. J Cheminform (2018) https://doi.org/10.1186/s13321-018-0263-1
[2] Mansouri, K. et al. SAR & QSAR in Env. Res. (2016)
https://doi.org/10.1080/1062936X.2016.1253611
[3] Williams A. J. et al. J Cheminform (2017) https://doi.org/10.1186/s13321-017-0247-6
[4] JRC QSAR Model Database https://qsardb.jrc.ec.europa.eu/qmrf/endpoint
[5] Mansouri, K. et al. EHP (2016) https://doi.org/10.1289/ehp.1510267
[6] Mansouri, K. et al. J Cheminform (2019) https://doi.org/10.1186/s13321-019-0384-1
[7] Mansouri, K. et al. EHP (2020) https://doi.org/10.1289/EHP5580
[8] Kleinstreuer et al. Comp Tox (2018) https://doi.org/10.1016/j.comtox.2018.08.002
[9] Mansouri, K et al. EHP “CATMoS manuscript” (2021) In Press
This poster does not necessarily reflect policies of EPA or any federal agency. Mention of trade
names or commercial products does not constitute endorsement or recommendation for use.
All models:
Physchem properties
BP Boiling Point
HL Henry's Law Constant
KOA Octanol/Air Partition Coefficient
LogP Octanol-water Partition Coefficient
MP Melting Point
KOC Soil Adsorption Coefficient
VP Vapor Pressure
WS Water Solubility
RT HPLC Retention Time
pKa Acid Dissociation Constant
logD Distribution Coefficient
Environmental fate
AOH Atmospheric Hydroxylation Rate
BCF Bioconcentration Factor
BioHL Biodegradation Half-life
RB Ready Biodegradability
KM Fish Biotransformation Half-life
KOC Soil Adsorption Coefficient
ADME properties
FUB Atmospheric Hydroxylation Rate
Clint Bioconcentration Factor
Toxicity endpoints
ER Estrogen Receptor Activity
AR Androgen Receptor Activity
AcuteTox Acute Oral Systemic Toxicity
Future models
CACO2 Caco-2 permeability
Inhalation Acute Inhalation Systemic Toxicity
SixPack Acute Toxicity Six-Pack Endpoints
UGT
Glucuronidation: substrate
selectivity
SULT Sulfation: substrate selectivity
Abstract ID: 1004

Contenu connexe

Tendances

Automated workflows for data curation and standardization of chemical structu...
Automated workflows for data curation and standardization of chemical structu...Automated workflows for data curation and standardization of chemical structu...
Automated workflows for data curation and standardization of chemical structu...Kamel Mansouri
 
Crunching Molecules and Numbers in R
Crunching Molecules and Numbers in RCrunching Molecules and Numbers in R
Crunching Molecules and Numbers in RRajarshi Guha
 
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...Kamel Mansouri
 

Tendances (20)

Automated workflows for data curation and standardization of chemical structu...
Automated workflows for data curation and standardization of chemical structu...Automated workflows for data curation and standardization of chemical structu...
Automated workflows for data curation and standardization of chemical structu...
 
Crunching Molecules and Numbers in R
Crunching Molecules and Numbers in RCrunching Molecules and Numbers in R
Crunching Molecules and Numbers in R
 
Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...
 
Environmental Chemistry Compound Identification Using High Resolution Mass Sp...
Environmental Chemistry Compound Identification Using High Resolution Mass Sp...Environmental Chemistry Compound Identification Using High Resolution Mass Sp...
Environmental Chemistry Compound Identification Using High Resolution Mass Sp...
 
Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...
 
An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...
 
Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...
 
Open PHACTS Chemistry Platform Update and Learnings
Open PHACTS Chemistry Platform Update and Learnings Open PHACTS Chemistry Platform Update and Learnings
Open PHACTS Chemistry Platform Update and Learnings
 
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
 
Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...
 
Web-based access to data for >600 disinfection by-products via the EPA CompTo...
Web-based access to data for >600 disinfection by-products via the EPA CompTo...Web-based access to data for >600 disinfection by-products via the EPA CompTo...
Web-based access to data for >600 disinfection by-products via the EPA CompTo...
 
New Approach Methods - What is That?
New Approach Methods - What is That?New Approach Methods - What is That?
New Approach Methods - What is That?
 
Chemical identification of unknowns in high resolution mass spectrometry usin...
Chemical identification of unknowns in high resolution mass spectrometry usin...Chemical identification of unknowns in high resolution mass spectrometry usin...
Chemical identification of unknowns in high resolution mass spectrometry usin...
 
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental scienceUS-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
 
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted AnalysisThe US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
 
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
 
How to place your research questions or results into the context of the "Lega...
How to place your research questions or results into the context of the "Lega...How to place your research questions or results into the context of the "Lega...
How to place your research questions or results into the context of the "Lega...
 
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
 
The needs for chemistry standards, database tools and data curation at the ch...
The needs for chemistry standards, database tools and data curation at the ch...The needs for chemistry standards, database tools and data curation at the ch...
The needs for chemistry standards, database tools and data curation at the ch...
 
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
 

Similaire à OPERA, AN OPEN SOURCE AND OPEN DATA SUITE OF QSAR MODELS

Virtual screening of chemicals for endocrine disrupting activity: Case studie...
Virtual screening of chemicals for endocrine disrupting activity: Case studie...Virtual screening of chemicals for endocrine disrupting activity: Case studie...
Virtual screening of chemicals for endocrine disrupting activity: Case studie...Kamel Mansouri
 
CoMPARA: Collaborative Modeling Project for Androgen Receptor Activity
CoMPARA: Collaborative Modeling Project for Androgen Receptor ActivityCoMPARA: Collaborative Modeling Project for Androgen Receptor Activity
CoMPARA: Collaborative Modeling Project for Androgen Receptor ActivityKamel Mansouri
 
EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...
EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...
EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...Kamel Mansouri
 
International Computational Collaborations to Solve Toxicology Problems
International Computational Collaborations to Solve Toxicology ProblemsInternational Computational Collaborations to Solve Toxicology Problems
International Computational Collaborations to Solve Toxicology ProblemsKamel Mansouri
 
Update on Phase 2 of the C4SL Project
Update on Phase 2 of the C4SL ProjectUpdate on Phase 2 of the C4SL Project
Update on Phase 2 of the C4SL ProjectIES / IAQM
 
Lecture 11 developing qsar, evaluation of qsar model and virtual screening
Lecture 11  developing qsar, evaluation of qsar model and virtual screeningLecture 11  developing qsar, evaluation of qsar model and virtual screening
Lecture 11 developing qsar, evaluation of qsar model and virtual screeningRAJAN ROLTA
 
SOT short course on computational toxicology
SOT short course on computational toxicology SOT short course on computational toxicology
SOT short course on computational toxicology Sean Ekins
 
Development of machine learning-based prediction models for chemical modulato...
Development of machine learning-based prediction models for chemical modulato...Development of machine learning-based prediction models for chemical modulato...
Development of machine learning-based prediction models for chemical modulato...Sunghwan Kim
 
Multi-residue pesticide analysis of food samples using acetonitrile extractio...
Multi-residue pesticide analysis of food samples using acetonitrile extractio...Multi-residue pesticide analysis of food samples using acetonitrile extractio...
Multi-residue pesticide analysis of food samples using acetonitrile extractio...Kate?ina Svobodov
 
Data drivenapproach to medicinalchemistry
Data drivenapproach to medicinalchemistryData drivenapproach to medicinalchemistry
Data drivenapproach to medicinalchemistryAnn-Marie Roche
 
A comparison of three chromatographic retention time prediction models
A comparison of three chromatographic retention time prediction modelsA comparison of three chromatographic retention time prediction models
A comparison of three chromatographic retention time prediction modelsAndrew McEachran
 
Ovanes Mekenyan - Lush Prize Conference 2014
Ovanes Mekenyan - Lush Prize Conference 2014Ovanes Mekenyan - Lush Prize Conference 2014
Ovanes Mekenyan - Lush Prize Conference 2014LushPrize
 
Consensus Models to Predict Endocrine Disruption for All Human-Exposure Chemi...
Consensus Models to Predict Endocrine Disruption for All Human-Exposure Chemi...Consensus Models to Predict Endocrine Disruption for All Human-Exposure Chemi...
Consensus Models to Predict Endocrine Disruption for All Human-Exposure Chemi...Kamel Mansouri
 
Discovery PBPK: Efficiently using machine learning & PBPK modeling to drive l...
Discovery PBPK: Efficiently using machine learning & PBPK modeling to drive l...Discovery PBPK: Efficiently using machine learning & PBPK modeling to drive l...
Discovery PBPK: Efficiently using machine learning & PBPK modeling to drive l...PhinC Development
 

Similaire à OPERA, AN OPEN SOURCE AND OPEN DATA SUITE OF QSAR MODELS (20)

Virtual screening of chemicals for endocrine disrupting activity: Case studie...
Virtual screening of chemicals for endocrine disrupting activity: Case studie...Virtual screening of chemicals for endocrine disrupting activity: Case studie...
Virtual screening of chemicals for endocrine disrupting activity: Case studie...
 
CoMPARA: Collaborative Modeling Project for Androgen Receptor Activity
CoMPARA: Collaborative Modeling Project for Androgen Receptor ActivityCoMPARA: Collaborative Modeling Project for Androgen Receptor Activity
CoMPARA: Collaborative Modeling Project for Androgen Receptor Activity
 
EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...
EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...
EDSP Prioritization: Collaborative Estrogen Receptor Activity Prediction Proj...
 
International Computational Collaborations to Solve Toxicology Problems
International Computational Collaborations to Solve Toxicology ProblemsInternational Computational Collaborations to Solve Toxicology Problems
International Computational Collaborations to Solve Toxicology Problems
 
Update on Phase 2 of the C4SL Project
Update on Phase 2 of the C4SL ProjectUpdate on Phase 2 of the C4SL Project
Update on Phase 2 of the C4SL Project
 
Exploiting enhanced non-testing approaches to meet the needs for sustainable ...
Exploiting enhanced non-testing approaches to meet the needs for sustainable ...Exploiting enhanced non-testing approaches to meet the needs for sustainable ...
Exploiting enhanced non-testing approaches to meet the needs for sustainable ...
 
Lecture 11 developing qsar, evaluation of qsar model and virtual screening
Lecture 11  developing qsar, evaluation of qsar model and virtual screeningLecture 11  developing qsar, evaluation of qsar model and virtual screening
Lecture 11 developing qsar, evaluation of qsar model and virtual screening
 
SOT short course on computational toxicology
SOT short course on computational toxicology SOT short course on computational toxicology
SOT short course on computational toxicology
 
Development of machine learning-based prediction models for chemical modulato...
Development of machine learning-based prediction models for chemical modulato...Development of machine learning-based prediction models for chemical modulato...
Development of machine learning-based prediction models for chemical modulato...
 
QSAR Modeling.pdf
QSAR Modeling.pdfQSAR Modeling.pdf
QSAR Modeling.pdf
 
Multi-residue pesticide analysis of food samples using acetonitrile extractio...
Multi-residue pesticide analysis of food samples using acetonitrile extractio...Multi-residue pesticide analysis of food samples using acetonitrile extractio...
Multi-residue pesticide analysis of food samples using acetonitrile extractio...
 
Data drivenapproach to medicinalchemistry
Data drivenapproach to medicinalchemistryData drivenapproach to medicinalchemistry
Data drivenapproach to medicinalchemistry
 
Robert T. Dunn, II, Ph.D., DABT, SLAS ADMET Special Interest Group Meeting p...
 Robert T. Dunn, II, Ph.D., DABT, SLAS ADMET Special Interest Group Meeting p... Robert T. Dunn, II, Ph.D., DABT, SLAS ADMET Special Interest Group Meeting p...
Robert T. Dunn, II, Ph.D., DABT, SLAS ADMET Special Interest Group Meeting p...
 
A comparison of three chromatographic retention time prediction models
A comparison of three chromatographic retention time prediction modelsA comparison of three chromatographic retention time prediction models
A comparison of three chromatographic retention time prediction models
 
Ovanes Mekenyan - Lush Prize Conference 2014
Ovanes Mekenyan - Lush Prize Conference 2014Ovanes Mekenyan - Lush Prize Conference 2014
Ovanes Mekenyan - Lush Prize Conference 2014
 
Consensus Models to Predict Endocrine Disruption for All Human-Exposure Chemi...
Consensus Models to Predict Endocrine Disruption for All Human-Exposure Chemi...Consensus Models to Predict Endocrine Disruption for All Human-Exposure Chemi...
Consensus Models to Predict Endocrine Disruption for All Human-Exposure Chemi...
 
Prediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source toolsPrediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source tools
 
Discovery PBPK: Efficiently using machine learning & PBPK modeling to drive l...
Discovery PBPK: Efficiently using machine learning & PBPK modeling to drive l...Discovery PBPK: Efficiently using machine learning & PBPK modeling to drive l...
Discovery PBPK: Efficiently using machine learning & PBPK modeling to drive l...
 
Web-based access to experimental and predicted data for environmental fate, t...
Web-based access to experimental and predicted data for environmental fate, t...Web-based access to experimental and predicted data for environmental fate, t...
Web-based access to experimental and predicted data for environmental fate, t...
 
Progress in Using Big Data in Chemical Toxicity Research at the National Cent...
Progress in Using Big Data in Chemical Toxicity Research at the National Cent...Progress in Using Big Data in Chemical Toxicity Research at the National Cent...
Progress in Using Big Data in Chemical Toxicity Research at the National Cent...
 

Plus de Kamel Mansouri

Chemical prioritization using in silico modeling. SOT 2018 (San Antonio, USA)
Chemical prioritization using in silico modeling. SOT 2018 (San Antonio, USA)Chemical prioritization using in silico modeling. SOT 2018 (San Antonio, USA)
Chemical prioritization using in silico modeling. SOT 2018 (San Antonio, USA)Kamel Mansouri
 
Scoring and ranking of metabolic trees to computationally prioritize chemical...
Scoring and ranking of metabolic trees to computationally prioritize chemical...Scoring and ranking of metabolic trees to computationally prioritize chemical...
Scoring and ranking of metabolic trees to computationally prioritize chemical...Kamel Mansouri
 
QSAR STUDY ON READY BIODEGRADABILITY OF CHEMICALS. Presented at the 3rd Chemo...
QSAR STUDY ON READY BIODEGRADABILITY OF CHEMICALS. Presented at the 3rd Chemo...QSAR STUDY ON READY BIODEGRADABILITY OF CHEMICALS. Presented at the 3rd Chemo...
QSAR STUDY ON READY BIODEGRADABILITY OF CHEMICALS. Presented at the 3rd Chemo...Kamel Mansouri
 
In-silico study of ToxCast GPCR assays by quantitative structure-activity rel...
In-silico study of ToxCast GPCR assays by quantitative structure-activity rel...In-silico study of ToxCast GPCR assays by quantitative structure-activity rel...
In-silico study of ToxCast GPCR assays by quantitative structure-activity rel...Kamel Mansouri
 
An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...Kamel Mansouri
 
In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...Kamel Mansouri
 

Plus de Kamel Mansouri (6)

Chemical prioritization using in silico modeling. SOT 2018 (San Antonio, USA)
Chemical prioritization using in silico modeling. SOT 2018 (San Antonio, USA)Chemical prioritization using in silico modeling. SOT 2018 (San Antonio, USA)
Chemical prioritization using in silico modeling. SOT 2018 (San Antonio, USA)
 
Scoring and ranking of metabolic trees to computationally prioritize chemical...
Scoring and ranking of metabolic trees to computationally prioritize chemical...Scoring and ranking of metabolic trees to computationally prioritize chemical...
Scoring and ranking of metabolic trees to computationally prioritize chemical...
 
QSAR STUDY ON READY BIODEGRADABILITY OF CHEMICALS. Presented at the 3rd Chemo...
QSAR STUDY ON READY BIODEGRADABILITY OF CHEMICALS. Presented at the 3rd Chemo...QSAR STUDY ON READY BIODEGRADABILITY OF CHEMICALS. Presented at the 3rd Chemo...
QSAR STUDY ON READY BIODEGRADABILITY OF CHEMICALS. Presented at the 3rd Chemo...
 
In-silico study of ToxCast GPCR assays by quantitative structure-activity rel...
In-silico study of ToxCast GPCR assays by quantitative structure-activity rel...In-silico study of ToxCast GPCR assays by quantitative structure-activity rel...
In-silico study of ToxCast GPCR assays by quantitative structure-activity rel...
 
An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...
 
In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...In-silico structure activity relationship study of toxicity endpoints by QSAR...
In-silico structure activity relationship study of toxicity endpoints by QSAR...
 

Dernier

AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...masabamasaba
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfproinshot.com
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationShrmpro
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfayushiqss
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...Nitya salvi
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsBert Jan Schrijver
 
%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durbanmasabamasaba
 

Dernier (20)

AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions Presentation
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban
 

OPERA, AN OPEN SOURCE AND OPEN DATA SUITE OF QSAR MODELS

  • 1. OPERA, AN OPEN-SOURCE AND OPEN-DATA SUITE OF QSAR MODELS K. Mansouri1, X. Chang2, D. Allen2, R. Judson3, A.J. Williams3, W. Casey1, N. Kleinstreuer1 1NIH/NIEHS/DNTP/NICEATM, RTP, NC, USA; 2ILS, RTP, NC, USA; 3CCTE/EPA, RTP, NC, USA References Introduction Consensus of international collaborations Existing, recently updated and future models OPERA application Recent model updates: PK parameters: FU and CLint CERAPP: Collaborative Estrogen Receptor Activity Prediction Project CoMPARA: Collaborative Modeling Project for Androgen Receptor Activity CATMoS: Collaborative Acute Toxicity Modeling Suite General approach: Availability: Interfaces: pKa: acid dissociation constant LogP: octanol-water partition coefficient LogD: distribution coefficient Predictions: - EPA CompTox Chemicals Dashboard (https://comptox.epa.gov/dashboard) - NTP’s Integrated Chemical Environment (https://ice.ntp.niehs.nih.gov/) Standalone desktop application (current version 2.7): - Github: https://github.com/NIEHS/OPERA • Windows and Linux packaged installers with dependencies. • Additional wrappers and libraries: Java, Python, C/C++ - NTP KNIME server: knime.niehs.nih.gov/knime/ More info: - https://ntp.niehs.nih.gov/go/opera Command line Graphical user interface • OECD 5 principles for QSAR validation are employed during modeling • Only high-quality curated data are used to build the models • Chemical structures are processed using the QSAR-ready standardization workflow prior to modeling • The QSAR-ready workflow is also implemented in the app for user input processing structures prior to prediction • Works with different input and output formats • Provides applicability domain and prediction accuracy assessment • Provides experimental values when available • Provides information about the nearest neighbors • Provides molecular descriptor values for transparency • OECD-compliant QSAR model reporting format (QMRF) reports available • The OPERA logP model was initially built using a curated dataset from the PHYSPROP database. • The overall statistics of the model reached an R2 of 0.86 and an RMSE of 0.78 for the test set. • The logP model as well as other OPERA models (water solubility, and vapor pressure) have been updated to account for highly investigated groups of chemicals such as polyfluorinated substances (PFAS). • The OPERA pKa model was built on a curated version of the DataWarrior dataset. • The acidic (3260 chemicals) and basic (3680 chemicals) datasets were modeled separately • First, a weighted-kNN classification model predicts whether a chemical is acidic, basic or both. Then a SVM model predicts the strongest acidic and basic pKa values • The acidic and basic pKa models reached an R2 of 0.72 and 0.78 and RMSE of 1.80 and 1.53, respectively. • LogD is the distribution coefficient that takes into account pH-dependence and is used to estimate the different relative concentrations of the ionized and non-ionized forms of a chemical at a given pH. • OPERA uses both pKa and logP predictions to provide logD estimates for ionizable chemicals at pH 5.5 and pH 7.4. • LogD is estimated using the following formula: 𝑙𝑜𝑔𝐷(𝑝𝐻) = 𝑙𝑜𝑔𝑃 − log(1 + 10 𝑝𝐻−𝑝𝐾𝑎 ) Binding Agonist Antagonist Training Validation Training Validation Training Validation Sn 0.93 0.58 0.85 0.94 0.67 0.18 Sp 0.97 0.92 0.98 0.94 0.94 0.90 BA 0.95 0.75 0.92 0.94 0.80 0.54 Binding Agonist Antagonist Training Validation Training Validation Training Validation Sn 0.99 0.69 0.95 0.74 1.00 0.61 Sp 0.91 0.87 0.98 0.97 0.95 0.87 BA 0.95 0.78 0.97 0.86 0.97 0.74 Very-Toxic Non-Toxic Training Evaluation Train Evaluation BA 0.93 0.84 0.92 0.78 Sn 0.87 0.70 0.88 0.67 Sp 0.99 0.97 0.97 0.90 EPA categories Training Evaluation Cat 1 Cat 2 Cat 3 Cat 4 Cat 1 Cat 2 Cat 3 Cat 4 BA 0.87 0.74 Sn 0.87 0.83 0.91 0.63 0.70 0.56 0.81 0.40 Sp 0.99 0.95 0.75 0.98 0.97 0.88 0.62 0.97 GHS categories Training Evaluation Cat 1 Cat 2 Cat 3 Cat 4 Cat 5 Cat 1 Cat 2 Cat 3 Cat 4 Cat 5 BA 0.88 0.74 Sn 0.73 0.75 0.84 0.80 0.88 0.50 0.53 0.56 0.66 0.67 Sp 0.99 0.99 0.92 0.89 0.96 0.99 0.97 0.89 0.74 0.90 LD50 Training Evaluation R2 0.85 0.65 RMSE 0.30 0.49 • Both CLint and FU OPERA models were built using datasets combined from different sources. • Most of the data entries are also available in the EPA’s high- throughput toxicokinetic (httk) R package. • After several rounds of automated and manual curation to reduce errors, variability and outliers, the CLint and FU datasets consisted of 1056 and 1873 chemicals, respectively. • The CLint dataset was modeled in two steps: • First a classification model to separate the cleared from non-cleared chemicals • Then, a regression model is applied to predict the CLint value for the cleared chemicals. • The toxicity endpoints included in OPERA are the estrogen and androgen pathway activities and the acute oral toxicity • The models were the result of three international collaborations including over a hundred scientists from a total of 35 research groups covering governmental institutions, industry and academia • Multiple models were combined into a unique consensus as show in the diagram. • CATMoS consisted of five different endpoints and the final consensus model was a combination of all predictions using a weight of evidence approach. • CATMoS is currently being evaluated for regulatory use by the US EPA. Cross- validation Test R2 0.63 0.65 RMSE 0.20 0.19 Cross- validation Test BA 0.70 0.57 R2 0.40 0.39 RMSE 0.73 0.79 • OPERA is a free and open-source/open-data suite of QSAR models providing predictions for toxicity endpoints and physicochemical, environmental fate, and ADME properties. • In addition to predictions, OPERA provides accuracy estimates, applicability domain assessment and experimental data when available. • Recent additions to OPERA include models for estrogenic activity, androgenic activity, and acute oral systemic toxicity developed through international collaborative modeling projects, and updates to models predicting plasma protein binding and intrinsic hepatic clearance. • OPERA predictions for ADME parameters (CLint and FU) as well as physicochemical parameters (logP, pKa, and logD) are used as inputs for the in vitro to in vivo extrapolation (IVIVE) workflow on the NTP’s Integrated Chemical Environment (ICE: https://ice.ntp.niehs.nih.gov/). • OPERA predictions are also available both via the user interface and for download from the EPA’s CompTox Chemicals Dashboard. (https://comptox.epa.gov/dashboard). [1] Mansouri K. et al. J Cheminform (2018) https://doi.org/10.1186/s13321-018-0263-1 [2] Mansouri, K. et al. SAR & QSAR in Env. Res. (2016) https://doi.org/10.1080/1062936X.2016.1253611 [3] Williams A. J. et al. J Cheminform (2017) https://doi.org/10.1186/s13321-017-0247-6 [4] JRC QSAR Model Database https://qsardb.jrc.ec.europa.eu/qmrf/endpoint [5] Mansouri, K. et al. EHP (2016) https://doi.org/10.1289/ehp.1510267 [6] Mansouri, K. et al. J Cheminform (2019) https://doi.org/10.1186/s13321-019-0384-1 [7] Mansouri, K. et al. EHP (2020) https://doi.org/10.1289/EHP5580 [8] Kleinstreuer et al. Comp Tox (2018) https://doi.org/10.1016/j.comtox.2018.08.002 [9] Mansouri, K et al. EHP “CATMoS manuscript” (2021) In Press This poster does not necessarily reflect policies of EPA or any federal agency. Mention of trade names or commercial products does not constitute endorsement or recommendation for use. All models: Physchem properties BP Boiling Point HL Henry's Law Constant KOA Octanol/Air Partition Coefficient LogP Octanol-water Partition Coefficient MP Melting Point KOC Soil Adsorption Coefficient VP Vapor Pressure WS Water Solubility RT HPLC Retention Time pKa Acid Dissociation Constant logD Distribution Coefficient Environmental fate AOH Atmospheric Hydroxylation Rate BCF Bioconcentration Factor BioHL Biodegradation Half-life RB Ready Biodegradability KM Fish Biotransformation Half-life KOC Soil Adsorption Coefficient ADME properties FUB Atmospheric Hydroxylation Rate Clint Bioconcentration Factor Toxicity endpoints ER Estrogen Receptor Activity AR Androgen Receptor Activity AcuteTox Acute Oral Systemic Toxicity Future models CACO2 Caco-2 permeability Inhalation Acute Inhalation Systemic Toxicity SixPack Acute Toxicity Six-Pack Endpoints UGT Glucuronidation: substrate selectivity SULT Sulfation: substrate selectivity Abstract ID: 1004