Integrating an Analytical Methods and Mass Spectral Database with Cheminformatics Capabilities

US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure Scientist at National Center of Computational Toxicology at EPA à US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
Innovative Research for a Sustainable Future
www.epa.gov/research
Integrating an Analytical Methods and Mass Spectral Database with
Cheminformatics Capabilities
Gregory Janesch1, Erik Carr1, Vicente Samano2, Brian Meyer2 and Antony Williams3
1. ORAU Student Services Contractor to Center for Computational Toxicology & Exposure, Office of Research & Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, USA
2. Senior Environmental Employment Program, US Environmental Protection Agency, Research Triangle Park, USA
3. Center for Computational Toxicology & Exposure, Office of Research & Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, USA
`
ACS West
San Francisco, CA
August 13-17, 2023
There are three kinds of data contained within the database.
- Fact sheets are results-oriented documents with data associated
with one or more substances including basic descriptions of health
effects to monographs with NMR, Raman, and IR spectra.
- Methods document an end-to-end analytical procedure for one or
more substances, sometimes 100s of chemicals. The documents
are curated to extract the chemical compounds and then
annotated with information such as matrix and methodologies.
- Spectra, in the form of lists of m/z-intensity pairs and parameters.
In addition to the above information, records have assorted
metadata stored in the database. These data include information
such as experimental conditions, authors, a synopsis for the method
or fact sheet, and other data depending on what kind of record it is.
Data are open access and are derived from a variety of sources.
These include online spectral databases, vendor methods, research
groups, EPA databases and other government agencies.
At the time of writing the database contains approximately:
- 165,000 spectra (plus 600,000 externally linked spectra)
- >700 fact sheets
- >3300 methods
General Searching
Data
Spectrum Search
Description
A large variety of sources for spectra, documented analytical
procedures and methods, and other associated documentation exist
and are, in theory, easily available with the usual web search.
However, these sources are largely isolated from each other, not
easy to find via general searches because of inconsistencies in
chemical names and identifiers and then are highly varied in format.
To address these challenges, the Analytical Methods and Open
Spectra (AMOS) web application has been developed. AMOS is a
database and associated web-based application containing several
types of records searchable by common identifiers known to
chemists (i.e., CASRNs, InChI Keys and chemical names).
The authors thank the data curation team for their rigorous work in
annotating and identifying information in the records. Chemical data
extraction, curation and annotation is an essential part of this work.
Primary search functionality
searches all records for a
single chemical substance.
One half of the page (Fig.1)
shows the searched
compound (assuming a
match) and yields a table of
records containing that
substance, the data source,
associated methodology, and
a short description of the
record itself.
Selecting a row in that table
allows for viewing the
contents of that record more
closely, whether opening an
analytical method or
displaying a spectrum.
For spectral data, an
additional search option is
available. If a mass range,
methodology, and spectrum
(as x,y pairs) are supplied,
matching spectra with that
mass and methodology,
ranked by their similarity to
the user-supplied spectrum
will be returned. See Fig. 2.
The top table lists the
associated substance for
the found spectrum (with
associated DTXSID), the
similarity of that spectrum,
and a description of that
spectrum. Below that table
is an interactive plot of the
overlap of the two spectra.
Method Searches
AMOS contains two functions for searching for methods. One is a simple
table that lists all methods in the database (not pictured). This list can be
filtered by several fields including matrix, analyte, and method name,
allowing for quick discovery of methods that cover a known topic.
The other, shown below, is a search for methods containing similar
substances, thereby providing a starting point even for chemicals without
methods. A substance is searched for and if methods exist they are
returned. If there are no existing methods for that chemical then AMOS
returns all methods which contain at least one substance with a
sufficiently high Tanimoto structural similarity coefficient. This can be
especially useful in cases where a substance does not have any methods
associated with it at all – in the example below (see Fig. 3), the drug was
only available starting in 2015, so there has been relatively little time to
develop and publish methods for it.
Acknowledgements
Disclaimers
This tool is currently internal to the US- EPA and still under development.
Plans to release this to the public have not been finalized, but the process
is hoped to be complete by early 2024.The data used in this application
have not been thoroughly reviewed by the EPA and the user needs to
exercise judgement in their use of the results.
The views expressed in this poster are those of the authors and do not
necessarily reflect the views or policies of the U.S. EPA
Figure 1: The list of methods and
LC-MS or GC-MS spectra
associated with perfluorooctane-
sulfonic acid (PFOS).
Figure 2: A spectral similarity search
result includes the similarity match for
spectra and the list of associated
chemical compounds.
Figure 3: A search for a chemical with no matching methods then
provides the associated structure to a Tanimoto structural similarity
search to return methods with similar structures contained in them.
1 sur 1

Recommandé

Inference Networks for Molecular Database Similarity Searching par
Inference Networks for Molecular Database Similarity SearchingInference Networks for Molecular Database Similarity Searching
Inference Networks for Molecular Database Similarity SearchingCSCJournals
229 vues16 diapositives
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION par
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSIONCOMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSIONcsandit
275 vues14 diapositives

Contenu connexe

Similaire à Integrating an Analytical Methods and Mass Spectral Database with Cheminformatics Capabilities

How to handle discrepancies while you collect data for systemic review – pubrica par
How to handle discrepancies while you collect data for systemic review – pubricaHow to handle discrepancies while you collect data for systemic review – pubrica
How to handle discrepancies while you collect data for systemic review – pubricaPubrica
55 vues3 diapositives
COMPUTATIONAL TOOLS FOR PREDICTION OF NUCLEAR RECEPTOR MEDIATED EFFECTS par
COMPUTATIONAL TOOLS FOR PREDICTION OF NUCLEAR RECEPTOR MEDIATED EFFECTSCOMPUTATIONAL TOOLS FOR PREDICTION OF NUCLEAR RECEPTOR MEDIATED EFFECTS
COMPUTATIONAL TOOLS FOR PREDICTION OF NUCLEAR RECEPTOR MEDIATED EFFECTSEAJOA
201 vues11 diapositives

Similaire à Integrating an Analytical Methods and Mass Spectral Database with Cheminformatics Capabilities(20)

How to handle discrepancies while you collect data for systemic review – pubrica par Pubrica
How to handle discrepancies while you collect data for systemic review – pubricaHow to handle discrepancies while you collect data for systemic review – pubrica
How to handle discrepancies while you collect data for systemic review – pubrica
Pubrica 55 vues
COMPUTATIONAL TOOLS FOR PREDICTION OF NUCLEAR RECEPTOR MEDIATED EFFECTS par EAJOA
COMPUTATIONAL TOOLS FOR PREDICTION OF NUCLEAR RECEPTOR MEDIATED EFFECTSCOMPUTATIONAL TOOLS FOR PREDICTION OF NUCLEAR RECEPTOR MEDIATED EFFECTS
COMPUTATIONAL TOOLS FOR PREDICTION OF NUCLEAR RECEPTOR MEDIATED EFFECTS
EAJOA201 vues
Hdat pdf-draft par shassant2
Hdat pdf-draftHdat pdf-draft
Hdat pdf-draft
shassant2343 vues
A Systematic Literature Review On Health Recommender Systems par Becky Goins
A Systematic Literature Review On Health Recommender SystemsA Systematic Literature Review On Health Recommender Systems
A Systematic Literature Review On Health Recommender Systems
Becky Goins3 vues
Predicting active compounds for lung cancer based on quantitative structure-a... par IJECEIAES
Predicting active compounds for lung cancer based on quantitative structure-a...Predicting active compounds for lung cancer based on quantitative structure-a...
Predicting active compounds for lung cancer based on quantitative structure-a...
IJECEIAES4 vues
Chemoinformatics—an introduction for computer scientists par unyil96
Chemoinformatics—an introduction for computer scientistsChemoinformatics—an introduction for computer scientists
Chemoinformatics—an introduction for computer scientists
unyil965.5K vues
Assessing Drug Safety Using AI par Databricks
Assessing Drug Safety Using AIAssessing Drug Safety Using AI
Assessing Drug Safety Using AI
Databricks849 vues
Developing tools for high resolution mass spectrometry-based screening via th... par Andrew McEachran
Developing tools for high resolution mass spectrometry-based screening via th...Developing tools for high resolution mass spectrometry-based screening via th...
Developing tools for high resolution mass spectrometry-based screening via th...
Andrew McEachran136 vues
Systematic reviews of topical fluorides for dental caries: a review of report... par cathykr
Systematic reviews of topical fluorides for dental caries: a review of report...Systematic reviews of topical fluorides for dental caries: a review of report...
Systematic reviews of topical fluorides for dental caries: a review of report...
cathykr1.4K vues
4th Annual Advancing the Pace of Chemical Risk Assessment par Michelle Angrish
4th Annual Advancing the Pace of Chemical Risk Assessment4th Annual Advancing the Pace of Chemical Risk Assessment
4th Annual Advancing the Pace of Chemical Risk Assessment
Embi cri review-2012-final par Peter Embi
Embi cri review-2012-finalEmbi cri review-2012-final
Embi cri review-2012-final
Peter Embi572 vues
Methods Of Search For Eligible Studies par Katie Gulley
Methods Of Search For Eligible StudiesMethods Of Search For Eligible Studies
Methods Of Search For Eligible Studies
Katie Gulley2 vues
A method for mining infrequent causal associations and its application in fin... par IEEEFINALYEARPROJECTS
A method for mining infrequent causal associations and its application in fin...A method for mining infrequent causal associations and its application in fin...
A method for mining infrequent causal associations and its application in fin...

Dernier

Distinct distributions of elliptical and disk galaxies across the Local Super... par
Distinct distributions of elliptical and disk galaxies across the Local Super...Distinct distributions of elliptical and disk galaxies across the Local Super...
Distinct distributions of elliptical and disk galaxies across the Local Super...Sérgio Sacani
30 vues12 diapositives
Nitrosamine & NDSRI.pptx par
Nitrosamine & NDSRI.pptxNitrosamine & NDSRI.pptx
Nitrosamine & NDSRI.pptxNileshBonde4
8 vues22 diapositives
EVALUATION OF HEPATOPROTECTIVE ACTIVITY OF SALIX SUBSERRATA IN PARACETAMOL IN... par
EVALUATION OF HEPATOPROTECTIVE ACTIVITY OF SALIX SUBSERRATA IN PARACETAMOL IN...EVALUATION OF HEPATOPROTECTIVE ACTIVITY OF SALIX SUBSERRATA IN PARACETAMOL IN...
EVALUATION OF HEPATOPROTECTIVE ACTIVITY OF SALIX SUBSERRATA IN PARACETAMOL IN...gynomark
12 vues15 diapositives
journal of engineering and applied science.pdf par
journal of engineering and applied science.pdfjournal of engineering and applied science.pdf
journal of engineering and applied science.pdfKSAravindSrivastava
7 vues7 diapositives
Workshop Chemical Robotics ChemAI 231116.pptx par
Workshop Chemical Robotics ChemAI 231116.pptxWorkshop Chemical Robotics ChemAI 231116.pptx
Workshop Chemical Robotics ChemAI 231116.pptxMarco Tibaldi
95 vues41 diapositives
RemeOs science and clinical evidence par
RemeOs science and clinical evidenceRemeOs science and clinical evidence
RemeOs science and clinical evidencePetrusViitanen1
26 vues96 diapositives

Dernier(20)

Distinct distributions of elliptical and disk galaxies across the Local Super... par Sérgio Sacani
Distinct distributions of elliptical and disk galaxies across the Local Super...Distinct distributions of elliptical and disk galaxies across the Local Super...
Distinct distributions of elliptical and disk galaxies across the Local Super...
Sérgio Sacani30 vues
EVALUATION OF HEPATOPROTECTIVE ACTIVITY OF SALIX SUBSERRATA IN PARACETAMOL IN... par gynomark
EVALUATION OF HEPATOPROTECTIVE ACTIVITY OF SALIX SUBSERRATA IN PARACETAMOL IN...EVALUATION OF HEPATOPROTECTIVE ACTIVITY OF SALIX SUBSERRATA IN PARACETAMOL IN...
EVALUATION OF HEPATOPROTECTIVE ACTIVITY OF SALIX SUBSERRATA IN PARACETAMOL IN...
gynomark12 vues
Workshop Chemical Robotics ChemAI 231116.pptx par Marco Tibaldi
Workshop Chemical Robotics ChemAI 231116.pptxWorkshop Chemical Robotics ChemAI 231116.pptx
Workshop Chemical Robotics ChemAI 231116.pptx
Marco Tibaldi95 vues
Ethical issues associated with Genetically Modified Crops and Genetically Mod... par PunithKumars6
Ethical issues associated with Genetically Modified Crops and Genetically Mod...Ethical issues associated with Genetically Modified Crops and Genetically Mod...
Ethical issues associated with Genetically Modified Crops and Genetically Mod...
PunithKumars618 vues
Conventional and non-conventional methods for improvement of cucurbits.pptx par gandhi976
Conventional and non-conventional methods for improvement of cucurbits.pptxConventional and non-conventional methods for improvement of cucurbits.pptx
Conventional and non-conventional methods for improvement of cucurbits.pptx
gandhi97616 vues
Company Fashion Show ChemAI 231116.pptx par Marco Tibaldi
Company Fashion Show ChemAI 231116.pptxCompany Fashion Show ChemAI 231116.pptx
Company Fashion Show ChemAI 231116.pptx
Marco Tibaldi74 vues
Open Access Publishing in Astrophysics par Peter Coles
Open Access Publishing in AstrophysicsOpen Access Publishing in Astrophysics
Open Access Publishing in Astrophysics
Peter Coles543 vues
Light Pollution for LVIS students par CWBarthlmew
Light Pollution for LVIS studentsLight Pollution for LVIS students
Light Pollution for LVIS students
CWBarthlmew5 vues
Metatheoretical Panda-Samaneh Borji.pdf par samanehborji
Metatheoretical Panda-Samaneh Borji.pdfMetatheoretical Panda-Samaneh Borji.pdf
Metatheoretical Panda-Samaneh Borji.pdf
samanehborji16 vues
application of genetic engineering 2.pptx par SankSurezz
application of genetic engineering 2.pptxapplication of genetic engineering 2.pptx
application of genetic engineering 2.pptx
SankSurezz6 vues
Gold Nanoparticle as novel Agent for Drug targeting (1).pptx par sakshijadhav9843
Gold Nanoparticle as novel Agent for Drug targeting (1).pptxGold Nanoparticle as novel Agent for Drug targeting (1).pptx
Gold Nanoparticle as novel Agent for Drug targeting (1).pptx
Synthesis and Characterization of Magnetite-Magnesium Sulphate-Sodium Dodecyl... par GIFT KIISI NKIN
Synthesis and Characterization of Magnetite-Magnesium Sulphate-Sodium Dodecyl...Synthesis and Characterization of Magnetite-Magnesium Sulphate-Sodium Dodecyl...
Synthesis and Characterization of Magnetite-Magnesium Sulphate-Sodium Dodecyl...
GIFT KIISI NKIN14 vues

Integrating an Analytical Methods and Mass Spectral Database with Cheminformatics Capabilities

  • 1. Innovative Research for a Sustainable Future www.epa.gov/research Integrating an Analytical Methods and Mass Spectral Database with Cheminformatics Capabilities Gregory Janesch1, Erik Carr1, Vicente Samano2, Brian Meyer2 and Antony Williams3 1. ORAU Student Services Contractor to Center for Computational Toxicology & Exposure, Office of Research & Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, USA 2. Senior Environmental Employment Program, US Environmental Protection Agency, Research Triangle Park, USA 3. Center for Computational Toxicology & Exposure, Office of Research & Development, U.S. Environmental Protection Agency, Research Triangle Park, NC, USA ` ACS West San Francisco, CA August 13-17, 2023 There are three kinds of data contained within the database. - Fact sheets are results-oriented documents with data associated with one or more substances including basic descriptions of health effects to monographs with NMR, Raman, and IR spectra. - Methods document an end-to-end analytical procedure for one or more substances, sometimes 100s of chemicals. The documents are curated to extract the chemical compounds and then annotated with information such as matrix and methodologies. - Spectra, in the form of lists of m/z-intensity pairs and parameters. In addition to the above information, records have assorted metadata stored in the database. These data include information such as experimental conditions, authors, a synopsis for the method or fact sheet, and other data depending on what kind of record it is. Data are open access and are derived from a variety of sources. These include online spectral databases, vendor methods, research groups, EPA databases and other government agencies. At the time of writing the database contains approximately: - 165,000 spectra (plus 600,000 externally linked spectra) - >700 fact sheets - >3300 methods General Searching Data Spectrum Search Description A large variety of sources for spectra, documented analytical procedures and methods, and other associated documentation exist and are, in theory, easily available with the usual web search. However, these sources are largely isolated from each other, not easy to find via general searches because of inconsistencies in chemical names and identifiers and then are highly varied in format. To address these challenges, the Analytical Methods and Open Spectra (AMOS) web application has been developed. AMOS is a database and associated web-based application containing several types of records searchable by common identifiers known to chemists (i.e., CASRNs, InChI Keys and chemical names). The authors thank the data curation team for their rigorous work in annotating and identifying information in the records. Chemical data extraction, curation and annotation is an essential part of this work. Primary search functionality searches all records for a single chemical substance. One half of the page (Fig.1) shows the searched compound (assuming a match) and yields a table of records containing that substance, the data source, associated methodology, and a short description of the record itself. Selecting a row in that table allows for viewing the contents of that record more closely, whether opening an analytical method or displaying a spectrum. For spectral data, an additional search option is available. If a mass range, methodology, and spectrum (as x,y pairs) are supplied, matching spectra with that mass and methodology, ranked by their similarity to the user-supplied spectrum will be returned. See Fig. 2. The top table lists the associated substance for the found spectrum (with associated DTXSID), the similarity of that spectrum, and a description of that spectrum. Below that table is an interactive plot of the overlap of the two spectra. Method Searches AMOS contains two functions for searching for methods. One is a simple table that lists all methods in the database (not pictured). This list can be filtered by several fields including matrix, analyte, and method name, allowing for quick discovery of methods that cover a known topic. The other, shown below, is a search for methods containing similar substances, thereby providing a starting point even for chemicals without methods. A substance is searched for and if methods exist they are returned. If there are no existing methods for that chemical then AMOS returns all methods which contain at least one substance with a sufficiently high Tanimoto structural similarity coefficient. This can be especially useful in cases where a substance does not have any methods associated with it at all – in the example below (see Fig. 3), the drug was only available starting in 2015, so there has been relatively little time to develop and publish methods for it. Acknowledgements Disclaimers This tool is currently internal to the US- EPA and still under development. Plans to release this to the public have not been finalized, but the process is hoped to be complete by early 2024.The data used in this application have not been thoroughly reviewed by the EPA and the user needs to exercise judgement in their use of the results. The views expressed in this poster are those of the authors and do not necessarily reflect the views or policies of the U.S. EPA Figure 1: The list of methods and LC-MS or GC-MS spectra associated with perfluorooctane- sulfonic acid (PFOS). Figure 2: A spectral similarity search result includes the similarity match for spectra and the list of associated chemical compounds. Figure 3: A search for a chemical with no matching methods then provides the associated structure to a Tanimoto structural similarity search to return methods with similar structures contained in them.