High resolution mass spectrometry (HRMS) and non-targeted analysis (NTA) are advancing the identification of emerging contaminants in environmental matrices, improving the means by which exposure analyses can be conducted. However, confidence in structure identification of unknowns in NTA presents challenges to analytical chemists. Structure identification requires integration of complementary data types such as reference databases, fragmentation prediction tools, and retention time prediction models. The goal of this research is to optimize and implement structure identification functionality within the US EPA’s CompTox Chemistry Dashboard, an open chemistry resource and web application containing data for ~760,000 substances. Rank-ordering the number of sources associated with chemical records within the Dashboard (Data Source Ranking) improves the identification of unknowns by bringing the most likely candidate structures to the top of a search results list. Database searching has been further optimized with the generation of MS-Ready Structures. MS-Ready structures are de-salted, stripped of stereochemistry, and mixture separated to replicate the form of a chemical observed via HRMS. Functionality to conduct batch searching of molecular formulae and monoisotopic masses was designed and released to improve searching efforts. Finally, a scoring-based identification scheme was developed, optimized, and surfaced via the Dashboard using multiple data streams contained within the database underlying the Dashboard. The scoring-based identification scheme improved the identification of unknowns over previous efforts using data source ranking alone. Combining these steps within an open chemistry resource provides a freely available software tool for structure identification and NTA. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.
Using the US EPA’s CompTox Chemistry Dashboard for structure identification and non-targeted analyses
1. Using the US EPA’s CompTox
Chemistry Dashboard for structure
identification and non-targeted analyses
Antony Williams1, Andrew D. McEachran3, Seth Newton2,
Kristin Isaacs2, Katherine Phillips2, Nancy Baker1,
Chris Grulke1 and Jon R. Sobus2
1) National Center for Computational Toxicology, U.S. Environmental Protection Agency, RTP, NC
2) National Exposure Research Laboratory, U.S. Environmental Protection Agency, RTP, NC
3) Oak Ridge Institute of Science and Education (ORISE) Research Participant, Research Triangle Park, NC
March 2018
ACS Spring Meeting, New Orleans
http://www.orcid.org/0000-0002-2668-4821
The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
2. The CompTox Chemistry Dashboard
• A publicly accessible website delivering access:
– ~760,000 chemicals with related property data
– Experimental and predicted physicochemical property data
– Experimental Human and Ecological hazard data
– Integration to “biological assay data” for 1000s of chemicals
– Information regarding consumer products containing chemicals
– Links to other agency websites and public data resources
– “Literature” searches for chemicals using public resources
– “Batch searching” for thousands of chemicals
– DOWNLOADABLE Open Data for reuse and repurposing
1
13. Dashboard for Structure ID
• Structure Identification using the dashboard
– Formula/mass-based searching – 1 chemical at a time
– Distilling structures into “MS-Ready form”
12
17. Dashboard for Structure ID
• Structure Identification using the dashboard
– Formula/mass-based searching – 1 chemical at a time
– Distilling structures into “MS-Ready form”
– Ranking based on metadata
16
26. Dashboard for Structure ID
• Structure Identification using the dashboard
– Formula/mass-based searching – 1 chemical at a time
– Distilling structures into “MS-Ready form”
– Ranking based on metadata
– Batch searching of formulae and masses
25
39. Conclusion
• The CompTox Chemistry Dashboard provides
access to data for ~760,000 chemicals
• High quality curated data and rich metadata
facilitates mass spec analysis
• “MS-Ready” processed data enables structure
identification
38
40. Acknowledgments
• The CompTox Chemistry Dashboard team
• NERL colleagues:
– Jon Sobus, Elin Ulrich, Mark Strynar, Seth Newton (NTA Analysis)
– Katherine Phillips, Kathie Dionisio, Kristin Isaacs (Consumer Products
Database)
• Emma Schymanski – Luxembourg Center for
Systems Biomedicine (MS-ready/NTA)
39
41. Contact
Antony Williams
US EPA Office of Research and Development
National Center for Computational Toxicology (NCCT)
Williams.Antony@epa.gov
ORCID: https://orcid.org/0000-0002-2668-4821
40