Characterising unknown metabolites talk from the ASMS Fall Metabolomics Informatics Workshop 2018 in San Francisco, California.
https://www.asms.org/conferences/fall-workshop/program
Slides with active hyperlinks accessible via tinyurl on the front page.
2. 2
Turning Unknowns into Knowns
o Knowns and Unknowns
o Overview of Resources
• Compound databases
• “Make your own” molecules
• Spectral libraries
o Walk-through Swiss Wastewater
• Targets
• Suspect screening approaches
• Annotation of non-targets with MetFrag
o Exchanging information for annotating
unknowns…
o Take home messages
2.3
3. 3
Knowns and Unknowns …
Peisl, Schymanski & Wilmes, 2018 Anal. Chim. Acta, DOI: 10.1016/j.aca.2017.12.034
Known known Unknown known
Known unknown Unknown unknown
-> Expected in sample
-> Confirmed by mass
spectrometry
-> Reference standard
available
-> Known as part of expert
knowledge or a mixture
-> Undocumented as an
individual compound
-> “Suspected” or unknown
to investigator
-> Documented in databases,
literature
-> Compound not previously
documented
-> Full elucidation and
confirmation required
4. 4
Searching for Known Small Molecules …
o Compound Databases
Peisl, Schymanski & Wilmes, 2018 Anal. Chim. Acta, DOI: 10.1016/j.aca.2017.12.034
PubChem: >96 million
https://pubchem.ncbi.nlm.nih.gov/
ChemSpider: >69 million
http://www.chemspider.com/
CompTox Chemicals Dashboard: >765 000
https://comptox.epa.gov/dashboard/
Human Metabolome DB (HMDB): >114 000
http://www.hmdb.ca/
5. 5
Searching for Known Small Molecules …
o Compound Databases … isn’t 96 million enough?
Peisl, Schymanski & Wilmes, 2018 Anal. Chim. Acta, DOI: 10.1016/j.aca.2017.12.034
Quick answer … NO!
E. coli data :N. Zamboni, IMSB, ETH Zürich
in silico prediction
6. 6
Searching for More (Un)Known Small Molecules …
Jeffryes et al, 2015, MINEs, J. Cheminf, 7:44. DOI: 10.1186/s13321-015-0087-1
o In silico metabolite prediction – example of MINE (2015)
KEGG MINE
13,307 => 571,368
EcoCyc MINE
1,832 => 54,719
YMDB MINE
1,978 => 100,755
HMDB [15] MINE
23,035 => 400,414
7. 7
Searching for More (Un)Known Small Molecules …
Jeffryes et al, 2015, MINEs, J. Cheminf, 7:44. DOI: 10.1186/s13321-015-0087-1
o In silico metabolite prediction – example of MINE (2015)
• First generation only … combinatorial explosion!
KEGG MINE
13,307 => 571,368
EcoCyc MINE
1,832 => 54,719
YMDB MINE
1,978 => 100,755
HMDB MINE
23,035 => 400,414
Speculation …
PubChem MINE
95 million => 1.6 billion … first generation only?!?!
8. 8
Searching for MORE (Un)Known Small Molecules…
Source: A. Kerber, R. Laue, M. Meringer, C. Rücker (2005) MATCH 54 (2), 301-312.
o Structure Generation
• But of course most of these do not exist
Molecular Mass
NumberofStructures
50 70 90 110 130 150
1100100001000000100000000
NIST MS LibraryNIST MS Library
Beilstein Registry
NIST MS Library
Beilstein Registry
Molecular Graphs
Structure Generation
100 million at mass = 150 Da
NIST MS Library
~1-200 at mass = 150
Spectral Libraries
9. 9
Searching for Small Molecules in Spectral Libraries
o … to find what is “on record” with MS “fingerprint”
• Too many different MS/MS libraries (and they are still too small)
Peisl, Schymanski & Wilmes, 2018 Anal. Chim. Acta, DOI: 10.1016/j.aca.2017.12.034
10. 10
Do we need all these libraries?
Vinaixa, Schymanski, Navarro, Neumann, Salek, Yanes, 2016, TrAC, DOI: 10.1016/j.trac.2015.09.005
o Yes … most libraries still have many unique entries
= HMDB,
GNPS,
MassBank,
ReSpect
Compound lists
provided by:
S. Stein, R. Mistrik, Agilent
11. 11
Mind the Gap!
Frainay, C. et al. (2018) “Mind the Gap: …” Metabolites: http://www.mdpi.com/2218-1989/8/3/51
o Only 23-60 % of (defined) metabolites in Genome-Scale Metabolic
Networks are covered by (combined!) Mass Spectral Libraries
12. 12
Mind the Gap!
Frainay, C. et al. (2018) “Mind the Gap: …” Metabolites: http://www.mdpi.com/2218-1989/8/3/51
o Best library to choose depends highly on your dataset
• Example: MSforID (https://msforid.com/) is poor for metabolic
networks – but great for forensic toxicology!
13. 13
Environmental Chemistry and Metabolomics …
Source: Fenner et al. (2013) Science, 341(6147), 752-758. DOI: 10.1126/science.1236281
…have surprisingly many things in common …
15. 15
Target, Suspect and Non-Target Screening
KNOWNS SUSPECTS No Prior Knowledge
HPLC separation and HR-MS/MS
TARGET
ANALYSIS
SUSPECT
SCREENING
NON-TARGET
SCREENING
Targets found Suspects found Masses of interest
(Molecular formula)
DATABASE
SEARCH
STRUCTURE
GENERATION
Confirmation and quantification of compounds present
Candidate selection (retention time, MS/MS, calculated properties)
Sampling extraction (SPE) HPLC separation HR-MS/MS
Time, Effort & Number of Compounds….
SUSPECTS
SPECTRUM
SEARCH
Spectral match
16. 16
Identification Strategies and Confidence
Schymanski et al, 2014, ES&T. DOI: 10.1021/es5002105 & Schymanski et al. 2015, ABC, DOI: 10.1007/s00216-015-8681-7
Peak
picking
Non-target HR-MS(/MS) Acquisition
Target
Screening
Suspect
Screening
Non-target
Screening
Start
Level 1 Confirmed Structure
by reference standard
Level 2 Probable Structure
by library/diagnostic evidence
Start
Level 3 Tentative Candidate(s)
suspect, substructure, class
Level 4 Unequivocal Molecular Formula
insufficient structural evidence
Start
Level 5 Mass of Interest
multiple detection, trends, …
“downgrading” with
contradictory evidence
Increasing identification
confidence
Target list Suspect list
Peak picking or XICs
17. 17
Target Analysis: Status Quo (>364 targets)
Schymanski et al. (2014), ES&T, 48: 1811-1818. DOI: 10.1021/es4044374
Target List
HPLC separation and HR-MS/MS
TARGET
ANALYSIS
Targets found
Confirmation and quantification of compounds present
Sampling extraction (SPE) HPLC separation HR-MS/MS
TPs!
18. 18
Target Analysis: Status Quo (>364 targets)
Schymanski et al. (2014), ES&T, 48: 1811-1818. DOI: 10.1021/es4044374
Target List
HPLC separation and HR-MS/MS
TARGET
ANALYSIS
Targets found
Confirmation and quantification of compounds present
Sampling extraction (SPE) HPLC separation HR-MS/MS
m/z
RT
19. 19
Swiss Wastewater: Top 30 Peaks (ESI-)
Schymanski et al. (2014), ES&T, 48: 1811-1818. DOI: 10.1021/es4044374
Artificial Sweeteners
Diclofenac
Pictures: www.coca-cola-com; www.rivella.ch; www.voltargengel.com
20. 20
Suspect Screening: Different Approaches
Target List Suspect List
HPLC separation and HR-MS/MS
TARGET
ANALYSIS
SUSPECT
SCREENING
Targets found Suspects found
Confirmation and quantification of compounds present
Candidate selection (retention time, MS/MS, calculated properties)
Sampling extraction (SPE) HPLC separation HR-MS/MS
o Screen for predicted transformation
products of known parent compounds
o Look for “well known” substances
without reference standards
o Screen for known homologue series
o Search in mass spectral libraries
21. 21
Suspect Screening: Benzotriazole TPs
Huntscha et al. 2014, ES&T, 48(8), 4435-4443. DOI: 10.1021/es405694z
28 Suspects
HPLC separation and HR-MS/MS
SUSPECT
SCREENING
11 masses for
6 suspect formulas
7 with MS/MS
1 reference std.
1 TP confirmed
1 TP “likely”, no std.
[UM-PPS]
↓
Eawag-PPS
↓
[enviPath]
22. 22
Suspect Screening: Benzotriazole TPs
Huntscha et al. 2014, ES&T, 48(8), 4435-4443. DOI: 10.1021/es405694z
28 Suspects
HPLC separation and HR-MS/MS
SUSPECT
SCREENING
11 masses for
6 suspect formulas
7 with MS/MS
1 reference std.
1 TP confirmed
1 TP “likely”, no std.
[UM-PPS]
↓
Eawag-PPS
↓
[enviPath]
N
N
N
H
O
OH
N
N
N
H
O OH
- Predicted with
Eawag-PPS
- No standard
- Not in ChemSpider
- In the Dashboard
DTXSID10212177
- Confirmed with
reference std.
- Observed in
WWTP effluents
23. 23
Suspect Screening: Different Approaches
Target List Suspect List
HPLC separation and HR-MS/MS
TARGET
ANALYSIS
SUSPECT
SCREENING
Targets found Suspects found
Confirmation and quantification of compounds present
Candidate selection (retention time, MS/MS, calculated properties)
Sampling extraction (SPE) HPLC separation HR-MS/MS
o Screen for predicted transformation
products of known parent compounds
o Look for “well known” substances
without reference standards
o Screen for known homologue series
o Search in mass spectral libraries
24. 24
Suspect Screening – “Screen Smart”
Moschet et al 2013, ES&T. DOI: 10.1021/ac4021598
o Screened 213 pesticides & TPs without standards => confirm 19 new IDs
o Browse: https://comptox.epa.gov/dashboard/chemical_lists/swisspest
25. 25
NORMAN Network Suspect List Exchange
o http://www.norman-network.com/?q=node/236
ReferencesFull Lists InChIKeys
26. 26
Lists on CompTox Chemicals Dashboard
https://comptox.epa.gov/dashboard/chemical_lists/
More lists become available with every release
27. 27
Suspect Screening: Different Approaches
Target List Suspect List
HPLC separation and HR-MS/MS
TARGET
ANALYSIS
SUSPECT
SCREENING
Targets found Suspects found
Confirmation and quantification of compounds present
Candidate selection (retention time, MS/MS, calculated properties)
Sampling extraction (SPE) HPLC separation HR-MS/MS
o Screen for predicted transformation
products of known parent compounds
o Look for “well known” substances
without reference standards
o Screen for known homologue series
o Search in mass spectral libraries
28. 28
RECAP: Target Analysis: Status Quo (>364 targets)
Schymanski et al. (2014), ES&T, 48: 1811-1818. DOI: 10.1021/es4044374
Target List
HPLC separation and HR-MS/MS
TARGET
ANALYSIS
Targets found
Confirmation and quantification of compounds present
Sampling extraction (SPE) HPLC separation HR-MS/MS
m/z
RT
30. 30
Swiss Wastewater: Top 30 Peaks (ESI-)
Schymanski et al. (2014), ES&T, 48: 1811-1818. DOI: 10.1021/es4044374
Artificial Sweeteners
Diclofenac
Pictures: www.coca-cola-com; www.rivella.ch; www.voltargengel.com
31. 31
Swiss Wastewater: Top 30 Peaks (ESI-)
Schymanski et al. (2014), ES&T, 48: 1811-1818. DOI: 10.1021/es4044374
S OO
O
-
O
S
O
-
O
CH2
m/z = 79.96 m/z = 183.01
Picture: www.momsteam.com
32. 32
Surfactant Screening From Literature
Schymanski et al. (2014), ES&T, 48: 1811-1818. DOI: 10.1021/es4044374
Literature sources
o Formulas, masses (ions), retention times and intensities
o Spectra of selected compounds (different instruments)
Gonzalez et al. Rapid Comm.
Mass Spec. 2008, 22: 1445-54
Lara-Martin et al. EST. 2010, 44: 1670-1676
33. 33
Homologous Series Detection
M. Loos & H Singer, 2017. J. Cheminf. DOI: 10.1186/s13321-017-0197-z & Schymanski et al. 2014, ES&T DOI: 10.1021/es4044374
http://www.envihomolog.eawag.ch/
Search for
discrete
mass
differences S OO
OH
CH3
CH3
m
n
C9H19
O
O
S
O
O
OHm
34. 34
Homologous Series Detection
M. Loos & H Singer, 2017. J. Cheminf. DOI: 10.1186/s13321-017-0197-z & Schymanski et al. 2014, ES&T DOI: 10.1021/es4044374
S OO
OH
CH3
CH3
m
n
DATS
S OO
OH
O
OH
CH3
()n ()m
SPAC
S OO
OH
O
OHCH3
()n
()m
STAC
http://www.envihomolog.eawag.ch/
35. 35
Swiss Wastewater: Top 30 Peaks (ESI-)
Schymanski et al. (2014), ES&T, 48: 1811-1818. DOI: 10.1021/es4044374
Acesulfame
Diclofenac
Cyclamate
Saccharin
C10DATS
C10SPAC
SPA5C
C15DATS
STA6C
C9DATS
SPA2DC
S OO
OH
O
OH
CH3
S OO
OH
CH3
CH3
()n
()m
SPAC
DATS
()n ()m
36. 36
Supporting Evidence for Homologues
Stravs et al. (2013), J. Mass Spectrom, 48(1):89-99. DOI: 10.1002/jms.3131
OHSO
O
CH3
O
OH
m n
SPA-9C
m+n=6
Formulas: http://sourceforge.net/projects/genform/
Meringer et al, 2011, MATCH 65, 259-290
Data: Schymanski et al. 2014, ES&T, 48:
1811-1818. DOI: 10.1021/es4044374
Chromatography and MS/MS Annotation
Literature: LIT00034,35
Sample: ETS00002
Standard: ETS00016,17,19,20
https://github.com/MassBank/RMassBank/
37. 37
Cross-Linking Homologues in the Dashboard
Schymanski, Grulke, Williams et al, in prep. & Williams et al. 2017 J. Cheminformatics 9:61 DOI: 10.1186/s13321-017-0247-6
https://comptox.epa.gov/dashboard/chemical_lists/eawagsurf
38. 38
Suspect Screening: Different Approaches
Target List Suspect List
HPLC separation and HR-MS/MS
TARGET
ANALYSIS
SUSPECT
SCREENING
Targets found Suspects found
Confirmation and quantification of compounds present
Candidate selection (retention time, MS/MS, calculated properties)
Sampling extraction (SPE) HPLC separation HR-MS/MS
o Screen for predicted transformation
products of known parent compounds
o Look for “well known” substances
without reference standards
o Screen for known homologue series
o Search in mass spectral libraries
39. 39
Searching for Small Molecules in Spectral Libraries
Peisl, Schymanski & Wilmes, 2018 Anal. Chim. Acta, DOI: 10.1016/j.aca.2017.12.034
40. 40
What about Non-Target Screening?
Target List Suspect List (no prior information)
HPLC separation and HR-MS/MS
TARGET
ANALYSIS
SUSPECT
SCREENING
NON-TARGET
SCREENING
Targets found Suspects found Masses of interest
(Molecular formula)
DATABASE
SEARCH
STRUCTURE
GENERATION
Confirmation and quantification of compounds present
Candidate selection (retention time, MS/MS, calculated properties)
Sampling extraction (SPE) HPLC separation HR-MS/MS
Number of compounds
41. 41
Swiss Wastewater: Top 30 Peaks (ESI-)
Schymanski et al. (2014), ES&T, 48: 1811-1818. DOI: 10.1021/es4044374
Acesulfame
Diclofenac
Cyclamate
Saccharin
C10DATS
C10SPAC
SPA5C
C15DATS
STA6C
C9DATS
SPA2DC
S OO
OH
O
OH
CH3
S OO
OH
CH3
CH3
()n
()m
SPAC
DATS
()n ()m
43. 43
MetFrag2.3: Non-target Identification
Ruttkies, Schymanski, Wolf, Hollender, Neumann (2016) J. Cheminf., 2016, DOI: 10.1186/s13321-016-0115-9
MetFrag
2010
MetFrag2.3
Fragments
only
MetFrag2.3
+References
+Retention time
ChemSpider1
Top 1 Ranks 73 105 420
% Top 1 Ranks 15 % 22 % 89 %
PubChem2
Top 1 Ranks - 30 336
% Top 1 Ranks - 6 % 71 %
Test set of 473 Eawag Target Substances
1www.chemspider.com; ~34 million entries
2https://pubchem.ncbi.nlm.nih.gov/; ~74 million entries
http://c-ruttkies.github.io/MetFrag/
Similar results with 3 independent datasets of 310, 289 and 225 substances
from Eawag and UFZ (www.massbank.eu)
44. 44
The Power of the Metadata (Top 1 ranks)
Schymanski et al, 2017, J Cheminf., DOI: 10.1186/s13321-017-0207-1 www.casmi-contest.org
45. 45
MetFrag2.3: Non-target Identification
Ruttkies, Schymanski, Wolf, Hollender, Neumann (2016) J. Cheminf., 2016, DOI: 10.1186/s13321-016-0115-9
Try with the Web Interface: http://msbi.ipb-halle.de/MetFragBeta/
46. 46
MetFrag2.3: Non-target Identification
Ruttkies, Schymanski, Wolf, Hollender, Neumann (2016) J. Cheminf., 2016, DOI: 10.1186/s13321-016-0115-9
Try with the Web Interface: http://msbi.ipb-halle.de/MetFragBeta/
47. 47
Swiss Wastewater: Top 30 Peaks (ESI-)
Schymanski et al. (2014), ES&T, 48: 1811-1818. DOI: 10.1021/es4044374
Acesulfame
Diclofenac
Cyclamate
Saccharin
C10DATS
C10SPAC
SPA5C
C15DATS
STA6C
C9DATS
SPA2DC
S N
SO O
OH
Now 13 of the top 30 (tentatively) identified
48. 48
We still have many unknowns …
(l) Data from Schymanski et al 2014, ES&T DOI: 10.1021/es4044374. (r) E. coli data provided by N. Zamboni, IMSB, ETH Zürich.
Environment
Cells
50. 50
Exchanging Knowledge … Open Science Helps!
We need to be able to find and annotate the unexpected!
C23F48O7
+CF2
51. 51
Exchanging Knowledge … Open Science Helps!
We need to be able to find and annotate the unexpected!
52. 52
Take Home Messages
Unknowns and High Resolution Mass Spectrometry
o Over 60 % of HR-MS peaks are potentially relevant but unknown
Environment
Cells
53. 53
Take Home Messages
o Over 60 % of HR-MS peaks are potentially relevant but unknown
o Annotating unknowns requires data and evidence from many different sources
o Many excellent workflows available to collate this information
o Incorporation of all available metadata is critical to success!
o E.g. MetFrag2.3 has greatly improved the speed and success of tentative
identification of “known unknowns”: 15 % => 89 % Ranked Number 1
o http://c-ruttkies.github.io/MetFrag/
Unknowns and High Resolution Mass Spectrometry
2.3
54. 54
Take Home Messages
o Over 60 % of HR-MS peaks are potentially relevant but unknown
o Annotating unknowns requires data and evidence from many different sources
o Exchange expert knowledge worldwide
o Community efforts contribute greatly to improved cross-annotation
o Information in the public domain helps everyone!
o You never know when it will help you
Unknowns and High Resolution Mass Spectrometry
Schymanski et al. 2015, ABC, DOI: 10.1007/s00216-015-8681-7; Alygizakis et al. 2018 ES&T, DOI: 10.1021/acs.est.8b00365