SlideShare a Scribd company logo
1 of 24
Download to read offline
ChemAxon UGM, San Diego, USA 25th September 2013
Recent improvements in Marvin v6:
Reaction Atom Mapping and its Application to
Reaction Validation in Pharmaceutical ELNs
Daniel Lowe and Roger Sayle
NextMove Software
Cambridge, UK
ChemAxon UGM, San Diego, USA 25th September 2013
What is Atom-Mapping?
Mapping
algorithm
ChemAxon UGM, San Diego, USA 25th September 2013
Why Perform Atom-Mapping?
• Assigning roles to reagents
• Normalization of reactions for registration
ChemAxon UGM, San Diego, USA 25th September 2013
Why Perform Atom-Mapping?
• More precise database searches
– Solvents/catalysts can be distinguished from
reactants
– Allows the relationship between the reactant
atoms and product atoms to be made explicit
ChemAxon UGM, San Diego, USA 25th September 2013
Example
• I want to find reactions converting an alkene
to a cyclopropane so I search for C=C>>C1CC1
ChemAxon UGM, San Diego, USA 25th September 2013
Why Perform Atom-Mapping?
• Identifying suspect reactions:
ChemAxon UGM, San Diego, USA 25th September 2013
Chemaxon atom mapping
ChemAxon UGM, San Diego, USA 25th September 2013
Chemaxon atom mapping
ChemAxon UGM, San Diego, USA 25th September 2013
Atom mapping modes
• Complete
• Changing
• Matching
ChemAxon UGM, San Diego, USA 25th September 2013
Methodology
Test set Reactions
Pharmaceutical ELN subset 18,244
ChemReact68 database 67,926
SPRESI database subset 5,230
Reactions extracted from 2008-
2011 USPTO patent applications*
562,872
* Lowe, D. M. Automated Extraction of Reactions from the Patent Literature.
243rd ACS National Meeting & Exposition, San Diego, CA, March 27, 2012.
ChemAxon UGM, San Diego, USA 25th September 2013
MetricS used
• Were all product atoms mapped
– Measures recall
• How many C-C bonds were broken
– Measures precision
ChemAxon UGM, San Diego, USA 25th September 2013
Ability to map all product atoms
0
10
20
30
40
50
60
70
80
PharmaELN ChemReact68 SPRESI USPTO
Percentofreactionswithallproductatoms
mapped
Marvin 5.10
Marvin 6.0
ChemDraw 12
ChemAxon UGM, San Diego, USA 25th September 2013
c-c bonds broken
0.0
0.2
0.4
0.6
0.8
1.0
1.2
PharmaELN ChemReact68 SPRESI USPTO
AveragenumberofC-Cbondsbrokenpermapping
(lowerisbetter)
Marvin 5.10
Marvin 6.0
ChemDraw 12
ChemAxon UGM, San Diego, USA 25th September 2013
Marvin 5.10
ChemDraw 12
Marvin 6.0
ChemAxon UGM, San Diego, USA 25th September 2013
Speed Comparison
*Comparison performed on the PharmaELN dataset on an i7-2600
0
50
100
150
200
250
300
350
Marvin 5.12 Marvin 6.0 Marvin 6.0
(multithreaded)
Reactionsmappedpersecond
ChemAxon UGM, San Diego, USA 25th September 2013
Difficult cases
ΔT
ChemAxon UGM, San Diego, USA 25th September 2013
Areas for improvements:
Implicit stoichiometry
ChemAxon UGM, San Diego, USA 25th September 2013
Areas for improvements:
many choices for reactant atom mapping
ChemAxon UGM, San Diego, USA 25th September 2013
0
10
20
30
40
50
60
70
80
90
100
PharmaELN
Percentofreactionswithallproductatomsmapped
Marvin 6.0
ChemDraw 12
Marvin6 + ChemDraw12
Consensus Result*
Consensus Methods
* Marvin 6.0 +
ChemDraw12 + 2
variants of GGA’s
Indigo toolkit +
InfoChem ICMap +
Pipeline Pilot + MDL
Cheshire
ChemAxon UGM, San Diego, USA 25th September 2013
Beyond atom mapping
• Missing reactants (often for routine reactions)
ChemAxon UGM, San Diego, USA 25th September 2013
Beyond atom mapping
• Change of stereoisomer or chiral resolution
(E)-3-{8-[2-(4-Isopropyl-1,3-thiazol-2-yl)ethyl]-2-methoxy-4-oxo-4H-pyrido[1,2-a]pyrimidin-3-yl}-2-propenoic acid (1
mg) was dissolved in CDCl3 (0.5 ml) and irradiated with light from a fluorescent lamp
for 19 hours . The solvent was evaporated to obtain the title compound (1 mg).
ChemAxon UGM, San Diego, USA 25th September 2013
Atom mapping + classification
0
10
20
30
40
50
60
70
80
90
100
Atom mapping
algorithms alone
Combined with
NameRXN
Percentofreactionswithallproduct
atomsmapped
Marvin 6.0
ChemDraw 12
Consensus
Result
Verified /
Recognised
by
NameRXN
(71%)
ChemAxon UGM, San Diego, USA 25th September 2013
conclusions
• Marvin v6’s atom mapping algorithm provides
large improvements in recall, precision and speed
over v5
• Atom mapping in some cases isn’t as simple as
finding a maximum common subgraph mapping
• Classification algorithms can be useful for the
validation of some reactions
ChemAxon UGM, San Diego, USA 25th September 2013
acknowledgements
• Zsolt Mohacsi and Istvan Rabel, ChemAxon
• Ed Griffen and Nick Tomkinson, AstraZeneca
• Andrew Wooster, GSK
• Hans Kraut, InfoChem
• Thank you for your time.

More Related Content

Similar to Recent improvements in marvin v6 reaction atom mapping and its application to reaction validation in pharmaceutical el ns

20130827 defense y_song
20130827 defense y_song20130827 defense y_song
20130827 defense y_song
songx205
 

Similar to Recent improvements in marvin v6 reaction atom mapping and its application to reaction validation in pharmaceutical el ns (6)

Ensuring Structural Compliance of Electric Vehicle Battery Pack Against Crush...
Ensuring Structural Compliance of Electric Vehicle Battery Pack Against Crush...Ensuring Structural Compliance of Electric Vehicle Battery Pack Against Crush...
Ensuring Structural Compliance of Electric Vehicle Battery Pack Against Crush...
 
IRJET- Static Analysis of Pulsar Bike Frame Made Up of Aluminum Alloy 6063
IRJET- Static Analysis of Pulsar Bike Frame Made Up of Aluminum Alloy 6063IRJET- Static Analysis of Pulsar Bike Frame Made Up of Aluminum Alloy 6063
IRJET- Static Analysis of Pulsar Bike Frame Made Up of Aluminum Alloy 6063
 
Clamp onguide v01_lowres
Clamp onguide v01_lowresClamp onguide v01_lowres
Clamp onguide v01_lowres
 
M55 Rocket Separation Operation 11 December 2012
M55 Rocket Separation Operation 11 December 2012M55 Rocket Separation Operation 11 December 2012
M55 Rocket Separation Operation 11 December 2012
 
20130827 defense y_song
20130827 defense y_song20130827 defense y_song
20130827 defense y_song
 
An Analysis of Amoxicillin Through GCMS and Later FTIR
An Analysis of Amoxicillin Through GCMS and Later FTIRAn Analysis of Amoxicillin Through GCMS and Later FTIR
An Analysis of Amoxicillin Through GCMS and Later FTIR
 

More from NextMove Software

CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
NextMove Software
 

More from NextMove Software (20)

DeepSMILES
DeepSMILESDeepSMILES
DeepSMILES
 
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
 
Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...
 
CINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speedCINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speed
 
A de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESA de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILES
 
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionRecent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
 
Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...
 
Comparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsComparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule Implementations
 
Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...
 
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
 
Recent improvements to the RDKit
Recent improvements to the RDKitRecent improvements to the RDKit
Recent improvements to the RDKit
 
Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...
 
Digital Chemical Representations
Digital Chemical RepresentationsDigital Chemical Representations
Digital Chemical Representations
 
Challenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptionsChallenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptions
 
PubChem as a Biologics Database
PubChem as a Biologics DatabasePubChem as a Biologics Database
PubChem as a Biologics Database
 
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
 
CINF 13: Pistachio - Search and Faceting of Large Reaction Databases
CINF 13: Pistachio - Search and Faceting of Large Reaction DatabasesCINF 13: Pistachio - Search and Faceting of Large Reaction Databases
CINF 13: Pistachio - Search and Faceting of Large Reaction Databases
 
Building on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesBuilding on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfiles
 
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
 
Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

Recent improvements in marvin v6 reaction atom mapping and its application to reaction validation in pharmaceutical el ns

  • 1. ChemAxon UGM, San Diego, USA 25th September 2013 Recent improvements in Marvin v6: Reaction Atom Mapping and its Application to Reaction Validation in Pharmaceutical ELNs Daniel Lowe and Roger Sayle NextMove Software Cambridge, UK
  • 2. ChemAxon UGM, San Diego, USA 25th September 2013 What is Atom-Mapping? Mapping algorithm
  • 3. ChemAxon UGM, San Diego, USA 25th September 2013 Why Perform Atom-Mapping? • Assigning roles to reagents • Normalization of reactions for registration
  • 4. ChemAxon UGM, San Diego, USA 25th September 2013 Why Perform Atom-Mapping? • More precise database searches – Solvents/catalysts can be distinguished from reactants – Allows the relationship between the reactant atoms and product atoms to be made explicit
  • 5. ChemAxon UGM, San Diego, USA 25th September 2013 Example • I want to find reactions converting an alkene to a cyclopropane so I search for C=C>>C1CC1
  • 6. ChemAxon UGM, San Diego, USA 25th September 2013 Why Perform Atom-Mapping? • Identifying suspect reactions:
  • 7. ChemAxon UGM, San Diego, USA 25th September 2013 Chemaxon atom mapping
  • 8. ChemAxon UGM, San Diego, USA 25th September 2013 Chemaxon atom mapping
  • 9. ChemAxon UGM, San Diego, USA 25th September 2013 Atom mapping modes • Complete • Changing • Matching
  • 10. ChemAxon UGM, San Diego, USA 25th September 2013 Methodology Test set Reactions Pharmaceutical ELN subset 18,244 ChemReact68 database 67,926 SPRESI database subset 5,230 Reactions extracted from 2008- 2011 USPTO patent applications* 562,872 * Lowe, D. M. Automated Extraction of Reactions from the Patent Literature. 243rd ACS National Meeting & Exposition, San Diego, CA, March 27, 2012.
  • 11. ChemAxon UGM, San Diego, USA 25th September 2013 MetricS used • Were all product atoms mapped – Measures recall • How many C-C bonds were broken – Measures precision
  • 12. ChemAxon UGM, San Diego, USA 25th September 2013 Ability to map all product atoms 0 10 20 30 40 50 60 70 80 PharmaELN ChemReact68 SPRESI USPTO Percentofreactionswithallproductatoms mapped Marvin 5.10 Marvin 6.0 ChemDraw 12
  • 13. ChemAxon UGM, San Diego, USA 25th September 2013 c-c bonds broken 0.0 0.2 0.4 0.6 0.8 1.0 1.2 PharmaELN ChemReact68 SPRESI USPTO AveragenumberofC-Cbondsbrokenpermapping (lowerisbetter) Marvin 5.10 Marvin 6.0 ChemDraw 12
  • 14. ChemAxon UGM, San Diego, USA 25th September 2013 Marvin 5.10 ChemDraw 12 Marvin 6.0
  • 15. ChemAxon UGM, San Diego, USA 25th September 2013 Speed Comparison *Comparison performed on the PharmaELN dataset on an i7-2600 0 50 100 150 200 250 300 350 Marvin 5.12 Marvin 6.0 Marvin 6.0 (multithreaded) Reactionsmappedpersecond
  • 16. ChemAxon UGM, San Diego, USA 25th September 2013 Difficult cases ΔT
  • 17. ChemAxon UGM, San Diego, USA 25th September 2013 Areas for improvements: Implicit stoichiometry
  • 18. ChemAxon UGM, San Diego, USA 25th September 2013 Areas for improvements: many choices for reactant atom mapping
  • 19. ChemAxon UGM, San Diego, USA 25th September 2013 0 10 20 30 40 50 60 70 80 90 100 PharmaELN Percentofreactionswithallproductatomsmapped Marvin 6.0 ChemDraw 12 Marvin6 + ChemDraw12 Consensus Result* Consensus Methods * Marvin 6.0 + ChemDraw12 + 2 variants of GGA’s Indigo toolkit + InfoChem ICMap + Pipeline Pilot + MDL Cheshire
  • 20. ChemAxon UGM, San Diego, USA 25th September 2013 Beyond atom mapping • Missing reactants (often for routine reactions)
  • 21. ChemAxon UGM, San Diego, USA 25th September 2013 Beyond atom mapping • Change of stereoisomer or chiral resolution (E)-3-{8-[2-(4-Isopropyl-1,3-thiazol-2-yl)ethyl]-2-methoxy-4-oxo-4H-pyrido[1,2-a]pyrimidin-3-yl}-2-propenoic acid (1 mg) was dissolved in CDCl3 (0.5 ml) and irradiated with light from a fluorescent lamp for 19 hours . The solvent was evaporated to obtain the title compound (1 mg).
  • 22. ChemAxon UGM, San Diego, USA 25th September 2013 Atom mapping + classification 0 10 20 30 40 50 60 70 80 90 100 Atom mapping algorithms alone Combined with NameRXN Percentofreactionswithallproduct atomsmapped Marvin 6.0 ChemDraw 12 Consensus Result Verified / Recognised by NameRXN (71%)
  • 23. ChemAxon UGM, San Diego, USA 25th September 2013 conclusions • Marvin v6’s atom mapping algorithm provides large improvements in recall, precision and speed over v5 • Atom mapping in some cases isn’t as simple as finding a maximum common subgraph mapping • Classification algorithms can be useful for the validation of some reactions
  • 24. ChemAxon UGM, San Diego, USA 25th September 2013 acknowledgements • Zsolt Mohacsi and Istvan Rabel, ChemAxon • Ed Griffen and Nick Tomkinson, AstraZeneca • Andrew Wooster, GSK • Hans Kraut, InfoChem • Thank you for your time.