SlideShare une entreprise Scribd logo
1  sur  3
Télécharger pour lire hors ligne
Technical Evaluation

                NMR Prediction Accuracy Validation
                                                                         ACD/CNMR Predictor
                                                                              Version 10.05
                            Kirill Blinov, Mikhail Kvasha, Brent Lefebvre, Ryan Sasaki and Antony Williams
                                                                     Advanced Chemistry Development, Inc.
                                                                                     Toronto, ON, Canada
                                                                                         www.acdlabs.com



Introduction
The validation of performance of NMR chemical shift prediction algorithms is a challenging problem for a
number of reasons. These will be discussed only at a general level in this technical evaluation since they
                                    1
have been discussed elsewhere. The central challenge associated with the validation of NMR shift
prediction algorithms is obtaining a quality data set for validation of the prediction accuracy.
If the validation data set is mainly simple structures, or structures that are well represented in the database
used as the basis of the prediction algorithms then the validation exercise will not truly represent the
challenges of prediction. The most valid test would be conducted on a validation set containing chemical
structures which are very different from these contained within the training dataset. Ideally, an independent
party without knowledge of the structures in the training set should choose the validation set, so as to avoid
any bias.
The quality of a validation database is important but difficult to prove in most cases. The ideal validation set
does not contain any errors in assignment and covers the whole range of structural diversity available in
present chemistry and in all future diversity possibilities. While this is clearly impossible to attain, large
diverse datasets do exist and, while not ideal, can be used for the purpose of validation. Every large dataset
contains errors but for comparisons of prediction between different algorithms this is actually irrelevant since
any errors remain challenging for all algorithms.
A resource is available on the Internet that has met the above criteria of size and quality to serve as a fair
and reliable validation set to evaluate the performance of ACD/CNMR Predictor in terms of accuracy of NMR
                                                              2
prediction. This resource is a database called NMRShiftDB and is created as a collaborative effort by
chemists and spectroscopists submitting data to the database. This document is an analysis of the
performance of the ACD/CNMR predictor using the NMRSHIFTDB database as the validation set. Due to
the availability of a comparison test issued by Wolfgang Robien we also have an opportunity to compare
performance with another commercial product, NMR Predict provided by Modgraph Consultants, Ltd.


NMRShiftDB
The NMRShiftDB is an open source collection of chemical structures and their associated NMR shift
assignments. The database is generated as a result of contributions by the public and has been described
                    3,4          5
in detail elsewhere . Currently , the database contains 19,958 structures with 214,136 assigned carbon
chemical shifts. Based on a cursory examination of the structural diversity within the database these data
represent a statistically relevant set to use in an evaluation of predictive accuracy and is the first large
dataset available from an independent source which we could use for this purpose.
                                                                                                 6
Robien has already published an analysis of performance of his neural network predictions . This review
provides an evaluation of the NMR prediction algorithms he has developed over many years. These
algorithms have been the basis of a number of software products including a commercially available
                      6
product, NMRPredict , offered by Modgraph Consultants, Ltd. Robien focused his analysis on the presence
of a number of outliers but gave no specific review of the quality of the dataset focusing only on the problem
assignments.
Technical Note
NMR Prediction Validation
The NMRShiftDB website offers visitors the opportunity to download a file in SDF format containing all of the
                                                                        5
structures and chemical shifts that compose the NMRShiftDB database . This file was downloaded and the
structures and shifts were imported into an ACD/Labs’ format.
As a first step, an analysis of the degree of overlap between the structures in the training set within the
ACD/CNMR Predictor and the validation set of NMRShiftDB was undertaken. It was found that 57% of the
carbon chemical shifts in the NMRShiftDB were already in the ACD/Labs database. Using this information
the NMRShiftDB database was then stripped of these chemical shifts, since they have been used as the
basis of the prediction algorithms in ACD/CNMR Predictor. The statistics comparing the full dataset and the
validation subset are shown below.


Results Summary
As mentioned above, 2 sets of results were obtained. The average deviation of the predicted vs.
experimental values based on the entire data set, and the same statistics for the subset of chemical shifts
that were unique. ACD/CNMR Predictor significantly outperforms Robien’s program by a significant margin.
The average deviation obtained by CNMR Predictor was 40% lower than that obtained by Robien.


Validation on Entire Dataset
                                   Average        Standard              Outliers (ppm Difference)
 Entire Dataset        Shift
                                   Deviation      Deviation
 Comparison            Count
                                    (ppm)5         (ppm)5         >10 ppm         >25 ppm         >50 ppm
 ACD/CNMR                                                            1,040           141              31
                      214,136         1.59            2.76
 v10.05                                                             (0.5%)         (0.07%)         (0.01%)
 CSEARCH                                                                             194              56
                      209,412         2.22            N/A            N/A
 (Modgraph)                                                                        (0.09%)         (0.03%)


Only 203,284 unique carbon centers were represented in the database but some had multiple assignments.
All redundancy was included in case there was disagreement between the assignments. And therefore over
214,000 assignments were considered. This is more than the number used by Robien and we assume this
to be due to the fact that we downloaded the data file later than Robien and new data had been added.
Robien used 209,412 chemical shifts. Consultation with Steinbeck indicates that some of the errors
identified by Robien in his analysis have been corrected. At present the original source file utilized by Robien
in his analysis is being sourced in order to allow a direct comparison of performance using the exact dataset
                                                                                               6
and we will report on that comparison in a separate publication. The web posting of Robien did not provide
a measure of Standard Deviation or the number of chemical shift predictions that were more than 10 ppm
from their experimental value and this is the reason for the absence of this parameter in the table.
       Note     Robien quoted an Average Deviation of 2.19 PPM after correction of some errors, but for
                comparison purposes, we have used the 2.22 PPM value with no corrections since no
                corrections were made to dataset that was run through the CNMR Predictor.


Validation on Completely Novel Chemical Shifts
Obtaining a good result with the full dataset was a useful exercise yet a more rigorous comparison was
conducted. The data used to train ACD/Labs NMR prediction algorithms include those collected from recent
literature articles and an overlap with a significant number of structures in the NMRShiftDB was expected.
In order to compare the predictive accuracy of the algorithm and provide an estimate of the performance of
the predictors on novel structures, the NMRShiftDB was cross-referenced with the internal database of
ACD/CNMR Predictor to remove duplicate structures. This exercise revealed that 57% of the compounds in
the NMRShiftDB were also found in the ACD/Labs database. This left 43% of the compounds, a total of




                                                       2
Technical Note
92,927 chemical shifts in NMRShiftDB to use as the dataset for the second validation study. The results are
shown below.

                                     Average              Standard             Outliers (ppm Difference)
       Dataset           Shift
                                     Deviation            Deviation
     Comparison          Count
                                      (ppm)5               (ppm)5        >10 ppm        >25 ppm              >50 ppm
                                                                             695          89                     24
     Data Subset         92,927               1.74          3.22
                                                                           (0.7%)       (0.1%)                (0.03%)
                                                                            1,040         141                    31
 Entire Dataset         214,136               1.59          2.76
                                                                           (0.5%)       (0.07%)               (0.01%)


The average deviation associated with the data subset has increased only slightly. The question as to
whether the comparison of different datasets leads to significant differences in performance has been
examined. The expectation would be that the correction of a few data points in a dataset of over 200,000
shifts would have a very small impact on the statistics presented in the table above. In fact, ignoring all data
points with an error of >25ppm reduces the average deviation to a value of 1.56ppm, a difference of
0.03ppm. Clearly the removal of a few points in error only makes a small difference to the overall statistics.


Conclusion
The NMRShiftDB is an excellent resource for the purpose of evaluating chemical shift prediction accuracy as
evidenced by this work and the previous work of Robien. As identified by Robien initially, and later in this
work, there are certainly outliers in the dataset requiring review and correction. Our previous work has
shown that the literature itself contains about 8% errors in the form of mis-assignments, transcription errors
and incorrect structures. The obvious errors in NMRShiftDB are certainly below this level and this is a
testament to the value of this resource. The NMRShiftDB dataset is large and structurally diverse and
continues to grow as scientists contribute.
Despite a large overlap between the NMRShiftDB and the ACD/Labs carbon NMR database, a statistically
relevant validation set of over 92,000 chemicals shifts was extracted from the NMRShiftDB and used to test
the algorithms. The data presented here shows that the ACD/Labs prediction algorithms have an average
deviation of less than 1.8 ppm on the validation set and significantly outperforms the algorithms of Robien
presented in his review. This work will be discussed in further detail in a future publication and validation is
presently being performed on the proton NMR shift data.


Acknowledgements
We thank Christoph Steinbeck and the members of his team for the provision of the NMRShiftDB service
and dataset. This has provided and invaluable resource for the testing of NMR prediction algorithms.


References
1.    Jens Meiler et al, J. Magn. Res., 157, 242–252 (2002).
2.    NMRShiftDB, http://nmrshiftdb.ice.mpg.de/
3.    C. Steinbeck, S. Krause, and S. Kuhn, J. Chem. Inf. Comput. Sci. 43, 1733-1739 (2003)
4.    C. Steinbeck and S. Kuhn, Phytochemistry 65, 2711–2717 (2004)
                                                              th
5.    Information available on NMRShiftDB as of May 7 , 2007.
6.    http://nmrpredict.orc.univie.ac.at/csearchlite/enjoy_its_free.html
7.    http://www.modgraph.co.uk/product_nmr.htm
8.    http://nmrshiftdb.pharmazie.uni-marburg.de/nmrshiftdbhtml/NmrshiftdbWithSignals.sdf.zip

                                   expi
                                          -   calci
                                                      |                                     (          -           )2
                                                                                                expi       calci
      Average Deviation =                                        Standard Deviation =
                                      n                                                         n-1
                                                             3

Contenu connexe

Tendances

Use of DynoChem in Process Development. Wilfried Hoffmann.
Use of DynoChem in Process Development. Wilfried Hoffmann.Use of DynoChem in Process Development. Wilfried Hoffmann.
Use of DynoChem in Process Development. Wilfried Hoffmann.Scale-up Systems
 
Deformulating Complex Polymer Mixtures By GPC-IR Technology
Deformulating Complex Polymer Mixtures By GPC-IR TechnologyDeformulating Complex Polymer Mixtures By GPC-IR Technology
Deformulating Complex Polymer Mixtures By GPC-IR Technologymzhou45
 
Webinar: How to Figure Out Your Competitors Formula by LC-IR Deformulation
Webinar: How to Figure Out Your Competitors Formula by LC-IR DeformulationWebinar: How to Figure Out Your Competitors Formula by LC-IR Deformulation
Webinar: How to Figure Out Your Competitors Formula by LC-IR Deformulationmzhou45
 
GPC-IR To Charaterize Polymer Mixtures--Akron Workshop
GPC-IR To Charaterize Polymer Mixtures--Akron WorkshopGPC-IR To Charaterize Polymer Mixtures--Akron Workshop
GPC-IR To Charaterize Polymer Mixtures--Akron Workshopmzhou45
 
Revisions to EPA Method 624 for Analysis of VOCs by GCMS
Revisions to EPA Method 624 for Analysis of VOCs by GCMSRevisions to EPA Method 624 for Analysis of VOCs by GCMS
Revisions to EPA Method 624 for Analysis of VOCs by GCMSShimadzu Scientific Instruments
 
Webinar - Pharmacopeial Modernization: How Will Your Chromatography Workflow ...
Webinar - Pharmacopeial Modernization: How Will Your Chromatography Workflow ...Webinar - Pharmacopeial Modernization: How Will Your Chromatography Workflow ...
Webinar - Pharmacopeial Modernization: How Will Your Chromatography Workflow ...Waters Corporation
 
LC-IR Applications In Polymer Related Industries
LC-IR Applications In Polymer Related IndustriesLC-IR Applications In Polymer Related Industries
LC-IR Applications In Polymer Related Industriesmzhou45
 
Waters Oligonucleotide Analysis Solutions
Waters Oligonucleotide Analysis SolutionsWaters Oligonucleotide Analysis Solutions
Waters Oligonucleotide Analysis SolutionsWaters Corporation
 
New LC-IR Technique To Characterize Polymeric Excipients In Pharmaceutical Fo...
New LC-IR Technique To Characterize Polymeric Excipients In Pharmaceutical Fo...New LC-IR Technique To Characterize Polymeric Excipients In Pharmaceutical Fo...
New LC-IR Technique To Characterize Polymeric Excipients In Pharmaceutical Fo...mzhou45
 
Determination of Impurities in Organic Solvents used in the Semiconductor Ind...
Determination of Impurities in Organic Solvents used in the Semiconductor Ind...Determination of Impurities in Organic Solvents used in the Semiconductor Ind...
Determination of Impurities in Organic Solvents used in the Semiconductor Ind...PerkinElmer, Inc.
 
New 2D-LC/MS approaches for the analysis of therapeutic proteins in a regulat...
New 2D-LC/MS approaches for the analysis of therapeutic proteins in a regulat...New 2D-LC/MS approaches for the analysis of therapeutic proteins in a regulat...
New 2D-LC/MS approaches for the analysis of therapeutic proteins in a regulat...ArnaudDelobel1
 
Property methods and calculations
Property methods and calculationsProperty methods and calculations
Property methods and calculationsMourad KORICHI
 
Trace metal analysis in sediments using aas and asv techniques
Trace metal analysis in sediments using aas and asv techniquesTrace metal analysis in sediments using aas and asv techniques
Trace metal analysis in sediments using aas and asv techniquesAwad Albalwi
 
Beacon Presentation EPA NARPM Annual Training
Beacon Presentation EPA NARPM Annual TrainingBeacon Presentation EPA NARPM Annual Training
Beacon Presentation EPA NARPM Annual TrainingHarryONeill
 
Nasa Nnc05 Ca90 C Final Report
Nasa Nnc05 Ca90 C Final ReportNasa Nnc05 Ca90 C Final Report
Nasa Nnc05 Ca90 C Final Reportinscore
 
Passive Soil Gas Testing - Standard for Site Characterization
Passive Soil Gas Testing  - Standard for Site CharacterizationPassive Soil Gas Testing  - Standard for Site Characterization
Passive Soil Gas Testing - Standard for Site CharacterizationHarryONeill
 

Tendances (20)

Use of DynoChem in Process Development. Wilfried Hoffmann.
Use of DynoChem in Process Development. Wilfried Hoffmann.Use of DynoChem in Process Development. Wilfried Hoffmann.
Use of DynoChem in Process Development. Wilfried Hoffmann.
 
Deformulating Complex Polymer Mixtures By GPC-IR Technology
Deformulating Complex Polymer Mixtures By GPC-IR TechnologyDeformulating Complex Polymer Mixtures By GPC-IR Technology
Deformulating Complex Polymer Mixtures By GPC-IR Technology
 
Webinar: How to Figure Out Your Competitors Formula by LC-IR Deformulation
Webinar: How to Figure Out Your Competitors Formula by LC-IR DeformulationWebinar: How to Figure Out Your Competitors Formula by LC-IR Deformulation
Webinar: How to Figure Out Your Competitors Formula by LC-IR Deformulation
 
GPC-IR To Charaterize Polymer Mixtures--Akron Workshop
GPC-IR To Charaterize Polymer Mixtures--Akron WorkshopGPC-IR To Charaterize Polymer Mixtures--Akron Workshop
GPC-IR To Charaterize Polymer Mixtures--Akron Workshop
 
Revisions to EPA Method 624 for Analysis of VOCs by GCMS
Revisions to EPA Method 624 for Analysis of VOCs by GCMSRevisions to EPA Method 624 for Analysis of VOCs by GCMS
Revisions to EPA Method 624 for Analysis of VOCs by GCMS
 
Webinar - Pharmacopeial Modernization: How Will Your Chromatography Workflow ...
Webinar - Pharmacopeial Modernization: How Will Your Chromatography Workflow ...Webinar - Pharmacopeial Modernization: How Will Your Chromatography Workflow ...
Webinar - Pharmacopeial Modernization: How Will Your Chromatography Workflow ...
 
LC-IR Applications In Polymer Related Industries
LC-IR Applications In Polymer Related IndustriesLC-IR Applications In Polymer Related Industries
LC-IR Applications In Polymer Related Industries
 
Lee, Charles - Organic Materials Chemistry - Spring Review 2012
Lee, Charles - Organic Materials Chemistry - Spring Review 2012Lee, Charles - Organic Materials Chemistry - Spring Review 2012
Lee, Charles - Organic Materials Chemistry - Spring Review 2012
 
Waters Oligonucleotide Analysis Solutions
Waters Oligonucleotide Analysis SolutionsWaters Oligonucleotide Analysis Solutions
Waters Oligonucleotide Analysis Solutions
 
New LC-IR Technique To Characterize Polymeric Excipients In Pharmaceutical Fo...
New LC-IR Technique To Characterize Polymeric Excipients In Pharmaceutical Fo...New LC-IR Technique To Characterize Polymeric Excipients In Pharmaceutical Fo...
New LC-IR Technique To Characterize Polymeric Excipients In Pharmaceutical Fo...
 
26 microsampling tim schlabach - agilent
26 microsampling tim schlabach - agilent26 microsampling tim schlabach - agilent
26 microsampling tim schlabach - agilent
 
Determination of Impurities in Organic Solvents used in the Semiconductor Ind...
Determination of Impurities in Organic Solvents used in the Semiconductor Ind...Determination of Impurities in Organic Solvents used in the Semiconductor Ind...
Determination of Impurities in Organic Solvents used in the Semiconductor Ind...
 
New 2D-LC/MS approaches for the analysis of therapeutic proteins in a regulat...
New 2D-LC/MS approaches for the analysis of therapeutic proteins in a regulat...New 2D-LC/MS approaches for the analysis of therapeutic proteins in a regulat...
New 2D-LC/MS approaches for the analysis of therapeutic proteins in a regulat...
 
Property methods and calculations
Property methods and calculationsProperty methods and calculations
Property methods and calculations
 
Trace metal analysis in sediments using aas and asv techniques
Trace metal analysis in sediments using aas and asv techniquesTrace metal analysis in sediments using aas and asv techniques
Trace metal analysis in sediments using aas and asv techniques
 
Next Generation Ultra High Pressure Liquid Chromatography (UHPLC) Technologie...
Next Generation Ultra High Pressure Liquid Chromatography (UHPLC) Technologie...Next Generation Ultra High Pressure Liquid Chromatography (UHPLC) Technologie...
Next Generation Ultra High Pressure Liquid Chromatography (UHPLC) Technologie...
 
Beacon Presentation EPA NARPM Annual Training
Beacon Presentation EPA NARPM Annual TrainingBeacon Presentation EPA NARPM Annual Training
Beacon Presentation EPA NARPM Annual Training
 
Innovation in size based polymer separations
Innovation in size based polymer separationsInnovation in size based polymer separations
Innovation in size based polymer separations
 
Nasa Nnc05 Ca90 C Final Report
Nasa Nnc05 Ca90 C Final ReportNasa Nnc05 Ca90 C Final Report
Nasa Nnc05 Ca90 C Final Report
 
Passive Soil Gas Testing - Standard for Site Characterization
Passive Soil Gas Testing  - Standard for Site CharacterizationPassive Soil Gas Testing  - Standard for Site Characterization
Passive Soil Gas Testing - Standard for Site Characterization
 

En vedette

W H Y P R A Y E R D R
W H Y  P R A Y E R  D RW H Y  P R A Y E R  D R
W H Y P R A Y E R D Rghanyog
 
T W O P O I N T S D R
T W O  P O I N T S  D RT W O  P O I N T S  D R
T W O P O I N T S D Rghanyog
 
W H A T I S N A M A S M A R A N D R
W H A T  I S  N A M A S M A R A N  D RW H A T  I S  N A M A S M A R A N  D R
W H A T I S N A M A S M A R A N D Rghanyog
 
A L C O H O L A N D T O B A C C O Dr
A L C O H O L  A N D  T O B A C C O  DrA L C O H O L  A N D  T O B A C C O  Dr
A L C O H O L A N D T O B A C C O Drghanyog
 
Y O G A A N D S U P E R J O Y D R
Y O G A  A N D  S U P E R J O Y  D RY O G A  A N D  S U P E R J O Y  D R
Y O G A A N D S U P E R J O Y D Rghanyog
 
Karir Apa Yang Anda Pilih?
Karir Apa Yang Anda Pilih?Karir Apa Yang Anda Pilih?
Karir Apa Yang Anda Pilih?Djadja Sardjana
 
神州泰岳测试试题(笔试)
神州泰岳测试试题(笔试)神州泰岳测试试题(笔试)
神州泰岳测试试题(笔试)yiditushe
 
V I C T O R Y O V E R N E C K A N D B A C K P A I N D R S H R I N I W...
V I C T O R Y  O V E R  N E C K  A N D  B A C K  P A I N   D R  S H R I N I W...V I C T O R Y  O V E R  N E C K  A N D  B A C K  P A I N   D R  S H R I N I W...
V I C T O R Y O V E R N E C K A N D B A C K P A I N D R S H R I N I W...ghanyog
 
Tubettoni E Fagioli Con Le Cozze
Tubettoni E Fagioli Con Le CozzeTubettoni E Fagioli Con Le Cozze
Tubettoni E Fagioli Con Le CozzeMy own sweet home
 
True Love Of The Prophet
True Love Of The ProphetTrue Love Of The Prophet
True Love Of The Prophetrabubakar
 
web 2.0 seconda parte
web 2.0 seconda parteweb 2.0 seconda parte
web 2.0 seconda parteAngelo Panini
 
Asbo Vendor Performance Eval Chicago 092509
Asbo Vendor Performance Eval   Chicago   092509Asbo Vendor Performance Eval   Chicago   092509
Asbo Vendor Performance Eval Chicago 092509Richard Gay, CPPO, RSBO
 
Oktopartial Introduction
Oktopartial IntroductionOktopartial Introduction
Oktopartial IntroductionTakeshi AKIMA
 

En vedette (20)

Navigating the Complex Web of Chemistry Using ChemSpider
Navigating the Complex Web of Chemistry Using ChemSpiderNavigating the Complex Web of Chemistry Using ChemSpider
Navigating the Complex Web of Chemistry Using ChemSpider
 
W H Y P R A Y E R D R
W H Y  P R A Y E R  D RW H Y  P R A Y E R  D R
W H Y P R A Y E R D R
 
Jeanine
JeanineJeanine
Jeanine
 
T W O P O I N T S D R
T W O  P O I N T S  D RT W O  P O I N T S  D R
T W O P O I N T S D R
 
W H A T I S N A M A S M A R A N D R
W H A T  I S  N A M A S M A R A N  D RW H A T  I S  N A M A S M A R A N  D R
W H A T I S N A M A S M A R A N D R
 
A L C O H O L A N D T O B A C C O Dr
A L C O H O L  A N D  T O B A C C O  DrA L C O H O L  A N D  T O B A C C O  Dr
A L C O H O L A N D T O B A C C O Dr
 
Workshop offline netwerken
Workshop offline netwerkenWorkshop offline netwerken
Workshop offline netwerken
 
Y O G A A N D S U P E R J O Y D R
Y O G A  A N D  S U P E R J O Y  D RY O G A  A N D  S U P E R J O Y  D R
Y O G A A N D S U P E R J O Y D R
 
Karir Apa Yang Anda Pilih?
Karir Apa Yang Anda Pilih?Karir Apa Yang Anda Pilih?
Karir Apa Yang Anda Pilih?
 
National NL Plant ID L-P
National NL Plant ID L-PNational NL Plant ID L-P
National NL Plant ID L-P
 
神州泰岳测试试题(笔试)
神州泰岳测试试题(笔试)神州泰岳测试试题(笔试)
神州泰岳测试试题(笔试)
 
V I C T O R Y O V E R N E C K A N D B A C K P A I N D R S H R I N I W...
V I C T O R Y  O V E R  N E C K  A N D  B A C K  P A I N   D R  S H R I N I W...V I C T O R Y  O V E R  N E C K  A N D  B A C K  P A I N   D R  S H R I N I W...
V I C T O R Y O V E R N E C K A N D B A C K P A I N D R S H R I N I W...
 
ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...
ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...
ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...
 
Tubettoni E Fagioli Con Le Cozze
Tubettoni E Fagioli Con Le CozzeTubettoni E Fagioli Con Le Cozze
Tubettoni E Fagioli Con Le Cozze
 
True Love Of The Prophet
True Love Of The ProphetTrue Love Of The Prophet
True Love Of The Prophet
 
Love2
Love2Love2
Love2
 
Peru Expeditions 2009 2010
Peru Expeditions 2009 2010Peru Expeditions 2009 2010
Peru Expeditions 2009 2010
 
web 2.0 seconda parte
web 2.0 seconda parteweb 2.0 seconda parte
web 2.0 seconda parte
 
Asbo Vendor Performance Eval Chicago 092509
Asbo Vendor Performance Eval   Chicago   092509Asbo Vendor Performance Eval   Chicago   092509
Asbo Vendor Performance Eval Chicago 092509
 
Oktopartial Introduction
Oktopartial IntroductionOktopartial Introduction
Oktopartial Introduction
 

Similaire à NMR Prediction Accuracy Validation

Getting the Big Picture by Joining up the SAR dots
Getting the Big Picture by Joining up the SAR dotsGetting the Big Picture by Joining up the SAR dots
Getting the Big Picture by Joining up the SAR dotsSorel Muresan
 
Optimisation process guide
Optimisation process guideOptimisation process guide
Optimisation process guidekillerkitties
 
Master thesispresentation
Master thesispresentationMaster thesispresentation
Master thesispresentationMatthew Urffer
 
Fault detection of feed water treatment process using PCA-WD with parameter o...
Fault detection of feed water treatment process using PCA-WD with parameter o...Fault detection of feed water treatment process using PCA-WD with parameter o...
Fault detection of feed water treatment process using PCA-WD with parameter o...ISA Interchange
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
Implementation of algorithms using WEKA
Implementation of algorithms using WEKAImplementation of algorithms using WEKA
Implementation of algorithms using WEKAAbhishek Nandgaonkar
 
Presentazione L.M. Rinaldi Ivan
Presentazione L.M. Rinaldi IvanPresentazione L.M. Rinaldi Ivan
Presentazione L.M. Rinaldi IvanIvan Rinaldi
 
1 NPP VIIRS Pre-Launch Performance and SDR Validation- IGARSS 2011.pptx
1 NPP VIIRS Pre-Launch Performance and SDR Validation- IGARSS 2011.pptx1 NPP VIIRS Pre-Launch Performance and SDR Validation- IGARSS 2011.pptx
1 NPP VIIRS Pre-Launch Performance and SDR Validation- IGARSS 2011.pptxgrssieee
 
A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...IOSR Journals
 
2015 Frankfurt_World ADC summit -Lan-16Feb2015
2015 Frankfurt_World ADC summit -Lan-16Feb20152015 Frankfurt_World ADC summit -Lan-16Feb2015
2015 Frankfurt_World ADC summit -Lan-16Feb2015Lan Dai
 
The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...Kamel Mansouri
 
Overview of DuraMat software tool development (poster version)
Overview of DuraMat software tool development(poster version)Overview of DuraMat software tool development(poster version)
Overview of DuraMat software tool development (poster version)Anubhav Jain
 
Translating Clinical Guidelines into Knowledge-guided Decision Support
Translating Clinical Guidelines into Knowledge-guided Decision SupportTranslating Clinical Guidelines into Knowledge-guided Decision Support
Translating Clinical Guidelines into Knowledge-guided Decision SupportPlan de Calidad para el SNS
 
PerkinElmer: Whole Tablet Measurements Using the Frontier Tablet Autosampler ...
PerkinElmer: Whole Tablet Measurements Using the Frontier Tablet Autosampler ...PerkinElmer: Whole Tablet Measurements Using the Frontier Tablet Autosampler ...
PerkinElmer: Whole Tablet Measurements Using the Frontier Tablet Autosampler ...PerkinElmer, Inc.
 

Similaire à NMR Prediction Accuracy Validation (20)

15 N Performance Validation Ss
15 N Performance Validation Ss15 N Performance Validation Ss
15 N Performance Validation Ss
 
The Performance Validation of Neural Network Based 13C NMR Prediction Using a...
The Performance Validation of Neural Network Based 13C NMR Prediction Using a...The Performance Validation of Neural Network Based 13C NMR Prediction Using a...
The Performance Validation of Neural Network Based 13C NMR Prediction Using a...
 
Getting the Big Picture by Joining up the SAR dots
Getting the Big Picture by Joining up the SAR dotsGetting the Big Picture by Joining up the SAR dots
Getting the Big Picture by Joining up the SAR dots
 
Validating the ChemSpider Open Spectral Database NMR Collection using ACD/Lab...
Validating the ChemSpider Open Spectral Database NMR Collection using ACD/Lab...Validating the ChemSpider Open Spectral Database NMR Collection using ACD/Lab...
Validating the ChemSpider Open Spectral Database NMR Collection using ACD/Lab...
 
Validating Automated Structure Confirmation Using NMR Prediction in a Blind S...
Validating Automated Structure Confirmation Using NMR Prediction in a Blind S...Validating Automated Structure Confirmation Using NMR Prediction in a Blind S...
Validating Automated Structure Confirmation Using NMR Prediction in a Blind S...
 
Optimisation process guide
Optimisation process guideOptimisation process guide
Optimisation process guide
 
Master thesispresentation
Master thesispresentationMaster thesispresentation
Master thesispresentation
 
Fault detection of feed water treatment process using PCA-WD with parameter o...
Fault detection of feed water treatment process using PCA-WD with parameter o...Fault detection of feed water treatment process using PCA-WD with parameter o...
Fault detection of feed water treatment process using PCA-WD with parameter o...
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Implementation of algorithms using WEKA
Implementation of algorithms using WEKAImplementation of algorithms using WEKA
Implementation of algorithms using WEKA
 
Presentazione L.M. Rinaldi Ivan
Presentazione L.M. Rinaldi IvanPresentazione L.M. Rinaldi Ivan
Presentazione L.M. Rinaldi Ivan
 
1 NPP VIIRS Pre-Launch Performance and SDR Validation- IGARSS 2011.pptx
1 NPP VIIRS Pre-Launch Performance and SDR Validation- IGARSS 2011.pptx1 NPP VIIRS Pre-Launch Performance and SDR Validation- IGARSS 2011.pptx
1 NPP VIIRS Pre-Launch Performance and SDR Validation- IGARSS 2011.pptx
 
Keynote HotSWUp 2012
Keynote HotSWUp 2012Keynote HotSWUp 2012
Keynote HotSWUp 2012
 
ADMET.pptx
ADMET.pptxADMET.pptx
ADMET.pptx
 
A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...
 
2015 Frankfurt_World ADC summit -Lan-16Feb2015
2015 Frankfurt_World ADC summit -Lan-16Feb20152015 Frankfurt_World ADC summit -Lan-16Feb2015
2015 Frankfurt_World ADC summit -Lan-16Feb2015
 
The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...
 
Overview of DuraMat software tool development (poster version)
Overview of DuraMat software tool development(poster version)Overview of DuraMat software tool development(poster version)
Overview of DuraMat software tool development (poster version)
 
Translating Clinical Guidelines into Knowledge-guided Decision Support
Translating Clinical Guidelines into Knowledge-guided Decision SupportTranslating Clinical Guidelines into Knowledge-guided Decision Support
Translating Clinical Guidelines into Knowledge-guided Decision Support
 
PerkinElmer: Whole Tablet Measurements Using the Frontier Tablet Autosampler ...
PerkinElmer: Whole Tablet Measurements Using the Frontier Tablet Autosampler ...PerkinElmer: Whole Tablet Measurements Using the Frontier Tablet Autosampler ...
PerkinElmer: Whole Tablet Measurements Using the Frontier Tablet Autosampler ...
 

Dernier

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Dernier (20)

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

NMR Prediction Accuracy Validation

  • 1. Technical Evaluation NMR Prediction Accuracy Validation ACD/CNMR Predictor Version 10.05 Kirill Blinov, Mikhail Kvasha, Brent Lefebvre, Ryan Sasaki and Antony Williams Advanced Chemistry Development, Inc. Toronto, ON, Canada www.acdlabs.com Introduction The validation of performance of NMR chemical shift prediction algorithms is a challenging problem for a number of reasons. These will be discussed only at a general level in this technical evaluation since they 1 have been discussed elsewhere. The central challenge associated with the validation of NMR shift prediction algorithms is obtaining a quality data set for validation of the prediction accuracy. If the validation data set is mainly simple structures, or structures that are well represented in the database used as the basis of the prediction algorithms then the validation exercise will not truly represent the challenges of prediction. The most valid test would be conducted on a validation set containing chemical structures which are very different from these contained within the training dataset. Ideally, an independent party without knowledge of the structures in the training set should choose the validation set, so as to avoid any bias. The quality of a validation database is important but difficult to prove in most cases. The ideal validation set does not contain any errors in assignment and covers the whole range of structural diversity available in present chemistry and in all future diversity possibilities. While this is clearly impossible to attain, large diverse datasets do exist and, while not ideal, can be used for the purpose of validation. Every large dataset contains errors but for comparisons of prediction between different algorithms this is actually irrelevant since any errors remain challenging for all algorithms. A resource is available on the Internet that has met the above criteria of size and quality to serve as a fair and reliable validation set to evaluate the performance of ACD/CNMR Predictor in terms of accuracy of NMR 2 prediction. This resource is a database called NMRShiftDB and is created as a collaborative effort by chemists and spectroscopists submitting data to the database. This document is an analysis of the performance of the ACD/CNMR predictor using the NMRSHIFTDB database as the validation set. Due to the availability of a comparison test issued by Wolfgang Robien we also have an opportunity to compare performance with another commercial product, NMR Predict provided by Modgraph Consultants, Ltd. NMRShiftDB The NMRShiftDB is an open source collection of chemical structures and their associated NMR shift assignments. The database is generated as a result of contributions by the public and has been described 3,4 5 in detail elsewhere . Currently , the database contains 19,958 structures with 214,136 assigned carbon chemical shifts. Based on a cursory examination of the structural diversity within the database these data represent a statistically relevant set to use in an evaluation of predictive accuracy and is the first large dataset available from an independent source which we could use for this purpose. 6 Robien has already published an analysis of performance of his neural network predictions . This review provides an evaluation of the NMR prediction algorithms he has developed over many years. These algorithms have been the basis of a number of software products including a commercially available 6 product, NMRPredict , offered by Modgraph Consultants, Ltd. Robien focused his analysis on the presence of a number of outliers but gave no specific review of the quality of the dataset focusing only on the problem assignments.
  • 2. Technical Note NMR Prediction Validation The NMRShiftDB website offers visitors the opportunity to download a file in SDF format containing all of the 5 structures and chemical shifts that compose the NMRShiftDB database . This file was downloaded and the structures and shifts were imported into an ACD/Labs’ format. As a first step, an analysis of the degree of overlap between the structures in the training set within the ACD/CNMR Predictor and the validation set of NMRShiftDB was undertaken. It was found that 57% of the carbon chemical shifts in the NMRShiftDB were already in the ACD/Labs database. Using this information the NMRShiftDB database was then stripped of these chemical shifts, since they have been used as the basis of the prediction algorithms in ACD/CNMR Predictor. The statistics comparing the full dataset and the validation subset are shown below. Results Summary As mentioned above, 2 sets of results were obtained. The average deviation of the predicted vs. experimental values based on the entire data set, and the same statistics for the subset of chemical shifts that were unique. ACD/CNMR Predictor significantly outperforms Robien’s program by a significant margin. The average deviation obtained by CNMR Predictor was 40% lower than that obtained by Robien. Validation on Entire Dataset Average Standard Outliers (ppm Difference) Entire Dataset Shift Deviation Deviation Comparison Count (ppm)5 (ppm)5 >10 ppm >25 ppm >50 ppm ACD/CNMR 1,040 141 31 214,136 1.59 2.76 v10.05 (0.5%) (0.07%) (0.01%) CSEARCH 194 56 209,412 2.22 N/A N/A (Modgraph) (0.09%) (0.03%) Only 203,284 unique carbon centers were represented in the database but some had multiple assignments. All redundancy was included in case there was disagreement between the assignments. And therefore over 214,000 assignments were considered. This is more than the number used by Robien and we assume this to be due to the fact that we downloaded the data file later than Robien and new data had been added. Robien used 209,412 chemical shifts. Consultation with Steinbeck indicates that some of the errors identified by Robien in his analysis have been corrected. At present the original source file utilized by Robien in his analysis is being sourced in order to allow a direct comparison of performance using the exact dataset 6 and we will report on that comparison in a separate publication. The web posting of Robien did not provide a measure of Standard Deviation or the number of chemical shift predictions that were more than 10 ppm from their experimental value and this is the reason for the absence of this parameter in the table. Note Robien quoted an Average Deviation of 2.19 PPM after correction of some errors, but for comparison purposes, we have used the 2.22 PPM value with no corrections since no corrections were made to dataset that was run through the CNMR Predictor. Validation on Completely Novel Chemical Shifts Obtaining a good result with the full dataset was a useful exercise yet a more rigorous comparison was conducted. The data used to train ACD/Labs NMR prediction algorithms include those collected from recent literature articles and an overlap with a significant number of structures in the NMRShiftDB was expected. In order to compare the predictive accuracy of the algorithm and provide an estimate of the performance of the predictors on novel structures, the NMRShiftDB was cross-referenced with the internal database of ACD/CNMR Predictor to remove duplicate structures. This exercise revealed that 57% of the compounds in the NMRShiftDB were also found in the ACD/Labs database. This left 43% of the compounds, a total of 2
  • 3. Technical Note 92,927 chemical shifts in NMRShiftDB to use as the dataset for the second validation study. The results are shown below. Average Standard Outliers (ppm Difference) Dataset Shift Deviation Deviation Comparison Count (ppm)5 (ppm)5 >10 ppm >25 ppm >50 ppm 695 89 24 Data Subset 92,927 1.74 3.22 (0.7%) (0.1%) (0.03%) 1,040 141 31 Entire Dataset 214,136 1.59 2.76 (0.5%) (0.07%) (0.01%) The average deviation associated with the data subset has increased only slightly. The question as to whether the comparison of different datasets leads to significant differences in performance has been examined. The expectation would be that the correction of a few data points in a dataset of over 200,000 shifts would have a very small impact on the statistics presented in the table above. In fact, ignoring all data points with an error of >25ppm reduces the average deviation to a value of 1.56ppm, a difference of 0.03ppm. Clearly the removal of a few points in error only makes a small difference to the overall statistics. Conclusion The NMRShiftDB is an excellent resource for the purpose of evaluating chemical shift prediction accuracy as evidenced by this work and the previous work of Robien. As identified by Robien initially, and later in this work, there are certainly outliers in the dataset requiring review and correction. Our previous work has shown that the literature itself contains about 8% errors in the form of mis-assignments, transcription errors and incorrect structures. The obvious errors in NMRShiftDB are certainly below this level and this is a testament to the value of this resource. The NMRShiftDB dataset is large and structurally diverse and continues to grow as scientists contribute. Despite a large overlap between the NMRShiftDB and the ACD/Labs carbon NMR database, a statistically relevant validation set of over 92,000 chemicals shifts was extracted from the NMRShiftDB and used to test the algorithms. The data presented here shows that the ACD/Labs prediction algorithms have an average deviation of less than 1.8 ppm on the validation set and significantly outperforms the algorithms of Robien presented in his review. This work will be discussed in further detail in a future publication and validation is presently being performed on the proton NMR shift data. Acknowledgements We thank Christoph Steinbeck and the members of his team for the provision of the NMRShiftDB service and dataset. This has provided and invaluable resource for the testing of NMR prediction algorithms. References 1. Jens Meiler et al, J. Magn. Res., 157, 242–252 (2002). 2. NMRShiftDB, http://nmrshiftdb.ice.mpg.de/ 3. C. Steinbeck, S. Krause, and S. Kuhn, J. Chem. Inf. Comput. Sci. 43, 1733-1739 (2003) 4. C. Steinbeck and S. Kuhn, Phytochemistry 65, 2711–2717 (2004) th 5. Information available on NMRShiftDB as of May 7 , 2007. 6. http://nmrpredict.orc.univie.ac.at/csearchlite/enjoy_its_free.html 7. http://www.modgraph.co.uk/product_nmr.htm 8. http://nmrshiftdb.pharmazie.uni-marburg.de/nmrshiftdbhtml/NmrshiftdbWithSignals.sdf.zip expi - calci | ( - )2 expi calci Average Deviation = Standard Deviation = n n-1 3