SlideShare a Scribd company logo
1 of 1
Download to read offline
Benchmarking and Validation of
                                                 JChem ECFP and FCFP Fingerprints
                                                                                        Roger Sayle, NextMove Software Ltd, Cambridge, UK
                                                                                                                                              roger@nextmovesoftware.co.uk


   Abstract
1. Overview                                                                                 6. Fingerprint Saturation
                                                                                           6. Fingerprint Saturation
The cornerstone of pharmaceutical chemistry is Crum Brownā€™s observation that               A common failing with binary fingerprints is caused by their inability to represent
similar compounds have similar therapeutic benefits. Cheminformatics tries to              the number of times a feature (such as a path or substructure) occurs. The
capture this insight by defining measures of similarity between the computer               fingerprints for decane (C10), undecane (C11) and dodecane (C12) are typically
representations of two molecules, with the hope of capturing a medicinal                   identical, as are those for many protein and DNA sequences. A more powerful
chemistā€™s intuitive sense of ā€œlikenessā€, and thereby correlate with bioactivity.           representation that solves these issues is to replace occurrence bits with counts,
This poster evaluates the chemical similarity measures offered by ChemAxon on a            turning binary fingerprints into occurrence histograms.
standard reference benchmark. Any such benchmark must by necessity be                      LINGOs similarity achieves better results on the Briem & Lessel benchmark by
flawed; the similarity between two molecules is influenced by the framework by             using counts instead of bits. However, as described in the Continuous Tanimoto
which they are compared [2]. However a robust similarity measure should                    section below, care has to be taken to use a suitable similarity measure for
typically perform better on such benchmarks, whilst a weaker model of chemical             comparing histograms.
similarity would be expected to perform worse (on average).                                ChemAxon have announced that the upcoming release of JChem, version 5.5, will
                                                                                           support ECFP and FCFP fingerprints with counts.
2. Briem & Lessel Benchmark
The benchmark employed in this evaluation is the commonly used Briem and                   7. Continuous Tanimoto
Lessel benchmark *1+. This test assesses a methodā€™s ability to identify near               Although there is universal agreement on how the Tanimoto coefficient should be
neighbours with the same biological activity from decoy molecules and molecules            interpreted for binary values, its application to continuous values, such as
with different biological activities. Five classes of active compounds are used: 40        histogram counts, has been implemented differently by different authors [4,5].
ACE inhibitors, 49 TXA antagonists, 110 HMG-CoA reductase inhibitors, 133 PAF              Consider the two alternate definitions T0 and T1 given below.
antagonists and 48 5HT3 antagonists. In addition to these 380 active compounds,                                             š‘                                     š‘
                                                                                                                            š‘– š‘„š‘– š‘¦š‘–                                š‘– minā” š‘„ š‘– ,
                                                                                                                                                                        (         š‘¦ š‘–)
the data set contains 573 ā€œrandomā€ MDDR compounds, for a total of 953                           š‘‡0 (š‘„, š‘¦) =       š‘                                š‘‡1 (š‘„, š‘¦) =      š‘
molecules. The benchmark proceeds by determining the 10 nearest neighbours                                    š‘–       š‘„ š‘–2 + š‘¦ š‘–2 āˆ’ š‘„ š‘– š‘¦ š‘–                       š‘– max(š‘„ š‘– ,     š‘¦ š‘–)
for each of the 380 active compounds. The query is not considered a neighbour              Both definitions agree for binary valued vectors, and are guaranteed to return
of itself. The score for each activity class is the fraction of these neighbours that      increasing fractional values between zero and one. Notice however that for
have the same activity as the query. Finally, the overall score is the average of the      x = { 3 } and y = { 4 }, then T1 = 3/4 = 0.75 but T0 = 12/13 ~ 0.923.
score of the five activity classes.                                                        In experiments with LINGOā€™s histograms, T1 was found to be superior (producing
                                                                                           an improvement of ~0.9%) whereas T0 actually made the results worse (by ~3%)
 3. Fingerprint Methods
3. Fingerprint Methods
Historically, the similarity method underlying ChemAxonā€™s JChem search engine
                                                                                           8. Conclusions
relied upon Chemical Fingerprints (ā€œCFā€). These are path-based fingerprints
                                                                                           ā€¢ ChemAxonā€™s Chemical Fingerprints perform comparably with other path and
similar to Daylight fingerprints, which allow a number of variants depending upon
                                                                                              feature-based fingerprints (including MACCS 166-bit keys, Daylight fingerprints
parameters for the number of bits in the fingerprint, the longest bond path to
                                                                                              and PubChem/CACTVS fingerprints). All these methods perform equivalently.
encode and the number of bits set by each path. The ā€œMarvin FPā€ below uses the
                                                                                           ā€¢ ECFP fingerprints, originally developed by Scitegic/Accelrys and as recently
generatemd defaults of 1024 bits, paths of up to 7 bonds, and 3 bits per path. The
                                                                                              implemented by ChemAxon, perform exceptionally well on the standard Briem
ā€œJChem FPā€ below uses the JChemManager defaults of 512 bits, paths up to
                                                                                              and Lessel benchmark.
length 6 and 2 bits per path.
                                                                                           ā€¢ The announced ECFP histograms would be anticipated to set new records in 2D
Recently, in v5.4, ChemAxon has added support for ECFP and FCFP fingerprints
                                                                                              chemical similarity.
originally introduced by Scitegic, now Accelrys *8+. These are termed ā€œECFP_4ā€
and ā€œFCFP_8ā€ below indicating the ChemAxon implementation with diameter
                                                                                           9. Acknowledgements
                                                                                           9. Acknowledgements
parameters of 4 bonds and 8 bonds respectively.
                                                                                           To Miklos Vargyas and Alex Allardyce for the invitation to present a poster at the
For reference comparison to other methods, also shown are LINGOs similarity [4],
                                                                                           ChemAxon UGM, to Peter Kovacs for JChem ECFP support and rapid bug fixing,
MACCS 166-bit keys *3+, Daylight fingerprints and IBMā€™s patented InChI-based
                                                                                           and to AstraZeneca and Vertex Pharmaceuticals for their interest in 2D similarity.
chemical similarity (US20080004810A1) as used in their SIMPLE product [7].
                                                                                           10. Bibliography
 4. Tanimoto Coefficient
4. Tanimoto Coefficient                                                                    1. Hans Briem and Uta F. Lessel, ā€œIn vitro and in silico Affinity Fingerprints: Finding
Many ways of comparing similarity between binary fingerprints have been                       Similarities beyond Structural Classesā€, Perspectives in Drug Discovery and Design, Vol.
discussed in the literature [9]; generally the best performing of these is the                20, pp. 231-244, 2000.
Tanimoto coefficient, š‘‡ = š‘‹ āˆ© š‘Œ        š‘‹ āˆŖ š‘Œ . This definition has almost magical          2. Robert D. Brown and Yvonne C. Martin, ā€œUse of Structure-Activity Data to Compare
properties, normalizing the differences between two feature sets by their sizes,              Structure-based Clustering Methods and Descriptors for Use in Compound Selectionā€,
                                                                                              JCICS, Vol. 36, No. 3, pp. 572-582, 1996.
intuitively ā€œthe fraction in commonā€. Experimentally this correlates well to the           3. Joseph L. Durant, Burton A. Leland, Douglas R. Henry and James G. Nourse,
chemical and biological notion of what makes two molecules similar.                           ā€œReoptimization of MDL Keys for Use in Drug Discoveryā€, JCIM, Vol. 42, pp. 1273-1280,
                                                                                              2002.
5. Evaluation Results                                                                      4. J. Andrew Grant, James A. Haigh, Barry T. Pickup, Anthony Nicholls and Roger A. Sayle,
90%                                                                                           ā€œLingos, Finite State Machines and Fast Similarity Searchingā€, JCIM, Vol. 46, No. 5, pp.
        79.4%                                                                                 1912-1918, 2006.
80%               76.0%     75.6%                                                          5. Thierry Kogel, Ola Engkvist, Niklas Blomberg and Sorel Muresan, ā€œMultifingerprint Based
70%                                   66.2%     65.2%                                         Similarity Searches for Targeted Class Compound Selectionā€, JCIM, Vol. 46, No. 3, pp.
                                                         63.7%     64.0%
                                                                                              1201-1213, 2006.
60%                                                                                        6. Steven W. Muchmore, Derek A. Debe, James T. Metz, Scott P. Brown, Yvonne C. Martin and
50%                                                                                           Philip J. Hajduk, ā€œApplication of Belief Theory to Similarity Data Fusion for Use in Analog
                                                                             42.0%            Searching and Lead Hoppingā€, JCIM, Vol. 48, No. 5, pp. 941-948, 2008.
40%
                                                                                           7. James Rhodes, Stephen Boyer, Jeffrey Kreule, Ying Chen and Patricia Ordonez, ā€œMining
30%                                                                                           Patents using Molecular Similarity Searchā€, Pacific Symposium on Biocomputing, Vol. 12,
20%                                                                                           pp. 304-315, 2007.
                                                                                           8. David Rogers and Mathew Hahn, ā€œExtended Connectivity Fingerprintsā€, JCIM, Vol. 50, No.
10%                                                                                           5, pp. 742-754, 2010.
 0%                                                                                        9. Peter Willet, John M. Barnard and Geoffrey M. Downs, ā€œChemical Similarity Searchingā€,
       ECFP_4    FCFP_8    LINGOs    MACCS Marvin CF JChem CF Daylight        IBM             JCICS, Vol. 38, No. 6, pp. 893-996, 1998.



                                                                                                                                                         NextMove Software Limited
                                                                                                                                                         Innovation Centre (Unit 23)
                                                                           www.nextmovesoftware.co.uk
                                                                                                                                                            Cambridge Science Park
                                                                           www.nextmovesoftware.com
                                                                                                                                                            Milton Road, Cambridge
                                                                                                                                                                   England CB4 0EY

More Related Content

Similar to Benchmarking and Validation of JChem ECFP and FCFP Fingerprints

GPCODON ALIGNMENT: A GLOBAL PAIRWISE CODON BASED SEQUENCE ALIGNMENT APPROACH
GPCODON ALIGNMENT: A GLOBAL PAIRWISE CODON BASED SEQUENCE ALIGNMENT APPROACHGPCODON ALIGNMENT: A GLOBAL PAIRWISE CODON BASED SEQUENCE ALIGNMENT APPROACH
GPCODON ALIGNMENT: A GLOBAL PAIRWISE CODON BASED SEQUENCE ALIGNMENT APPROACHijdms
Ā 
A Biological Sequence Compression Based on cross chromosomal similarities usi...
A Biological Sequence Compression Based on cross chromosomal similarities usi...A Biological Sequence Compression Based on cross chromosomal similarities usi...
A Biological Sequence Compression Based on cross chromosomal similarities usi...CSCJournals
Ā 
An Automatic Clustering Technique for Optimal Clusters
An Automatic Clustering Technique for Optimal ClustersAn Automatic Clustering Technique for Optimal Clusters
An Automatic Clustering Technique for Optimal ClustersIJCSEA Journal
Ā 
Architecture of a morphological malware detector
Architecture of a morphological malware detectorArchitecture of a morphological malware detector
Architecture of a morphological malware detectorUltraUploader
Ā 
The Effect of Updating the Local Pheromone on ACS Performance using Fuzzy Log...
The Effect of Updating the Local Pheromone on ACS Performance using Fuzzy Log...The Effect of Updating the Local Pheromone on ACS Performance using Fuzzy Log...
The Effect of Updating the Local Pheromone on ACS Performance using Fuzzy Log...IJECEIAES
Ā 
An Algorithm For Vector Quantizer Design
An Algorithm For Vector Quantizer DesignAn Algorithm For Vector Quantizer Design
An Algorithm For Vector Quantizer DesignAngie Miller
Ā 
Design of arq and hybrid arq protocols for wireless channels using bch codes
Design of arq and hybrid arq protocols for wireless channels using bch codesDesign of arq and hybrid arq protocols for wireless channels using bch codes
Design of arq and hybrid arq protocols for wireless channels using bch codesIAEME Publication
Ā 
Applications of Artificial Neural Networks in Cancer Prediction
Applications of Artificial Neural Networks in Cancer PredictionApplications of Artificial Neural Networks in Cancer Prediction
Applications of Artificial Neural Networks in Cancer PredictionIRJET Journal
Ā 
An Efficient Genetic Algorithm for Solving Knapsack Problem.pdf
An Efficient Genetic Algorithm for Solving Knapsack Problem.pdfAn Efficient Genetic Algorithm for Solving Knapsack Problem.pdf
An Efficient Genetic Algorithm for Solving Knapsack Problem.pdfNancy Ideker
Ā 
A Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
A Comparative Analysis of Feature Selection Methods for Clustering DNA SequencesA Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
A Comparative Analysis of Feature Selection Methods for Clustering DNA SequencesCSCJournals
Ā 
Classification With Ant Colony
Classification With Ant ColonyClassification With Ant Colony
Classification With Ant ColonyGissely Souza
Ā 
OPTIMIZATION OF HEURISTIC ALGORITHMS FOR IMPROVING BER OF ADAPTIVE TURBO CODES
OPTIMIZATION OF HEURISTIC ALGORITHMS FOR IMPROVING BER OF ADAPTIVE TURBO CODESOPTIMIZATION OF HEURISTIC ALGORITHMS FOR IMPROVING BER OF ADAPTIVE TURBO CODES
OPTIMIZATION OF HEURISTIC ALGORITHMS FOR IMPROVING BER OF ADAPTIVE TURBO CODESIAEME Publication
Ā 
OPTIMIZATION OF HEURISTIC ALGORITHMS FOR IMPROVING BER OF ADAPTIVE TURBO CODES
OPTIMIZATION OF HEURISTIC ALGORITHMS FOR IMPROVING BER OF ADAPTIVE TURBO CODESOPTIMIZATION OF HEURISTIC ALGORITHMS FOR IMPROVING BER OF ADAPTIVE TURBO CODES
OPTIMIZATION OF HEURISTIC ALGORITHMS FOR IMPROVING BER OF ADAPTIVE TURBO CODESIAEME Publication
Ā 
Ant Colony Optimization for Optimal Low-Pass State Variable Filter Sizing
Ant Colony Optimization for Optimal Low-Pass State Variable Filter Sizing Ant Colony Optimization for Optimal Low-Pass State Variable Filter Sizing
Ant Colony Optimization for Optimal Low-Pass State Variable Filter Sizing IJECEIAES
Ā 

Similar to Benchmarking and Validation of JChem ECFP and FCFP Fingerprints (20)

GPCODON ALIGNMENT: A GLOBAL PAIRWISE CODON BASED SEQUENCE ALIGNMENT APPROACH
GPCODON ALIGNMENT: A GLOBAL PAIRWISE CODON BASED SEQUENCE ALIGNMENT APPROACHGPCODON ALIGNMENT: A GLOBAL PAIRWISE CODON BASED SEQUENCE ALIGNMENT APPROACH
GPCODON ALIGNMENT: A GLOBAL PAIRWISE CODON BASED SEQUENCE ALIGNMENT APPROACH
Ā 
JBUON-21-1-33
JBUON-21-1-33JBUON-21-1-33
JBUON-21-1-33
Ā 
A Biological Sequence Compression Based on cross chromosomal similarities usi...
A Biological Sequence Compression Based on cross chromosomal similarities usi...A Biological Sequence Compression Based on cross chromosomal similarities usi...
A Biological Sequence Compression Based on cross chromosomal similarities usi...
Ā 
An Automatic Clustering Technique for Optimal Clusters
An Automatic Clustering Technique for Optimal ClustersAn Automatic Clustering Technique for Optimal Clusters
An Automatic Clustering Technique for Optimal Clusters
Ā 
Architecture of a morphological malware detector
Architecture of a morphological malware detectorArchitecture of a morphological malware detector
Architecture of a morphological malware detector
Ā 
Cdma
CdmaCdma
Cdma
Ā 
The Effect of Updating the Local Pheromone on ACS Performance using Fuzzy Log...
The Effect of Updating the Local Pheromone on ACS Performance using Fuzzy Log...The Effect of Updating the Local Pheromone on ACS Performance using Fuzzy Log...
The Effect of Updating the Local Pheromone on ACS Performance using Fuzzy Log...
Ā 
Empirical and quantum mechanical methods of 13 c chemical shifts prediction c...
Empirical and quantum mechanical methods of 13 c chemical shifts prediction c...Empirical and quantum mechanical methods of 13 c chemical shifts prediction c...
Empirical and quantum mechanical methods of 13 c chemical shifts prediction c...
Ā 
An Algorithm For Vector Quantizer Design
An Algorithm For Vector Quantizer DesignAn Algorithm For Vector Quantizer Design
An Algorithm For Vector Quantizer Design
Ā 
Design of arq and hybrid arq protocols for wireless channels using bch codes
Design of arq and hybrid arq protocols for wireless channels using bch codesDesign of arq and hybrid arq protocols for wireless channels using bch codes
Design of arq and hybrid arq protocols for wireless channels using bch codes
Ā 
Applications of Artificial Neural Networks in Cancer Prediction
Applications of Artificial Neural Networks in Cancer PredictionApplications of Artificial Neural Networks in Cancer Prediction
Applications of Artificial Neural Networks in Cancer Prediction
Ā 
An Efficient Genetic Algorithm for Solving Knapsack Problem.pdf
An Efficient Genetic Algorithm for Solving Knapsack Problem.pdfAn Efficient Genetic Algorithm for Solving Knapsack Problem.pdf
An Efficient Genetic Algorithm for Solving Knapsack Problem.pdf
Ā 
A Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
A Comparative Analysis of Feature Selection Methods for Clustering DNA SequencesA Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
A Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
Ā 
Classification With Ant Colony
Classification With Ant ColonyClassification With Ant Colony
Classification With Ant Colony
Ā 
Molecular Biology Software Links
Molecular Biology Software LinksMolecular Biology Software Links
Molecular Biology Software Links
Ā 
OPTIMIZATION OF HEURISTIC ALGORITHMS FOR IMPROVING BER OF ADAPTIVE TURBO CODES
OPTIMIZATION OF HEURISTIC ALGORITHMS FOR IMPROVING BER OF ADAPTIVE TURBO CODESOPTIMIZATION OF HEURISTIC ALGORITHMS FOR IMPROVING BER OF ADAPTIVE TURBO CODES
OPTIMIZATION OF HEURISTIC ALGORITHMS FOR IMPROVING BER OF ADAPTIVE TURBO CODES
Ā 
OPTIMIZATION OF HEURISTIC ALGORITHMS FOR IMPROVING BER OF ADAPTIVE TURBO CODES
OPTIMIZATION OF HEURISTIC ALGORITHMS FOR IMPROVING BER OF ADAPTIVE TURBO CODESOPTIMIZATION OF HEURISTIC ALGORITHMS FOR IMPROVING BER OF ADAPTIVE TURBO CODES
OPTIMIZATION OF HEURISTIC ALGORITHMS FOR IMPROVING BER OF ADAPTIVE TURBO CODES
Ā 
gkv343.pdf
gkv343.pdfgkv343.pdf
gkv343.pdf
Ā 
Ant Colony Optimization for Optimal Low-Pass State Variable Filter Sizing
Ant Colony Optimization for Optimal Low-Pass State Variable Filter Sizing Ant Colony Optimization for Optimal Low-Pass State Variable Filter Sizing
Ant Colony Optimization for Optimal Low-Pass State Variable Filter Sizing
Ā 
D111823
D111823D111823
D111823
Ā 

More from NextMove Software

CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...NextMove Software
Ā 
Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...NextMove Software
Ā 
CINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speedCINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speedNextMove Software
Ā 
A de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESA de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESNextMove Software
Ā 
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionRecent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionNextMove Software
Ā 
Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...NextMove Software
Ā 
Comparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsComparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsNextMove Software
Ā 
Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...NextMove Software
Ā 
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...NextMove Software
Ā 
Recent improvements to the RDKit
Recent improvements to the RDKitRecent improvements to the RDKit
Recent improvements to the RDKitNextMove Software
Ā 
Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...NextMove Software
Ā 
Digital Chemical Representations
Digital Chemical RepresentationsDigital Chemical Representations
Digital Chemical RepresentationsNextMove Software
Ā 
Challenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptionsChallenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptionsNextMove Software
Ā 
PubChem as a Biologics Database
PubChem as a Biologics DatabasePubChem as a Biologics Database
PubChem as a Biologics DatabaseNextMove Software
Ā 
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...NextMove Software
Ā 
CINF 13: Pistachio - Search and Faceting of Large Reaction Databases
CINF 13: Pistachio - Search and Faceting of Large Reaction DatabasesCINF 13: Pistachio - Search and Faceting of Large Reaction Databases
CINF 13: Pistachio - Search and Faceting of Large Reaction DatabasesNextMove Software
Ā 
Building on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesBuilding on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesNextMove Software
Ā 
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...NextMove Software
Ā 
Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)NextMove Software
Ā 

More from NextMove Software (20)

DeepSMILES
DeepSMILESDeepSMILES
DeepSMILES
Ā 
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
Ā 
Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...
Ā 
CINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speedCINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speed
Ā 
A de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESA de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILES
Ā 
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionRecent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Ā 
Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...
Ā 
Comparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsComparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule Implementations
Ā 
Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...
Ā 
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Chemical similarity using multi-terabyte graph databases: 68 billion nodes an...
Ā 
Recent improvements to the RDKit
Recent improvements to the RDKitRecent improvements to the RDKit
Recent improvements to the RDKit
Ā 
Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...
Ā 
Digital Chemical Representations
Digital Chemical RepresentationsDigital Chemical Representations
Digital Chemical Representations
Ā 
Challenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptionsChallenges and successes in machine interpretation of Markush descriptions
Challenges and successes in machine interpretation of Markush descriptions
Ā 
PubChem as a Biologics Database
PubChem as a Biologics DatabasePubChem as a Biologics Database
PubChem as a Biologics Database
Ā 
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
Ā 
CINF 13: Pistachio - Search and Faceting of Large Reaction Databases
CINF 13: Pistachio - Search and Faceting of Large Reaction DatabasesCINF 13: Pistachio - Search and Faceting of Large Reaction Databases
CINF 13: Pistachio - Search and Faceting of Large Reaction Databases
Ā 
Building on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesBuilding on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfiles
Ā 
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Ā 
Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)
Ā 

Recently uploaded

Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
Ā 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
Ā 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
Ā 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 šŸ’ž Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 šŸ’ž Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 šŸ’ž Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 šŸ’ž Full Nigh...Pooja Nehwal
Ā 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
Ā 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
Ā 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
Ā 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
Ā 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
Ā 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
Ā 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
Ā 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
Ā 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
Ā 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
Ā 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
Ā 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
Ā 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
Ā 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
Ā 

Recently uploaded (20)

Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
Ā 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
Ā 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
Ā 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 šŸ’ž Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 šŸ’ž Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 šŸ’ž Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 šŸ’ž Full Nigh...
Ā 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
Ā 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Ā 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
Ā 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
Ā 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
Ā 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
Ā 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
Ā 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
Ā 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
Ā 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
Ā 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
Ā 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
Ā 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
Ā 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Ā 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Ā 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
Ā 

Benchmarking and Validation of JChem ECFP and FCFP Fingerprints

  • 1. Benchmarking and Validation of JChem ECFP and FCFP Fingerprints Roger Sayle, NextMove Software Ltd, Cambridge, UK roger@nextmovesoftware.co.uk Abstract 1. Overview 6. Fingerprint Saturation 6. Fingerprint Saturation The cornerstone of pharmaceutical chemistry is Crum Brownā€™s observation that A common failing with binary fingerprints is caused by their inability to represent similar compounds have similar therapeutic benefits. Cheminformatics tries to the number of times a feature (such as a path or substructure) occurs. The capture this insight by defining measures of similarity between the computer fingerprints for decane (C10), undecane (C11) and dodecane (C12) are typically representations of two molecules, with the hope of capturing a medicinal identical, as are those for many protein and DNA sequences. A more powerful chemistā€™s intuitive sense of ā€œlikenessā€, and thereby correlate with bioactivity. representation that solves these issues is to replace occurrence bits with counts, This poster evaluates the chemical similarity measures offered by ChemAxon on a turning binary fingerprints into occurrence histograms. standard reference benchmark. Any such benchmark must by necessity be LINGOs similarity achieves better results on the Briem & Lessel benchmark by flawed; the similarity between two molecules is influenced by the framework by using counts instead of bits. However, as described in the Continuous Tanimoto which they are compared [2]. However a robust similarity measure should section below, care has to be taken to use a suitable similarity measure for typically perform better on such benchmarks, whilst a weaker model of chemical comparing histograms. similarity would be expected to perform worse (on average). ChemAxon have announced that the upcoming release of JChem, version 5.5, will support ECFP and FCFP fingerprints with counts. 2. Briem & Lessel Benchmark The benchmark employed in this evaluation is the commonly used Briem and 7. Continuous Tanimoto Lessel benchmark *1+. This test assesses a methodā€™s ability to identify near Although there is universal agreement on how the Tanimoto coefficient should be neighbours with the same biological activity from decoy molecules and molecules interpreted for binary values, its application to continuous values, such as with different biological activities. Five classes of active compounds are used: 40 histogram counts, has been implemented differently by different authors [4,5]. ACE inhibitors, 49 TXA antagonists, 110 HMG-CoA reductase inhibitors, 133 PAF Consider the two alternate definitions T0 and T1 given below. antagonists and 48 5HT3 antagonists. In addition to these 380 active compounds, š‘ š‘ š‘– š‘„š‘– š‘¦š‘– š‘– minā” š‘„ š‘– , ( š‘¦ š‘–) the data set contains 573 ā€œrandomā€ MDDR compounds, for a total of 953 š‘‡0 (š‘„, š‘¦) = š‘ š‘‡1 (š‘„, š‘¦) = š‘ molecules. The benchmark proceeds by determining the 10 nearest neighbours š‘– š‘„ š‘–2 + š‘¦ š‘–2 āˆ’ š‘„ š‘– š‘¦ š‘– š‘– max(š‘„ š‘– , š‘¦ š‘–) for each of the 380 active compounds. The query is not considered a neighbour Both definitions agree for binary valued vectors, and are guaranteed to return of itself. The score for each activity class is the fraction of these neighbours that increasing fractional values between zero and one. Notice however that for have the same activity as the query. Finally, the overall score is the average of the x = { 3 } and y = { 4 }, then T1 = 3/4 = 0.75 but T0 = 12/13 ~ 0.923. score of the five activity classes. In experiments with LINGOā€™s histograms, T1 was found to be superior (producing an improvement of ~0.9%) whereas T0 actually made the results worse (by ~3%) 3. Fingerprint Methods 3. Fingerprint Methods Historically, the similarity method underlying ChemAxonā€™s JChem search engine 8. Conclusions relied upon Chemical Fingerprints (ā€œCFā€). These are path-based fingerprints ā€¢ ChemAxonā€™s Chemical Fingerprints perform comparably with other path and similar to Daylight fingerprints, which allow a number of variants depending upon feature-based fingerprints (including MACCS 166-bit keys, Daylight fingerprints parameters for the number of bits in the fingerprint, the longest bond path to and PubChem/CACTVS fingerprints). All these methods perform equivalently. encode and the number of bits set by each path. The ā€œMarvin FPā€ below uses the ā€¢ ECFP fingerprints, originally developed by Scitegic/Accelrys and as recently generatemd defaults of 1024 bits, paths of up to 7 bonds, and 3 bits per path. The implemented by ChemAxon, perform exceptionally well on the standard Briem ā€œJChem FPā€ below uses the JChemManager defaults of 512 bits, paths up to and Lessel benchmark. length 6 and 2 bits per path. ā€¢ The announced ECFP histograms would be anticipated to set new records in 2D Recently, in v5.4, ChemAxon has added support for ECFP and FCFP fingerprints chemical similarity. originally introduced by Scitegic, now Accelrys *8+. These are termed ā€œECFP_4ā€ and ā€œFCFP_8ā€ below indicating the ChemAxon implementation with diameter 9. Acknowledgements 9. Acknowledgements parameters of 4 bonds and 8 bonds respectively. To Miklos Vargyas and Alex Allardyce for the invitation to present a poster at the For reference comparison to other methods, also shown are LINGOs similarity [4], ChemAxon UGM, to Peter Kovacs for JChem ECFP support and rapid bug fixing, MACCS 166-bit keys *3+, Daylight fingerprints and IBMā€™s patented InChI-based and to AstraZeneca and Vertex Pharmaceuticals for their interest in 2D similarity. chemical similarity (US20080004810A1) as used in their SIMPLE product [7]. 10. Bibliography 4. Tanimoto Coefficient 4. Tanimoto Coefficient 1. Hans Briem and Uta F. Lessel, ā€œIn vitro and in silico Affinity Fingerprints: Finding Many ways of comparing similarity between binary fingerprints have been Similarities beyond Structural Classesā€, Perspectives in Drug Discovery and Design, Vol. discussed in the literature [9]; generally the best performing of these is the 20, pp. 231-244, 2000. Tanimoto coefficient, š‘‡ = š‘‹ āˆ© š‘Œ š‘‹ āˆŖ š‘Œ . This definition has almost magical 2. Robert D. Brown and Yvonne C. Martin, ā€œUse of Structure-Activity Data to Compare properties, normalizing the differences between two feature sets by their sizes, Structure-based Clustering Methods and Descriptors for Use in Compound Selectionā€, JCICS, Vol. 36, No. 3, pp. 572-582, 1996. intuitively ā€œthe fraction in commonā€. Experimentally this correlates well to the 3. Joseph L. Durant, Burton A. Leland, Douglas R. Henry and James G. Nourse, chemical and biological notion of what makes two molecules similar. ā€œReoptimization of MDL Keys for Use in Drug Discoveryā€, JCIM, Vol. 42, pp. 1273-1280, 2002. 5. Evaluation Results 4. J. Andrew Grant, James A. Haigh, Barry T. Pickup, Anthony Nicholls and Roger A. Sayle, 90% ā€œLingos, Finite State Machines and Fast Similarity Searchingā€, JCIM, Vol. 46, No. 5, pp. 79.4% 1912-1918, 2006. 80% 76.0% 75.6% 5. Thierry Kogel, Ola Engkvist, Niklas Blomberg and Sorel Muresan, ā€œMultifingerprint Based 70% 66.2% 65.2% Similarity Searches for Targeted Class Compound Selectionā€, JCIM, Vol. 46, No. 3, pp. 63.7% 64.0% 1201-1213, 2006. 60% 6. Steven W. Muchmore, Derek A. Debe, James T. Metz, Scott P. Brown, Yvonne C. Martin and 50% Philip J. Hajduk, ā€œApplication of Belief Theory to Similarity Data Fusion for Use in Analog 42.0% Searching and Lead Hoppingā€, JCIM, Vol. 48, No. 5, pp. 941-948, 2008. 40% 7. James Rhodes, Stephen Boyer, Jeffrey Kreule, Ying Chen and Patricia Ordonez, ā€œMining 30% Patents using Molecular Similarity Searchā€, Pacific Symposium on Biocomputing, Vol. 12, 20% pp. 304-315, 2007. 8. David Rogers and Mathew Hahn, ā€œExtended Connectivity Fingerprintsā€, JCIM, Vol. 50, No. 10% 5, pp. 742-754, 2010. 0% 9. Peter Willet, John M. Barnard and Geoffrey M. Downs, ā€œChemical Similarity Searchingā€, ECFP_4 FCFP_8 LINGOs MACCS Marvin CF JChem CF Daylight IBM JCICS, Vol. 38, No. 6, pp. 893-996, 1998. NextMove Software Limited Innovation Centre (Unit 23) www.nextmovesoftware.co.uk Cambridge Science Park www.nextmovesoftware.com Milton Road, Cambridge England CB4 0EY