SlideShare une entreprise Scribd logo
1  sur  1
Télécharger pour lire hors ligne
S e p t e m b e r / O c t o b e r 2 0 0 0 31
artificial neural networks (ANN) to relate
structure to function. The ability to analyze
molecular structure and predict effectiveness
helps Dr. Danter look for existing drugs to
battle diseases like HIV, as well as to develop
potential new medications. Analyzing
chemical structures, CHEMSAS™ utilizes
hybrid ANN systems to predict the in vitro
response of HIV1 to potential anti-viral
drugs.
The results to date are impressive. In a
recent study conducted by Dr. Danter, he
analyzed 311 drugs with known in vitro
activity against the HIV1 virus. The system
correctly classified more than 96% of the
molecules.
One of the great strengths of a data-
mining tool like CART is its ability to pick
out the significant variables – even when they
are hidden among hundreds or thousands of
irrelevant variables. It also clearly identifies
complex interactions among study variables,
and permits Dr. Danter to obtain more
accurate results in minutes – rather than days.
Mining In Other Areas
During the past several months, Dr.
Danter has also used CART in developing
models to study central nervous system
receptors, anti-arthritic medications, and
antibiotics, among others. As an artificial
intelligence tool, CART’s role in predicting
specific biological activity continues to be
vital to his research at Critical Outcome, Inc.
To view detailed study results and modeling
procedures, review their research at
www.critical outcome.com.
Richard Burnham can be reached at (651) 773-0619 or at
published@att.net
Dr. Wayne Danter, MD, FRCPC is an Associate Professor of
Medicine and Director, LRI Neural Computing Lab at the
University of Western Ontario London Ontario, Canada and
can be reached at (519) 851-0035 or
wdanter@criticaloutcome.com
Salford Systems (www.salford-systems.com) can be reached
at (619) 543-8880 or info@salford-systems.com
Figure 2: The overtrained maximal
tree has a relative error rate of .505
(red line); the optimal tree relative
error is .435 (green line). The
highlighted nodes on the left of tree
contribute least to performance and
will be the first to be pruned away.
Figure 1: The optimal CART tree. Red
nodes contain greatest concentration
of the “High Risk” group and blue
nodes concentrate the “Low Risk
Group.” Hovering the mouse over a
node displays its contents.
Molecular Data Mining Tool:
Advances In HIV Research
Pruning Decision Trees
Upon creating the structure, the system
prunes back the tree and uses a self-test
procedure to ensure that the model is not
over-fitting — that is, finding patterns that
apply only to training data. This produces a
smaller, optimal-sized tree. The tree’s
terminal nodes become the model used for
the remainder of the research process.
A list of important variables is
automatically produced and is used to
develop the model, ranked by importance.
This is crucial because many of the variables
turn out to be relatively unimportant. “You
may have a couple of hundred input
variables, but a subgroup of those variables
are the most important ones and the only
ones we really need to use,” says Dr. Danter.
Using all the variables throughout the
analysis would make the process needlessly
cumbersome — possibly skewing the results.
To satisfy Dr. Danter’s specialized
modeling needs in his HIV research, he
inputs the results into another Salford
Systems product, MARS® (Multivariate
Adaptive Regression Splines), then into a
neural network program from Ward Systems
Group, NeuroShell® Classifier. MARS is a
non-parametric regression procedure that
extends Dr. Danter’s work by improving the
accuracy of predictions. NeuroShell®
Classifier then categorizes a molecule’s
activity based on patterns derived from
CART and MARS.
Honing The Data
The results are honed to specific
research needs using a proprietary algorithm
Dr. Danter developed called CHEMSAS™.
This process decomposes complex molecular
structures into key elements, teaching
Pharmaceutical companies may have
as many as a million molecules in their
databases. Modeling each molecule and
predicting its effectiveness using standard
statistical methods is virtually impossible
because of the enormous number of
variables. Dr. Danter uses CART®
(Classification and Regression Trees), a
software package from Salford Systems to
help build models that isolate the most
important variables. Working with public
domain, molecular HIV data, Danter
trains CART and complementary systems
to predict if a given molecular structure is
biologically active against a disease. Says
Dr. Danter, “Once we have such a model,
we can screen almost any molecule with a
molecular weight up to 1700 daltons (an
atomic mass unit). It’s an area called
molecular mining. We’ve developed it as a
generic tool, so that if there is a specific
target biological activity, we can screen for
it.”
To build a model, CART generates a
binary decision tree based on yes/no
answers. It generates nodes until it has
created the largest tree that fits the data.
This ensures that the node-generating
process is not halted too soon and
important structures are not overlooked.
Figure 3: Summary reports include a
variable importance ranking, gains and
lift charts and tables, misclassification
reports, and an overall summary of all
trees grown in a session.
T
he ability to predict biological activity
based on molecular structure is leading
researchers to breakthroughs in the most
complex challenges of medicine. Using a
combination of artificial intelligence tools,
Dr. Wayne Danter of Critical Outcome
Technologies (London, Ontario, Canada)
has developed a method to predict whether
specific molecular structures are effective
against a disease. Currently under study is
the HIV1 virus.

Contenu connexe

Tendances

Assessing Drug Safety Using AI
Assessing Drug Safety Using AIAssessing Drug Safety Using AI
Assessing Drug Safety Using AIDatabricks
 
NanoAgents: Molecular Docking Using Multi-Agent Technology
NanoAgents: Molecular Docking Using Multi-Agent TechnologyNanoAgents: Molecular Docking Using Multi-Agent Technology
NanoAgents: Molecular Docking Using Multi-Agent TechnologyCSCJournals
 
Disease Identification and Detection in Apple Tree
Disease Identification and Detection in Apple TreeDisease Identification and Detection in Apple Tree
Disease Identification and Detection in Apple Treeijtsrd
 
Drug discovery using ai
Drug discovery using aiDrug discovery using ai
Drug discovery using aiSukant Khurana
 
Introduction to RandomForests 2004
Introduction to RandomForests 2004Introduction to RandomForests 2004
Introduction to RandomForests 2004Salford Systems
 
Poster genome engineering & Synthetic Biology 2016
Poster genome engineering & Synthetic Biology 2016Poster genome engineering & Synthetic Biology 2016
Poster genome engineering & Synthetic Biology 2016Michiel Stock
 
Ai in drug design webinar 26 feb 2019
Ai in drug design webinar 26 feb 2019Ai in drug design webinar 26 feb 2019
Ai in drug design webinar 26 feb 2019Pistoia Alliance
 
Machine learning in biology
Machine learning in biologyMachine learning in biology
Machine learning in biologyPranavathiyani G
 
ELRIG Event Biocity Scotland May19
ELRIG Event Biocity Scotland May19ELRIG Event Biocity Scotland May19
ELRIG Event Biocity Scotland May19Angelo Pugliese
 
Plant disease detection and classification using deep learning
Plant disease detection and classification using deep learning Plant disease detection and classification using deep learning
Plant disease detection and classification using deep learning JAVAID AHMAD WANI
 
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsEnabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsLuis Marco Ruiz
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)Michael Atkins
 
Record matching over query results from Web Databases
Record matching over query results from Web DatabasesRecord matching over query results from Web Databases
Record matching over query results from Web Databasestusharjadhav2611
 
A Novel Approach for Tomato Diseases Classification Based on Deep Convolution...
A Novel Approach for Tomato Diseases Classification Based on Deep Convolution...A Novel Approach for Tomato Diseases Classification Based on Deep Convolution...
A Novel Approach for Tomato Diseases Classification Based on Deep Convolution...Mohammad Shakirul islam
 
Building an informatics solution to sustain AI-guided cell profiling with hig...
Building an informatics solution to sustain AI-guided cell profiling with hig...Building an informatics solution to sustain AI-guided cell profiling with hig...
Building an informatics solution to sustain AI-guided cell profiling with hig...Ola Spjuth
 
Developing tools for high resolution mass spectrometry-based screening via th...
Developing tools for high resolution mass spectrometry-based screening via th...Developing tools for high resolution mass spectrometry-based screening via th...
Developing tools for high resolution mass spectrometry-based screening via th...Andrew McEachran
 
Computer aided drug design - a new drug discovery tool
Computer aided drug design - a new drug discovery toolComputer aided drug design - a new drug discovery tool
Computer aided drug design - a new drug discovery toolVikas Soni
 

Tendances (20)

Assessing Drug Safety Using AI
Assessing Drug Safety Using AIAssessing Drug Safety Using AI
Assessing Drug Safety Using AI
 
NanoAgents: Molecular Docking Using Multi-Agent Technology
NanoAgents: Molecular Docking Using Multi-Agent TechnologyNanoAgents: Molecular Docking Using Multi-Agent Technology
NanoAgents: Molecular Docking Using Multi-Agent Technology
 
Disease Identification and Detection in Apple Tree
Disease Identification and Detection in Apple TreeDisease Identification and Detection in Apple Tree
Disease Identification and Detection in Apple Tree
 
nm0915-965-2
nm0915-965-2nm0915-965-2
nm0915-965-2
 
Drug discovery using ai
Drug discovery using aiDrug discovery using ai
Drug discovery using ai
 
Introduction to RandomForests 2004
Introduction to RandomForests 2004Introduction to RandomForests 2004
Introduction to RandomForests 2004
 
Poster genome engineering & Synthetic Biology 2016
Poster genome engineering & Synthetic Biology 2016Poster genome engineering & Synthetic Biology 2016
Poster genome engineering & Synthetic Biology 2016
 
Ai in drug design webinar 26 feb 2019
Ai in drug design webinar 26 feb 2019Ai in drug design webinar 26 feb 2019
Ai in drug design webinar 26 feb 2019
 
Machine learning in biology
Machine learning in biologyMachine learning in biology
Machine learning in biology
 
ELRIG Event Biocity Scotland May19
ELRIG Event Biocity Scotland May19ELRIG Event Biocity Scotland May19
ELRIG Event Biocity Scotland May19
 
Plant disease detection and classification using deep learning
Plant disease detection and classification using deep learning Plant disease detection and classification using deep learning
Plant disease detection and classification using deep learning
 
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsEnabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)
 
AI for drug discovery
AI for drug discoveryAI for drug discovery
AI for drug discovery
 
Record matching over query results from Web Databases
Record matching over query results from Web DatabasesRecord matching over query results from Web Databases
Record matching over query results from Web Databases
 
A Novel Approach for Tomato Diseases Classification Based on Deep Convolution...
A Novel Approach for Tomato Diseases Classification Based on Deep Convolution...A Novel Approach for Tomato Diseases Classification Based on Deep Convolution...
A Novel Approach for Tomato Diseases Classification Based on Deep Convolution...
 
Building an informatics solution to sustain AI-guided cell profiling with hig...
Building an informatics solution to sustain AI-guided cell profiling with hig...Building an informatics solution to sustain AI-guided cell profiling with hig...
Building an informatics solution to sustain AI-guided cell profiling with hig...
 
Developing tools for high resolution mass spectrometry-based screening via th...
Developing tools for high resolution mass spectrometry-based screening via th...Developing tools for high resolution mass spectrometry-based screening via th...
Developing tools for high resolution mass spectrometry-based screening via th...
 
Computer aided drug design - a new drug discovery tool
Computer aided drug design - a new drug discovery toolComputer aided drug design - a new drug discovery tool
Computer aided drug design - a new drug discovery tool
 

Similaire à Molecular data mining tool advances in hiv

HEALTH PREDICTION ANALYSIS USING DATA MINING
HEALTH PREDICTION ANALYSIS USING DATA  MININGHEALTH PREDICTION ANALYSIS USING DATA  MINING
HEALTH PREDICTION ANALYSIS USING DATA MININGAshish Salve
 
Introduction to machine_learning_us
Introduction to machine_learning_usIntroduction to machine_learning_us
Introduction to machine_learning_usAnasua Sarkar
 
AAPM Foster July 2009
AAPM Foster July 2009AAPM Foster July 2009
AAPM Foster July 2009Ian Foster
 
Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24Sage Base
 
Supervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For CancerSupervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For Cancerpaperpublications3
 
A Study on Cancer Perpetuation Using the Classification Algorithms
A Study on Cancer Perpetuation Using the Classification AlgorithmsA Study on Cancer Perpetuation Using the Classification Algorithms
A Study on Cancer Perpetuation Using the Classification Algorithmspaperpublications3
 
Heart Diseases Diagnosis Using Data Mining Techniques
Heart Diseases Diagnosis Using Data Mining TechniquesHeart Diseases Diagnosis Using Data Mining Techniques
Heart Diseases Diagnosis Using Data Mining Techniquespaperpublications3
 
V5_I2_2016_Paper11.docx
V5_I2_2016_Paper11.docxV5_I2_2016_Paper11.docx
V5_I2_2016_Paper11.docxIIRindia
 
Ijarcet vol-2-issue-4-1393-1397
Ijarcet vol-2-issue-4-1393-1397Ijarcet vol-2-issue-4-1393-1397
Ijarcet vol-2-issue-4-1393-1397Editor IJARCET
 
MULTI MODEL DATA MINING APPROACH FOR HEART FAILURE PREDICTION
MULTI MODEL DATA MINING APPROACH FOR HEART FAILURE PREDICTIONMULTI MODEL DATA MINING APPROACH FOR HEART FAILURE PREDICTION
MULTI MODEL DATA MINING APPROACH FOR HEART FAILURE PREDICTIONIJDKP
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Cao report 2007-2012
Cao report 2007-2012Cao report 2007-2012
Cao report 2007-2012Elif Ceylan
 
Paper id 212014112
Paper id 212014112Paper id 212014112
Paper id 212014112IJRAT
 
IRJET- Extending Association Rule Summarization Techniques to Assess Risk of ...
IRJET- Extending Association Rule Summarization Techniques to Assess Risk of ...IRJET- Extending Association Rule Summarization Techniques to Assess Risk of ...
IRJET- Extending Association Rule Summarization Techniques to Assess Risk of ...IRJET Journal
 
MLTDD : USE OF MACHINE LEARNING TECHNIQUES FOR DIAGNOSIS OF THYROID GLAND DIS...
MLTDD : USE OF MACHINE LEARNING TECHNIQUES FOR DIAGNOSIS OF THYROID GLAND DIS...MLTDD : USE OF MACHINE LEARNING TECHNIQUES FOR DIAGNOSIS OF THYROID GLAND DIS...
MLTDD : USE OF MACHINE LEARNING TECHNIQUES FOR DIAGNOSIS OF THYROID GLAND DIS...cscpconf
 

Similaire à Molecular data mining tool advances in hiv (20)

HEALTH PREDICTION ANALYSIS USING DATA MINING
HEALTH PREDICTION ANALYSIS USING DATA  MININGHEALTH PREDICTION ANALYSIS USING DATA  MINING
HEALTH PREDICTION ANALYSIS USING DATA MINING
 
Introduction to machine_learning_us
Introduction to machine_learning_usIntroduction to machine_learning_us
Introduction to machine_learning_us
 
AAPM Foster July 2009
AAPM Foster July 2009AAPM Foster July 2009
AAPM Foster July 2009
 
[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar
[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar
[IJET-V2I3P21] Authors: Amit Kumar Dewangan, Akhilesh Kumar Shrivas, Prem Kumar
 
Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24
 
Role of computers
Role of computersRole of computers
Role of computers
 
Supervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For CancerSupervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For Cancer
 
A Study on Cancer Perpetuation Using the Classification Algorithms
A Study on Cancer Perpetuation Using the Classification AlgorithmsA Study on Cancer Perpetuation Using the Classification Algorithms
A Study on Cancer Perpetuation Using the Classification Algorithms
 
Heart Diseases Diagnosis Using Data Mining Techniques
Heart Diseases Diagnosis Using Data Mining TechniquesHeart Diseases Diagnosis Using Data Mining Techniques
Heart Diseases Diagnosis Using Data Mining Techniques
 
V5_I2_2016_Paper11.docx
V5_I2_2016_Paper11.docxV5_I2_2016_Paper11.docx
V5_I2_2016_Paper11.docx
 
Ijarcet vol-2-issue-4-1393-1397
Ijarcet vol-2-issue-4-1393-1397Ijarcet vol-2-issue-4-1393-1397
Ijarcet vol-2-issue-4-1393-1397
 
MULTI MODEL DATA MINING APPROACH FOR HEART FAILURE PREDICTION
MULTI MODEL DATA MINING APPROACH FOR HEART FAILURE PREDICTIONMULTI MODEL DATA MINING APPROACH FOR HEART FAILURE PREDICTION
MULTI MODEL DATA MINING APPROACH FOR HEART FAILURE PREDICTION
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Cao report 2007-2012
Cao report 2007-2012Cao report 2007-2012
Cao report 2007-2012
 
bbbPaper
bbbPaperbbbPaper
bbbPaper
 
Paper id 212014112
Paper id 212014112Paper id 212014112
Paper id 212014112
 
IRJET- Extending Association Rule Summarization Techniques to Assess Risk of ...
IRJET- Extending Association Rule Summarization Techniques to Assess Risk of ...IRJET- Extending Association Rule Summarization Techniques to Assess Risk of ...
IRJET- Extending Association Rule Summarization Techniques to Assess Risk of ...
 
C0344023028
C0344023028C0344023028
C0344023028
 
MLTDD : USE OF MACHINE LEARNING TECHNIQUES FOR DIAGNOSIS OF THYROID GLAND DIS...
MLTDD : USE OF MACHINE LEARNING TECHNIQUES FOR DIAGNOSIS OF THYROID GLAND DIS...MLTDD : USE OF MACHINE LEARNING TECHNIQUES FOR DIAGNOSIS OF THYROID GLAND DIS...
MLTDD : USE OF MACHINE LEARNING TECHNIQUES FOR DIAGNOSIS OF THYROID GLAND DIS...
 
Journal
JournalJournal
Journal
 

Plus de Salford Systems

Datascience101presentation4
Datascience101presentation4Datascience101presentation4
Datascience101presentation4Salford Systems
 
Improve Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForestsImprove Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForestsSalford Systems
 
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...Salford Systems
 
Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications Salford Systems
 
The Do's and Don'ts of Data Mining
The Do's and Don'ts of Data MiningThe Do's and Don'ts of Data Mining
The Do's and Don'ts of Data MiningSalford Systems
 
Introduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele CutlerIntroduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele CutlerSalford Systems
 
9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like YouSalford Systems
 
Statistically Significant Quotes To Remember
Statistically Significant Quotes To RememberStatistically Significant Quotes To Remember
Statistically Significant Quotes To RememberSalford Systems
 
Using CART For Beginners with A Teclo Example Dataset
Using CART For Beginners with A Teclo Example DatasetUsing CART For Beginners with A Teclo Example Dataset
Using CART For Beginners with A Teclo Example DatasetSalford Systems
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideSalford Systems
 
Evolution of regression ols to gps to mars
Evolution of regression   ols to gps to marsEvolution of regression   ols to gps to mars
Evolution of regression ols to gps to marsSalford Systems
 
Data Mining for Higher Education
Data Mining for Higher EducationData Mining for Higher Education
Data Mining for Higher EducationSalford Systems
 
Comparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modelingComparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modelingSalford Systems
 
TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination
TreeNet Tree Ensembles & CART Decision Trees:  A Winning CombinationTreeNet Tree Ensembles & CART Decision Trees:  A Winning Combination
TreeNet Tree Ensembles & CART Decision Trees: A Winning CombinationSalford Systems
 
SPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARSSPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARSSalford Systems
 
Hybrid cart logit model 1998
Hybrid cart logit model 1998Hybrid cart logit model 1998
Hybrid cart logit model 1998Salford Systems
 
Session Logs Tutorial for SPM
Session Logs Tutorial for SPMSession Logs Tutorial for SPM
Session Logs Tutorial for SPMSalford Systems
 
Some of the new features in SPM 7
Some of the new features in SPM 7Some of the new features in SPM 7
Some of the new features in SPM 7Salford Systems
 
TreeNet Overview - Updated October 2012
TreeNet Overview  - Updated October 2012TreeNet Overview  - Updated October 2012
TreeNet Overview - Updated October 2012Salford Systems
 

Plus de Salford Systems (20)

Datascience101presentation4
Datascience101presentation4Datascience101presentation4
Datascience101presentation4
 
Improve Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForestsImprove Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForests
 
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
 
Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications
 
The Do's and Don'ts of Data Mining
The Do's and Don'ts of Data MiningThe Do's and Don'ts of Data Mining
The Do's and Don'ts of Data Mining
 
Introduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele CutlerIntroduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele Cutler
 
9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You
 
Statistically Significant Quotes To Remember
Statistically Significant Quotes To RememberStatistically Significant Quotes To Remember
Statistically Significant Quotes To Remember
 
Using CART For Beginners with A Teclo Example Dataset
Using CART For Beginners with A Teclo Example DatasetUsing CART For Beginners with A Teclo Example Dataset
Using CART For Beginners with A Teclo Example Dataset
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User Guide
 
Evolution of regression ols to gps to mars
Evolution of regression   ols to gps to marsEvolution of regression   ols to gps to mars
Evolution of regression ols to gps to mars
 
Data Mining for Higher Education
Data Mining for Higher EducationData Mining for Higher Education
Data Mining for Higher Education
 
Comparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modelingComparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modeling
 
TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination
TreeNet Tree Ensembles & CART Decision Trees:  A Winning CombinationTreeNet Tree Ensembles & CART Decision Trees:  A Winning Combination
TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination
 
SPM v7.0 Feature Matrix
SPM v7.0 Feature MatrixSPM v7.0 Feature Matrix
SPM v7.0 Feature Matrix
 
SPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARSSPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARS
 
Hybrid cart logit model 1998
Hybrid cart logit model 1998Hybrid cart logit model 1998
Hybrid cart logit model 1998
 
Session Logs Tutorial for SPM
Session Logs Tutorial for SPMSession Logs Tutorial for SPM
Session Logs Tutorial for SPM
 
Some of the new features in SPM 7
Some of the new features in SPM 7Some of the new features in SPM 7
Some of the new features in SPM 7
 
TreeNet Overview - Updated October 2012
TreeNet Overview  - Updated October 2012TreeNet Overview  - Updated October 2012
TreeNet Overview - Updated October 2012
 

Molecular data mining tool advances in hiv

  • 1. S e p t e m b e r / O c t o b e r 2 0 0 0 31 artificial neural networks (ANN) to relate structure to function. The ability to analyze molecular structure and predict effectiveness helps Dr. Danter look for existing drugs to battle diseases like HIV, as well as to develop potential new medications. Analyzing chemical structures, CHEMSAS™ utilizes hybrid ANN systems to predict the in vitro response of HIV1 to potential anti-viral drugs. The results to date are impressive. In a recent study conducted by Dr. Danter, he analyzed 311 drugs with known in vitro activity against the HIV1 virus. The system correctly classified more than 96% of the molecules. One of the great strengths of a data- mining tool like CART is its ability to pick out the significant variables – even when they are hidden among hundreds or thousands of irrelevant variables. It also clearly identifies complex interactions among study variables, and permits Dr. Danter to obtain more accurate results in minutes – rather than days. Mining In Other Areas During the past several months, Dr. Danter has also used CART in developing models to study central nervous system receptors, anti-arthritic medications, and antibiotics, among others. As an artificial intelligence tool, CART’s role in predicting specific biological activity continues to be vital to his research at Critical Outcome, Inc. To view detailed study results and modeling procedures, review their research at www.critical outcome.com. Richard Burnham can be reached at (651) 773-0619 or at published@att.net Dr. Wayne Danter, MD, FRCPC is an Associate Professor of Medicine and Director, LRI Neural Computing Lab at the University of Western Ontario London Ontario, Canada and can be reached at (519) 851-0035 or wdanter@criticaloutcome.com Salford Systems (www.salford-systems.com) can be reached at (619) 543-8880 or info@salford-systems.com Figure 2: The overtrained maximal tree has a relative error rate of .505 (red line); the optimal tree relative error is .435 (green line). The highlighted nodes on the left of tree contribute least to performance and will be the first to be pruned away. Figure 1: The optimal CART tree. Red nodes contain greatest concentration of the “High Risk” group and blue nodes concentrate the “Low Risk Group.” Hovering the mouse over a node displays its contents. Molecular Data Mining Tool: Advances In HIV Research Pruning Decision Trees Upon creating the structure, the system prunes back the tree and uses a self-test procedure to ensure that the model is not over-fitting — that is, finding patterns that apply only to training data. This produces a smaller, optimal-sized tree. The tree’s terminal nodes become the model used for the remainder of the research process. A list of important variables is automatically produced and is used to develop the model, ranked by importance. This is crucial because many of the variables turn out to be relatively unimportant. “You may have a couple of hundred input variables, but a subgroup of those variables are the most important ones and the only ones we really need to use,” says Dr. Danter. Using all the variables throughout the analysis would make the process needlessly cumbersome — possibly skewing the results. To satisfy Dr. Danter’s specialized modeling needs in his HIV research, he inputs the results into another Salford Systems product, MARS® (Multivariate Adaptive Regression Splines), then into a neural network program from Ward Systems Group, NeuroShell® Classifier. MARS is a non-parametric regression procedure that extends Dr. Danter’s work by improving the accuracy of predictions. NeuroShell® Classifier then categorizes a molecule’s activity based on patterns derived from CART and MARS. Honing The Data The results are honed to specific research needs using a proprietary algorithm Dr. Danter developed called CHEMSAS™. This process decomposes complex molecular structures into key elements, teaching Pharmaceutical companies may have as many as a million molecules in their databases. Modeling each molecule and predicting its effectiveness using standard statistical methods is virtually impossible because of the enormous number of variables. Dr. Danter uses CART® (Classification and Regression Trees), a software package from Salford Systems to help build models that isolate the most important variables. Working with public domain, molecular HIV data, Danter trains CART and complementary systems to predict if a given molecular structure is biologically active against a disease. Says Dr. Danter, “Once we have such a model, we can screen almost any molecule with a molecular weight up to 1700 daltons (an atomic mass unit). It’s an area called molecular mining. We’ve developed it as a generic tool, so that if there is a specific target biological activity, we can screen for it.” To build a model, CART generates a binary decision tree based on yes/no answers. It generates nodes until it has created the largest tree that fits the data. This ensures that the node-generating process is not halted too soon and important structures are not overlooked. Figure 3: Summary reports include a variable importance ranking, gains and lift charts and tables, misclassification reports, and an overall summary of all trees grown in a session. T he ability to predict biological activity based on molecular structure is leading researchers to breakthroughs in the most complex challenges of medicine. Using a combination of artificial intelligence tools, Dr. Wayne Danter of Critical Outcome Technologies (London, Ontario, Canada) has developed a method to predict whether specific molecular structures are effective against a disease. Currently under study is the HIV1 virus.