SlideShare une entreprise Scribd logo
1  sur  11
Télécharger pour lire hors ligne
Guide for reproducing
results of Bioassay paper
using Weka
Important points to remember before
starting a run:
   All datasets should be in ARFF format, otherwise weka will complain for incompatible
    format during training and testing.
   Standard classifiers are used for confirmatory screen data as it is smaller and less im-
    balanced, whereas cost-sensitive classifiers are used with primary & mixed datasets as
    they are more imbalanced.
   We have two goals-
       1. To find most robust and versatile classifier for imbalanced bioassay data.
       2. To find out optimal misclassification cost setting for a classifier.
   The misclassification cost for False Negatives has to be set in order to achieve maxi-
    mum number of True Positives with a False Positive rate less than 20%.
   The datasets are randomly split into 80% training and validation set and 20% independ-
    ent test set, so we should have two files for each dataset one for training the classifier
    and one for testing the model built by that classifier.
   Use 5 fold cross-validation for larger datasets i.e. primary and mixed screens and use
    10 fold cross–validation for smaller datasets i.e. confirmatory screens.
   CostSensitiveClassifier is used for base classifiers Naïve Bayes, SMO (Sequential Minimal
    Optimization) and Random Forest, as it outperforms other meta-learners.
   MetaCost with J48 produces bettet results than other meta-learners.
   For Naïve Bayes and Random Forest, default options are used.
   For SMO, option BuildLogisticModels was set to true.
   For J48, option Unpruned was set to true.
   For more details please refer the paper.
Step wise guide to set-up a weka run:
1. Start weka explorer.
2. In Preprocess tab go to open file…
3. Open a training file in ARFF format.




                                              Click open




4. For example, AID1608red_train.arff.
5. After opening the file should look like:
6. Now click on classify tab in the menu bar.
7. We will first train a model using Naïve Bayes classifier, as we are using confirmatory
  screen AID1608 we will first apply standard classifiers and if there will be less than 20%
  False Positive rate than cost-sensitive classifiers is used.
8. Click on Choose button to select a classifier. From Bayes folder choose Naïve Bayes.




9. Your window should appear as below with cross-validation selected with 10 folds:
10. Now click on start button, model will start building.
11. Since we have used 10 fold cross-validation so it will build models for 10 folds.




                               Check status here




               Run completed
12. Look at the output section scroll to bottom section as shown:




13. This is the model generated by Naïve Bayes classifier by using training set
    AID1608red_train.
14. Next step is to test this model on the independent test set AID1608red_test.
15. Go to section test options select Supplied test set and click on set.
16. Open the test file AID1608red_test.
17. After reading the file close the Test instances dialog by clicking on close.
18. Now right-click on your model in result list and choose Re-evaluate model on current
test set.




                                      Click here
19. Within fraction of a second results are produced in the same output window.




                            False positive


         True positive



                                             False negative
                         True negative




20. We have obtained a False Positive rate of 14.5% which is less than 20% and a True posi-
tive rate of 15.4% which is very low. Now, we will set cost-sensitive classifier to improve
the results.
21. As mentioned in page 2 of this tutorial for Naïve Bayes we will use Weka’s CostSensi-
tiveClassifier.
22. The author has used incremental costing where cost was increased in stages from 2 to
    1000000, until a 20% False positive rate was reached.
23. So, we will set up a cost matrix by starting with a misclassification cost of 2.
24. Go to choose button, select CostSensitiveClassifier from meta folder.




25. Click on the text box to open the GenericObjectEditor dialog box as shown:




     Click here and this
    dialog box will open
             up
26. In this dialog box, select Naïve Bayes from choose classifier.
27. Next, click on costMatrix to set up misclassification cost.




28. We have 2 classes in our dataset i.e. actives and inactives so we will set up a 2X2
     Matrix. ( For TP, FP, TN, FN).




   In classes enter 2.
   Click resize to cre-
ate a 2X2 matrix.
   Change misclassi-
fication cost for false
negatives to 2.
   Then close the
dialog box.




                                                                              Write 2 in place of 1
29. Leave all other options default and now close GenericObjectEditor dialog by clicking OK
30. Click start to begin building cost-sensitive model.
31. Repeat steps 13-19 as described above for testing.




32. See improved results, True Positives has increased within a 20% limit for False
    Positives.
33. We stop here as we have achieved our goal.
34. Similarly, you can build models using SMO, Random Forest and J48. Check their
    settings as mentioned on page 2 of this tutorial before starting the run.

Contenu connexe

Tendances

Slides for a workshop to build the pharma competition Living Business Model
Slides for a workshop to build the pharma competition Living Business ModelSlides for a workshop to build the pharma competition Living Business Model
Slides for a workshop to build the pharma competition Living Business ModelKim Warren
 
One sample t test (procedure and output in SPSS)
One sample t test (procedure and output in SPSS)One sample t test (procedure and output in SPSS)
One sample t test (procedure and output in SPSS)Unexplord Solutions LLP
 
Paired sample t test (procedure and output)
Paired sample t test (procedure and output)Paired sample t test (procedure and output)
Paired sample t test (procedure and output)Unexplord Solutions LLP
 
One way anova in spss (procedure and output)
One way anova in spss (procedure and output)One way anova in spss (procedure and output)
One way anova in spss (procedure and output)Unexplord Solutions LLP
 
Independent sample t test in spss (procedure and output)
Independent sample t test in spss (procedure and output)Independent sample t test in spss (procedure and output)
Independent sample t test in spss (procedure and output)Unexplord Solutions LLP
 

Tendances (10)

Slides for a workshop to build the pharma competition Living Business Model
Slides for a workshop to build the pharma competition Living Business ModelSlides for a workshop to build the pharma competition Living Business Model
Slides for a workshop to build the pharma competition Living Business Model
 
One sample t test (procedure and output in SPSS)
One sample t test (procedure and output in SPSS)One sample t test (procedure and output in SPSS)
One sample t test (procedure and output in SPSS)
 
Paired sample t test (procedure and output)
Paired sample t test (procedure and output)Paired sample t test (procedure and output)
Paired sample t test (procedure and output)
 
One way anova in spss (procedure and output)
One way anova in spss (procedure and output)One way anova in spss (procedure and output)
One way anova in spss (procedure and output)
 
Independent sample t test in spss (procedure and output)
Independent sample t test in spss (procedure and output)Independent sample t test in spss (procedure and output)
Independent sample t test in spss (procedure and output)
 
Basic abap oo
Basic abap ooBasic abap oo
Basic abap oo
 
XL-MINER:Partition
XL-MINER:PartitionXL-MINER:Partition
XL-MINER:Partition
 
GIMP BASICS by Aedam Ampongan
GIMP BASICS by Aedam AmponganGIMP BASICS by Aedam Ampongan
GIMP BASICS by Aedam Ampongan
 
XL-MINER: Data Utilities
XL-MINER: Data UtilitiesXL-MINER: Data Utilities
XL-MINER: Data Utilities
 
Multiply-and-divide-in-excel
Multiply-and-divide-in-excelMultiply-and-divide-in-excel
Multiply-and-divide-in-excel
 

En vedette

Consumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestConsumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestHirak Sen Roy
 
Test
TestTest
Testrofop
 
SPIPNOZ 2013 : le plugin evaluations
SPIPNOZ 2013 : le plugin evaluationsSPIPNOZ 2013 : le plugin evaluations
SPIPNOZ 2013 : le plugin evaluationsCyril Marion
 
Parameter Optimisation for Automated Feature Point Detection
Parameter Optimisation for Automated Feature Point DetectionParameter Optimisation for Automated Feature Point Detection
Parameter Optimisation for Automated Feature Point DetectionDario Panada
 
Conistency of random forests
Conistency of random forestsConistency of random forests
Conistency of random forestsHoang Nguyen
 
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-LearnAccelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-LearnGilles Louppe
 
CVPR2015 reading "Global refinement of random forest"
CVPR2015 reading "Global refinement of random forest"CVPR2015 reading "Global refinement of random forest"
CVPR2015 reading "Global refinement of random forest"Akisato Kimura
 
Random forest
Random forestRandom forest
Random forestUjjawal
 

En vedette (9)

Consumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestConsumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random Forest
 
Test
TestTest
Test
 
SPIPNOZ 2013 : le plugin evaluations
SPIPNOZ 2013 : le plugin evaluationsSPIPNOZ 2013 : le plugin evaluations
SPIPNOZ 2013 : le plugin evaluations
 
Parameter Optimisation for Automated Feature Point Detection
Parameter Optimisation for Automated Feature Point DetectionParameter Optimisation for Automated Feature Point Detection
Parameter Optimisation for Automated Feature Point Detection
 
Conistency of random forests
Conistency of random forestsConistency of random forests
Conistency of random forests
 
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-LearnAccelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
 
CVPR2015 reading "Global refinement of random forest"
CVPR2015 reading "Global refinement of random forest"CVPR2015 reading "Global refinement of random forest"
CVPR2015 reading "Global refinement of random forest"
 
Random forest
Random forestRandom forest
Random forest
 
Random forest
Random forestRandom forest
Random forest
 

Similaire à Weka guide

AI Builder - Text Classification
AI Builder - Text ClassificationAI Builder - Text Classification
AI Builder - Text ClassificationCheah Eng Soon
 
Normal Modal Analysis in Hypermesh
Normal Modal Analysis in HypermeshNormal Modal Analysis in Hypermesh
Normal Modal Analysis in HypermeshRahul Shedage
 
Lab 10.doc
Lab 10.docLab 10.doc
Lab 10.docbutest
 
Lab 10.doc
Lab 10.docLab 10.doc
Lab 10.docbutest
 
Bank of pecunia mortgage risk model
Bank of pecunia mortgage risk modelBank of pecunia mortgage risk model
Bank of pecunia mortgage risk modelRui Cao
 
Easy Pivot Tutorial June 2020
Easy Pivot Tutorial June 2020Easy Pivot Tutorial June 2020
Easy Pivot Tutorial June 2020Adhi Wikantyoso
 
CedCommerce Walmart Marketplace Repricer Extension for Magento Store
CedCommerce Walmart Marketplace Repricer Extension for Magento StoreCedCommerce Walmart Marketplace Repricer Extension for Magento Store
CedCommerce Walmart Marketplace Repricer Extension for Magento StoreCedCommerce
 
Weka Term Paper_VGSoM_10BM60011
Weka Term Paper_VGSoM_10BM60011Weka Term Paper_VGSoM_10BM60011
Weka Term Paper_VGSoM_10BM60011Amu Singh
 
AI Builder - Binary Classification
AI Builder - Binary ClassificationAI Builder - Binary Classification
AI Builder - Binary ClassificationCheah Eng Soon
 
Scoring documentation
Scoring documentationScoring documentation
Scoring documentationFatima Khalid
 
Advanced Computer Programming..pptx
Advanced Computer Programming..pptxAdvanced Computer Programming..pptx
Advanced Computer Programming..pptxKrishanthaRanaweera1
 
Weka term paper(siddharth 10 bm60086)
Weka term paper(siddharth 10 bm60086)Weka term paper(siddharth 10 bm60086)
Weka term paper(siddharth 10 bm60086)Siddharth Verma
 
CIS 1403 lab 4 selection
CIS 1403 lab 4 selectionCIS 1403 lab 4 selection
CIS 1403 lab 4 selectionHamad Odhabi
 
Blackboxtesting 02 An Example Test Series
Blackboxtesting 02 An Example Test SeriesBlackboxtesting 02 An Example Test Series
Blackboxtesting 02 An Example Test Seriesnazeer pasha
 

Similaire à Weka guide (20)

AI Builder - Text Classification
AI Builder - Text ClassificationAI Builder - Text Classification
AI Builder - Text Classification
 
OLT open script
OLT open script OLT open script
OLT open script
 
Normal Modal Analysis in Hypermesh
Normal Modal Analysis in HypermeshNormal Modal Analysis in Hypermesh
Normal Modal Analysis in Hypermesh
 
Lab report watson
Lab report watsonLab report watson
Lab report watson
 
Lab 10.doc
Lab 10.docLab 10.doc
Lab 10.doc
 
Lab 10.doc
Lab 10.docLab 10.doc
Lab 10.doc
 
Bank of pecunia mortgage risk model
Bank of pecunia mortgage risk modelBank of pecunia mortgage risk model
Bank of pecunia mortgage risk model
 
Easy Pivot Tutorial June 2020
Easy Pivot Tutorial June 2020Easy Pivot Tutorial June 2020
Easy Pivot Tutorial June 2020
 
Tutorials.pdf
Tutorials.pdfTutorials.pdf
Tutorials.pdf
 
CedCommerce Walmart Marketplace Repricer Extension for Magento Store
CedCommerce Walmart Marketplace Repricer Extension for Magento StoreCedCommerce Walmart Marketplace Repricer Extension for Magento Store
CedCommerce Walmart Marketplace Repricer Extension for Magento Store
 
Weka Term Paper_VGSoM_10BM60011
Weka Term Paper_VGSoM_10BM60011Weka Term Paper_VGSoM_10BM60011
Weka Term Paper_VGSoM_10BM60011
 
AI Builder - Binary Classification
AI Builder - Binary ClassificationAI Builder - Binary Classification
AI Builder - Binary Classification
 
Scoring documentation
Scoring documentationScoring documentation
Scoring documentation
 
Advanced Computer Programming..pptx
Advanced Computer Programming..pptxAdvanced Computer Programming..pptx
Advanced Computer Programming..pptx
 
Predictive Modeling with Enterprise Miner
Predictive Modeling with Enterprise MinerPredictive Modeling with Enterprise Miner
Predictive Modeling with Enterprise Miner
 
Predictive Modeling with Enterprise Miner
Predictive Modeling with Enterprise MinerPredictive Modeling with Enterprise Miner
Predictive Modeling with Enterprise Miner
 
Weka term paper(siddharth 10 bm60086)
Weka term paper(siddharth 10 bm60086)Weka term paper(siddharth 10 bm60086)
Weka term paper(siddharth 10 bm60086)
 
CIS 1403 lab 4 selection
CIS 1403 lab 4 selectionCIS 1403 lab 4 selection
CIS 1403 lab 4 selection
 
How to prevent duplicate values in a range nta
How to prevent duplicate values in a range ntaHow to prevent duplicate values in a range nta
How to prevent duplicate values in a range nta
 
Blackboxtesting 02 An Example Test Series
Blackboxtesting 02 An Example Test SeriesBlackboxtesting 02 An Example Test Series
Blackboxtesting 02 An Example Test Series
 

Plus de Abhik Seal

Clinicaldataanalysis in r
Clinicaldataanalysis in rClinicaldataanalysis in r
Clinicaldataanalysis in rAbhik Seal
 
Virtual Screening in Drug Discovery
Virtual Screening in Drug DiscoveryVirtual Screening in Drug Discovery
Virtual Screening in Drug DiscoveryAbhik Seal
 
Data manipulation on r
Data manipulation on rData manipulation on r
Data manipulation on rAbhik Seal
 
Data handling in r
Data handling in rData handling in r
Data handling in rAbhik Seal
 
Modeling Chemical Datasets
Modeling Chemical DatasetsModeling Chemical Datasets
Modeling Chemical DatasetsAbhik Seal
 
Introduction to Adverse Drug Reactions
Introduction to Adverse Drug ReactionsIntroduction to Adverse Drug Reactions
Introduction to Adverse Drug ReactionsAbhik Seal
 
Mapping protein to function
Mapping protein to functionMapping protein to function
Mapping protein to functionAbhik Seal
 
Sequencedatabases
SequencedatabasesSequencedatabases
SequencedatabasesAbhik Seal
 
Chemical File Formats for storing chemical data
Chemical File Formats for storing chemical dataChemical File Formats for storing chemical data
Chemical File Formats for storing chemical dataAbhik Seal
 
Understanding Smiles
Understanding Smiles Understanding Smiles
Understanding Smiles Abhik Seal
 
Learning chemistry with google
Learning chemistry with googleLearning chemistry with google
Learning chemistry with googleAbhik Seal
 
3 d virtual screening of pknb inhibitors using data
3 d virtual screening of pknb inhibitors using data3 d virtual screening of pknb inhibitors using data
3 d virtual screening of pknb inhibitors using dataAbhik Seal
 
R scatter plots
R scatter plotsR scatter plots
R scatter plotsAbhik Seal
 
Q plot tutorial
Q plot tutorialQ plot tutorial
Q plot tutorialAbhik Seal
 
Pharmacohoreppt
PharmacohorepptPharmacohoreppt
PharmacohorepptAbhik Seal
 

Plus de Abhik Seal (20)

Chemical data
Chemical dataChemical data
Chemical data
 
Clinicaldataanalysis in r
Clinicaldataanalysis in rClinicaldataanalysis in r
Clinicaldataanalysis in r
 
Virtual Screening in Drug Discovery
Virtual Screening in Drug DiscoveryVirtual Screening in Drug Discovery
Virtual Screening in Drug Discovery
 
Data manipulation on r
Data manipulation on rData manipulation on r
Data manipulation on r
 
Data handling in r
Data handling in rData handling in r
Data handling in r
 
Networks
NetworksNetworks
Networks
 
Modeling Chemical Datasets
Modeling Chemical DatasetsModeling Chemical Datasets
Modeling Chemical Datasets
 
Introduction to Adverse Drug Reactions
Introduction to Adverse Drug ReactionsIntroduction to Adverse Drug Reactions
Introduction to Adverse Drug Reactions
 
Mapping protein to function
Mapping protein to functionMapping protein to function
Mapping protein to function
 
Sequencedatabases
SequencedatabasesSequencedatabases
Sequencedatabases
 
Chemical File Formats for storing chemical data
Chemical File Formats for storing chemical dataChemical File Formats for storing chemical data
Chemical File Formats for storing chemical data
 
Understanding Smiles
Understanding Smiles Understanding Smiles
Understanding Smiles
 
Learning chemistry with google
Learning chemistry with googleLearning chemistry with google
Learning chemistry with google
 
3 d virtual screening of pknb inhibitors using data
3 d virtual screening of pknb inhibitors using data3 d virtual screening of pknb inhibitors using data
3 d virtual screening of pknb inhibitors using data
 
Poster
PosterPoster
Poster
 
R scatter plots
R scatter plotsR scatter plots
R scatter plots
 
Indo us 2012
Indo us 2012Indo us 2012
Indo us 2012
 
Q plot tutorial
Q plot tutorialQ plot tutorial
Q plot tutorial
 
Pharmacohoreppt
PharmacohorepptPharmacohoreppt
Pharmacohoreppt
 
Document1
Document1Document1
Document1
 

Dernier

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 

Dernier (20)

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 

Weka guide

  • 1. Guide for reproducing results of Bioassay paper using Weka
  • 2. Important points to remember before starting a run:  All datasets should be in ARFF format, otherwise weka will complain for incompatible format during training and testing.  Standard classifiers are used for confirmatory screen data as it is smaller and less im- balanced, whereas cost-sensitive classifiers are used with primary & mixed datasets as they are more imbalanced.  We have two goals- 1. To find most robust and versatile classifier for imbalanced bioassay data. 2. To find out optimal misclassification cost setting for a classifier.  The misclassification cost for False Negatives has to be set in order to achieve maxi- mum number of True Positives with a False Positive rate less than 20%.  The datasets are randomly split into 80% training and validation set and 20% independ- ent test set, so we should have two files for each dataset one for training the classifier and one for testing the model built by that classifier.  Use 5 fold cross-validation for larger datasets i.e. primary and mixed screens and use 10 fold cross–validation for smaller datasets i.e. confirmatory screens.  CostSensitiveClassifier is used for base classifiers Naïve Bayes, SMO (Sequential Minimal Optimization) and Random Forest, as it outperforms other meta-learners.  MetaCost with J48 produces bettet results than other meta-learners.  For Naïve Bayes and Random Forest, default options are used.  For SMO, option BuildLogisticModels was set to true.  For J48, option Unpruned was set to true.  For more details please refer the paper.
  • 3. Step wise guide to set-up a weka run: 1. Start weka explorer. 2. In Preprocess tab go to open file… 3. Open a training file in ARFF format. Click open 4. For example, AID1608red_train.arff. 5. After opening the file should look like:
  • 4. 6. Now click on classify tab in the menu bar. 7. We will first train a model using Naïve Bayes classifier, as we are using confirmatory screen AID1608 we will first apply standard classifiers and if there will be less than 20% False Positive rate than cost-sensitive classifiers is used. 8. Click on Choose button to select a classifier. From Bayes folder choose Naïve Bayes. 9. Your window should appear as below with cross-validation selected with 10 folds:
  • 5. 10. Now click on start button, model will start building. 11. Since we have used 10 fold cross-validation so it will build models for 10 folds. Check status here Run completed
  • 6. 12. Look at the output section scroll to bottom section as shown: 13. This is the model generated by Naïve Bayes classifier by using training set AID1608red_train. 14. Next step is to test this model on the independent test set AID1608red_test. 15. Go to section test options select Supplied test set and click on set. 16. Open the test file AID1608red_test.
  • 7. 17. After reading the file close the Test instances dialog by clicking on close. 18. Now right-click on your model in result list and choose Re-evaluate model on current test set. Click here
  • 8. 19. Within fraction of a second results are produced in the same output window. False positive True positive False negative True negative 20. We have obtained a False Positive rate of 14.5% which is less than 20% and a True posi- tive rate of 15.4% which is very low. Now, we will set cost-sensitive classifier to improve the results. 21. As mentioned in page 2 of this tutorial for Naïve Bayes we will use Weka’s CostSensi- tiveClassifier. 22. The author has used incremental costing where cost was increased in stages from 2 to 1000000, until a 20% False positive rate was reached. 23. So, we will set up a cost matrix by starting with a misclassification cost of 2.
  • 9. 24. Go to choose button, select CostSensitiveClassifier from meta folder. 25. Click on the text box to open the GenericObjectEditor dialog box as shown: Click here and this dialog box will open up
  • 10. 26. In this dialog box, select Naïve Bayes from choose classifier. 27. Next, click on costMatrix to set up misclassification cost. 28. We have 2 classes in our dataset i.e. actives and inactives so we will set up a 2X2 Matrix. ( For TP, FP, TN, FN).  In classes enter 2.  Click resize to cre- ate a 2X2 matrix.  Change misclassi- fication cost for false negatives to 2.  Then close the dialog box. Write 2 in place of 1
  • 11. 29. Leave all other options default and now close GenericObjectEditor dialog by clicking OK 30. Click start to begin building cost-sensitive model. 31. Repeat steps 13-19 as described above for testing. 32. See improved results, True Positives has increased within a 20% limit for False Positives. 33. We stop here as we have achieved our goal. 34. Similarly, you can build models using SMO, Random Forest and J48. Check their settings as mentioned on page 2 of this tutorial before starting the run.