SlideShare une entreprise Scribd logo
1  sur  25
Statistical terms for
Classification in weka
Surabhi Dwivedi
Important Statistics to
select a model
•Confusion Matrix
•TP Rate
•FP Rate
•Precision
•Recall
•F-measure
•ROC Area
•Kappa Statistics
•Test Option for Classifier
– Cross Validation
– Supplied test set
– Percentage split
Surabhi Dwivedi
Confusion Matrix
• In machine learning, a confusion
matrix, also known as a contingency
table or an error matrix
–A table layout that allows visualization of
the performance of an algorithm
–Each column of the matrix represents the
instances in a predicted class
–Each row represents the instances in an
actual class
Surabhi Dwivedi
Confusion Matrix
Surabhi Dwivedi
Sample
Confusion Matrix
Surabhi Dwivedi
Explanation - Terms
Precision (also called positive predictive value)
– the fraction of retrieved instances that are relevant
•Recall (also known as sensitivity/ True Positive)
– the fraction of relevant instances that are retrieved
•Eg:
•A program for recognizing pass /fail identifies 7 PASS in a
scene containing 9 instances . If 4 of the identifications are
correct, but 3 are actually FAIL, the program's precision is 4/7
while its recall is 4/9.
•High precision means that an algorithm returned substantially
more relevant results than irrelevant
•High recall means that an algorithm returned most of the
relevant results.
•F-measure (the weighted harmonic mean of precision and
recall) Surabhi Dwivedi
Surabhi Dwivedi
Explanation - Terms
Precision (also called positive predictive value)
– the fraction of retrieved instances that are relevant
•Recall (also known as sensitivity/ True Positive)
– the fraction of relevant instances that are retrieved
•Eg:
•A program for recognizing pass /fail identifies 7 PASS in a
scene containing 9 instances . If 4 of the identifications are
correct, but 3 are actually FAIL, the program's precision is 4/7
while its recall is 4/9.
•High precision means that an algorithm returned substantially
more relevant results than irrelevant
•High recall means that an algorithm returned most of the
relevant results.
•F-measure (the weighted harmonic mean of precision and
recall) Surabhi Dwivedi
F-score
•In statistical analysis of binary
classification, the F-score or F-measure
–a measure of a test's accuracy.
–It considers both the precision p and the
recall r of the test to compute the score
–a weighted average of the precision and
recall
–an F1 score reaches its best value at 1 and
worst score at 0
Surabhi Dwivedi
Kappa Statistics
Surabhi Dwivedi
Kappa Statistics
•Kappa is a chance-corrected measure of agreement
between the classifications and the true classes.
– A value of 1 implies perfect agreement and values
less than 1 imply less than perfect agreement
• Poor agreement = Less than 0.20
• Fair agreement = 0.20 to 0.40
• Moderate agreement = 0.40 to 0.60
• Good agreement = 0.60 to 0.80
• Very good agreement = 0.80 to 1.00
•Kappa statistic - a mean for evaluating the predication
performance of classifiers
– gives a better indicator of how the classifier
performed across all instances
Surabhi Dwivedi
Kappa Statistics
•The row indicates the true class,
the column indicates the
Actu
al
Predicted
ROC Curve
•ROC curve, is a graphical plot that illustrates the performance
of a binary classifier system as its discrimination threshold is
varied.
•The curve is created by plotting the true positive rate against
the false positive rate at various threshold settings.
•The true-positive rate is also known as sensitivity or recall in
machine learning. The false-positive rate is also known as the
fall-out and can be calculated as 1 - specificity.
•Specificity (sometimes called the true negative rate)
measures the proportion of negatives which are correctly
identified as such (e.g., the percentage of healthy people who
are correctly identified as not having the condition), and is
complementary to the false positive rate.
Surabhi Dwivedi
ROC Curve
Surabhi Dwivedi
Receiver Operating Curve –
Class XII Data
Sample
Receiver Operating Curve –
Class X Data
Sample
ROC Curve
Sample
•In ROC curve, a better model is more
towards upper left corner.
•Output – Threshold Curve
Surabhi Dwivedi
Margin Curve
•The margin curve prints the cumulative frequency of the
difference of actual class probability and the highest
probability predicted for other classes
•for a single class, if it is predicted to be positive with
probability p, the margin is p - (1-p) =2p-1.
•The negative values denote classification errors,
meaning that the dominant class is not the correct one
•Margin contains the margin value (plotted as an x-
coordinate)
•Cumulative contains the count of instances with margin
less than or equal to the current margin (plot as y axis)
Surabhi Dwivedi
Margin Curve
Sample
Margin Curve
Sample
Ideal Output
Sample
Test Options - Classifier
•The test sets are a percentage of the
data that will be used to test whether
the model has learned the concept
properly
•In WEKA you can run an execution
splitting your data set into training
data (to build the tree in the case of
J48) and test data (to test the model in
order to determine that the concept
has been learned). Surabhi Dwivedi
Thank you

Contenu connexe

Similaire à Statistical terms for classification

Classification Assessment Methods.pptx
Classification Assessment  Methods.pptxClassification Assessment  Methods.pptx
Classification Assessment Methods.pptxRiadh Al-Haidari
 
ML-ChapterFour-ModelEvaluation.pptx
ML-ChapterFour-ModelEvaluation.pptxML-ChapterFour-ModelEvaluation.pptx
ML-ChapterFour-ModelEvaluation.pptxbelay41
 
Topic 6 (model specification)
Topic 6 (model specification)Topic 6 (model specification)
Topic 6 (model specification)Ryan Herzog
 
The Chi-Square Statistic: Tests for Goodness of Fit and Independence
The Chi-Square Statistic: Tests for Goodness of Fit and IndependenceThe Chi-Square Statistic: Tests for Goodness of Fit and Independence
The Chi-Square Statistic: Tests for Goodness of Fit and Independencejasondroesch
 
datamining-lect11.pptx
datamining-lect11.pptxdatamining-lect11.pptx
datamining-lect11.pptxRithikRaj25
 
IME 672 - Classifier Evaluation I.pptx
IME 672 - Classifier Evaluation I.pptxIME 672 - Classifier Evaluation I.pptx
IME 672 - Classifier Evaluation I.pptxTemp762476
 
Lecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptxLecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptxGauravSonawane51
 
Business Quantitative Lecture 3
Business Quantitative Lecture 3Business Quantitative Lecture 3
Business Quantitative Lecture 3saark
 
Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and Regressionjasondroesch
 
Introduction To Data Science Using R
Introduction To Data Science Using RIntroduction To Data Science Using R
Introduction To Data Science Using RANURAG SINGH
 
Intro to data science
Intro to data scienceIntro to data science
Intro to data scienceANURAG SINGH
 
Common evaluation measures in NLP and IR
Common evaluation measures in NLP and IRCommon evaluation measures in NLP and IR
Common evaluation measures in NLP and IRRushdi Shams
 
evolution of data mining.pptx
evolution of data mining.pptxevolution of data mining.pptx
evolution of data mining.pptxShimaaIbrahim33
 

Similaire à Statistical terms for classification (20)

Classification Assessment Methods.pptx
Classification Assessment  Methods.pptxClassification Assessment  Methods.pptx
Classification Assessment Methods.pptx
 
ML-ChapterFour-ModelEvaluation.pptx
ML-ChapterFour-ModelEvaluation.pptxML-ChapterFour-ModelEvaluation.pptx
ML-ChapterFour-ModelEvaluation.pptx
 
Topic 6 (model specification)
Topic 6 (model specification)Topic 6 (model specification)
Topic 6 (model specification)
 
The Chi-Square Statistic: Tests for Goodness of Fit and Independence
The Chi-Square Statistic: Tests for Goodness of Fit and IndependenceThe Chi-Square Statistic: Tests for Goodness of Fit and Independence
The Chi-Square Statistic: Tests for Goodness of Fit and Independence
 
datamining-lect11.pptx
datamining-lect11.pptxdatamining-lect11.pptx
datamining-lect11.pptx
 
IME 672 - Classifier Evaluation I.pptx
IME 672 - Classifier Evaluation I.pptxIME 672 - Classifier Evaluation I.pptx
IME 672 - Classifier Evaluation I.pptx
 
Lecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptxLecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptx
 
Chapter6
Chapter6Chapter6
Chapter6
 
Business Quantitative Lecture 3
Business Quantitative Lecture 3Business Quantitative Lecture 3
Business Quantitative Lecture 3
 
Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and Regression
 
Hm306 week 5
Hm306 week 5Hm306 week 5
Hm306 week 5
 
Hm306 week 5
Hm306 week 5Hm306 week 5
Hm306 week 5
 
UNIT 5.pptx
UNIT 5.pptxUNIT 5.pptx
UNIT 5.pptx
 
Introduction To Data Science Using R
Introduction To Data Science Using RIntroduction To Data Science Using R
Introduction To Data Science Using R
 
Intro to data science
Intro to data scienceIntro to data science
Intro to data science
 
Multiple Regression.ppt
Multiple Regression.pptMultiple Regression.ppt
Multiple Regression.ppt
 
Multiple Regression.ppt
Multiple Regression.pptMultiple Regression.ppt
Multiple Regression.ppt
 
Common evaluation measures in NLP and IR
Common evaluation measures in NLP and IRCommon evaluation measures in NLP and IR
Common evaluation measures in NLP and IR
 
evolution of data mining.pptx
evolution of data mining.pptxevolution of data mining.pptx
evolution of data mining.pptx
 
The Chi Square Test
The Chi Square TestThe Chi Square Test
The Chi Square Test
 

Dernier

How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 

Dernier (20)

How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 

Statistical terms for classification

  • 1. Statistical terms for Classification in weka Surabhi Dwivedi
  • 2. Important Statistics to select a model •Confusion Matrix •TP Rate •FP Rate •Precision •Recall •F-measure •ROC Area •Kappa Statistics •Test Option for Classifier – Cross Validation – Supplied test set – Percentage split Surabhi Dwivedi
  • 3. Confusion Matrix • In machine learning, a confusion matrix, also known as a contingency table or an error matrix –A table layout that allows visualization of the performance of an algorithm –Each column of the matrix represents the instances in a predicted class –Each row represents the instances in an actual class Surabhi Dwivedi
  • 6.
  • 8. Explanation - Terms Precision (also called positive predictive value) – the fraction of retrieved instances that are relevant •Recall (also known as sensitivity/ True Positive) – the fraction of relevant instances that are retrieved •Eg: •A program for recognizing pass /fail identifies 7 PASS in a scene containing 9 instances . If 4 of the identifications are correct, but 3 are actually FAIL, the program's precision is 4/7 while its recall is 4/9. •High precision means that an algorithm returned substantially more relevant results than irrelevant •High recall means that an algorithm returned most of the relevant results. •F-measure (the weighted harmonic mean of precision and recall) Surabhi Dwivedi Surabhi Dwivedi
  • 9. Explanation - Terms Precision (also called positive predictive value) – the fraction of retrieved instances that are relevant •Recall (also known as sensitivity/ True Positive) – the fraction of relevant instances that are retrieved •Eg: •A program for recognizing pass /fail identifies 7 PASS in a scene containing 9 instances . If 4 of the identifications are correct, but 3 are actually FAIL, the program's precision is 4/7 while its recall is 4/9. •High precision means that an algorithm returned substantially more relevant results than irrelevant •High recall means that an algorithm returned most of the relevant results. •F-measure (the weighted harmonic mean of precision and recall) Surabhi Dwivedi
  • 10. F-score •In statistical analysis of binary classification, the F-score or F-measure –a measure of a test's accuracy. –It considers both the precision p and the recall r of the test to compute the score –a weighted average of the precision and recall –an F1 score reaches its best value at 1 and worst score at 0 Surabhi Dwivedi
  • 12. Kappa Statistics •Kappa is a chance-corrected measure of agreement between the classifications and the true classes. – A value of 1 implies perfect agreement and values less than 1 imply less than perfect agreement • Poor agreement = Less than 0.20 • Fair agreement = 0.20 to 0.40 • Moderate agreement = 0.40 to 0.60 • Good agreement = 0.60 to 0.80 • Very good agreement = 0.80 to 1.00 •Kappa statistic - a mean for evaluating the predication performance of classifiers – gives a better indicator of how the classifier performed across all instances Surabhi Dwivedi
  • 13. Kappa Statistics •The row indicates the true class, the column indicates the Actu al Predicted
  • 14. ROC Curve •ROC curve, is a graphical plot that illustrates the performance of a binary classifier system as its discrimination threshold is varied. •The curve is created by plotting the true positive rate against the false positive rate at various threshold settings. •The true-positive rate is also known as sensitivity or recall in machine learning. The false-positive rate is also known as the fall-out and can be calculated as 1 - specificity. •Specificity (sometimes called the true negative rate) measures the proportion of negatives which are correctly identified as such (e.g., the percentage of healthy people who are correctly identified as not having the condition), and is complementary to the false positive rate. Surabhi Dwivedi
  • 16. Receiver Operating Curve – Class XII Data Sample
  • 17. Receiver Operating Curve – Class X Data Sample
  • 19. •In ROC curve, a better model is more towards upper left corner. •Output – Threshold Curve Surabhi Dwivedi
  • 20. Margin Curve •The margin curve prints the cumulative frequency of the difference of actual class probability and the highest probability predicted for other classes •for a single class, if it is predicted to be positive with probability p, the margin is p - (1-p) =2p-1. •The negative values denote classification errors, meaning that the dominant class is not the correct one •Margin contains the margin value (plotted as an x- coordinate) •Cumulative contains the count of instances with margin less than or equal to the current margin (plot as y axis) Surabhi Dwivedi
  • 24. Test Options - Classifier •The test sets are a percentage of the data that will be used to test whether the model has learned the concept properly •In WEKA you can run an execution splitting your data set into training data (to build the tree in the case of J48) and test data (to test the model in order to determine that the concept has been learned). Surabhi Dwivedi