SlideShare une entreprise Scribd logo
1  sur  25
Statistical terms for
Classification in weka
Surabhi Dwivedi
Important Statistics to
select a model
•Confusion Matrix
•TP Rate
•FP Rate
•Precision
•Recall
•F-measure
•ROC Area
•Kappa Statistics
•Test Option for Classifier
– Cross Validation
– Supplied test set
– Percentage split
Surabhi Dwivedi
Confusion Matrix
• In machine learning, a confusion
matrix, also known as a contingency
table or an error matrix
–A table layout that allows visualization of
the performance of an algorithm
–Each column of the matrix represents the
instances in a predicted class
–Each row represents the instances in an
actual class
Surabhi Dwivedi
Confusion Matrix
Surabhi Dwivedi
Sample
Confusion Matrix
Surabhi Dwivedi
Explanation - Terms
Precision (also called positive predictive value)
– the fraction of retrieved instances that are relevant
•Recall (also known as sensitivity/ True Positive)
– the fraction of relevant instances that are retrieved
•Eg:
•A program for recognizing pass /fail identifies 7 PASS in a
scene containing 9 instances . If 4 of the identifications are
correct, but 3 are actually FAIL, the program's precision is 4/7
while its recall is 4/9.
•High precision means that an algorithm returned substantially
more relevant results than irrelevant
•High recall means that an algorithm returned most of the
relevant results.
•F-measure (the weighted harmonic mean of precision and
recall) Surabhi Dwivedi
Surabhi Dwivedi
Explanation - Terms
Precision (also called positive predictive value)
– the fraction of retrieved instances that are relevant
•Recall (also known as sensitivity/ True Positive)
– the fraction of relevant instances that are retrieved
•Eg:
•A program for recognizing pass /fail identifies 7 PASS in a
scene containing 9 instances . If 4 of the identifications are
correct, but 3 are actually FAIL, the program's precision is 4/7
while its recall is 4/9.
•High precision means that an algorithm returned substantially
more relevant results than irrelevant
•High recall means that an algorithm returned most of the
relevant results.
•F-measure (the weighted harmonic mean of precision and
recall) Surabhi Dwivedi
F-score
•In statistical analysis of binary
classification, the F-score or F-measure
–a measure of a test's accuracy.
–It considers both the precision p and the
recall r of the test to compute the score
–a weighted average of the precision and
recall
–an F1 score reaches its best value at 1 and
worst score at 0
Surabhi Dwivedi
Kappa Statistics
Surabhi Dwivedi
Kappa Statistics
•Kappa is a chance-corrected measure of agreement
between the classifications and the true classes.
– A value of 1 implies perfect agreement and values
less than 1 imply less than perfect agreement
• Poor agreement = Less than 0.20
• Fair agreement = 0.20 to 0.40
• Moderate agreement = 0.40 to 0.60
• Good agreement = 0.60 to 0.80
• Very good agreement = 0.80 to 1.00
•Kappa statistic - a mean for evaluating the predication
performance of classifiers
– gives a better indicator of how the classifier
performed across all instances
Surabhi Dwivedi
Kappa Statistics
•The row indicates the true class,
the column indicates the
Actu
al
Predicted
ROC Curve
•ROC curve, is a graphical plot that illustrates the performance
of a binary classifier system as its discrimination threshold is
varied.
•The curve is created by plotting the true positive rate against
the false positive rate at various threshold settings.
•The true-positive rate is also known as sensitivity or recall in
machine learning. The false-positive rate is also known as the
fall-out and can be calculated as 1 - specificity.
•Specificity (sometimes called the true negative rate)
measures the proportion of negatives which are correctly
identified as such (e.g., the percentage of healthy people who
are correctly identified as not having the condition), and is
complementary to the false positive rate.
Surabhi Dwivedi
ROC Curve
Surabhi Dwivedi
Receiver Operating Curve –
Class XII Data
Sample
Receiver Operating Curve –
Class X Data
Sample
ROC Curve
Sample
•In ROC curve, a better model is more
towards upper left corner.
•Output – Threshold Curve
Surabhi Dwivedi
Margin Curve
•The margin curve prints the cumulative frequency of the
difference of actual class probability and the highest
probability predicted for other classes
•for a single class, if it is predicted to be positive with
probability p, the margin is p - (1-p) =2p-1.
•The negative values denote classification errors,
meaning that the dominant class is not the correct one
•Margin contains the margin value (plotted as an x-
coordinate)
•Cumulative contains the count of instances with margin
less than or equal to the current margin (plot as y axis)
Surabhi Dwivedi
Margin Curve
Sample
Margin Curve
Sample
Ideal Output
Sample
Test Options - Classifier
•The test sets are a percentage of the
data that will be used to test whether
the model has learned the concept
properly
•In WEKA you can run an execution
splitting your data set into training
data (to build the tree in the case of
J48) and test data (to test the model in
order to determine that the concept
has been learned). Surabhi Dwivedi
Thank you

Contenu connexe

Similaire à Statistical terms for classification

Classification Assessment Methods.pptx
Classification Assessment  Methods.pptxClassification Assessment  Methods.pptx
Classification Assessment Methods.pptxRiadh Al-Haidari
 
ML-ChapterFour-ModelEvaluation.pptx
ML-ChapterFour-ModelEvaluation.pptxML-ChapterFour-ModelEvaluation.pptx
ML-ChapterFour-ModelEvaluation.pptxbelay41
 
Topic 6 (model specification)
Topic 6 (model specification)Topic 6 (model specification)
Topic 6 (model specification)Ryan Herzog
 
The Chi-Square Statistic: Tests for Goodness of Fit and Independence
The Chi-Square Statistic: Tests for Goodness of Fit and IndependenceThe Chi-Square Statistic: Tests for Goodness of Fit and Independence
The Chi-Square Statistic: Tests for Goodness of Fit and Independencejasondroesch
 
datamining-lect11.pptx
datamining-lect11.pptxdatamining-lect11.pptx
datamining-lect11.pptxRithikRaj25
 
IME 672 - Classifier Evaluation I.pptx
IME 672 - Classifier Evaluation I.pptxIME 672 - Classifier Evaluation I.pptx
IME 672 - Classifier Evaluation I.pptxTemp762476
 
Lecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptxLecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptxGauravSonawane51
 
Business Quantitative Lecture 3
Business Quantitative Lecture 3Business Quantitative Lecture 3
Business Quantitative Lecture 3saark
 
Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and Regressionjasondroesch
 
Intro to data science
Intro to data scienceIntro to data science
Intro to data scienceANURAG SINGH
 
Introduction To Data Science Using R
Introduction To Data Science Using RIntroduction To Data Science Using R
Introduction To Data Science Using RANURAG SINGH
 
Common evaluation measures in NLP and IR
Common evaluation measures in NLP and IRCommon evaluation measures in NLP and IR
Common evaluation measures in NLP and IRRushdi Shams
 
evolution of data mining.pptx
evolution of data mining.pptxevolution of data mining.pptx
evolution of data mining.pptxShimaaIbrahim33
 

Similaire à Statistical terms for classification (20)

Classification Assessment Methods.pptx
Classification Assessment  Methods.pptxClassification Assessment  Methods.pptx
Classification Assessment Methods.pptx
 
ML-ChapterFour-ModelEvaluation.pptx
ML-ChapterFour-ModelEvaluation.pptxML-ChapterFour-ModelEvaluation.pptx
ML-ChapterFour-ModelEvaluation.pptx
 
Topic 6 (model specification)
Topic 6 (model specification)Topic 6 (model specification)
Topic 6 (model specification)
 
The Chi-Square Statistic: Tests for Goodness of Fit and Independence
The Chi-Square Statistic: Tests for Goodness of Fit and IndependenceThe Chi-Square Statistic: Tests for Goodness of Fit and Independence
The Chi-Square Statistic: Tests for Goodness of Fit and Independence
 
datamining-lect11.pptx
datamining-lect11.pptxdatamining-lect11.pptx
datamining-lect11.pptx
 
IME 672 - Classifier Evaluation I.pptx
IME 672 - Classifier Evaluation I.pptxIME 672 - Classifier Evaluation I.pptx
IME 672 - Classifier Evaluation I.pptx
 
Lecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptxLecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptx
 
Chapter6
Chapter6Chapter6
Chapter6
 
Business Quantitative Lecture 3
Business Quantitative Lecture 3Business Quantitative Lecture 3
Business Quantitative Lecture 3
 
Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and Regression
 
Hm306 week 5
Hm306 week 5Hm306 week 5
Hm306 week 5
 
Hm306 week 5
Hm306 week 5Hm306 week 5
Hm306 week 5
 
UNIT 5.pptx
UNIT 5.pptxUNIT 5.pptx
UNIT 5.pptx
 
Intro to data science
Intro to data scienceIntro to data science
Intro to data science
 
Introduction To Data Science Using R
Introduction To Data Science Using RIntroduction To Data Science Using R
Introduction To Data Science Using R
 
Multiple Regression.ppt
Multiple Regression.pptMultiple Regression.ppt
Multiple Regression.ppt
 
Multiple Regression.ppt
Multiple Regression.pptMultiple Regression.ppt
Multiple Regression.ppt
 
Common evaluation measures in NLP and IR
Common evaluation measures in NLP and IRCommon evaluation measures in NLP and IR
Common evaluation measures in NLP and IR
 
evolution of data mining.pptx
evolution of data mining.pptxevolution of data mining.pptx
evolution of data mining.pptx
 
The Chi Square Test
The Chi Square TestThe Chi Square Test
The Chi Square Test
 

Dernier

April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxolyaivanovalion
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 

Dernier (20)

April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 

Statistical terms for classification

  • 1. Statistical terms for Classification in weka Surabhi Dwivedi
  • 2. Important Statistics to select a model •Confusion Matrix •TP Rate •FP Rate •Precision •Recall •F-measure •ROC Area •Kappa Statistics •Test Option for Classifier – Cross Validation – Supplied test set – Percentage split Surabhi Dwivedi
  • 3. Confusion Matrix • In machine learning, a confusion matrix, also known as a contingency table or an error matrix –A table layout that allows visualization of the performance of an algorithm –Each column of the matrix represents the instances in a predicted class –Each row represents the instances in an actual class Surabhi Dwivedi
  • 6.
  • 8. Explanation - Terms Precision (also called positive predictive value) – the fraction of retrieved instances that are relevant •Recall (also known as sensitivity/ True Positive) – the fraction of relevant instances that are retrieved •Eg: •A program for recognizing pass /fail identifies 7 PASS in a scene containing 9 instances . If 4 of the identifications are correct, but 3 are actually FAIL, the program's precision is 4/7 while its recall is 4/9. •High precision means that an algorithm returned substantially more relevant results than irrelevant •High recall means that an algorithm returned most of the relevant results. •F-measure (the weighted harmonic mean of precision and recall) Surabhi Dwivedi Surabhi Dwivedi
  • 9. Explanation - Terms Precision (also called positive predictive value) – the fraction of retrieved instances that are relevant •Recall (also known as sensitivity/ True Positive) – the fraction of relevant instances that are retrieved •Eg: •A program for recognizing pass /fail identifies 7 PASS in a scene containing 9 instances . If 4 of the identifications are correct, but 3 are actually FAIL, the program's precision is 4/7 while its recall is 4/9. •High precision means that an algorithm returned substantially more relevant results than irrelevant •High recall means that an algorithm returned most of the relevant results. •F-measure (the weighted harmonic mean of precision and recall) Surabhi Dwivedi
  • 10. F-score •In statistical analysis of binary classification, the F-score or F-measure –a measure of a test's accuracy. –It considers both the precision p and the recall r of the test to compute the score –a weighted average of the precision and recall –an F1 score reaches its best value at 1 and worst score at 0 Surabhi Dwivedi
  • 12. Kappa Statistics •Kappa is a chance-corrected measure of agreement between the classifications and the true classes. – A value of 1 implies perfect agreement and values less than 1 imply less than perfect agreement • Poor agreement = Less than 0.20 • Fair agreement = 0.20 to 0.40 • Moderate agreement = 0.40 to 0.60 • Good agreement = 0.60 to 0.80 • Very good agreement = 0.80 to 1.00 •Kappa statistic - a mean for evaluating the predication performance of classifiers – gives a better indicator of how the classifier performed across all instances Surabhi Dwivedi
  • 13. Kappa Statistics •The row indicates the true class, the column indicates the Actu al Predicted
  • 14. ROC Curve •ROC curve, is a graphical plot that illustrates the performance of a binary classifier system as its discrimination threshold is varied. •The curve is created by plotting the true positive rate against the false positive rate at various threshold settings. •The true-positive rate is also known as sensitivity or recall in machine learning. The false-positive rate is also known as the fall-out and can be calculated as 1 - specificity. •Specificity (sometimes called the true negative rate) measures the proportion of negatives which are correctly identified as such (e.g., the percentage of healthy people who are correctly identified as not having the condition), and is complementary to the false positive rate. Surabhi Dwivedi
  • 16. Receiver Operating Curve – Class XII Data Sample
  • 17. Receiver Operating Curve – Class X Data Sample
  • 19. •In ROC curve, a better model is more towards upper left corner. •Output – Threshold Curve Surabhi Dwivedi
  • 20. Margin Curve •The margin curve prints the cumulative frequency of the difference of actual class probability and the highest probability predicted for other classes •for a single class, if it is predicted to be positive with probability p, the margin is p - (1-p) =2p-1. •The negative values denote classification errors, meaning that the dominant class is not the correct one •Margin contains the margin value (plotted as an x- coordinate) •Cumulative contains the count of instances with margin less than or equal to the current margin (plot as y axis) Surabhi Dwivedi
  • 24. Test Options - Classifier •The test sets are a percentage of the data that will be used to test whether the model has learned the concept properly •In WEKA you can run an execution splitting your data set into training data (to build the tree in the case of J48) and test data (to test the model in order to determine that the concept has been learned). Surabhi Dwivedi