SlideShare une entreprise Scribd logo
1  sur  19
Télécharger pour lire hors ligne
On Comparing Classifiers:
Pitfalls to Avoid and a
Recommended Approach
(cited by 581)
Author: Steven L.Salzberg
Presented by: Mehmet Ali Abbasoğlu &
Mustafa İlker Saraç
10.04.2014
Contents
1. Motivation
2. Comparing Algorithms
3. Definitions
4. Problems
5. Recommended Approach
6. Conclusion
Motivation
● Be careful about comparative studies of classification
and other algorithms.
○ It is easy to result in statistically invalid conclusions.
● How to chose which algorithm to use for a new
problem?
● Using brute force one can easily find a phenomenon or
pattern that looks impressive.
○ REALLY?
Motivation
● You have lots of data
○ Choose one from UCI repository
● You have many classification methods to compare
But,
● Any differences in classification accuracy that reach
statistical significance should be reported as important?
○ Think again!
Comparing Algorithms
● Many new algorithms has problems according to a
survey conducted by Prechelt.
○ 29% not evaluated on a real problem
○ 8% compared to more than one alternative on real
data
● A survey by Flexer on experimental neural network
papers in leading journals
○ Only 3 out of 43 used a seperate data set for tuning
parameters.
Comparing Algorithms
● Drawbacks of reporting results on a well studied data
set, e.g. a data set from UCI repository
○ It is hard to improve results
○ Prone to statistical accidents
○ They are fine to see initial results for your new
algorithm
● It seems easy to change known algorithms a little then
use comparisons to report improved results.
○ High risk of statistical invalidity
○ Better apply new algorithms
Definitions
● Statistical significance
○ In statistics, a result is considered significant not because
it is important or meaningful, but because it has been
predicted as unlikely to have occurred by chance alone.
● t-test
○ Used to determine whether two sets of data are
significantly different from each other
● p-value
○ Probability of getting the same results when comparing 2
hypothesis.
● null hypothesis
○ The default position, initial state of the data
Problem 1 :
Small repository of datasets
● It is difficult to produce major new results using well-
studied and widely shared data.
● Suppose 100 people are studying the effect of
algorithms A and B
● At least 5 will get results statistically significant at p <=
0.05
● Clearly results are due to chance.
○ The ones who get significant results will publish
○ While others will simply move on to other experiments.
Problem 2 :
Statistical validity
● Statistics offer many tests that are desined to measure
the significance of any difference
● These tests are not designed with computational
experiments in mind.
● For example
○ 14 different variations of classifier algorithms
○ 11 different datasets
○ 154 variations, 154 changes to be significant
○ Actual p-value used is 154*0.05 = 7.7
○ multiplicy effect
Problem 2 :
Statistical validity
● Let the significance for each level be α
● Chance for making right conclusion for one experiment
is (1 - α )
● Assuming experiments are independent of one another,
chance for getting n experiments correct is (1 - α )n
● Chances of not making correct conclusion is 1- ( 1 - α )n
● Substituting α = 0.05
● Chances for making incorrect conclusion is 0.9996
● To obtain results significant at 0.05 level with 154 tests
1 - ( 1 - α )n
< 0.05
α < 0.003
● This adjustment is known as Bonferroni Adjustment.
Problem 3 :
Experiments are not independent
● The t-test assumes that the test sets for
each algorithm are independent.
● Generally two algorithms are compared on
the same data set
○ Obviously the test sets are not independent.
Problem 4 :
Only considers overall accuracy
● Comparison must consider 4 number when a common
test set is used for comparing two algorithms
○ A got right and B got wrong ( A > B )
○ B got right and A got wrong ( B > A )
○ Both algorithms got right
○ Both algorithms got wrong
● If only two algorithms compared
○ Throw out ties
○ Compare A > B vs B > A
● If more than two algorithms compared
○ Use “Analysis of Variance” (ANOVA)
○ Bonferroni adjustment for multiple test
Problem 5 :
Repeated tuning
● Researchers tune their algorithms repeatedly to perform
optimally on a data set.
● Whenever tuning takes place, every adjustment should
really be considered as a separate experiment.
○ For example if 10 tuning experiments were
attempted, then p-value should be 0.005 instead of
0.05.
● When one uses an algorithm that has been used before,
the algorithm may already have been tuned on public
databases.
Problem 5 :
Repeated tuning
● Recommended approach:
○ Reserve a portion of the training set as a tuning set
○ Repeatedly test the algorithm and adjust parameters on tuning
set.
○ Measure accuracy on the test data.
Problem 5 :
Generalizing results
● Common methodological approach
○ pick several datasets from UCI repository
○ perform series of experiments
■ measuring classification accuracy
■ learning rates
● It is not valid to make general statements about other
datasets.
○ The repository is not an unbiased sample of classification
problems.
● Someone can write an algorithm that works very well on
some of the known datasets
○ Anyone familiar with the data may be biased.
A Recommended Approach
1. Choose other algorithms to include in the comparison.
2. Chose a benchmark data set.
3. Divide the data set into k subsets for cross validation
○ Typically k = 10
○ For small data sets, chose larger k.
A Recommended Approach
4. Run cross-validation
○ For each of the k subsets of the data set D, create a training
set T = D - k
○ Divide T into two subsets: T1
(training) and T2
(tuning)
○ Once parameters are optimized, re-run training on T
○ Measure accuracy on k
○ Overall accuracy is averaged across all k partitions.
5. Compare algorithms
● In case of multiple data sets, Bonferroni adjustment
should be applied.
Conclusion
● Authors do not mean to discourage emprical
comparisons
● They try to provide suggestions to avoid pitfalls
● They suggest that
○ Statistical tools should be used carefully.
○ Every details of the experiment should be reported.
Thank you!

Contenu connexe

Tendances

Psyc 355Education Specialist / snaptutorial.com
Psyc 355Education Specialist / snaptutorial.comPsyc 355Education Specialist / snaptutorial.com
Psyc 355Education Specialist / snaptutorial.comMcdonaldRyan117
 
Psyc 355 Effective Communication - tutorialrank.com
Psyc 355 Effective Communication - tutorialrank.comPsyc 355 Effective Communication - tutorialrank.com
Psyc 355 Effective Communication - tutorialrank.comBartholomew88
 
PSYC 355 Inspiring Innovation/tutorialrank.com
 PSYC 355 Inspiring Innovation/tutorialrank.com PSYC 355 Inspiring Innovation/tutorialrank.com
PSYC 355 Inspiring Innovation/tutorialrank.comjonhson158
 
Comparison statisticalsignificancetestir
Comparison statisticalsignificancetestirComparison statisticalsignificancetestir
Comparison statisticalsignificancetestirClaudia Ribeiro
 
Why we run cronbach’s alpha
Why we run cronbach’s alphaWhy we run cronbach’s alpha
Why we run cronbach’s alphaAiden Yeh
 
Basic Concepts of Non-Parametric Methods ( Statistics )
Basic Concepts of Non-Parametric Methods ( Statistics )Basic Concepts of Non-Parametric Methods ( Statistics )
Basic Concepts of Non-Parametric Methods ( Statistics )Hasnat Israq
 
STATISTICAL TOOLS USED IN ANALYTICAL CHEMISTRY
STATISTICAL TOOLS USED IN ANALYTICAL CHEMISTRYSTATISTICAL TOOLS USED IN ANALYTICAL CHEMISTRY
STATISTICAL TOOLS USED IN ANALYTICAL CHEMISTRYkeerthana151
 
Imputation of missing data in clinical trials
Imputation of missing data in clinical trialsImputation of missing data in clinical trials
Imputation of missing data in clinical trialsSeema Ahirwar
 
Psyc 355 Effective Communication / snaptutorial.com
Psyc 355  Effective Communication / snaptutorial.comPsyc 355  Effective Communication / snaptutorial.com
Psyc 355 Effective Communication / snaptutorial.comHarrisGeorg39
 
Psyc 355 Enhance teaching-snaptutorial.com
Psyc 355 Enhance teaching-snaptutorial.comPsyc 355 Enhance teaching-snaptutorial.com
Psyc 355 Enhance teaching-snaptutorial.comrobertleew40
 
Psyc 355 Exceptional Education / snaptutorial.com
Psyc 355 Exceptional Education / snaptutorial.comPsyc 355 Exceptional Education / snaptutorial.com
Psyc 355 Exceptional Education / snaptutorial.comBaileya73
 
Non parametrics
Non parametricsNon parametrics
Non parametricsRyan Sain
 
Research Methology -Factor Analyses
Research Methology -Factor AnalysesResearch Methology -Factor Analyses
Research Methology -Factor AnalysesNeerav Shivhare
 
Imputation techniques for missing data in clinical trials
Imputation techniques for missing data in clinical trialsImputation techniques for missing data in clinical trials
Imputation techniques for missing data in clinical trialsNitin George
 

Tendances (20)

Psyc 355Education Specialist / snaptutorial.com
Psyc 355Education Specialist / snaptutorial.comPsyc 355Education Specialist / snaptutorial.com
Psyc 355Education Specialist / snaptutorial.com
 
Psyc 355 Effective Communication - tutorialrank.com
Psyc 355 Effective Communication - tutorialrank.comPsyc 355 Effective Communication - tutorialrank.com
Psyc 355 Effective Communication - tutorialrank.com
 
Data analysis
Data analysisData analysis
Data analysis
 
Analysis of Variance
Analysis of VarianceAnalysis of Variance
Analysis of Variance
 
PSYC 355 Inspiring Innovation/tutorialrank.com
 PSYC 355 Inspiring Innovation/tutorialrank.com PSYC 355 Inspiring Innovation/tutorialrank.com
PSYC 355 Inspiring Innovation/tutorialrank.com
 
Comparison statisticalsignificancetestir
Comparison statisticalsignificancetestirComparison statisticalsignificancetestir
Comparison statisticalsignificancetestir
 
Why we run cronbach’s alpha
Why we run cronbach’s alphaWhy we run cronbach’s alpha
Why we run cronbach’s alpha
 
Basic Concepts of Non-Parametric Methods ( Statistics )
Basic Concepts of Non-Parametric Methods ( Statistics )Basic Concepts of Non-Parametric Methods ( Statistics )
Basic Concepts of Non-Parametric Methods ( Statistics )
 
STATISTICAL TOOLS USED IN ANALYTICAL CHEMISTRY
STATISTICAL TOOLS USED IN ANALYTICAL CHEMISTRYSTATISTICAL TOOLS USED IN ANALYTICAL CHEMISTRY
STATISTICAL TOOLS USED IN ANALYTICAL CHEMISTRY
 
Imputation of missing data in clinical trials
Imputation of missing data in clinical trialsImputation of missing data in clinical trials
Imputation of missing data in clinical trials
 
Psyc 355 Effective Communication / snaptutorial.com
Psyc 355  Effective Communication / snaptutorial.comPsyc 355  Effective Communication / snaptutorial.com
Psyc 355 Effective Communication / snaptutorial.com
 
Psyc 355 Enhance teaching-snaptutorial.com
Psyc 355 Enhance teaching-snaptutorial.comPsyc 355 Enhance teaching-snaptutorial.com
Psyc 355 Enhance teaching-snaptutorial.com
 
Psyc 355 Exceptional Education / snaptutorial.com
Psyc 355 Exceptional Education / snaptutorial.comPsyc 355 Exceptional Education / snaptutorial.com
Psyc 355 Exceptional Education / snaptutorial.com
 
Error analytical
Error analyticalError analytical
Error analytical
 
Non parametrics
Non parametricsNon parametrics
Non parametrics
 
The Chi Square Test
The Chi Square TestThe Chi Square Test
The Chi Square Test
 
Research Methology -Factor Analyses
Research Methology -Factor AnalysesResearch Methology -Factor Analyses
Research Methology -Factor Analyses
 
Imputation techniques for missing data in clinical trials
Imputation techniques for missing data in clinical trialsImputation techniques for missing data in clinical trials
Imputation techniques for missing data in clinical trials
 
Khurram
KhurramKhurram
Khurram
 
Mann Whitney U test
Mann Whitney U testMann Whitney U test
Mann Whitney U test
 

Similaire à CS550 Presentation - On comparing classifiers by Slazberg

Chemometrics-ANALYTICAL DATA SIGNIFICANCE TESTS.pptx
Chemometrics-ANALYTICAL DATA SIGNIFICANCE TESTS.pptxChemometrics-ANALYTICAL DATA SIGNIFICANCE TESTS.pptx
Chemometrics-ANALYTICAL DATA SIGNIFICANCE TESTS.pptxHakimuNsubuga2
 
Day 12 t test for dependent samples and single samples pdf
Day 12 t test for dependent samples and single samples pdfDay 12 t test for dependent samples and single samples pdf
Day 12 t test for dependent samples and single samples pdfElih Sutisna Yanto
 
Artificial Intelligence Chapter 9 Negnevitsky
Artificial Intelligence Chapter 9 NegnevitskyArtificial Intelligence Chapter 9 Negnevitsky
Artificial Intelligence Chapter 9 Negnevitskylopanath
 
Experimental designs and data analysis in the field of Agronomy science by ma...
Experimental designs and data analysis in the field of Agronomy science by ma...Experimental designs and data analysis in the field of Agronomy science by ma...
Experimental designs and data analysis in the field of Agronomy science by ma...Manoj Sharma
 
Planning of experiment in industrial research
Planning of experiment in industrial researchPlanning of experiment in industrial research
Planning of experiment in industrial researchpbbharate
 
hypothesis teesting
 hypothesis teesting hypothesis teesting
hypothesis teestingkpgandhi
 
A05 Continuous One Variable Stat Tests
A05 Continuous One Variable Stat TestsA05 Continuous One Variable Stat Tests
A05 Continuous One Variable Stat TestsLeanleaders.org
 
A05 Continuous One Variable Stat Tests
A05 Continuous One Variable Stat TestsA05 Continuous One Variable Stat Tests
A05 Continuous One Variable Stat TestsLeanleaders.org
 
Quantitative methodology part one.compressed
Quantitative methodology part one.compressedQuantitative methodology part one.compressed
Quantitative methodology part one.compressedMaria Sanchez
 
Worked examples of sampling uncertainty evaluation
Worked examples of sampling uncertainty evaluationWorked examples of sampling uncertainty evaluation
Worked examples of sampling uncertainty evaluationGH Yeoh
 
Week 11 Model Evalaution Model Evaluation
Week 11 Model Evalaution Model EvaluationWeek 11 Model Evalaution Model Evaluation
Week 11 Model Evalaution Model Evaluationkhairulhuda242
 
Machine Learning with Spark and Cassandra - Model Selection Tests
Machine Learning with Spark and Cassandra - Model Selection TestsMachine Learning with Spark and Cassandra - Model Selection Tests
Machine Learning with Spark and Cassandra - Model Selection TestsAnant Corporation
 
TEST #1Perform the following two-tailed hypothesis test, using a.docx
TEST #1Perform the following two-tailed hypothesis test, using a.docxTEST #1Perform the following two-tailed hypothesis test, using a.docx
TEST #1Perform the following two-tailed hypothesis test, using a.docxmattinsonjanel
 

Similaire à CS550 Presentation - On comparing classifiers by Slazberg (20)

Chemometrics-ANALYTICAL DATA SIGNIFICANCE TESTS.pptx
Chemometrics-ANALYTICAL DATA SIGNIFICANCE TESTS.pptxChemometrics-ANALYTICAL DATA SIGNIFICANCE TESTS.pptx
Chemometrics-ANALYTICAL DATA SIGNIFICANCE TESTS.pptx
 
CHAPTER 4- Lesson A
CHAPTER 4- Lesson ACHAPTER 4- Lesson A
CHAPTER 4- Lesson A
 
Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)
 
Day 12 t test for dependent samples and single samples pdf
Day 12 t test for dependent samples and single samples pdfDay 12 t test for dependent samples and single samples pdf
Day 12 t test for dependent samples and single samples pdf
 
Artificial Intelligence Chapter 9 Negnevitsky
Artificial Intelligence Chapter 9 NegnevitskyArtificial Intelligence Chapter 9 Negnevitsky
Artificial Intelligence Chapter 9 Negnevitsky
 
chapter12.ppt
chapter12.pptchapter12.ppt
chapter12.ppt
 
Experimental designs and data analysis in the field of Agronomy science by ma...
Experimental designs and data analysis in the field of Agronomy science by ma...Experimental designs and data analysis in the field of Agronomy science by ma...
Experimental designs and data analysis in the field of Agronomy science by ma...
 
T test
T testT test
T test
 
Planning of experiment in industrial research
Planning of experiment in industrial researchPlanning of experiment in industrial research
Planning of experiment in industrial research
 
hypothesis teesting
 hypothesis teesting hypothesis teesting
hypothesis teesting
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
A05 Continuous One Variable Stat Tests
A05 Continuous One Variable Stat TestsA05 Continuous One Variable Stat Tests
A05 Continuous One Variable Stat Tests
 
A05 Continuous One Variable Stat Tests
A05 Continuous One Variable Stat TestsA05 Continuous One Variable Stat Tests
A05 Continuous One Variable Stat Tests
 
Quantitative methodology part one.compressed
Quantitative methodology part one.compressedQuantitative methodology part one.compressed
Quantitative methodology part one.compressed
 
Worked examples of sampling uncertainty evaluation
Worked examples of sampling uncertainty evaluationWorked examples of sampling uncertainty evaluation
Worked examples of sampling uncertainty evaluation
 
Week 11 Model Evalaution Model Evaluation
Week 11 Model Evalaution Model EvaluationWeek 11 Model Evalaution Model Evaluation
Week 11 Model Evalaution Model Evaluation
 
Machine Learning with Spark and Cassandra - Model Selection Tests
Machine Learning with Spark and Cassandra - Model Selection TestsMachine Learning with Spark and Cassandra - Model Selection Tests
Machine Learning with Spark and Cassandra - Model Selection Tests
 
TEST #1Perform the following two-tailed hypothesis test, using a.docx
TEST #1Perform the following two-tailed hypothesis test, using a.docxTEST #1Perform the following two-tailed hypothesis test, using a.docx
TEST #1Perform the following two-tailed hypothesis test, using a.docx
 
UNIT 5.pptx
UNIT 5.pptxUNIT 5.pptx
UNIT 5.pptx
 
Introduction to meta analysis
Introduction to meta analysisIntroduction to meta analysis
Introduction to meta analysis
 

Plus de mustafa sarac

Uluslararasilasma son
Uluslararasilasma sonUluslararasilasma son
Uluslararasilasma sonmustafa sarac
 
Real time machine learning proposers day v3
Real time machine learning proposers day v3Real time machine learning proposers day v3
Real time machine learning proposers day v3mustafa sarac
 
Latka december digital
Latka december digitalLatka december digital
Latka december digitalmustafa sarac
 
Axial RC SCX10 AE2 ESC user manual
Axial RC SCX10 AE2 ESC user manualAxial RC SCX10 AE2 ESC user manual
Axial RC SCX10 AE2 ESC user manualmustafa sarac
 
Array programming with Numpy
Array programming with NumpyArray programming with Numpy
Array programming with Numpymustafa sarac
 
Math for programmers
Math for programmersMath for programmers
Math for programmersmustafa sarac
 
TEGV 2020 Bireysel bagiscilarimiz
TEGV 2020 Bireysel bagiscilarimizTEGV 2020 Bireysel bagiscilarimiz
TEGV 2020 Bireysel bagiscilarimizmustafa sarac
 
How to make and manage a bee hotel?
How to make and manage a bee hotel?How to make and manage a bee hotel?
How to make and manage a bee hotel?mustafa sarac
 
Cahit arf makineler dusunebilir mi
Cahit arf makineler dusunebilir miCahit arf makineler dusunebilir mi
Cahit arf makineler dusunebilir mimustafa sarac
 
How did Software Got So Reliable Without Proof?
How did Software Got So Reliable Without Proof?How did Software Got So Reliable Without Proof?
How did Software Got So Reliable Without Proof?mustafa sarac
 
Staff Report on Algorithmic Trading in US Capital Markets
Staff Report on Algorithmic Trading in US Capital MarketsStaff Report on Algorithmic Trading in US Capital Markets
Staff Report on Algorithmic Trading in US Capital Marketsmustafa sarac
 
Yetiskinler icin okuma yazma egitimi
Yetiskinler icin okuma yazma egitimiYetiskinler icin okuma yazma egitimi
Yetiskinler icin okuma yazma egitimimustafa sarac
 
Consumer centric api design v0.4.0
Consumer centric api design v0.4.0Consumer centric api design v0.4.0
Consumer centric api design v0.4.0mustafa sarac
 
State of microservices 2020 by tsh
State of microservices 2020 by tshState of microservices 2020 by tsh
State of microservices 2020 by tshmustafa sarac
 
Uber pitch deck 2008
Uber pitch deck 2008Uber pitch deck 2008
Uber pitch deck 2008mustafa sarac
 
Wireless solar keyboard k760 quickstart guide
Wireless solar keyboard k760 quickstart guideWireless solar keyboard k760 quickstart guide
Wireless solar keyboard k760 quickstart guidemustafa sarac
 
State of Serverless Report 2020
State of Serverless Report 2020State of Serverless Report 2020
State of Serverless Report 2020mustafa sarac
 
Dont just roll the dice
Dont just roll the diceDont just roll the dice
Dont just roll the dicemustafa sarac
 

Plus de mustafa sarac (20)

Uluslararasilasma son
Uluslararasilasma sonUluslararasilasma son
Uluslararasilasma son
 
Real time machine learning proposers day v3
Real time machine learning proposers day v3Real time machine learning proposers day v3
Real time machine learning proposers day v3
 
Latka december digital
Latka december digitalLatka december digital
Latka december digital
 
Axial RC SCX10 AE2 ESC user manual
Axial RC SCX10 AE2 ESC user manualAxial RC SCX10 AE2 ESC user manual
Axial RC SCX10 AE2 ESC user manual
 
Array programming with Numpy
Array programming with NumpyArray programming with Numpy
Array programming with Numpy
 
Math for programmers
Math for programmersMath for programmers
Math for programmers
 
The book of Why
The book of WhyThe book of Why
The book of Why
 
BM sgk meslek kodu
BM sgk meslek koduBM sgk meslek kodu
BM sgk meslek kodu
 
TEGV 2020 Bireysel bagiscilarimiz
TEGV 2020 Bireysel bagiscilarimizTEGV 2020 Bireysel bagiscilarimiz
TEGV 2020 Bireysel bagiscilarimiz
 
How to make and manage a bee hotel?
How to make and manage a bee hotel?How to make and manage a bee hotel?
How to make and manage a bee hotel?
 
Cahit arf makineler dusunebilir mi
Cahit arf makineler dusunebilir miCahit arf makineler dusunebilir mi
Cahit arf makineler dusunebilir mi
 
How did Software Got So Reliable Without Proof?
How did Software Got So Reliable Without Proof?How did Software Got So Reliable Without Proof?
How did Software Got So Reliable Without Proof?
 
Staff Report on Algorithmic Trading in US Capital Markets
Staff Report on Algorithmic Trading in US Capital MarketsStaff Report on Algorithmic Trading in US Capital Markets
Staff Report on Algorithmic Trading in US Capital Markets
 
Yetiskinler icin okuma yazma egitimi
Yetiskinler icin okuma yazma egitimiYetiskinler icin okuma yazma egitimi
Yetiskinler icin okuma yazma egitimi
 
Consumer centric api design v0.4.0
Consumer centric api design v0.4.0Consumer centric api design v0.4.0
Consumer centric api design v0.4.0
 
State of microservices 2020 by tsh
State of microservices 2020 by tshState of microservices 2020 by tsh
State of microservices 2020 by tsh
 
Uber pitch deck 2008
Uber pitch deck 2008Uber pitch deck 2008
Uber pitch deck 2008
 
Wireless solar keyboard k760 quickstart guide
Wireless solar keyboard k760 quickstart guideWireless solar keyboard k760 quickstart guide
Wireless solar keyboard k760 quickstart guide
 
State of Serverless Report 2020
State of Serverless Report 2020State of Serverless Report 2020
State of Serverless Report 2020
 
Dont just roll the dice
Dont just roll the diceDont just roll the dice
Dont just roll the dice
 

Dernier

Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 

Dernier (20)

Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 

CS550 Presentation - On comparing classifiers by Slazberg

  • 1. On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach (cited by 581) Author: Steven L.Salzberg Presented by: Mehmet Ali Abbasoğlu & Mustafa İlker Saraç 10.04.2014
  • 2. Contents 1. Motivation 2. Comparing Algorithms 3. Definitions 4. Problems 5. Recommended Approach 6. Conclusion
  • 3. Motivation ● Be careful about comparative studies of classification and other algorithms. ○ It is easy to result in statistically invalid conclusions. ● How to chose which algorithm to use for a new problem? ● Using brute force one can easily find a phenomenon or pattern that looks impressive. ○ REALLY?
  • 4. Motivation ● You have lots of data ○ Choose one from UCI repository ● You have many classification methods to compare But, ● Any differences in classification accuracy that reach statistical significance should be reported as important? ○ Think again!
  • 5. Comparing Algorithms ● Many new algorithms has problems according to a survey conducted by Prechelt. ○ 29% not evaluated on a real problem ○ 8% compared to more than one alternative on real data ● A survey by Flexer on experimental neural network papers in leading journals ○ Only 3 out of 43 used a seperate data set for tuning parameters.
  • 6. Comparing Algorithms ● Drawbacks of reporting results on a well studied data set, e.g. a data set from UCI repository ○ It is hard to improve results ○ Prone to statistical accidents ○ They are fine to see initial results for your new algorithm ● It seems easy to change known algorithms a little then use comparisons to report improved results. ○ High risk of statistical invalidity ○ Better apply new algorithms
  • 7. Definitions ● Statistical significance ○ In statistics, a result is considered significant not because it is important or meaningful, but because it has been predicted as unlikely to have occurred by chance alone. ● t-test ○ Used to determine whether two sets of data are significantly different from each other ● p-value ○ Probability of getting the same results when comparing 2 hypothesis. ● null hypothesis ○ The default position, initial state of the data
  • 8. Problem 1 : Small repository of datasets ● It is difficult to produce major new results using well- studied and widely shared data. ● Suppose 100 people are studying the effect of algorithms A and B ● At least 5 will get results statistically significant at p <= 0.05 ● Clearly results are due to chance. ○ The ones who get significant results will publish ○ While others will simply move on to other experiments.
  • 9. Problem 2 : Statistical validity ● Statistics offer many tests that are desined to measure the significance of any difference ● These tests are not designed with computational experiments in mind. ● For example ○ 14 different variations of classifier algorithms ○ 11 different datasets ○ 154 variations, 154 changes to be significant ○ Actual p-value used is 154*0.05 = 7.7 ○ multiplicy effect
  • 10. Problem 2 : Statistical validity ● Let the significance for each level be α ● Chance for making right conclusion for one experiment is (1 - α ) ● Assuming experiments are independent of one another, chance for getting n experiments correct is (1 - α )n ● Chances of not making correct conclusion is 1- ( 1 - α )n ● Substituting α = 0.05 ● Chances for making incorrect conclusion is 0.9996 ● To obtain results significant at 0.05 level with 154 tests 1 - ( 1 - α )n < 0.05 α < 0.003 ● This adjustment is known as Bonferroni Adjustment.
  • 11. Problem 3 : Experiments are not independent ● The t-test assumes that the test sets for each algorithm are independent. ● Generally two algorithms are compared on the same data set ○ Obviously the test sets are not independent.
  • 12. Problem 4 : Only considers overall accuracy ● Comparison must consider 4 number when a common test set is used for comparing two algorithms ○ A got right and B got wrong ( A > B ) ○ B got right and A got wrong ( B > A ) ○ Both algorithms got right ○ Both algorithms got wrong ● If only two algorithms compared ○ Throw out ties ○ Compare A > B vs B > A ● If more than two algorithms compared ○ Use “Analysis of Variance” (ANOVA) ○ Bonferroni adjustment for multiple test
  • 13. Problem 5 : Repeated tuning ● Researchers tune their algorithms repeatedly to perform optimally on a data set. ● Whenever tuning takes place, every adjustment should really be considered as a separate experiment. ○ For example if 10 tuning experiments were attempted, then p-value should be 0.005 instead of 0.05. ● When one uses an algorithm that has been used before, the algorithm may already have been tuned on public databases.
  • 14. Problem 5 : Repeated tuning ● Recommended approach: ○ Reserve a portion of the training set as a tuning set ○ Repeatedly test the algorithm and adjust parameters on tuning set. ○ Measure accuracy on the test data.
  • 15. Problem 5 : Generalizing results ● Common methodological approach ○ pick several datasets from UCI repository ○ perform series of experiments ■ measuring classification accuracy ■ learning rates ● It is not valid to make general statements about other datasets. ○ The repository is not an unbiased sample of classification problems. ● Someone can write an algorithm that works very well on some of the known datasets ○ Anyone familiar with the data may be biased.
  • 16. A Recommended Approach 1. Choose other algorithms to include in the comparison. 2. Chose a benchmark data set. 3. Divide the data set into k subsets for cross validation ○ Typically k = 10 ○ For small data sets, chose larger k.
  • 17. A Recommended Approach 4. Run cross-validation ○ For each of the k subsets of the data set D, create a training set T = D - k ○ Divide T into two subsets: T1 (training) and T2 (tuning) ○ Once parameters are optimized, re-run training on T ○ Measure accuracy on k ○ Overall accuracy is averaged across all k partitions. 5. Compare algorithms ● In case of multiple data sets, Bonferroni adjustment should be applied.
  • 18. Conclusion ● Authors do not mean to discourage emprical comparisons ● They try to provide suggestions to avoid pitfalls ● They suggest that ○ Statistical tools should be used carefully. ○ Every details of the experiment should be reported.