Model Selection Using Conformal Predictors

•

0 j'aime•60 vues

Work presented at CPRML 2015 for model selection using efficiency of conformal predictors. Work done in collaboration with Ritvik Jaiswal and Dr. Vineeth Balasubramanian from IIT Hyderabad

Données & analyses

MODEL SELECTION
USING CONFORMAL
PREDICTORS
ABHAY GUPTA
INDIAN INSTITUTE OF TECHNOLOGY HYDERABAD

PROPOSED APPROACH
• Efficiency - means of selecting model parameters in classifiers
• Narrow conformal prediction regions desirable
• Model selection posed as an optimisation problem
• Objective to optimise the value of the S-criterion of efficiency
• k-Nearest Neighbour (k-NN) classifier used for validating
approach

PROPOSED METHODOLOGY
• k-Nearest Neighbour (k-NN) classifier used for validating idea
• Objective: Minimise S-criterion of efficiency
• The k which minimises the objective function in general gives high accuracy and
efficiency.
• S-criterion of efficiency is defined as:
• where are the p-values defined as follows:

PROPOSED METHODOLOGY
• Smaller values preferable for S-criterion
• Ensures smaller size for the prediction set →
• Intuition: For an incoming test point, we want most of the training points to have a
higher conformity score than test points
• Small values for the expression are desirable
• and are conformity scores for test and training points respectively

PROPOSED METHODOLOGY
• This gives a proxy for the S-criterion which is to be minimized
where n and m are the number of test and training points respectively.
• The conformity score for an incoming point for the k-NN classifier is defined as:

PROPOSED METHODOLOGY
• This leads us to the objective function:
where n, m and k are the number of test points, training points and the
number of nearest neighbours respectively.

EMPIRICAL STUDY
• Following datasets were used:
● All results averaged over 5 trials

EMPIRICAL STUDY → USPS DATASET
k vs OBJECTIVE FUNCTION
ρ = 0.8589
k vs ACCURACY
ρ = -0.9271

EMPIRICAL STUDY → USPS DATASET
(EFFICIENCY RESULTS)
k vs OBJECTIVE FUNCTION k vs PREDICTION SET SIZE
(80% CONFIDENCE)

EMPIRICAL STUDY → STANDARD WAVEFORM DATASET
k vs OBJECTIVE FUNCTION
ρ = -0.6501
k vs ACCURACY
ρ = 0.7989

EMPIRICAL STUDY → STANDARD WAVEFORM DATASET
(EFFICIENCY RESULTS)
k vs OBJECTIVE FUNCTION k vs PREDICTION SET SIZE
(80% CONFIDENCE)

CONCLUSIONS AND FUTURE/ONGOING WORK
• While validity is guaranteed, efficiency varies with classifier parameters
• Proposed approach shows promise – a baby step
– k vs Objective function (Validation Set) and k vs Accuracy (Test Set) are
negatively correlated
– As value of the objective function decreases, efficiency increases (expectedly)
• Future/Ongoing work
– What would other measures of efficiency lead to?
– Can we frame this as a convex/submodular/other objective function with
guaranteed performance bounds?

Recommandé

An empirical evaluation of cost-based federated SPARQL query Processing EnginesUmair Qudus

Optimizing SPARQL Query Processing On Dynamic and Static Data Based on Query ...Soheila Dehghanzadeh

Like-for-Like Comparisons of Machine Learning Algorithms - Dominik Dahlem, Bo...WithTheBest

Fall 09 Residential Presentationkneadae

checking if it gets acceptedhwbloom15

Parametric Estimation in a nutshellPlanisware

Energy-Efficient Reduce-and-Rank Using Input-Adaptive ApproximationsJAYAPRAKASH JPINFOTECH

Factor Labeldrcrawford

Recommandé

An empirical evaluation of cost-based federated SPARQL query Processing EnginesUmair Qudus

Optimizing SPARQL Query Processing On Dynamic and Static Data Based on Query ...Soheila Dehghanzadeh

Like-for-Like Comparisons of Machine Learning Algorithms - Dominik Dahlem, Bo...WithTheBest

Fall 09 Residential Presentationkneadae

checking if it gets acceptedhwbloom15

Parametric Estimation in a nutshellPlanisware

Energy-Efficient Reduce-and-Rank Using Input-Adaptive ApproximationsJAYAPRAKASH JPINFOTECH

Factor Labeldrcrawford

Predicting SPARQL query execution time and suggesting SPARQL queries based on...Rakebul Hasan

Bioactivity Predictive ModelingMay2016Matthew Clark

K - Nearest neighbor ( KNN )Mohammad Junaid Khan

Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationThomas Ploetz

PILOT STUDY RESULTS TURNAROUND - EDITED.pptxMatataMuthoka1

CS8080_IRT_UNIT - III T6 K-NN CLASSIFIER.pdfAALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING

Predicting Multiple Metrics for Queries: Better Decision Enabled by Machine L...Soheila Dehghanzadeh

K- Nearest Neighbor ApproachKumud Arora

Fp12_Efficient_SCMMd. Al-Amin Khandaker Nipu

Week 12 Dimensionality Reduction Bagian 1khairulhuda242

20220914-MBT-Experiences-SB1-final.pptxMinh Nguyen

Equivalence partinioning and boundary value analysisniharika5412

Nearest neighbour algorithmAnmitas1

Willump: Optimizing Feature Computation in ML InferenceDatabricks

Mca se chapter_9_formal_methodsAman Adhikari

KNN Algorithm using C++Afraz Khan

Week 11 Model Evalaution Model Evaluationkhairulhuda242

Deep Reinforcement learningCairo University

Kaggle Higgs Boson Machine Learning ChallengeBernard Ong

datamining-lect11.pptxRithikRaj25

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408

Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann

Contenu connexe

Similaire à Model Selection Using Conformal Predictors

Predicting SPARQL query execution time and suggesting SPARQL queries based on...Rakebul Hasan

Bioactivity Predictive ModelingMay2016Matthew Clark

K - Nearest neighbor ( KNN )Mohammad Junaid Khan

Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationThomas Ploetz

PILOT STUDY RESULTS TURNAROUND - EDITED.pptxMatataMuthoka1

CS8080_IRT_UNIT - III T6 K-NN CLASSIFIER.pdfAALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING

Predicting Multiple Metrics for Queries: Better Decision Enabled by Machine L...Soheila Dehghanzadeh

K- Nearest Neighbor ApproachKumud Arora

Fp12_Efficient_SCMMd. Al-Amin Khandaker Nipu

Week 12 Dimensionality Reduction Bagian 1khairulhuda242

20220914-MBT-Experiences-SB1-final.pptxMinh Nguyen

Equivalence partinioning and boundary value analysisniharika5412

Nearest neighbour algorithmAnmitas1

Willump: Optimizing Feature Computation in ML InferenceDatabricks

Mca se chapter_9_formal_methodsAman Adhikari

KNN Algorithm using C++Afraz Khan

Week 11 Model Evalaution Model Evaluationkhairulhuda242

Deep Reinforcement learningCairo University

Kaggle Higgs Boson Machine Learning ChallengeBernard Ong

datamining-lect11.pptxRithikRaj25

Similaire à Model Selection Using Conformal Predictors (20)

Predicting SPARQL query execution time and suggesting SPARQL queries based on...

Bioactivity Predictive ModelingMay2016

K - Nearest neighbor ( KNN )

Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation

PILOT STUDY RESULTS TURNAROUND - EDITED.pptx

CS8080_IRT_UNIT - III T6 K-NN CLASSIFIER.pdf

Predicting Multiple Metrics for Queries: Better Decision Enabled by Machine L...

K- Nearest Neighbor Approach

Fp12_Efficient_SCM

Week 12 Dimensionality Reduction Bagian 1

20220914-MBT-Experiences-SB1-final.pptx

Equivalence partinioning and boundary value analysis

Nearest neighbour algorithm

Willump: Optimizing Feature Computation in ML Inference

Mca se chapter_9_formal_methods

KNN Algorithm using C++

Week 11 Model Evalaution Model Evaluation

Deep Reinforcement learning

Kaggle Higgs Boson Machine Learning Challenge

datamining-lect11.pptx

Dernier

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408

Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann

Midocean dropshipping via API with DroFxolyaivanovalion

Invezz.com - Grow your wealth with trading signalsInvezz1

Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda

100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor

VidaXL dropshipping via API with DroFx.pptxolyaivanovalion

FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg

B2 Creative Industry Response Evaluation.docxStephen266013

Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71

Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth

定制英国白金汉大学毕业证（UCB毕业证书）成绩单原版一比一ffjhghh

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach

Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten

RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh

Dernier (20)

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps

Generative AI on Enterprise Cloud with NiFi and Milvus

Midocean dropshipping via API with DroFx

Invezz.com - Grow your wealth with trading signals

Customer Service Analytics - Make Sense of All Your Data.pptx

100-Concepts-of-AI by Anupama Kate .pptx

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...

VidaXL dropshipping via API with DroFx.pptx

FESE Capital Markets Fact Sheet 2024 Q1.pdf

B2 Creative Industry Response Evaluation.docx

Dubai Call Girls Wifey O52&786472 Call Girls Dubai

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha

Unveiling Insights: The Role of a Data Analyst

定制英国白金汉大学毕业证（UCB毕业证书）成绩单原版一比一

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt

Log Analysis using OSSEC sasoasasasas.pptx

RA-11058_IRR-COMPRESS Do 198 series of 1998

Model Selection Using Conformal Predictors

1. MODEL SELECTION USING CONFORMAL PREDICTORS ABHAY GUPTA INDIAN INSTITUTE OF TECHNOLOGY HYDERABAD

2. PROPOSED APPROACH • Efficiency - means of selecting model parameters in classifiers • Narrow conformal prediction regions desirable • Model selection posed as an optimisation problem • Objective to optimise the value of the S-criterion of efficiency • k-Nearest Neighbour (k-NN) classifier used for validating approach

3. PROPOSED METHODOLOGY • k-Nearest Neighbour (k-NN) classifier used for validating idea • Objective: Minimise S-criterion of efficiency • The k which minimises the objective function in general gives high accuracy and efficiency. • S-criterion of efficiency is defined as: • where are the p-values defined as follows:

4. PROPOSED METHODOLOGY • Smaller values preferable for S-criterion • Ensures smaller size for the prediction set → • Intuition: For an incoming test point, we want most of the training points to have a higher conformity score than test points • Small values for the expression are desirable • and are conformity scores for test and training points respectively

5. PROPOSED METHODOLOGY • This gives a proxy for the S-criterion which is to be minimized where n and m are the number of test and training points respectively. • The conformity score for an incoming point for the k-NN classifier is defined as:

6. PROPOSED METHODOLOGY • This leads us to the objective function: where n, m and k are the number of test points, training points and the number of nearest neighbours respectively.

7. EMPIRICAL STUDY • Following datasets were used: ● All results averaged over 5 trials

8. EMPIRICAL STUDY → USPS DATASET k vs OBJECTIVE FUNCTION ρ = 0.8589 k vs ACCURACY ρ = -0.9271

9. EMPIRICAL STUDY → USPS DATASET (EFFICIENCY RESULTS) k vs OBJECTIVE FUNCTION k vs PREDICTION SET SIZE (80% CONFIDENCE)

10. EMPIRICAL STUDY → STANDARD WAVEFORM DATASET k vs OBJECTIVE FUNCTION ρ = -0.6501 k vs ACCURACY ρ = 0.7989

11. EMPIRICAL STUDY → STANDARD WAVEFORM DATASET (EFFICIENCY RESULTS) k vs OBJECTIVE FUNCTION k vs PREDICTION SET SIZE (80% CONFIDENCE)

12. CONCLUSIONS AND FUTURE/ONGOING WORK • While validity is guaranteed, efficiency varies with classifier parameters • Proposed approach shows promise – a baby step – k vs Objective function (Validation Set) and k vs Accuracy (Test Set) are negatively correlated – As value of the objective function decreases, efficiency increases (expectedly) • Future/Ongoing work – What would other measures of efficiency lead to? – Can we frame this as a convex/submodular/other objective function with guaranteed performance bounds?