SlideShare une entreprise Scribd logo
1  sur  24
Télécharger pour lire hors ligne
Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
Combining Committee-based
Semi-supervised and Active Learning and
Its Application to Handwritten Digits
Recognition
Mohamed Farouk Abdel Hady, Friedhelm Schwenker
Institute of Neural Information Processing
University of Ulm, Germany
{mohamed.abdel-hady|friedhelm.schwenker}@uni-ulm.de
April 8, 2010
1 / 24
Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
Overview
2 / 24
Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
Semi-Supervised Learning
In many domains, the amount of training examples is large
but unlabeled.
Data labeling process is often tedious, expensive and
time consuming because it requires the effort of human
experts.
Research directions of SSL
Semi-Supervised Clustering
Semi-Supervised Classification
Semi-Supervised Regression
Semi-Supervised Dimensionality Reduction
3 / 24
Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
Semi-Supervised Learning
Description SSL algorithm
Single-view, Single-learner EM (Nigam and Ghani, 2000)
Single-classifier Self-Training (Nigam and Ghani, 2000)
Multi-view, Single-learner Co-EM (Nigam and Ghani, 2000)
Multiple classifiers Co-Training (Blum and Mitchell, COLT’98)
Single-view, Multi-learner Statistical Co-Learning (Goldman et al., 2000)
Multiple classifiers Democratic Co-Learning (Y. Zhou et al., 2004)
Single-view, Single-learner Tri-Training (Z.-H. Zhou, TKDE’05)
Multiple classifiers Co-Forest (Li and Z.-H. Zhou, TSMC’07)
Co-Training by Committee
Z.-H. Zhou and M. Li, Semi-supervised learning by disagreement, Knowledge and
Information Systems, in press.
4 / 24
Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
How can unlabeled data be helpful?
5 / 24
Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
How can unlabeled data be helpful?
6 / 24
Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
Self-Training
But the most confident examples often lie away from the target
decision boundary (non informative examples). Therefore, in
many cases this process does not create representative
training sets as it selects non informative examples.
7 / 24
Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
Multi-View Co-Training
Blum and Mitchell (1998)
As any multi-view learning algorithm, it requires that each
training example is represented by multiple sufficient and
redundant views,
i.e. two or more sets of features that are conditionally
independent given the class label and each is sufficient for
learning.
For web page classification: 1) the text appearing on the
page itself, and 2) the text attached to hyperlinks pointing
to this page, from other pages.
8 / 24
Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
Multi-View Co-Training
9 / 24
Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
Single-View Co-Training by Committee
Contribution
A single-view variant of Co-Training for application
domains in which there are not redundant and independent
views is proposed.
Two learning frameworks for combining the merits of active
learning with semi-supervised learning.
Motivation
For many real-world applications, the requirement for two
sufficient and independent views can not be fulfilled.
Co-Training does not work well without an appropriate
feature splitting (Nigam and Ghani, 2000)
Measuring the labeling confidence is not a straightforward
task.
10 / 24
Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
Single-View Co-Training By Committee
11 / 24
Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
How to measure confidence
Inaccurate confidence estimation
→ selecting and adding mislabeled examples to the training set
→ degrade the classification accuracy
Estimating Class Probabilities (CPE) provided by companion
committee.
Confidence(xu, H
(t−1)
i ) = max
1≤c≤C
H
(t−1)
i (xu, ωc)
Unfortunately, in many cases the classifier does not provide an
accurate CPE. For instance, a decision tree provides piecewise
constant probability estimates. That is, all unlabeled examples
xu which lie into a particular leaf, will have the same CPEs
because the exact value of xu is not used in determining its
CPE.
12 / 24
Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
Improving CPE of Decision Trees
Laplace Correction, Probability Estimation Tree (PET),
(Provost, Machine Learning 2003)
P(ωc|xu) =
nc + 1
N + C
Bagging of PET
Retrofitting Decision Tree Classifiers Using Kernel Density
Estimation (Fayyad, ICML’95)
Improve Decision Trees for Probability-Based Ranking by
Lazy Learners (Liang, ICTAI’06)
13 / 24
Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
Estimating local competence
The local competency of an unlabeled example xu given
H
(t−1)
i is defined as follows:
Comp(xu, H
(t−1)
i ) =
xn∈N(xu),xn∈ωpred
H
(t−1)
i (xn, ωpred )
||xn − xu||2 +
where ωpred is the class label assigned to xu by H
(t−1)
i ;
H
(t−1)
j (xn, ωpred ) is the probability given by H
(t−1)
j that
neighbor xn belongs to class ωpred ; is a constant added to
avoid zero denominator.
It is inspired by decision-dependent distance-based k-nn
estimate of the competence that was proposed for dynamic
classifier selection. (Woods, PAMI’97)
14 / 24
Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
Estimating local competence
estimating local competence of an unlabeled example
given companion committee
15 / 24
Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
Handwritten Digits Recognition
The Handwritten Digits that are described by four sets of
features and are publicly available at UCI Repository. The digits
were extracted from a collection of Dutch utility maps. A total of
2,000 patterns (200 patterns per class) have been digitized in
binary images.
Name Description
mfeat-pix 240 pixel averages in 2 x 3 windows
mfeat-kar 64 Karhunen-Love coefficients
mfeat-fac 216 profile correlations
mfeat-fou 76 Fourier coefficients of the character shapes
16 / 24
Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
Experimental Setup
WEKA
4 runs of 10-fold cross-validation
For SSL, 10% of the training examples (180 patterns) are
randomly selected as the initial labeled data set L while the
remaining are used as unlabeled data set U.
The Random Subspace Method constructs an ensemble of
ten C4.5 pruned decision trees (with Laplace Correction)
where each tree uses only 50% of the features.
We set the pool size u = 100, the sample size n = one and
the number of nearest neighbors used to estimate local
competence k is 10.
17 / 24
Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
Experimental Results
Comparison between forests and individual trees.
Comparison between CoBC and Self-Training.
Comparison between CPE and local competence
confidence measures.
Comparison between CoBC and Co-Forest.
18 / 24
Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
Experimental Results
•
: corrected paired t-test implemented in WEKA at 0.05 significance level.
19 / 24
Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
Combining QBC and CoBC
Both semi-supervised learning and active learning tackle the
same problem but from different directions.
QBC-then-CoBC: QBC provides CoBC with a better
starting point instead of randomly selecting labeled
examples.
QBC-with-CoBC: In QBC-then-CoBC, QBC does not
benefit from CoBC. On the other hand, in QBC-with-CoBC,
both algorithms are benefiting from each other.
20 / 24
Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
Experimental Results
•
: corrected paired t-test implemented in WEKA at 0.05 significance level.
21 / 24
Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
Conclusion
A new single-view committe-based semi-supervised
learning framework is proposed.
An ensemble of diverse and accurate classifiers can
effectively exploit the unlabeled data to improve the
recognition accuracy.
The random subspace method not only enforces the
diversity but also reduces the dimensionality which is
desirable in case of small training set size.
CoBC outperforms Self-Training.
The local competence estimates is an effective confidence
measure that outperforms the class probability estimates
for sample selection.
22 / 24
Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
Future Work
Influence of ensemble size, random subspace size
Different ensemble learners, base learners such as SVM
or kNN
CoBC depends only on the companion committee H
(t−1)
j
constructed at the previous iteration to measure
confidence. We will study the influence of depending on all
the previous versions (H
(t )
j , t = t − 1, t − 2, . . . , 0).
23 / 24
Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work
Thanks for your attention
Questions ??
24 / 24

Contenu connexe

Similaire à Combining Committee-Based Semi-supervised and Active Learning and Its Application to Handwritten Digits Recognition

ResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksSeunghyun Hwang
 
[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...
[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...
[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...YONG ZHENG
 
Tutorial rpo
Tutorial rpoTutorial rpo
Tutorial rpomosi2005
 
LNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine LearningLNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine Learningbutest
 
A parsimonious SVM model selection criterion for classification of real-world ...
A parsimonious SVM model selection criterion for classification of real-world ...A parsimonious SVM model selection criterion for classification of real-world ...
A parsimonious SVM model selection criterion for classification of real-world ...o_almasi
 
3ways to improve semantic segmentation
3ways to improve semantic segmentation3ways to improve semantic segmentation
3ways to improve semantic segmentationFrozen Paradise
 
AIAA-SDM-SequentialSampling-2012
AIAA-SDM-SequentialSampling-2012AIAA-SDM-SequentialSampling-2012
AIAA-SDM-SequentialSampling-2012OptiModel
 
Practical tips for handling noisy data and annotaiton
Practical tips for handling noisy data and annotaitonPractical tips for handling noisy data and annotaiton
Practical tips for handling noisy data and annotaitonRyuichiKanoh
 
ASS_SDM2012_Ali
ASS_SDM2012_AliASS_SDM2012_Ali
ASS_SDM2012_AliMDO_Lab
 
An approach for improved students’ performance prediction using homogeneous ...
An approach for improved students’ performance prediction  using homogeneous ...An approach for improved students’ performance prediction  using homogeneous ...
An approach for improved students’ performance prediction using homogeneous ...IJECEIAES
 
Manta ray optimized deep contextualized bi-directional long short-term memor...
Manta ray optimized deep contextualized bi-directional long  short-term memor...Manta ray optimized deep contextualized bi-directional long  short-term memor...
Manta ray optimized deep contextualized bi-directional long short-term memor...IJECEIAES
 
Timetable management system(chapter 3)
Timetable management system(chapter 3)Timetable management system(chapter 3)
Timetable management system(chapter 3)Emeer95
 
Towards Reinforcement Learning-based Aggregate Computing
Towards Reinforcement Learning-based Aggregate ComputingTowards Reinforcement Learning-based Aggregate Computing
Towards Reinforcement Learning-based Aggregate ComputingGianluca Aguzzi
 
A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringAllenWu
 
Testing survey by_directions
Testing survey by_directionsTesting survey by_directions
Testing survey by_directionsTao He
 
Comparison of Learning Algorithms for Handwritten Digit Recognition
Comparison of Learning Algorithms for Handwritten Digit RecognitionComparison of Learning Algorithms for Handwritten Digit Recognition
Comparison of Learning Algorithms for Handwritten Digit RecognitionSafaa Alnabulsi
 
CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...
CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...
CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...ijcsit
 
Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...
Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...
Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...Daniel Davis
 
IEEE 2014 JAVA DATA MINING PROJECTS Active learning of constraints for semi s...
IEEE 2014 JAVA DATA MINING PROJECTS Active learning of constraints for semi s...IEEE 2014 JAVA DATA MINING PROJECTS Active learning of constraints for semi s...
IEEE 2014 JAVA DATA MINING PROJECTS Active learning of constraints for semi s...IEEEFINALYEARSTUDENTPROJECTS
 
Online learning in estimation of distribution algorithms for dynamic environm...
Online learning in estimation of distribution algorithms for dynamic environm...Online learning in estimation of distribution algorithms for dynamic environm...
Online learning in estimation of distribution algorithms for dynamic environm...André Gonçalves
 

Similaire à Combining Committee-Based Semi-supervised and Active Learning and Its Application to Handwritten Digits Recognition (20)

ResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksResNeSt: Split-Attention Networks
ResNeSt: Split-Attention Networks
 
[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...
[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...
[RecSys 2014] Deviation-Based and Similarity-Based Contextual SLIM Recommenda...
 
Tutorial rpo
Tutorial rpoTutorial rpo
Tutorial rpo
 
LNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine LearningLNCS 5050 - Bilevel Optimization and Machine Learning
LNCS 5050 - Bilevel Optimization and Machine Learning
 
A parsimonious SVM model selection criterion for classification of real-world ...
A parsimonious SVM model selection criterion for classification of real-world ...A parsimonious SVM model selection criterion for classification of real-world ...
A parsimonious SVM model selection criterion for classification of real-world ...
 
3ways to improve semantic segmentation
3ways to improve semantic segmentation3ways to improve semantic segmentation
3ways to improve semantic segmentation
 
AIAA-SDM-SequentialSampling-2012
AIAA-SDM-SequentialSampling-2012AIAA-SDM-SequentialSampling-2012
AIAA-SDM-SequentialSampling-2012
 
Practical tips for handling noisy data and annotaiton
Practical tips for handling noisy data and annotaitonPractical tips for handling noisy data and annotaiton
Practical tips for handling noisy data and annotaiton
 
ASS_SDM2012_Ali
ASS_SDM2012_AliASS_SDM2012_Ali
ASS_SDM2012_Ali
 
An approach for improved students’ performance prediction using homogeneous ...
An approach for improved students’ performance prediction  using homogeneous ...An approach for improved students’ performance prediction  using homogeneous ...
An approach for improved students’ performance prediction using homogeneous ...
 
Manta ray optimized deep contextualized bi-directional long short-term memor...
Manta ray optimized deep contextualized bi-directional long  short-term memor...Manta ray optimized deep contextualized bi-directional long  short-term memor...
Manta ray optimized deep contextualized bi-directional long short-term memor...
 
Timetable management system(chapter 3)
Timetable management system(chapter 3)Timetable management system(chapter 3)
Timetable management system(chapter 3)
 
Towards Reinforcement Learning-based Aggregate Computing
Towards Reinforcement Learning-based Aggregate ComputingTowards Reinforcement Learning-based Aggregate Computing
Towards Reinforcement Learning-based Aggregate Computing
 
A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clustering
 
Testing survey by_directions
Testing survey by_directionsTesting survey by_directions
Testing survey by_directions
 
Comparison of Learning Algorithms for Handwritten Digit Recognition
Comparison of Learning Algorithms for Handwritten Digit RecognitionComparison of Learning Algorithms for Handwritten Digit Recognition
Comparison of Learning Algorithms for Handwritten Digit Recognition
 
CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...
CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...
CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...
 
Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...
Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...
Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...
 
IEEE 2014 JAVA DATA MINING PROJECTS Active learning of constraints for semi s...
IEEE 2014 JAVA DATA MINING PROJECTS Active learning of constraints for semi s...IEEE 2014 JAVA DATA MINING PROJECTS Active learning of constraints for semi s...
IEEE 2014 JAVA DATA MINING PROJECTS Active learning of constraints for semi s...
 
Online learning in estimation of distribution algorithms for dynamic environm...
Online learning in estimation of distribution algorithms for dynamic environm...Online learning in estimation of distribution algorithms for dynamic environm...
Online learning in estimation of distribution algorithms for dynamic environm...
 

Dernier

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Dernier (20)

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

Combining Committee-Based Semi-supervised and Active Learning and Its Application to Handwritten Digits Recognition

  • 1. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work Combining Committee-based Semi-supervised and Active Learning and Its Application to Handwritten Digits Recognition Mohamed Farouk Abdel Hady, Friedhelm Schwenker Institute of Neural Information Processing University of Ulm, Germany {mohamed.abdel-hady|friedhelm.schwenker}@uni-ulm.de April 8, 2010 1 / 24
  • 2. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work Overview 2 / 24
  • 3. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work Semi-Supervised Learning In many domains, the amount of training examples is large but unlabeled. Data labeling process is often tedious, expensive and time consuming because it requires the effort of human experts. Research directions of SSL Semi-Supervised Clustering Semi-Supervised Classification Semi-Supervised Regression Semi-Supervised Dimensionality Reduction 3 / 24
  • 4. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work Semi-Supervised Learning Description SSL algorithm Single-view, Single-learner EM (Nigam and Ghani, 2000) Single-classifier Self-Training (Nigam and Ghani, 2000) Multi-view, Single-learner Co-EM (Nigam and Ghani, 2000) Multiple classifiers Co-Training (Blum and Mitchell, COLT’98) Single-view, Multi-learner Statistical Co-Learning (Goldman et al., 2000) Multiple classifiers Democratic Co-Learning (Y. Zhou et al., 2004) Single-view, Single-learner Tri-Training (Z.-H. Zhou, TKDE’05) Multiple classifiers Co-Forest (Li and Z.-H. Zhou, TSMC’07) Co-Training by Committee Z.-H. Zhou and M. Li, Semi-supervised learning by disagreement, Knowledge and Information Systems, in press. 4 / 24
  • 5. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work How can unlabeled data be helpful? 5 / 24
  • 6. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work How can unlabeled data be helpful? 6 / 24
  • 7. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work Self-Training But the most confident examples often lie away from the target decision boundary (non informative examples). Therefore, in many cases this process does not create representative training sets as it selects non informative examples. 7 / 24
  • 8. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work Multi-View Co-Training Blum and Mitchell (1998) As any multi-view learning algorithm, it requires that each training example is represented by multiple sufficient and redundant views, i.e. two or more sets of features that are conditionally independent given the class label and each is sufficient for learning. For web page classification: 1) the text appearing on the page itself, and 2) the text attached to hyperlinks pointing to this page, from other pages. 8 / 24
  • 9. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work Multi-View Co-Training 9 / 24
  • 10. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work Single-View Co-Training by Committee Contribution A single-view variant of Co-Training for application domains in which there are not redundant and independent views is proposed. Two learning frameworks for combining the merits of active learning with semi-supervised learning. Motivation For many real-world applications, the requirement for two sufficient and independent views can not be fulfilled. Co-Training does not work well without an appropriate feature splitting (Nigam and Ghani, 2000) Measuring the labeling confidence is not a straightforward task. 10 / 24
  • 11. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work Single-View Co-Training By Committee 11 / 24
  • 12. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work How to measure confidence Inaccurate confidence estimation → selecting and adding mislabeled examples to the training set → degrade the classification accuracy Estimating Class Probabilities (CPE) provided by companion committee. Confidence(xu, H (t−1) i ) = max 1≤c≤C H (t−1) i (xu, ωc) Unfortunately, in many cases the classifier does not provide an accurate CPE. For instance, a decision tree provides piecewise constant probability estimates. That is, all unlabeled examples xu which lie into a particular leaf, will have the same CPEs because the exact value of xu is not used in determining its CPE. 12 / 24
  • 13. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work Improving CPE of Decision Trees Laplace Correction, Probability Estimation Tree (PET), (Provost, Machine Learning 2003) P(ωc|xu) = nc + 1 N + C Bagging of PET Retrofitting Decision Tree Classifiers Using Kernel Density Estimation (Fayyad, ICML’95) Improve Decision Trees for Probability-Based Ranking by Lazy Learners (Liang, ICTAI’06) 13 / 24
  • 14. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work Estimating local competence The local competency of an unlabeled example xu given H (t−1) i is defined as follows: Comp(xu, H (t−1) i ) = xn∈N(xu),xn∈ωpred H (t−1) i (xn, ωpred ) ||xn − xu||2 + where ωpred is the class label assigned to xu by H (t−1) i ; H (t−1) j (xn, ωpred ) is the probability given by H (t−1) j that neighbor xn belongs to class ωpred ; is a constant added to avoid zero denominator. It is inspired by decision-dependent distance-based k-nn estimate of the competence that was proposed for dynamic classifier selection. (Woods, PAMI’97) 14 / 24
  • 15. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work Estimating local competence estimating local competence of an unlabeled example given companion committee 15 / 24
  • 16. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work Handwritten Digits Recognition The Handwritten Digits that are described by four sets of features and are publicly available at UCI Repository. The digits were extracted from a collection of Dutch utility maps. A total of 2,000 patterns (200 patterns per class) have been digitized in binary images. Name Description mfeat-pix 240 pixel averages in 2 x 3 windows mfeat-kar 64 Karhunen-Love coefficients mfeat-fac 216 profile correlations mfeat-fou 76 Fourier coefficients of the character shapes 16 / 24
  • 17. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work Experimental Setup WEKA 4 runs of 10-fold cross-validation For SSL, 10% of the training examples (180 patterns) are randomly selected as the initial labeled data set L while the remaining are used as unlabeled data set U. The Random Subspace Method constructs an ensemble of ten C4.5 pruned decision trees (with Laplace Correction) where each tree uses only 50% of the features. We set the pool size u = 100, the sample size n = one and the number of nearest neighbors used to estimate local competence k is 10. 17 / 24
  • 18. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work Experimental Results Comparison between forests and individual trees. Comparison between CoBC and Self-Training. Comparison between CPE and local competence confidence measures. Comparison between CoBC and Co-Forest. 18 / 24
  • 19. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work Experimental Results • : corrected paired t-test implemented in WEKA at 0.05 significance level. 19 / 24
  • 20. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work Combining QBC and CoBC Both semi-supervised learning and active learning tackle the same problem but from different directions. QBC-then-CoBC: QBC provides CoBC with a better starting point instead of randomly selecting labeled examples. QBC-with-CoBC: In QBC-then-CoBC, QBC does not benefit from CoBC. On the other hand, in QBC-with-CoBC, both algorithms are benefiting from each other. 20 / 24
  • 21. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work Experimental Results • : corrected paired t-test implemented in WEKA at 0.05 significance level. 21 / 24
  • 22. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work Conclusion A new single-view committe-based semi-supervised learning framework is proposed. An ensemble of diverse and accurate classifiers can effectively exploit the unlabeled data to improve the recognition accuracy. The random subspace method not only enforces the diversity but also reduces the dimensionality which is desirable in case of small training set size. CoBC outperforms Self-Training. The local competence estimates is an effective confidence measure that outperforms the class probability estimates for sample selection. 22 / 24
  • 23. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work Future Work Influence of ensemble size, random subspace size Different ensemble learners, base learners such as SVM or kNN CoBC depends only on the companion committee H (t−1) j constructed at the previous iteration to measure confidence. We will study the influence of depending on all the previous versions (H (t ) j , t = t − 1, t − 2, . . . , 0). 23 / 24
  • 24. Overview Semi-Supervised Learning(SSL) Single-View CoBC Experimental Results Conclusion Future Work Thanks for your attention Questions ?? 24 / 24