SlideShare une entreprise Scribd logo
1  sur  8
Télécharger pour lire hors ligne
Ethan Bowen
Neural Networks
12/4/2011
Multihyperkernel Customization and Analysis on One Versus All Support Vector
Machines
Abstract
Kernels allow for mapping into a higher dimensional space in order to get non-
linearly separable data into a separable form. With the application of Multihyperkernels
it is possible to obtain an increase in correct classification for Support Vector Machines
while preserving their structure. This is done by the combination of kernels to form a
new kernel.
Introduction
Support Vector Machines perform binary classification by creating an N-
dimensional hyperplane that best separates the classes into two distinct categories. All
SVMs use a function called a kernel to do such mapping. All kernel functions are
written in the form K(x,w) = <ɸ(x), ɸ (w) > where ɸ is the features space and x and w are
from the input space. Two common kernels used are the polynomial and radial basis
kernel (RBF). I will be covering many more kernels but primarily these two types.
The dataset I used I received is a multiclass categorization problem from the UCI
Machine Learning Repository that uses 5 features to classify teachers based on their
performance. In the dataset, the class labels are low, medium, or high and the features
are English speaking, course instructor, course, summer or regular semester, and class
size.
Instead of using a multiclass SVM which essentially requires Y binary classifying
SVMs, where Y is the number of classes, I decided to do a One-Versus-All approach
Ethan Bowen
Neural Networks
12/4/2011
which classifies each class against the rest. This approach allowed me to easily show
how the type of kernel used can affect the percentage of correct label classification and
also showing how creating custom kernels can improve correct class classification. My
research and knowledge of the subject at hand was obtained through reading several
published papers (which are cited) and from the course of this Neural Networks class.
Methods
When considering kernel methods to use it is best to know the data that you are
working with in order to have an optimal kernel function. For instance, using a linear
kernel function would be more harmful than good if you knew that your data was non-
linearly separable. The method I approached when choosing my kernel for this dataset
was to just try to find the most optimal. I began to test the kernels mentioned earlier
against some new kernels I created. Since I was given no testing sample, my testing
sample is a subset of the training sample.
For testing the polynomial kernel I created two kernel functions. The first is
BowenPoly, which is a kernel function of the form k (xi,x) = k (xi,x)d
where d is the
number of features in the dataset. The second is BowenN1, which is a kernel function
of the form k (xi,x) = k (xi,x)d+1
where d is the number of features in the dataset.
For testing the RBF kernel I created BowenRBF which is a combination of two
RBF kernels. I denote r=||xi-x||2
and ɛ=1/2*σ where σ is sigma and α is an Nx1 matrix
containing weights and N is the number of kernels used in BowenPoly (for this case
N=2). It can be said that the summation from 1 to N on α=1. For my testing α1=0.5 and
α2=0.5 meaning each kernel used is weighed at 50% of its’ original value.
Ethan Bowen
Neural Networks
12/4/2011
Using the Laplacian, Exponential, Multiquadric, and Gaussian RBFs, I plotted
sigma from 1 to 10 in 0.01 increments to see how kernels classified. I choose to use
the Laplacian and Exponential kernels since they gave the best results compared to the
other kernels (including Gaussian) (4). So with the Laplacian kernel (LAP) in the form
k(xi,x)=e-r/σ
and the Exponential kernel (EXP) in the form k(xi,x)=e-r/2(σ^2)
I created
BowenRBF in the form k(xi,x)=α1LAP + α2EXP. I obtained this process from the use of
a multihyperkernel which is multiple “kernel on kernel” notions that implicitly do kernel
optimization inside a set family of kernels (such as Gaussian kernels with different
sigmas) in the form of k(xi,x) = α𝑁
1 iKi(xi,x). For my case sigma is the same for both
kernels in BowenRBF.
Once I found the classification of each previous kernel and for new kernels for
each class (low, medium, high), I observed my results to see first which kernel gave
better average correctness for classification and second to determine the usefulness of
the new kernels compared to the original kernels.
Results
For the linear I found that the most optimal kernel is k (xi,x) = xi
T
x was only about
70% correctly classified so I quickly switched to non-linear kernels for testing. Based on
(1) it shows that for this dataset BowenPoly and BowenPoly1 did not consistently map
better than the polynomial kernel so I could not accurately state that my kernels would
map better for different testing sets. For the non-linear data I tested against the most
commonly used RBF called Gaussian RBF in the form k(xi,x)= e-(ɛr)^2
and the
classification was 80% correct. (2) shows that for each OVA SVM, BowenRBF shows
Ethan Bowen
Neural Networks
12/4/2011
improvement compared to the Gaussian RBF for sigmas from 0.7 to 10 and based on
(3) you can see that the averages of the OVA SVMs for the class labels (low, medium,
and high) has BowenRBF at a much higher percent correct classification than the
Gaussian RBF for sigmas 0.7 to 10. Therefore, I can’t say for this dataset that there is
evidence to prove that BowenPoly and BowenN1 will regularly correctly classify at a
higher percentage at the homogeneous polynomial kernel, but I can say that for this
dataset that there is evidence (3) that BowenRBF will regularly correctly classify at a
higher percentage over the Gaussian RBF kernel and this justifies that I could use
BowenRBF to obtain a high percent correct classification for further testing samples.
Discussion
It can be said that doing just a few classifications tests on a dataset does not
justify BowenRBF to being a better kernel over Gaussian and much more research into
multihyperkernels is needed before a concrete justification can be given. I find this
research very interesting and found often I would be learning many new processes of
using kernels and applying them to specific applications. In this process α was simply
choosen as 0.5 for each element but I learned that there are algorithms for learning
these weights as well. Overall this was a very fun project and I enjoyed the process of
discovering a custom kernel that worked better than other known kernels.
Ethan Bowen
Neural Networks
12/4/2011
References
[1]Andrew Oliver Hatch. Kernel Optimization for Support Vector Machines : Application
to Speaker Verification.
PhD thesis, EECS Department, University of California, Berkeley, Dec 2006.
[2] C. S. Ong and A. J. Smola. Machine learning using hyperkernels. In Proceedings of
the
International Conference on Machine Learning, pages 568–575, 2003.
[3] Souza, César R. "Kernel Functions for Machine Learning Applications." 17 Mar.
2010. Web. <http://crsouza.blogspot.com/2010/03/kernel-functions-for-machine-
learning.html>.
Ethan Bowen
Neural Networks
12/4/2011
(1)
Ethan Bowen
Neural Networks
12/4/2011
(2)
Ethan Bowen
Neural Networks
12/4/2011
(3)
(4) Sigma from 0-10. Graph is scaled.

Contenu connexe

En vedette

কবিতা নবম ও দশম শ্রেণি কপোতাক্ষ নদ ১২
কবিতা নবম ও দশম শ্রেণি কপোতাক্ষ নদ ১২কবিতা নবম ও দশম শ্রেণি কপোতাক্ষ নদ ১২
কবিতা নবম ও দশম শ্রেণি কপোতাক্ষ নদ ১২Cambriannews
 
কবিতা নবম ও দশম শ্রেণি কপোতাক্ষ নদ ১৩
কবিতা নবম ও দশম শ্রেণি কপোতাক্ষ নদ ১৩কবিতা নবম ও দশম শ্রেণি কপোতাক্ষ নদ ১৩
কবিতা নবম ও দশম শ্রেণি কপোতাক্ষ নদ ১৩Cambriannews
 
Wine event benefiting second helpings at petersons 6.8.15
Wine event benefiting second helpings at petersons 6.8.15Wine event benefiting second helpings at petersons 6.8.15
Wine event benefiting second helpings at petersons 6.8.15Peterson's Restaurant
 
কবিতা নবম ও দশম শ্রেণি আমি কোনো আগন্তক নই ০৪
কবিতা নবম ও দশম শ্রেণি আমি কোনো আগন্তক নই ০৪কবিতা নবম ও দশম শ্রেণি আমি কোনো আগন্তক নই ০৪
কবিতা নবম ও দশম শ্রেণি আমি কোনো আগন্তক নই ০৪Cambriannews
 
IKAN Relocations_Services Portfolio
IKAN Relocations_Services PortfolioIKAN Relocations_Services Portfolio
IKAN Relocations_Services PortfolioSumana Das Gupta
 
Untitled Presentation
Untitled PresentationUntitled Presentation
Untitled PresentationEnelyn Vilk
 
English blazer Victory
English blazer  VictoryEnglish blazer  Victory
English blazer VictorySriram Prasad
 

En vedette (10)

কবিতা নবম ও দশম শ্রেণি কপোতাক্ষ নদ ১২
কবিতা নবম ও দশম শ্রেণি কপোতাক্ষ নদ ১২কবিতা নবম ও দশম শ্রেণি কপোতাক্ষ নদ ১২
কবিতা নবম ও দশম শ্রেণি কপোতাক্ষ নদ ১২
 
কবিতা নবম ও দশম শ্রেণি কপোতাক্ষ নদ ১৩
কবিতা নবম ও দশম শ্রেণি কপোতাক্ষ নদ ১৩কবিতা নবম ও দশম শ্রেণি কপোতাক্ষ নদ ১৩
কবিতা নবম ও দশম শ্রেণি কপোতাক্ষ নদ ১৩
 
Wine event benefiting second helpings at petersons 6.8.15
Wine event benefiting second helpings at petersons 6.8.15Wine event benefiting second helpings at petersons 6.8.15
Wine event benefiting second helpings at petersons 6.8.15
 
কবিতা নবম ও দশম শ্রেণি আমি কোনো আগন্তক নই ০৪
কবিতা নবম ও দশম শ্রেণি আমি কোনো আগন্তক নই ০৪কবিতা নবম ও দশম শ্রেণি আমি কোনো আগন্তক নই ০৪
কবিতা নবম ও দশম শ্রেণি আমি কোনো আগন্তক নই ০৪
 
Assignment 1
Assignment 1Assignment 1
Assignment 1
 
Wohlford Resume 2016
Wohlford Resume 2016Wohlford Resume 2016
Wohlford Resume 2016
 
Meeting poster
Meeting posterMeeting poster
Meeting poster
 
IKAN Relocations_Services Portfolio
IKAN Relocations_Services PortfolioIKAN Relocations_Services Portfolio
IKAN Relocations_Services Portfolio
 
Untitled Presentation
Untitled PresentationUntitled Presentation
Untitled Presentation
 
English blazer Victory
English blazer  VictoryEnglish blazer  Victory
English blazer Victory
 

Similaire à Multihyperkernel Customization and Analysis on OVA SVMs

SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional VerificationSai Kiran Kadam
 
Machine Learning
Machine LearningMachine Learning
Machine Learningbutest
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierNeha Kulkarni
 
Evaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernelsEvaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernelsinfopapers
 
Improving Machine Learning Approaches to Coreference Resolution
Improving Machine Learning Approaches to Coreference ResolutionImproving Machine Learning Approaches to Coreference Resolution
Improving Machine Learning Approaches to Coreference Resolutionbutest
 
Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Fatimakhan325
 
A general frame for building optimal multiple SVM kernels
A general frame for building optimal multiple SVM kernelsA general frame for building optimal multiple SVM kernels
A general frame for building optimal multiple SVM kernelsinfopapers
 
Large Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentLarge Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentShaleen Kumar Gupta
 
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural N...
Classification of Iris Data using Kernel Radial Basis Probabilistic  Neural N...Classification of Iris Data using Kernel Radial Basis Probabilistic  Neural N...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural N...Scientific Review SR
 
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...Scientific Review
 
GENERALIZED LEGENDRE POLYNOMIALS FOR SUPPORT VECTOR MACHINES (SVMS) CLASSIFIC...
GENERALIZED LEGENDRE POLYNOMIALS FOR SUPPORT VECTOR MACHINES (SVMS) CLASSIFIC...GENERALIZED LEGENDRE POLYNOMIALS FOR SUPPORT VECTOR MACHINES (SVMS) CLASSIFIC...
GENERALIZED LEGENDRE POLYNOMIALS FOR SUPPORT VECTOR MACHINES (SVMS) CLASSIFIC...IJNSA Journal
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksSDL
 
Recursive Neural Networks
Recursive Neural NetworksRecursive Neural Networks
Recursive Neural NetworksSangwoo Mo
 
Adaptive Training of Radial Basis Function Networks Based on Cooperative
Adaptive Training of Radial Basis Function Networks Based on CooperativeAdaptive Training of Radial Basis Function Networks Based on Cooperative
Adaptive Training of Radial Basis Function Networks Based on CooperativeESCOM
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.pptyang947066
 
17 1 knowledge-based system
17 1 knowledge-based system17 1 knowledge-based system
17 1 knowledge-based systemTianlu Wang
 
Data Science - Part IX - Support Vector Machine
Data Science - Part IX -  Support Vector MachineData Science - Part IX -  Support Vector Machine
Data Science - Part IX - Support Vector MachineDerek Kane
 

Similaire à Multihyperkernel Customization and Analysis on OVA SVMs (20)

SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional Verification
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
Evaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernelsEvaluation of a hybrid method for constructing multiple SVM kernels
Evaluation of a hybrid method for constructing multiple SVM kernels
 
2224d_final
2224d_final2224d_final
2224d_final
 
Improving Machine Learning Approaches to Coreference Resolution
Improving Machine Learning Approaches to Coreference ResolutionImproving Machine Learning Approaches to Coreference Resolution
Improving Machine Learning Approaches to Coreference Resolution
 
Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)
 
A general frame for building optimal multiple SVM kernels
A general frame for building optimal multiple SVM kernelsA general frame for building optimal multiple SVM kernels
A general frame for building optimal multiple SVM kernels
 
Large Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentLarge Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate Descent
 
20120140505011
2012014050501120120140505011
20120140505011
 
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural N...
Classification of Iris Data using Kernel Radial Basis Probabilistic  Neural N...Classification of Iris Data using Kernel Radial Basis Probabilistic  Neural N...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural N...
 
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
 
GENERALIZED LEGENDRE POLYNOMIALS FOR SUPPORT VECTOR MACHINES (SVMS) CLASSIFIC...
GENERALIZED LEGENDRE POLYNOMIALS FOR SUPPORT VECTOR MACHINES (SVMS) CLASSIFIC...GENERALIZED LEGENDRE POLYNOMIALS FOR SUPPORT VECTOR MACHINES (SVMS) CLASSIFIC...
GENERALIZED LEGENDRE POLYNOMIALS FOR SUPPORT VECTOR MACHINES (SVMS) CLASSIFIC...
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural Networks
 
Recursive Neural Networks
Recursive Neural NetworksRecursive Neural Networks
Recursive Neural Networks
 
Adaptive Training of Radial Basis Function Networks Based on Cooperative
Adaptive Training of Radial Basis Function Networks Based on CooperativeAdaptive Training of Radial Basis Function Networks Based on Cooperative
Adaptive Training of Radial Basis Function Networks Based on Cooperative
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.ppt
 
ML_in_QM_JC_02-10-18
ML_in_QM_JC_02-10-18ML_in_QM_JC_02-10-18
ML_in_QM_JC_02-10-18
 
17 1 knowledge-based system
17 1 knowledge-based system17 1 knowledge-based system
17 1 knowledge-based system
 
Data Science - Part IX - Support Vector Machine
Data Science - Part IX -  Support Vector MachineData Science - Part IX -  Support Vector Machine
Data Science - Part IX - Support Vector Machine
 

Multihyperkernel Customization and Analysis on OVA SVMs

  • 1. Ethan Bowen Neural Networks 12/4/2011 Multihyperkernel Customization and Analysis on One Versus All Support Vector Machines Abstract Kernels allow for mapping into a higher dimensional space in order to get non- linearly separable data into a separable form. With the application of Multihyperkernels it is possible to obtain an increase in correct classification for Support Vector Machines while preserving their structure. This is done by the combination of kernels to form a new kernel. Introduction Support Vector Machines perform binary classification by creating an N- dimensional hyperplane that best separates the classes into two distinct categories. All SVMs use a function called a kernel to do such mapping. All kernel functions are written in the form K(x,w) = <ɸ(x), ɸ (w) > where ɸ is the features space and x and w are from the input space. Two common kernels used are the polynomial and radial basis kernel (RBF). I will be covering many more kernels but primarily these two types. The dataset I used I received is a multiclass categorization problem from the UCI Machine Learning Repository that uses 5 features to classify teachers based on their performance. In the dataset, the class labels are low, medium, or high and the features are English speaking, course instructor, course, summer or regular semester, and class size. Instead of using a multiclass SVM which essentially requires Y binary classifying SVMs, where Y is the number of classes, I decided to do a One-Versus-All approach
  • 2. Ethan Bowen Neural Networks 12/4/2011 which classifies each class against the rest. This approach allowed me to easily show how the type of kernel used can affect the percentage of correct label classification and also showing how creating custom kernels can improve correct class classification. My research and knowledge of the subject at hand was obtained through reading several published papers (which are cited) and from the course of this Neural Networks class. Methods When considering kernel methods to use it is best to know the data that you are working with in order to have an optimal kernel function. For instance, using a linear kernel function would be more harmful than good if you knew that your data was non- linearly separable. The method I approached when choosing my kernel for this dataset was to just try to find the most optimal. I began to test the kernels mentioned earlier against some new kernels I created. Since I was given no testing sample, my testing sample is a subset of the training sample. For testing the polynomial kernel I created two kernel functions. The first is BowenPoly, which is a kernel function of the form k (xi,x) = k (xi,x)d where d is the number of features in the dataset. The second is BowenN1, which is a kernel function of the form k (xi,x) = k (xi,x)d+1 where d is the number of features in the dataset. For testing the RBF kernel I created BowenRBF which is a combination of two RBF kernels. I denote r=||xi-x||2 and ɛ=1/2*σ where σ is sigma and α is an Nx1 matrix containing weights and N is the number of kernels used in BowenPoly (for this case N=2). It can be said that the summation from 1 to N on α=1. For my testing α1=0.5 and α2=0.5 meaning each kernel used is weighed at 50% of its’ original value.
  • 3. Ethan Bowen Neural Networks 12/4/2011 Using the Laplacian, Exponential, Multiquadric, and Gaussian RBFs, I plotted sigma from 1 to 10 in 0.01 increments to see how kernels classified. I choose to use the Laplacian and Exponential kernels since they gave the best results compared to the other kernels (including Gaussian) (4). So with the Laplacian kernel (LAP) in the form k(xi,x)=e-r/σ and the Exponential kernel (EXP) in the form k(xi,x)=e-r/2(σ^2) I created BowenRBF in the form k(xi,x)=α1LAP + α2EXP. I obtained this process from the use of a multihyperkernel which is multiple “kernel on kernel” notions that implicitly do kernel optimization inside a set family of kernels (such as Gaussian kernels with different sigmas) in the form of k(xi,x) = α𝑁 1 iKi(xi,x). For my case sigma is the same for both kernels in BowenRBF. Once I found the classification of each previous kernel and for new kernels for each class (low, medium, high), I observed my results to see first which kernel gave better average correctness for classification and second to determine the usefulness of the new kernels compared to the original kernels. Results For the linear I found that the most optimal kernel is k (xi,x) = xi T x was only about 70% correctly classified so I quickly switched to non-linear kernels for testing. Based on (1) it shows that for this dataset BowenPoly and BowenPoly1 did not consistently map better than the polynomial kernel so I could not accurately state that my kernels would map better for different testing sets. For the non-linear data I tested against the most commonly used RBF called Gaussian RBF in the form k(xi,x)= e-(ɛr)^2 and the classification was 80% correct. (2) shows that for each OVA SVM, BowenRBF shows
  • 4. Ethan Bowen Neural Networks 12/4/2011 improvement compared to the Gaussian RBF for sigmas from 0.7 to 10 and based on (3) you can see that the averages of the OVA SVMs for the class labels (low, medium, and high) has BowenRBF at a much higher percent correct classification than the Gaussian RBF for sigmas 0.7 to 10. Therefore, I can’t say for this dataset that there is evidence to prove that BowenPoly and BowenN1 will regularly correctly classify at a higher percentage at the homogeneous polynomial kernel, but I can say that for this dataset that there is evidence (3) that BowenRBF will regularly correctly classify at a higher percentage over the Gaussian RBF kernel and this justifies that I could use BowenRBF to obtain a high percent correct classification for further testing samples. Discussion It can be said that doing just a few classifications tests on a dataset does not justify BowenRBF to being a better kernel over Gaussian and much more research into multihyperkernels is needed before a concrete justification can be given. I find this research very interesting and found often I would be learning many new processes of using kernels and applying them to specific applications. In this process α was simply choosen as 0.5 for each element but I learned that there are algorithms for learning these weights as well. Overall this was a very fun project and I enjoyed the process of discovering a custom kernel that worked better than other known kernels.
  • 5. Ethan Bowen Neural Networks 12/4/2011 References [1]Andrew Oliver Hatch. Kernel Optimization for Support Vector Machines : Application to Speaker Verification. PhD thesis, EECS Department, University of California, Berkeley, Dec 2006. [2] C. S. Ong and A. J. Smola. Machine learning using hyperkernels. In Proceedings of the International Conference on Machine Learning, pages 568–575, 2003. [3] Souza, César R. "Kernel Functions for Machine Learning Applications." 17 Mar. 2010. Web. <http://crsouza.blogspot.com/2010/03/kernel-functions-for-machine- learning.html>.
  • 8. Ethan Bowen Neural Networks 12/4/2011 (3) (4) Sigma from 0-10. Graph is scaled.