SlideShare une entreprise Scribd logo
1  sur  37
Télécharger pour lire hors ligne
Machine 
Learning 
for 
Language 
Technology 
Lecture 
9: 
Perceptron 
Marina 
San2ni 
Department 
of 
Linguis2cs 
and 
Philology 
Uppsala 
University, 
Uppsala, 
Sweden 
Autumn 
2014 
Acknowledgement: 
Thanks 
to 
Prof. 
Joakim 
Nivre 
for 
course 
design 
and 
materials 
1
Inputs 
and 
Outputs
Feature 
Representa2on
Features 
and 
Classes
Examples 
(i)
Examples 
(ii)
Block 
Feature 
Vectors
Representa2on 
Linear 
Classifiers: 
Repe22on 
& 
Extension 
8
Linear 
classifiers 
(atomic 
classes) 
Linear 
Classifiers: 
Repe22on 
& 
Extension 
15 
• Assump2on: 
data 
must 
be 
linearily 
separable
Perceptron
Perceptron 
(i)
Perceptron 
Learning 
Algorithm
Separability 
and 
Margin 
(i)
Separability 
and 
Margin 
(ii) 
Linear 
Classifiers: 
Repe22on 
& 
Extension 
20 
• Given 
a 
training 
instance, 
let 
Y 
bar 
t 
be 
the 
set 
of 
all 
labels 
that 
are 
incorrect, 
let’s 
define 
the 
set 
of 
incorrect 
labels 
minus 
the 
correct 
labels 
for 
that 
instance. 
• 
Then 
we 
say 
that 
a 
training 
set 
is 
separable 
with 
a 
margin 
gamma, 
if 
there 
exists 
a 
weight 
vector 
w 
that 
has 
a 
certain 
norm 
(ie 
1), 
The score that we get when 
we use this vector w minus 
the score of every incorrect 
label is at least gamma
Separability 
and 
Margin 
(iii) 
• IMPORTANT: 
for 
every 
training 
instance 
the 
score 
that 
we 
get 
when 
we 
use 
the 
training 
vector 
w 
minus 
the 
score 
of 
every 
incorrect 
label 
is 
at 
least 
a 
certain 
margin 
gamma 
(ɣ). 
That 
is, 
the 
margin 
ɣ 
is 
the 
smallest 
difference 
between 
the 
score 
of 
the 
right 
class 
and 
the 
best 
score 
of 
the 
incorrect 
class. 
The higher the weights, 
the greater the norms. 
And we want this to be 1 
(normalization). 
There 
are 
different 
ways 
of 
measuring 
the 
length/ 
magnitude 
of 
a 
vector 
and 
they 
are 
known 
as 
norms. 
The 
Eucledian 
norm 
(or 
L2 
norm) 
says: 
take 
all 
the 
values 
of 
the 
weight 
vector, 
square 
them 
and 
sum 
them 
up, 
then 
take 
the 
square 
root 
.
Perceptron 
Linear 
Classifiers: 
Repe22on 
& 
Extension 
22
Perceptron 
Learning 
Algorithm 
Linear 
Classifiers: 
Repe22on 
& 
Extension 
23
Main 
Theorem
25 
Linear 
Classifiers: 
Repe22on 
& 
Extension 
Perceptron 
Theorem 
• For 
any 
training 
set 
that 
is 
separable 
with 
some 
margin, 
we 
can 
prove 
that 
the 
number 
of 
mistakes 
during 
training 
-­‐-­‐ 
if 
we 
keep 
itera2ng 
over 
the 
training 
set 
-­‐-­‐ 
is 
bounded 
by 
a 
quan2ty 
that 
depends 
on 
the 
size 
of 
the 
margin 
(see 
proofs 
in 
the 
Appendix, 
slides 
Lecture 
3). 
• R 
depends 
on 
the 
norm 
of 
the 
largest 
difference 
you 
can 
have 
between 
feature 
vectors. 
The 
larger 
R, 
the 
more 
spread 
out 
the 
data, 
the 
more 
errors 
we 
can 
poten2ally 
make. 
Similarly 
if 
gamma 
is 
larger 
we 
will 
make 
fewer 
mistakes.
Summary
Basically… 
27 
.... 
if 
it 
is 
possible 
to 
find 
such 
a 
weight 
vector 
for 
some 
posiAve 
margin 
gamma, 
then 
the 
training 
set 
is 
Linear 
Classifiers: 
Repe22on 
& 
Extension 
separable. 
So... 
if 
the 
training 
set 
is 
separable, 
Perceptron 
will 
eventually 
find 
the 
weight 
vector 
that 
separates 
the 
data. 
The 
2me 
it 
takes 
depends 
on 
the 
property 
of 
the 
data. 
But 
aeer 
a 
finite 
number 
of 
itera2on, 
the 
training 
set 
will 
converge 
to 
0. 
However... 
although 
we 
find 
the 
perfect 
weight 
vector 
for 
separa2ng 
the 
training 
data, 
it 
might 
be 
the 
case 
that 
the 
classifier 
has 
not 
good 
generaliza2on 
(do 
you 
remember 
the 
difference 
between 
empirical 
error 
and 
generaliza2on 
error?) 
So, 
with 
Perceptron, 
we 
have 
a 
fixed 
norm 
(=1) 
and 
variable 
margin 
(>0).
Appendix: 
Proofs 
and 
Deriva2ons
Lecture 9 Perceptron
Lecture 9 Perceptron
Lecture 9 Perceptron
Lecture 9 Perceptron
Lecture 9 Perceptron
Lecture 9 Perceptron
Lecture 9 Perceptron
Lecture 9 Perceptron
Lecture 9 Perceptron

Contenu connexe

Tendances

Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural Networks
Francesco Collova'
 
Section5 Rbf
Section5 RbfSection5 Rbf
Section5 Rbf
kylin
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
nextlib
 
Counterpropagation NETWORK
Counterpropagation NETWORKCounterpropagation NETWORK
Counterpropagation NETWORK
ESCOM
 

Tendances (20)

Svm and kernel machines
Svm and kernel machinesSvm and kernel machines
Svm and kernel machines
 
Lecture11 - neural networks
Lecture11 - neural networksLecture11 - neural networks
Lecture11 - neural networks
 
Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural Networks
 
Adaptive Resonance Theory
Adaptive Resonance TheoryAdaptive Resonance Theory
Adaptive Resonance Theory
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
 
Section5 Rbf
Section5 RbfSection5 Rbf
Section5 Rbf
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep Learning
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
HOPFIELD NETWORK
HOPFIELD NETWORKHOPFIELD NETWORK
HOPFIELD NETWORK
 
CNN and its applications by ketaki
CNN and its applications by ketakiCNN and its applications by ketaki
CNN and its applications by ketaki
 
Adaline madaline
Adaline madalineAdaline madaline
Adaline madaline
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Counterpropagation NETWORK
Counterpropagation NETWORKCounterpropagation NETWORK
Counterpropagation NETWORK
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
boosting algorithm
boosting algorithmboosting algorithm
boosting algorithm
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
 
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural network
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
Cnn
CnnCnn
Cnn
 
Unit 1
Unit 1Unit 1
Unit 1
 

Similaire à Lecture 9 Perceptron

Huong dan cu the svm
Huong dan cu the svmHuong dan cu the svm
Huong dan cu the svm
taikhoan262
 

Similaire à Lecture 9 Perceptron (20)

Artificial Neural Networks Deep Learning Report
Artificial Neural Networks   Deep Learning ReportArtificial Neural Networks   Deep Learning Report
Artificial Neural Networks Deep Learning Report
 
Lecture 10: SVM and MIRA
Lecture 10: SVM and MIRALecture 10: SVM and MIRA
Lecture 10: SVM and MIRA
 
Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio
 
Group Project
Group ProjectGroup Project
Group Project
 
Support Vector Machine.pptx
Support Vector Machine.pptxSupport Vector Machine.pptx
Support Vector Machine.pptx
 
Deep learning book_chap_02
Deep learning book_chap_02Deep learning book_chap_02
Deep learning book_chap_02
 
Linear regression
Linear regressionLinear regression
Linear regression
 
large scale Machine learning
large scale Machine learninglarge scale Machine learning
large scale Machine learning
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and lda
 
Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...
Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...
Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...
 
Regularization
RegularizationRegularization
Regularization
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
PRML Chapter 4
PRML Chapter 4PRML Chapter 4
PRML Chapter 4
 
Guide
GuideGuide
Guide
 
L05 language model_part2
L05 language model_part2L05 language model_part2
L05 language model_part2
 
Guide
GuideGuide
Guide
 
Huong dan cu the svm
Huong dan cu the svmHuong dan cu the svm
Huong dan cu the svm
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
 
Artificial intelligence.pptx
Artificial intelligence.pptxArtificial intelligence.pptx
Artificial intelligence.pptx
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
 

Plus de Marina Santini

Plus de Marina Santini (20)

Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity i...
 
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsTowards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology Applications
 
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
A Web Corpus for eCare: Collection, Lay Annotation and Learning -First Results-
 
An Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability FeaturesAn Exploratory Study on Genre Classification using Readability Features
An Exploratory Study on Genre Classification using Readability Features
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word CloudsLecture: Semantic Word Clouds
Lecture: Semantic Word Clouds
 
Lecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic WebLecture: Ontologies and the Semantic Web
Lecture: Ontologies and the Semantic Web
 
Lecture: Summarization
Lecture: SummarizationLecture: Summarization
Lecture: Summarization
 
Relation Extraction
Relation ExtractionRelation Extraction
Relation Extraction
 
Lecture: Question Answering
Lecture: Question AnsweringLecture: Question Answering
Lecture: Question Answering
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)
 
Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense Disambiguation
 
Lecture: Word Senses
Lecture: Word SensesLecture: Word Senses
Lecture: Word Senses
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 
Semantics and Computational Semantics
Semantics and Computational SemanticsSemantics and Computational Semantics
Semantics and Computational Semantics
 
Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)
 
Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1)
 
Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Lecture 5: Interval Estimation
Lecture 5: Interval Estimation
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 

Dernier

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 

Dernier (20)

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 

Lecture 9 Perceptron

  • 1. Machine Learning for Language Technology Lecture 9: Perceptron Marina San2ni Department of Linguis2cs and Philology Uppsala University, Uppsala, Sweden Autumn 2014 Acknowledgement: Thanks to Prof. Joakim Nivre for course design and materials 1
  • 8. Representa2on Linear Classifiers: Repe22on & Extension 8
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15. Linear classifiers (atomic classes) Linear Classifiers: Repe22on & Extension 15 • Assump2on: data must be linearily separable
  • 20. Separability and Margin (ii) Linear Classifiers: Repe22on & Extension 20 • Given a training instance, let Y bar t be the set of all labels that are incorrect, let’s define the set of incorrect labels minus the correct labels for that instance. • Then we say that a training set is separable with a margin gamma, if there exists a weight vector w that has a certain norm (ie 1), The score that we get when we use this vector w minus the score of every incorrect label is at least gamma
  • 21. Separability and Margin (iii) • IMPORTANT: for every training instance the score that we get when we use the training vector w minus the score of every incorrect label is at least a certain margin gamma (ɣ). That is, the margin ɣ is the smallest difference between the score of the right class and the best score of the incorrect class. The higher the weights, the greater the norms. And we want this to be 1 (normalization). There are different ways of measuring the length/ magnitude of a vector and they are known as norms. The Eucledian norm (or L2 norm) says: take all the values of the weight vector, square them and sum them up, then take the square root .
  • 22. Perceptron Linear Classifiers: Repe22on & Extension 22
  • 23. Perceptron Learning Algorithm Linear Classifiers: Repe22on & Extension 23
  • 25. 25 Linear Classifiers: Repe22on & Extension Perceptron Theorem • For any training set that is separable with some margin, we can prove that the number of mistakes during training -­‐-­‐ if we keep itera2ng over the training set -­‐-­‐ is bounded by a quan2ty that depends on the size of the margin (see proofs in the Appendix, slides Lecture 3). • R depends on the norm of the largest difference you can have between feature vectors. The larger R, the more spread out the data, the more errors we can poten2ally make. Similarly if gamma is larger we will make fewer mistakes.
  • 27. Basically… 27 .... if it is possible to find such a weight vector for some posiAve margin gamma, then the training set is Linear Classifiers: Repe22on & Extension separable. So... if the training set is separable, Perceptron will eventually find the weight vector that separates the data. The 2me it takes depends on the property of the data. But aeer a finite number of itera2on, the training set will converge to 0. However... although we find the perfect weight vector for separa2ng the training data, it might be the case that the classifier has not good generaliza2on (do you remember the difference between empirical error and generaliza2on error?) So, with Perceptron, we have a fixed norm (=1) and variable margin (>0).
  • 28. Appendix: Proofs and Deriva2ons