SlideShare une entreprise Scribd logo
1  sur  1
Télécharger pour lire hors ligne
http://multimedialab.elis.ugent.be 
Ghent University – iMinds, ELIS Department/Multimedia Lab 
Gaston Crommenlaan 8 bus 201 
B-9050 Ledeberg – Ghent, Belgium 
Fréderic Godin, Baptist Vandersmissen, Azarakhsh Jalalvand, Wesley De Neve and Rik Van de Walle 
Workshop on Machine Learning and NLP, NIPS 2014 
Alleviating Manual Feature Engineering for Part-of-Speech Tagging 
of Twitter Microposts using Distributed Word Representations 
12/12/2014, Montreal, Canada 
Research Question 
Vote-Constrained Bootstrapping* 
Can we avoid manual feature engineering when developing 
a Part-of-Speech tagger for Twitter microposts? 
@frederic_godin, @BaptistV, @wmdeneve and @rvdwalle 
Solution 
Automatically learn features on 400 million raw Twitter microposts that 
capture syntactic and semantic patterns and feed them to a neural network 
Learn Features Train the Part-of-Speech Tagger 
400 million 
Word2vec Skip-gram 
400D vector 
400D 
400D vector 
400D 400D 
444000000DDD v vVeeecccttotoorr r 
Hidden Layer (500D) 
Output Layer (52D) 
im doin good 
VBG 
Evaluation 
ARK tagger 
GATE tagger 
im 
doin 
good VBG 
V 
Agree? 
Automatically generate 
high confidence labeled data 
Use this data to pre-train 
the neural network 
*Derczynski et al., 2013. "Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data" 
Word2vec 
dataset 
Pre-training 
dataset 
Accuracy 
validation set 
Accuracy 
test set 
150M / 87.95% 87.46% 
150M 50K 89.64% 88.82% 
400M 50K 89.73% 88.95% 
400M 125K 90.09% 88.90% 
Ritter et al. (2011) 84.55% 
Derczynski et al. (2013) 88.69%

Contenu connexe

Similaire à Alleviating Manual Feature Engineering for Part-of-Speech Tagging of Twitter Microposts using Distributed Word Representations

Resume upto august 2016
Resume upto august 2016Resume upto august 2016
Resume upto august 2016
Chandan Raj
 
resume_15450424_1453448806 (1)
resume_15450424_1453448806 (1)resume_15450424_1453448806 (1)
resume_15450424_1453448806 (1)
Akhil Mohan
 
requirements engineering - technologies
requirements engineering - technologiesrequirements engineering - technologies
requirements engineering - technologies
Katrien Verbert
 

Similaire à Alleviating Manual Feature Engineering for Part-of-Speech Tagging of Twitter Microposts using Distributed Word Representations (20)

CV VD Mohire
CV VD MohireCV VD Mohire
CV VD Mohire
 
Amit_Resume
Amit_ResumeAmit_Resume
Amit_Resume
 
VEDANT GHODKE - RESUME
VEDANT GHODKE - RESUMEVEDANT GHODKE - RESUME
VEDANT GHODKE - RESUME
 
VEDANT GHODKE - RESUME
VEDANT GHODKE - RESUMEVEDANT GHODKE - RESUME
VEDANT GHODKE - RESUME
 
DB
DBDB
DB
 
Resume upto august 2016
Resume upto august 2016Resume upto august 2016
Resume upto august 2016
 
Deepak Kumar Gautam
Deepak Kumar GautamDeepak Kumar Gautam
Deepak Kumar Gautam
 
Naisahdh nimbark copy
Naisahdh nimbark   copyNaisahdh nimbark   copy
Naisahdh nimbark copy
 
DILIP M NAIR
DILIP M NAIRDILIP M NAIR
DILIP M NAIR
 
Yashwanth Krishnan - Resume
Yashwanth Krishnan - ResumeYashwanth Krishnan - Resume
Yashwanth Krishnan - Resume
 
Rushabh_Doshi_1_
Rushabh_Doshi_1_Rushabh_Doshi_1_
Rushabh_Doshi_1_
 
resume_15450424_1453448806 (1)
resume_15450424_1453448806 (1)resume_15450424_1453448806 (1)
resume_15450424_1453448806 (1)
 
requirements engineering - technologies
requirements engineering - technologiesrequirements engineering - technologies
requirements engineering - technologies
 
Resume Ajay Neethi Kannan
Resume Ajay Neethi KannanResume Ajay Neethi Kannan
Resume Ajay Neethi Kannan
 
Rohit Ahlawat
Rohit AhlawatRohit Ahlawat
Rohit Ahlawat
 
Pravalika Resume
Pravalika ResumePravalika Resume
Pravalika Resume
 
Next Generation IoT Architectures_Hans Salomonsson
Next Generation IoT Architectures_Hans SalomonssonNext Generation IoT Architectures_Hans Salomonsson
Next Generation IoT Architectures_Hans Salomonsson
 
Resume_Anu
Resume_AnuResume_Anu
Resume_Anu
 
Mohamed-Rashad-Resume
Mohamed-Rashad-ResumeMohamed-Rashad-Resume
Mohamed-Rashad-Resume
 
Resume
ResumeResume
Resume
 

Plus de fgodin

Plus de fgodin (7)

Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...
Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...
Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...
 
Skip, residual and densely connected RNN architectures
Skip, residual and densely connected RNN architecturesSkip, residual and densely connected RNN architectures
Skip, residual and densely connected RNN architectures
 
Improving Language Modeling using Densely Connected Recurrent Neural Networks
Improving Language Modeling using Densely Connected Recurrent Neural NetworksImproving Language Modeling using Densely Connected Recurrent Neural Networks
Improving Language Modeling using Densely Connected Recurrent Neural Networks
 
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...Named Entity Recognition for Twitter Microposts (only) using Distributed Word...
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...
 
The Normalized Freebase Distance (NFD)
The Normalized Freebase Distance (NFD)The Normalized Freebase Distance (NFD)
The Normalized Freebase Distance (NFD)
 
Msm2013challenge
Msm2013challengeMsm2013challenge
Msm2013challenge
 
Using Topic Models for Twitter hashtag recommendation
Using Topic Models for Twitter hashtag recommendationUsing Topic Models for Twitter hashtag recommendation
Using Topic Models for Twitter hashtag recommendation
 

Dernier

Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Monica Sydney
 
75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx
Asmae Rabhi
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
ydyuyu
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
JOHNBEBONYAP1
 
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
ydyuyu
 
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Monica Sydney
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Monica Sydney
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
ayvbos
 

Dernier (20)

"Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency""Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
 
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
 
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac RoomVip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
 
Power point inglese - educazione civica di Nuria Iuzzolino
Power point inglese - educazione civica di Nuria IuzzolinoPower point inglese - educazione civica di Nuria Iuzzolino
Power point inglese - educazione civica di Nuria Iuzzolino
 
75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx
 
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirt
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
 
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
 
20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
 
Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.
 
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
 
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
 

Alleviating Manual Feature Engineering for Part-of-Speech Tagging of Twitter Microposts using Distributed Word Representations

  • 1. http://multimedialab.elis.ugent.be Ghent University – iMinds, ELIS Department/Multimedia Lab Gaston Crommenlaan 8 bus 201 B-9050 Ledeberg – Ghent, Belgium Fréderic Godin, Baptist Vandersmissen, Azarakhsh Jalalvand, Wesley De Neve and Rik Van de Walle Workshop on Machine Learning and NLP, NIPS 2014 Alleviating Manual Feature Engineering for Part-of-Speech Tagging of Twitter Microposts using Distributed Word Representations 12/12/2014, Montreal, Canada Research Question Vote-Constrained Bootstrapping* Can we avoid manual feature engineering when developing a Part-of-Speech tagger for Twitter microposts? @frederic_godin, @BaptistV, @wmdeneve and @rvdwalle Solution Automatically learn features on 400 million raw Twitter microposts that capture syntactic and semantic patterns and feed them to a neural network Learn Features Train the Part-of-Speech Tagger 400 million Word2vec Skip-gram 400D vector 400D 400D vector 400D 400D 444000000DDD v vVeeecccttotoorr r Hidden Layer (500D) Output Layer (52D) im doin good VBG Evaluation ARK tagger GATE tagger im doin good VBG V Agree? Automatically generate high confidence labeled data Use this data to pre-train the neural network *Derczynski et al., 2013. "Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data" Word2vec dataset Pre-training dataset Accuracy validation set Accuracy test set 150M / 87.95% 87.46% 150M 50K 89.64% 88.82% 400M 50K 89.73% 88.95% 400M 125K 90.09% 88.90% Ritter et al. (2011) 84.55% Derczynski et al. (2013) 88.69%