2. 本論文の選定理由
• ACL 2012 Tutorial Deep Learning for NLPにて紹介さ
れている
• 代表的なNLPタスクにDeep Learningを適用している
– POS tagging
– Chunking
– Named Entity Recognition
– Semantic Role Labeling
• NLP with Deep Learningの代表的な研究者が執筆し
ている
– Chris Manning
– Ronan Collobert
3. 本論文のまとめ
目的
Propose a unified neural network architecture and
learning algorithm that can be applied to various
NLP tasks
POS tagging, Chunking, NER, SLR
結論
人手でfeatureを作成する代わりに、大量のlabeled/unlabeled training
dataからinternal representationを学習する
本研究の成果は、高精度で低計算コストなfreely available tagging
systemを構築するための基礎となる
36. 提案手法
• Transforming word into Feature Vectors
• Extracting Higher Level Features from Word
Feature Vectors
• Training
• Benchmark Result
37. Pre Processing
• use lower case words in the dictionary
• add “caps” feature to words had at least one
non-initial capital letter
• number with in a word are replace with the
string “NUMBER”
43. Lots of Unlabeled Data
• Two window approach (11) networks (100HU) trained on
two corpus
• LM1
– Wikipedia: 631 Mwords
– order dictionary words by frequency
– increase dictionary size: 5000, 10; 000, 30; 000, 50; 000, 100;
000
– 4 weeks of training
• LM2
– Wikipedia + Reuter=631+221=852M words
– initialized with LM1, dictionary size is 130; 000
– 30,000 additional most frequent Reuters words
– 3 additional weeks of training
51. その他の工夫
• Suffix Features
– Use last two characters as feature
• Gazetters
– 8,000 locations, person names, organizations and
misc entries from CoNLL2003
• POS
– use POS as a feature for CHUNK &NER
• CHUNK
– use CHUNK as a feature for SRL
54. Conclusion
• Achievements
– “All purpose" neural network architecture for NLP tagging
– Limit task-specic engineering
– Rely on very large unlabeled datasets
– We do not plan to stop here
• Critics
– Why forgetting NLP expertise for neural network training
skills?
• NLP goals are not limited to existing NLP task
• Excessive task-specic engineering is not desirable
– Why neural networks?
• Scale on massive datasets
• Discover hidden representations
• Most of neural network technology existed in 1997 (Bottou, 1997)