SlideShare une entreprise Scribd logo
1  sur  1
Télécharger pour lire hors ligne
CoNLL 2017 UD Shared Task
Mul -Model and Crosslingual
Dependency Analysis
Johannes Heinecke, Munshi Asadullah
Orange Labs, Lannion, France
Architecture
BistParser modifica ons: One depencency tree per sentence
Training:
• hidden layer size 40, 50 or 100, depending on language
• other BistParser op ons used: --k 3 --lstmdims 125
--lstmlayers 2 --bibi-lstm --usehead --userl
• word embeddings for all languages (except Gothic)
– all words in lowercase (if applicable)
– punctua on separated from words
– word2vec standard op ons except -size {300,500}
and -window 10
Surprise Languages
Two crosslingual approaches: training (1) on a mix of 23 languages and (2) on a typologically close language
(hsb → cs, sme → fi, kmr → fa, bxr → hi), both without word embeddings: (2) gave much be er results.
Example of modified CONLL (cols. 1, 2 and 4) used for training (i.e.
cs, shown below le ) and predic on (in this case hsb, below right):
training data (cs)
1 Manažeři NOUN
2 rozhodují VERB
3 ADV ADV
4 ADP ADP
5 místě NOUN
6 PUNCT PUNCT
test data (hsb)
1 Njejsu VERB
2 DET DET
3 archeologiske ADJ
4 dokłady NOUN
5 ADP ADP
…
23 language mix
Upper Sorbian (1001
) 63.2%
Northern Sami (50) 49.2%
Kurmanji (100) 29.2%
Buryat (50) 26.3%
Upper Sorbian (hsb)
cs (100) 69.5%
cs (50) 67.5%
pl (50) 56.9%
pl (100) 51.9%
Northern Sami (sme)
fi (100) 52.9%
fiu2
(100) 51.7%
fiu (50) 49.7%
fi (50) 50.8%
Kurmanji (kmr)
fa (100) 36.7%
fa (50) 35.8%
hi (50) 22.2%
ur (50) 20.6%
Buryat (bxr)
hi (50) 32.0%
ur (50) 28.0%
tr (100) 27.6%
fi (100) 21.8%
ja (50) 18.0%
1
Number indicates hidden layer size. 2
Mix of Fenno-Ugric languages, here fi, et and hu.
Results
• 10th
posi on with LAS 68.61% (improved to 69.75% a er bug fixes)
• 9th
posi on with Content Word LAS (CLAS) as evalua on metrics: 64.15%
• 8th
posi on on surprise languages: 38.72% (7th
posi on with CLAS: 34.28%)
Run me (on Tira VM, Ubuntu Xenial): 3 hours (all treebanks), using <16GB
Contact: johannes.heinecke@orange.com

Contenu connexe

Plus de Association for Computational Linguistics

Plus de Association for Computational Linguistics (20)

Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
 
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
 
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
 
Terumasa Ehara - 2015 - System Combination of RBMT plus SPE and Preordering p...
Terumasa Ehara - 2015 - System Combination of RBMT plus SPE and Preordering p...Terumasa Ehara - 2015 - System Combination of RBMT plus SPE and Preordering p...
Terumasa Ehara - 2015 - System Combination of RBMT plus SPE and Preordering p...
 
Terumasa Ehara - 2015 - System Combination of RBMT plus SPE and Preordering p...
Terumasa Ehara - 2015 - System Combination of RBMT plus SPE and Preordering p...Terumasa Ehara - 2015 - System Combination of RBMT plus SPE and Preordering p...
Terumasa Ehara - 2015 - System Combination of RBMT plus SPE and Preordering p...
 
Toshiaki Nakazawa - 2015 - Overview of the 2nd Workshop on Asian Translation
Toshiaki Nakazawa  - 2015 - Overview of the 2nd Workshop on Asian TranslationToshiaki Nakazawa  - 2015 - Overview of the 2nd Workshop on Asian Translation
Toshiaki Nakazawa - 2015 - Overview of the 2nd Workshop on Asian Translation
 
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT SystemHua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
 
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
 
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
 
Katsuhito Sudoh - 2015 Chinese-to-Japanese Patent Machine Translation based o...
Katsuhito Sudoh - 2015 Chinese-to-Japanese Patent Machine Translation based o...Katsuhito Sudoh - 2015 Chinese-to-Japanese Patent Machine Translation based o...
Katsuhito Sudoh - 2015 Chinese-to-Japanese Patent Machine Translation based o...
 

Dernier

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Dernier (20)

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 

Johannes Heinecke - 2017 - Multi-Model and Crosslingual Dependency Analysis

  • 1. CoNLL 2017 UD Shared Task Mul -Model and Crosslingual Dependency Analysis Johannes Heinecke, Munshi Asadullah Orange Labs, Lannion, France Architecture BistParser modifica ons: One depencency tree per sentence Training: • hidden layer size 40, 50 or 100, depending on language • other BistParser op ons used: --k 3 --lstmdims 125 --lstmlayers 2 --bibi-lstm --usehead --userl • word embeddings for all languages (except Gothic) – all words in lowercase (if applicable) – punctua on separated from words – word2vec standard op ons except -size {300,500} and -window 10 Surprise Languages Two crosslingual approaches: training (1) on a mix of 23 languages and (2) on a typologically close language (hsb → cs, sme → fi, kmr → fa, bxr → hi), both without word embeddings: (2) gave much be er results. Example of modified CONLL (cols. 1, 2 and 4) used for training (i.e. cs, shown below le ) and predic on (in this case hsb, below right): training data (cs) 1 Manažeři NOUN 2 rozhodují VERB 3 ADV ADV 4 ADP ADP 5 místě NOUN 6 PUNCT PUNCT test data (hsb) 1 Njejsu VERB 2 DET DET 3 archeologiske ADJ 4 dokłady NOUN 5 ADP ADP … 23 language mix Upper Sorbian (1001 ) 63.2% Northern Sami (50) 49.2% Kurmanji (100) 29.2% Buryat (50) 26.3% Upper Sorbian (hsb) cs (100) 69.5% cs (50) 67.5% pl (50) 56.9% pl (100) 51.9% Northern Sami (sme) fi (100) 52.9% fiu2 (100) 51.7% fiu (50) 49.7% fi (50) 50.8% Kurmanji (kmr) fa (100) 36.7% fa (50) 35.8% hi (50) 22.2% ur (50) 20.6% Buryat (bxr) hi (50) 32.0% ur (50) 28.0% tr (100) 27.6% fi (100) 21.8% ja (50) 18.0% 1 Number indicates hidden layer size. 2 Mix of Fenno-Ugric languages, here fi, et and hu. Results • 10th posi on with LAS 68.61% (improved to 69.75% a er bug fixes) • 9th posi on with Content Word LAS (CLAS) as evalua on metrics: 64.15% • 8th posi on on surprise languages: 38.72% (7th posi on with CLAS: 34.28%) Run me (on Tira VM, Ubuntu Xenial): 3 hours (all treebanks), using <16GB Contact: johannes.heinecke@orange.com