SlideShare une entreprise Scribd logo
1  sur  115
Télécharger pour lire hors ligne
UniMorph and Morphological Inflection Task: Past, Present, and Future
Ekaterina Vylomova@
@
University of Melbourne
ekaterina.vylomova@unimelb.edu.au
20 августа 2021 г.
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 1 / 115
PART I: The UniMorph Project
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 2 / 115
Increasing Multilinguality
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 3 / 115
Speech is Special
Charles F. Hockett on Essential Properties of Human Languages
Displacement
Ability to refer to things in space and time and communicate about things that are not present
Productivity
Ability to create new and unique meanings of utterances from previously existing utterances and
sounds
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 4 / 115
Speech is Special
Charles F. Hockett on Essential Properties of Human Languages
Duality of Patterning
Meaningless phonic segments (phonemes) are combined to make meaningful words, etc.
Learnability
A speaker of a language can learn another language
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 5 / 115
Linguistic Diversity
Roman Jacobson on Differences between Languages
«_»
“Languages differ essentially in what they must convey and not in what they may convey”
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 6 / 115
Languages differ in many ways!
(1) Chinese (Isolating)
wǒmen
I.PL.AN
xué
learn
le
.PAST
zhè
this
xiē
.PL
shēngcı́.
new word.
“We learned these new words.”
(2) Russian (Synthetic)
My
We.NOM
vyučili
learn.PAST.PL
eti
this.ACC.PL
novyje
new.ACC.PL
slova.
word.ACC.PL
“We learned these new words.”
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 7 / 115
Languages differ in many ways!
An example of West Greenlandic taken from Fortescue (2017):
(3) West Greenlandic (Polysynthetic)
Nannu-n-niuti-kkuminar-tu-
Polar.bear-catch-instrument.for.achieving-something.good.for-PART-
rujussu-u-vuq.
big-be-3SG.INDIC
“It (a dog) is good for catching polar bears with.”
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 8 / 115
Languages differ in many ways!
An example of Kunwinjku taken from Evans (2003):
(4) Kunwinjku (Polysynthetic)
Aban-yawoith-warrgah-marne-ganj-ginje-ng.
1/3PL-again-wrong-BEN-meat-cook-PP
“I cooked the wrong meat for them again”
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 9 / 115
Languages differ in many ways!
An example of Kunwinjku taken from Evans (2003):
(5) Kunwinjku (Polysynthetic)
Aban-yawoith-warrgah-marne-ganj-ginje-ng.
1/3PL-again-wrong-BEN-meat-cook-PP
“I cooked the wrong meat for them again”
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 10 / 115
Discussion of what should be considered as a word:
John Mansfield’s “The word as a unit of internal predictability”
Languages differ in many ways!
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 11 / 115
Some exhibit rich grammatical case systems (e.g., 12 in Erzya and 24 in Veps)
Some mark possessiveness
Others might have complex verbal morphology (e.g., Oto-Manguean languages)
Even “decline” nouns for tense (e.g., Tupi–Guarani languages)
Languages differ in many ways!
Let’s Discuss The Following Dimensions:
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 12 / 115
Fusion
Inflectional Synthesis
Position of Case Affixes
Fusion (WALS 20A)
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 13 / 115
Fusion (WALS 20A)
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 14 / 115
From isolating to concatenative
Concatenative morphology is the most common system
Non-linearities such as ablaut or tonal morphology can also be present
Isolating languages: the Sahel Belt in West Africa, Southeast Asia and the Pacific
Tonal–concatenative morphology can be found in Mesoamerican languages
Inflectional Synthesis of the Verb (WALS 22A)
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 15 / 115
Inflectional Synthesis of the Verb (WALS 22A)
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 16 / 115
Analytic expressions are common in Eurasia
Synthetic expressions are used to a high degree in the Americas
Position of Case Affixes (WALS 51A)
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 17 / 115
Position of Case Affixes (WALS 51A)
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 18 / 115
Can variably surface as prefixes, suffixes, infixes, or circumfixes
Suffixation: Most Eurasian and Australian languages
to a lesser extent in South American and New Guinean languages
Prefixation:Mesoamerican languages and African languages spoken below the Sahara
The Earliest Approach to Morphology (Sanskrit)
Pāņini’s karakas
Formalize regularities in the words
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 19 / 115
Inflectional Morphology is Paradigmatic
..or Russian Morphology
Morphological Inflection
Formalize regularities in the words
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 20 / 115
Inflectional Morphology is Paradigmatic
Formalizations differ: The number of cases may vary from 6 to 11(Zaliznyak, 1967)
Inflectional Morphology: Paradigms (nouns)
Morphological Inflection
беглец “runner” + pos=N,case=ACC,num=SG → беглеца
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 21 / 115
ru-noun-table | b | беглец | a=an
Inflectional Morphology: Classes (nouns)
Morphological Inflection
беглец + pos=N,case=ACC,num=SG → беглеца
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 22 / 115
EN Wiktionary: ru-noun-table | b | беглец | a=an
Inflectional Morphology: Classes (nouns); *Differs in En/Ru Editions of
Wiktionary*
Morphological Inflection
беглец + pos=N,case=ACC,num=SG → беглеца
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 23 / 115
EN Wiktionary: ru-noun-table | b | беглец | a=an
RU Wiktionary:сущ ru m a 5b|основа=беглец|основа1=беглец|слоги=по-слогам|бег|лец
Inflectional Morphology: Wiktionary annotation is Not Cross-linguistically
Consistent
Other Languages
Hungarian
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 24 / 115
Wiktionary: Inconsistent annotation across languages
Within a single language: across different editions (en; ru; de; etc)
Many language-specific features
Linguistic Diversity and Universals
Universal Grammar
Evans and Levinson, 2009: The Myth of Language Universals
"Diversity can be found at almost every level of linguistic organization”
Languages vary greatly on phonological, morphological, semantic, and
syntactic levels
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 25 / 115
Linguistic Diversity and Universals
Universal Grammar
Evans and Levinson, 2009: The Myth of Language Universals
"Diversity can be found at almost every level of linguistic organization”
Languages vary greatly on phonological, morphological, semantic, and
syntactic levels
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 26 / 115
Typology: describe the limits of cross-linguistic variation
Linguistic Diversity and Universals
Universal Grammar
Evans and Levinson, 2009: The Myth of Language Universals
"Diversity can be found at almost every level of linguistic organization”
Languages vary greatly on phonological, morphological, semantic, and
syntactic levels
Haspelmath, 2010
Descriptive categories (specific to languages) vs. comparative concepts.
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 27 / 115
Typology: describe the limits of cross-linguistic variation
UniMorph – Universal Annotation
Universal Annotation (by John Sylak-Glassman and David Yarowsky)
1) 23 dimensions of meaning (TAM, case, number, animacy), 212 features
2) A-morphous (word-based) morphology (Anderson, 1992)
3) Initial paradigms were mainly extracted from English Edition of
Wiktionary (Kirov et al., 2016)
https://unimorph.github.io/
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 28 / 115
[Sylak-Glassman, 2016]
UniMorph – Universal Annotation
Universal Annotation (by John Sylak-Glassman and David Yarowsky)
1) 23 dimensions of meaning (TAM, case, number, animacy), 212 features
2) A-morphous (word-based) morphology (Anderson, 1992)
3) Initial paradigms were mainly extracted from English Edition of
Wiktionary (Kirov et al., 2016)
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 29 / 115
[Sylak-Glassman, 2016]
UniMorph – Universal Annotation
Universal Annotation (by John Sylak-Glassman and David Yarowsky)
1) 23 dimensions of meaning (TAM, case, number, animacy), 212 features
2) A-morphous (word-based) morphology (Anderson, 1992)
3) Initial paradigms were mainly extracted from English Edition of
Wiktionary (Kirov et al., 2016)
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 30 / 115
[Sylak-Glassman, 2016]
UniMorph – Universal Annotation
Universal Annotation (by John Sylak-Glassman and David Yarowsky)
1) 23 dimensions of meaning (TAM, case, number, animacy), 212 features
2) A-morphous (word-based) morphology (Anderson, 1992)
3) Initial paradigms were mainly extracted from English Edition of
Wiktionary (Kirov et al., 2016)
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 31 / 115
[Sylak-Glassman, 2016]
PART II: SIGMORPHON Shared Tasks on Morphological (Re-)inflection
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 32 / 115
Morphological (Re-)Inflection
SIGMORPHON Shared Task 2016–2019
Inflection: PLAY + PRESENT PARTICIPLE → playing
ReInflection: played + PRESENT PARTICIPLE → playing
Lemma Tag Form
RUN PAST ran
RUN PRES;1SG run
RUN PRES;2SG run
RUN PRES;3SG runs
RUN PRES;PL run
RUN PART running
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 33 / 115
2018 :∼ 96% accuracy on avg.
in high-resource setting
But much less well
in low-resource setting
SIGMORPHON 2016 Shared Task (Cotterell et al., 2016)
Morphological (Re-)Inflection (10 Languages)
Task1: беглец + pos=N,case=ACC,num=SG → беглеца
Task2: беглецами + pos=N,case=INS, num=PL +
pos=N,case=ACC,num=SG → беглеца
Task3: беглецами + pos=N,case=ACC,num=SG → беглеца
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 34 / 115
[Cotterell et al., 2016]
LMU+BIU+Helsinki: Neural (seq2seq) +/- aligner
MSU+Col/NYU: rule-based/heuristics
Others: external aligner+WFST/CRF
SIGMORPHON 2016 Shared Task (Cotterell et al., 2016)
Morphological (Re-)Inflection (10 Languages): Neural
encoder–decoders
1) character-level input: <s> r u n OUT_POS=V OUT_NUM=SG
OUT_TENSE=PRES </s> Output: <s> r u n s </s>
2) Ensembles of seq2seq (GRUs + soft attention (Bahdanau et al., 2015))
3) Enriching the data with combinations of other (non-lemma) forms
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 35 / 115
[Kann and Schuetze, 2016]
SIGMORPHON 2016 Shared Task (Cotterell et al., 2016)
Morphological (Re-)Inflection (10 Languages): Neural
encoder–decoders
1) character-level input: <s> r u n OUT_POS=V OUT_NUM=SG
OUT_TENSE=PRES </s> Output: <s> r u n s </s>
2) Ensembles of seq2seq (GRUs + soft attention (Bahdanau et al., 2015))
3) Enriching the data with combinations of other (non-lemma) forms
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 36 / 115
[Kann and Schuetze, 2016]
SIGMORPHON 2016 Shared Task (Cotterell et al., 2016)
Morphological (Re-)Inflection (10 Languages): Neural
encoder–decoders
1) character-level input: <s> r u n OUT_POS=V OUT_NUM=SG
OUT_TENSE=PRES </s> Output: <s> r u n s </s>
2) Ensembles of seq2seq (GRUs + soft attention (Bahdanau et al., 2015))
3) Enriching the data with combinations of other (non-lemma) forms
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 37 / 115
[Kann and Schuetze, 2016]
SIGMORPHON 2016 Shared Task (Cotterell et al., 2016)
Morphological (Re-)Inflection (10 Languages): Neural
encoder–decoders
1) character-level input: <s> r u n OUT_POS=V OUT_NUM=SG
OUT_TENSE=PRES </s> Output: <s> r u n s </s>
2) Ensembles of seq2seq (GRUs + soft attention (Bahdanau et al., 2015))
3) Enriching the data with combinations of other (non-lemma) forms
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 38 / 115
[Kann and Schuetze, 2016]
SIGMORPHON 2016 Shared Task (Cotterell et al., 2016)
Morphological (Re-)Inflection (10 Languages): Neural
encoder–decoders
1) Extract input–output string alignments; 2) Train seq2seq (LSTM-based)
models to learn a sequence of operations (hard monotonic attention)
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 39 / 115
[Aharoni and Goldberg, 2017]
SIGMORPHON 2016 Shared Task (Cotterell et al., 2016)
Morphological (Re-)Inflection (10 Languages): Neural
encoder–decoders
1) Extract input–output string alignments; 2) Train seq2seq (LSTM-based)
models to learn a sequence of operations (hard monotonic attention)
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 40 / 115
[Aharoni and Goldberg, 2017]
SIGMORPHON 2016 Shared Task (Cotterell et al., 2016)
Morphological (Re-)Inflection (10 Languages): Neural
encoder–decoders
1) Extract input–output string alignments; 2) Train seq2seq (LSTM-based)
models to learn a sequence of operations (hard monotonic attention)
Errors
глядеть pos=V,tense=PRS,per=1,num=SG,aspect=IPFV gold: гляжу predicted: глядею
увлекаться pos=V,tense=PRS,per=1,num=SG,aspect=IPFV gold: увлекаюсь
predicted: увлеклюсь
звать pos=V,tense=PRS,per=3,num=SG,aspect=IPFV gold: зовёт predicted: звает
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 41 / 115
[Aharoni and Goldberg, 2017]
SIGMORPHON 2016 Shared Task (Cotterell et al., 2016)
Morphological (Re-)Inflection (10 Languages): Neural
encoder–decoders
1) Extract input–output string alignments; 2) Train seq2seq (LSTM-based)
models to learn a sequence of operations (hard monotonic attention)
Errors
зять pos=N,case=GEN,num=PL gold: зятьёв predicted: зятей
перстень pos=N,case=GEN,num=PL gold: перстней predicted: перстеее
телекамера pos=N,case=GEN,num=PL gold: телекамер predicted: телекаморо
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 42 / 115
[Aharoni and Goldberg, 2017]
SIGMORPHON 2016 Shared Task (Cotterell et al., 2016)
Morphological (Re-)Inflection (10 Languages): Neural
encoder–decoders
1) Extract input–output string alignments; 2) Train seq2seq (LSTM-based)
models to learn a sequence of operations (hard monotonic attention)
Errors
лоботряс pos=N,case=ACC,num=PL gold: лоботрясов predicted: лоботрясы
львица pos=N,case=ACC,num=PL gold: львиц predicted: львица
милиционер pos=N,case=ACC,num=PL gold: милиционеров predicted: милиционеры
светлячок pos=N,case=ACC,num=PL gold: светлячков predicted: светлячки
скот pos=N,case=ACC,num=PL gold: скотов predicted: скоты
счёт pos=N,case=ACC,num=PL gold: счета predicted: счеты
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 43 / 115
[Aharoni and Goldberg, 2017]
CoNLL–SIGMORPHON 2017 Shared Task (Cotterell et al., 2017)
Universal Morphological Reinflection (52 Languages)
Task1: Morphological Inflection
Task2: Paradigm Completion
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 44 / 115
CoNLL–SIGMORPHON 2017 Shared Task (Cotterell et al., 2017)
Universal Morphological Reinflection (52 Languages)
3 Settings: Low (100 samples), Medium (1000), High (10,000)
Sampled based on their token frequency in Wikipedia corpus (with
resampling for syncretic slots)
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 45 / 115
[Cotterell et al., 2017]
CoNLL–SIGMORPHON 2017 Shared Task (Cotterell et al., 2017)
Universal Morphological Reinflection (52 Languages)
3 Settings: Low (100 samples), Medium (1000), High (10,000)
Sampled based on their token frequency in Wikipedia corpus (with
resampling for syncretic slots)
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 46 / 115
[Cotterell et al., 2017]
CoNLL–SIGMORPHON 2017 Shared Task (Cotterell et al., 2017)
Universal Morphological Reinflection (52 Languages): Neural
encoder–decoders
1) (Align & Copy): Based on Aharoni and Goldberg, 2017
2) Extract input–output string alignments (add COPY/edit operations) 2)
Train seq2seq (LSTM-based) models to learn a sequence of operations (hard
monotonic attention)
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 47 / 115
[Makarov et al., 2017]
CoNLL–SIGMORPHON 2017 Shared Task (Cotterell et al., 2017)
Universal Morphological Reinflection (52 Languages)
3 Settings: Low (100 samples), Medium (1000), High (10,000)
Sampled based on their token frequency in Wikipedia corpus (with
resampling for syncretic slots)
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 48 / 115
[Makarov et al., 2017]
CoNLL–SIGMORPHON 2017 Shared Task (Cotterell et al., 2017)
Error taxonomy
What are common errors that neural systems make?
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 49 / 115
[Gorman et al., 2019]
CoNLL–SIGMORPHON 2017 Shared Task (Cotterell et al., 2017)
Error taxonomy
What are common errors that neural systems make?
Types of Errors
Free variation error: more than one acceptable form exists
Extraction errors: flaws in UniMorph’s parsing of Wiktionary
Wiktionary errors: errors in the Wiktionary data itself
Silly errors: “bizarre” errors which defy any purely linguistic characterization (“*membled”
instead of “mailed” or enters a loop such as “ynawemaylmyylmyylmyylmyylmyylmyym...” instead
of “ysnewem”)
Allomorphy errors: misapplication of existing allomorphic patterns
Spelling errors: forms that do not follow language-specific orthographic conventions
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 50 / 115
[Gorman et al., 2019]
CoNLL–SIGMORPHON 2017 Shared Task (Cotterell et al., 2017)
Error taxonomy
What are common errors that neural systems make?
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 51 / 115
[Gorman et al., 2019]
CoNLL–SIGMORPHON 2017 Shared Task (Cotterell et al., 2017)
Error taxonomy
What are common errors that neural systems make?
Allomorphy Errors
Stem-final vowels in Finnish (*pohjanpystykorvojen); Consonant gradation in Finnish (*ei
kiemurda)
Ablaut in Dutch and German (*pront; *saufte)
Umlaut (*Einwohnerzähle, *Förmer), plural suffixes, Verbal prefixes in German (*umkehre)
Linking vowels in Hungarian (*masszázsakból instead of *masszázsokból)
Yers (*kle
˛sek instead of kle
˛sk), Genitive singular suffixes in Polish (*izotopa)
Animacy in Polish and Russian (грузин vs. магазин in ACC.SG )
Aspect in Russian (*будешь сорвать)
Internal inflection in Russian compounds (*государствах-донорах, *лёгких промышленности
(ACC.PL))
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 52 / 115
[Gorman et al., 2019]
CoNLL–SIGMORPHON 2018 Shared Task (Cotterell et al., 2018)
Universal Morphological Reinflection (103 Languages)
Task1: Morphological Inflection (Low, Medium, High)
Task2: Inflection in Context (Vylomova et al., 2019)
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 53 / 115
[Cotterell et al., 2018]
CoNLL–SIGMORPHON 2018 Shared Task (Cotterell et al., 2018)
Universal Morphological Reinflection (103 Languages)
Task1: Morphological Inflection (Low, Medium, High)
Task2: Inflection in Context (Vylomova et al., 2019)
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 54 / 115
[Cotterell et al., 2018]
CoNLL–SIGMORPHON 2018 Shared Task (Cotterell et al., 2018)
Universal Morphological Reinflection (103 Languages)
Task1: Morphological Inflection (Low, Medium, High)
Task2: Inflection in Context (Vylomova et al., 2019)
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 55 / 115
[Cotterell et al., 2018]
CoNLL–SIGMORPHON 2018 Shared Task (Cotterell et al., 2018)
Universal Morphological Reinflection (103 Languages)
Task1: Morphological Inflection (Low, Medium, High)
Task2: Inflection in Context (Vylomova et al., 2019)
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 56 / 115
[Cotterell et al., 2018]
Track 1: With morphosynt. annotation
Track 2: Without morphosynt. annotation
Requires to capture agreement and infer inherent vs. contextual categories (Vylomova et al., 2019)
SIGMORPHON 2019 Shared Task (McCarthy et al., 2019)
Morphological Analysis in Context and Cross-Lingual Transfer for
Inflection (100 Language Pairs)
Task1: Cross-lingual Transfer for Morphological Inflection (10k HR +100 LR)
Task2: Morphological Analysis in Context
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 57 / 115
[McCarthy et al., 2019]
SIGMORPHON 2019 Shared Task (McCarthy et al., 2019)
Morphological Analysis in Context and Cross-Lingual Transfer for
Inflection (100 Language Pairs)
Task1: Cross-lingual Transfer for Morphological Inflection (10k HR +100 LR)
Task2: Morphological Analysis in Context
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 58 / 115
[McCarthy et al., 2019]
SIGMORPHON 2019 Shared Task (McCarthy et al., 2019)
Morphological Analysis in Context and Cross-Lingual Transfer for
Inflection (100 Language Pairs)
Task1: Cross-lingual Transfer for Morphological Inflection (10k HR +100 LR)
Task2: Morphological Analysis in Context
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 59 / 115
[McCarthy et al., 2019]
SIGMORPHON 2019 Shared Task (McCarthy et al., 2019)
Morphological Analysis in Context and Cross-Lingual Transfer for
Inflection (100 Language Pairs)
Task1: Cross-lingual Transfer for Morphological Inflection (10k HR +100 LR)
Task2: Morphological Analysis in Context
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 60 / 115
[Anastasopoulos and Neubig, 2019]
SIGMORPHON 2019 Shared Task (McCarthy et al., 2019)
Morphological Analysis in Context and Cross-Lingual Transfer for
Inflection (100 Language Pairs)
Task1: Cross-lingual Transfer for Morphological Inflection (10k HR +100 LR)
Task2: Morphological Analysis in Context
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 61 / 115
[Anastasopoulos and Neubig, 2019]
SIGMORPHON 2019 Shared Task (McCarthy et al., 2019)
Morphological Analysis in Context and Cross-Lingual Transfer for
Inflection (100 Language Pairs)
Task1: Cross-lingual Transfer for Morphological Inflection (10k HR +100 LR)
Task2: Morphological Analysis in Context
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 62 / 115
[Anastasopoulos and Neubig, 2019]
So...
SIGMORPHON Shared Tasks 2016–2019
PLAY + PRESENT PARTICIPLE → playing
played + PRESENT PARTICIPLE → playing
Lemma Tag Form
RUN PAST ran
RUN PRES;1SG run
RUN PRES;2SG run
RUN PRES;3SG runs
RUN PRES;PL run
RUN PART running
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 63 / 115
2018 :∼ 96% accuracy on avg.
in high-resource setting
But much less well
in low-resource setting
So...
SIGMORPHON Shared Tasks 2016–2019
PLAY + PRESENT PARTICIPLE → playing
played + PRESENT PARTICIPLE → playing
Lemma Tag Form
RUN PAST ran
RUN PRES;1SG run
RUN PRES;2SG run
RUN PRES;3SG runs
RUN PRES;PL run
RUN PART running
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 64 / 115
Also see Ling Liu’s 2021 Overview
“Computational Morphology with Neural Network Approaches”
2018 :∼ 96% accuracy on avg.
in high-resource setting
But much less well
in low-resource setting
PART III: Scaling up and increasing UniMorph Collaboration!
From Wiktionary to more linguistic resources: Including grammar books, Apertium data,
text/glossed corpora.
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 65 / 115
Language-Specific Biases
As Bender(2009, 2016) notes architectures and training and tuning
algorithms still present language-specific biases
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 66 / 115
SIGMORPHON 2020 SHARED TASK 0 (Vylomova et al., 2020)
As Bender(2009, 2016) notes architectures and training and tuning
algorithms still present language-specific biases
Let’s focus on typological diversity and aim to investigate systems’ ability to
generalize across typologically distinct languages!
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 67 / 115
SIGMORPHON 2020 SHARED TASK 0 (Vylomova et al., 2020)
As Bender(2009, 2016) notes architectures and training and tuning
algorithms still present language-specific biases
Let’s focus on typological diversity and aim to investigate systems’ ability to
generalize across typologically distinct languages!
If a model works well for a sample of IE languages, should the same model
also work well for Tupi–Guarani languages?
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 68 / 115
SIGMORPHON 2020 SHARED TASK 0 (Vylomova et al., 2020)
90 Languages from 13 languages families
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 69 / 115
Three Phases
Development
2 months; train & dev: 45 languages from 5 families (Austronesian, Niger-Congo, Oto-Manguean,
Uralic, IE)
Generalization
1 week; train & dev: 45 languages from 10 families ( Afro-Asiatic, Algic, Dravidian,
Indo-European, Niger-Congo, Sino-Tibetan, Siouan, Songhay, Southern Daly, Tungusic, Turkic,
Uralic, and Uto-Aztecan)
Evaluation
1 week; test: all 90 languages
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 70 / 115
Data
Preparation
Manually converted their features (tags) into the UniMorph format
Canonicalized (https://github.com/unimorph/um-canonicalize) the converted language
data
Splitting
Used only noun, verb, and adjective forms to construct training, development, and evaluation
sets.
Randomly sampled 70%, 10%, and 20% for train, development, and test, respectively.
Zarma, Tajik, Lingala, Ludian, Māori, Sotho, Võro, Anglo-Norman, and Zulu contain less than
400 training samples
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 71 / 115
Systems: Baselines
Non-neural
Simple alignment-based as in previous years (Cotterell et al., 2017;2018)
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 72 / 115
Systems: Baselines
Neural
Neural transducer (Wu et al, 2019), which is essentially a hard monotonic attention model
(mono-*)
Transformer adopted for character-level tasks Wu et al, (2020; trm-*), SoTA on ST 2017
+ data augmentation technique used by Anastasopoulos et al. (2019;-aug-)
+ family-wise shared parameters (*-shared)
Team Description System Model Features
Neural Ensemble Multilingual Hallucination
Baseline wu2019exact
mono-single
mono-aug-single
mono-shared
mono-aug-shared
wu2020applying
trm-single
trm-aug-single
trm-shared
trm-aug-shared
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 73 / 115
Systems: Teams
10 teams submitted 22 systems in total, out of which 19 were neural
Team Description System Model Features
Neural Ensemble Multilingual Hallucination
CMU Tartan Jayarao et al.(2020)
cmu_tartan_00-0
cmu_tartan_00-1
cmu_tartan_01-0
cmu_tartan_01-1
cmu_tartan_02-1
CU7565 Beemer et al. (2020)
CU7565-01-0
CU7565-02-0
CULing Liu et al. (2020) CULing-01-0
DeepSpin Peterset al. (2020)
deepspin-01-1
deepspin-02-1
ETH Zurich Forster et al. (2020)
ETHZ00-1
ETHZ02-1
Flexica Scherbakov (2020)
flexica-01-0
flexica-02-1
flexica-03-1
IMS Yu et al. (2020) IMS-00-0
LTI Murikinati et al. (2020) LTI-00-1
NYU-CUBoulder Singer et al. (2020)
NYU-CUBoulder-01-0
NYU-CUBoulder-02-0
NYU-CUBoulder-03-0
NYU-CUBoulder-04-0
UIUC Canby et al. (2020) uiuc-01-0
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 74 / 115
Systems: Description (* – winning system)
Improving neural baselines
*UIUC: transformers with synchronous bidirectional decoding technique (Zhou et al.,2019)
and family-wise fine-tuning
ETH Zurich: exact decoding strategy that uses Dijkstra’s search algorithm
Improving previous years’ models: Hard Monotonic Attention
IMS: L2R+R2L models with a genetic algorithm for ensemble search and data hallucination
Flexica:multilingual (family-wise) model with improved alignment strategy
+ new data hallucination technique based on phonotactic modelling
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 75 / 115
Systems: Description (* – winning system)
Improving their 2019 models
LTI: multi-source encoder–decoder with two-step attention architecture + cross-lingual
transfer+ data hallucination + romanization of scripts
*DeepSpin: massively multilingual (all languages) gated sparse two-headed attention model
with sparsemax
+ 1.5-entmax
Transformer vs. LSTMs
CMU Tartan: compared trasformer- and LSTM-based encoder–decoders trained mono- and
multilingually with data hallucination
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 76 / 115
Systems: Description (* – winning system)
Ensembles of Transformers
NYU-CUBoulder: compared vanilla and pointer-generator (monolingual) transformers
+ ensembles of three and five pointer-generator transformers + data hallucination (less than
1,000 samples)
*CULing: ensemble of three (monolingual) transformers + augmented the initial input (that
only used the lemma as a source form) with entries corresponding to other (non-lemma) slots
(reinflection) to improve learning of principal parts of paradigm
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 77 / 115
Systems: Description (* – winning system)
Non-neural systems
CU7565: manually developed finite-state grammars for 25 languages
+ hierarchical paradigm clustering (based on similarity of string transformation rules)
Flexica: a method similar to Hulden (2014) but with transformation rules treated
independently and assigned a score based on their frequency, specificity and diversity of
surrounding characters
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 78 / 115
Evaluation
Per-language accuracy
Per-language Levenstein distance
Takes into account the statistical significance of differences between systems
Ranking
Any system which is the same (as assessed via statistical significance) as the best performing one
is also ranked 1st for that language.
For genus/family:
We aggregate the systems’ ranks and re-rank them based on the amount of times they ranked
1st, 2nd, etc.
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 79 / 115
Results: 4 winning systems (outperform baselines)
uiuc-01-0 2.4 90.5
deepspin-02-1 2.9 90.9
BASE: trm-single 2.8 90.1
CULing-01-0 3.2 91.2
deepspin-01-1 3.8 90.5
BASE: trm-aug-single 3.7 90.3
NYU-CUBoulder-04-0 7.1 88.8
NYU-CUBoulder-03-0 8.9 88.8
NYU-CUBoulder-02-0 8.9 88.7
IMS-00-0 10.6 89.2
NYU-CUBoulder-01-0 9.6 88.6
BASE: trm-shared 10.3 85.9
BASE: mono-aug-single 7.5 88.8
cmu_tartan_00-0 8.7 87.1
BASE: mono-single 7.9 85.8
cmu_tartan_01-1 9.0 87.1
BASE: trm-aug-shared 12.5 86.5
BASE: mono-shared 10.8 86.0
cmu_tartan_00-1 9.4 86.5
LTI-00-1 12.0 86.6
BASE: mono-aug-shared 12.8 86.8
cmu_tartan_02-1 10.6 86.1
cmu_tartan_01-0 10.9 86.6
flexica-03-1 16.7 79.6
ETHZ-00-1 20.1 75.6
*CU7565-01-0 24.1 90.7
flexica-02-1 17.1 78.5
*CU7565-02-0 19.2 83.6
ETHZ-02-1 17.0 80.9
flexica-01-0 24.4 70.8
Oracle (Baselines) 96.1
Oracle (Submissions) 97.7
Oracle (All) 97.9
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 80 / 115
Results: 4 winning systems (outperform baselines)
uiuc-01-0 2.4 90.5
deepspin-02-1 2.9 90.9
BASE: trm-single 2.8 90.1
CULing-01-0 3.2 91.2
deepspin-01-1 3.8 90.5
BASE: trm-aug-single 3.7 90.3
NYU-CUBoulder-04-0 7.1 88.8
NYU-CUBoulder-03-0 8.9 88.8
NYU-CUBoulder-02-0 8.9 88.7
IMS-00-0 10.6 89.2
NYU-CUBoulder-01-0 9.6 88.6
BASE: trm-shared 10.3 85.9
BASE: mono-aug-single 7.5 88.8
cmu_tartan_00-0 8.7 87.1
BASE: mono-single 7.9 85.8
cmu_tartan_01-1 9.0 87.1
BASE: trm-aug-shared 12.5 86.5
BASE: mono-shared 10.8 86.0
cmu_tartan_00-1 9.4 86.5
LTI-00-1 12.0 86.6
BASE: mono-aug-shared 12.8 86.8
cmu_tartan_02-1 10.6 86.1
cmu_tartan_01-0 10.9 86.6
flexica-03-1 16.7 79.6
ETHZ-00-1 20.1 75.6
*CU7565-01-0 24.1 90.7
flexica-02-1 17.1 78.5
*CU7565-02-0 19.2 83.6
ETHZ-02-1 17.0 80.9
flexica-01-0 24.4 70.8
Oracle (Baselines) 96.1
Oracle (Submissions) 97.7
Oracle (All) 97.9
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 81 / 115
The baselines and the submissions are complementary
adding them together increases the oracle scored
The largest gaps in oracle systems are observed in Algic, Oto-Manguean
Sino-Tibetan, Southern Daly, Tungusic, and Uto-Aztecan families
Accuracy by language averaged across all submissions
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 82 / 115
Accuracy by language averaged across all submissions
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 83 / 115
A significant effect of dataset size was observed
Relatively easy: Austronesian and Niger-Congo
Difficult: some Uralic and Oto-Manguean languages
Challenging: Ludic, Norwegian Nynorsk, Middle Low German , Evenki, and O’odham
Accuracy by Language
Has morphological inflection become a solved problem in certain scenarios?
We have classified test examples into four categories:
Very Easy: all submitted systems got correct
Easy: predicted correctly by 80% of systems
Hard: predicted correctly by 20% of systems
Very Hard: none submitted systems got correct
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 84 / 115
Noun Samples Difficulty
1
3
6
7
1
5
0
9
2
3
3
9
1
0
8
1
3
4
6
4
5
9
0
1
5
8
5
7
3
9
9
8
8
7
0
1
6
5
1
1
7
9
9
1
9
6
2
4
8
2
3
8
9
5
3
9
7
4
5
4
4
4
2
9
3
3
3
1
3
5
9
6
2
1
1
5
2
0
2
2
6
0
2
7
0
1
7
5
8
4
6
6
3
1
1
3
4
9
1
2
3
5
0
9
8
0
2
4
8
4
4
7
1
6
7
0
4
1
4
9
1
6
3
7
2
1
3
3
9
7
0.00
0.25
0.50
0.75
1.00
ang aze bak ben crh dan deu est evn gmh gml isl izh kan kjh kpv krl liv mdf mhr mlt myv nno nob olo ood pus san sme swe syc tel udm urd vep vot vro
VeryEasy Easy Hard VeryHard
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 85 / 115
Verb Samples Difficulty
7
6
2
2
9
4
0
1
4
5
6
2
7
2
3
8
7
6
9
4
9
2
0
1
4
8
4
1
1
1
0
9
4
0
1
4
1
2
1
0
8
1
7
2
5
7
8
3
0
2
7
4
6
2
5
5
1
2
5
4
4
2
3
0
1
6
3
5
7
7
5
0
6
7
2
0
6
7
0
3
2
4
3
5
1
5
4
5
1
6
8
6
8
8
1
1
2
4
1
7
2
3
8
4
2
3
6
1
0
3
6
8
3
5
3
9
7
4
8
2
0
8
9
1
0
8
5
1
5
5
5
9
0
8
1
3
4
8
2
4
4
1
7
7
1
4
4
9
9
6
8
4
1
3
0
6
3
5
3
9
3
1
2
6
3
5
0
2
0
9
7
3
6
2
6
8
0
4
3
4
9
5
9
6
8
5
2
2
8
4
2
2
4
4
4
0
3
6
4
2
7
5
9
8
0
2
4
9
3
9
0
2
9
3
3
6
4
4
1
9
8
9
0
9
3
7
3
2
2
1
6
4
7
2
5
7
8
6
7
7
8
6
1
4
6
8
2
2
7
4
7
1
8
5
3
4
8
6
4
9
3
6
4
3
5
9
7
2
2
6
7
5
0.00
0.25
0.50
0.75
1.00
a
k
a
a
n
g
a
s
t
a
z
e
a
z
g
b
e
n
b
o
d
c
a
t
c
e
b
c
l
y
c
p
a
c
r
e
c
r
h
c
t
p
c
z
n
d
a
k
d
a
n
d
e
u
e
n
g
e
s
t
e
v
n
f
a
s
f
r
m
f
r
r
f
u
r
g
a
a
g
l
g
g
m
h
g
m
l
g
s
w
h
i
l
h
i
n
i
s
l
k
a
n
k
a
z
k
i
r
k
o
n
k
p
v
k
r
l
l
i
n
l
i
v
l
l
d
l
u
g
m
a
o
m
d
f
m
h
r
m
l
g
m
l
t
m
w
f
m
y
v
n
l
d
n
n
o
n
o
b
n
y
a
o
l
o
o
o
d
o
r
m
o
t
e
o
t
m
p
e
i
p
u
s
s
m
e
s
n
a
s
o
t
s
w
a
s
w
e
t
e
l
t
g
l
t
u
k
u
d
m
u
i
g
u
r
d
u
z
b
v
e
c
v
e
p
x
n
o
x
t
y
z
p
v
z
u
l
VeryEasy Easy Hard VeryHard
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 86 / 115
Adjective Samples Difficulty
3
3
2
6
9
4
7
1
1
0
2
4
5
4
1
0
7
2
1
3
4
5
1
8
0
7
9
7
1
2
3
0
2
5
0
5
4
2
8
9
2
4
8
4
1
0
2
0
1
9
1
5
2
2
6
6
9
8
3
8
2
1
4
4
6
1
8
3
5
4
2
3
9
0.00
0.25
0.50
0.75
1.00
a
n
g
b
a
k
c
r
h
e
v
n
g
m
l
i
z
h
k
p
v
k
r
l
l
i
v
m
d
f
m
h
r
m
y
v
n
l
d
n
n
o
n
o
b
o
l
o
p
u
s
s
a
n
s
m
e
s
w
e
s
y
c
u
d
m
v
e
p
VeryEasy Easy Hard VeryHard
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 87 / 115
Questions Addressed in Papers
Is developing morphological grammars manually worthwhile?
CU7565 manually designed finite-state grammars for 25 languages
Paradigms of some languages were relatively easy to describe but neural networks also
performed quite well
For Ingrian and Tagalog (LRL) grammars demonstrate superior performance but this comes at
the expense of a significant amount of person-hours
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 88 / 115
Questions Addressed in Papers
What is the best training strategy for low-resource languages?
Hallucinated data highlighted its utility for LRLs.
Augmenting the data with tuples where lemmas are replaced with non-lemma forms and their
tags
Multilingual training
Ensembles
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 89 / 115
Error Analysis
Systematic Errors:
Data Inconsistency
The train, development and test sets contain 2%, 0.3%, and 0.6% inconsistent entries
Highest rates: Azerbaijani, Old English, Cree, Danish, Middle Low German , Kannada,
Norwegian Bokmål, Chichimec, and Veps
Dialectal variations in Finno-Ugric and Tungusic
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 90 / 115
Language-Specific Errors
Algic (Cree)
Mean accuracy across systems was 65.1% (41.5% to 73%)
Struggled with the choice of preverbal auxiliary ( ‘kitta’ could refer to future, imperfective, or
imperative)
The paradigms were very large, there were very few lemmas (28 impersonal verbs and 14
transitive verbs
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 91 / 115
Language-Specific Errors
Austronesian
Mean accuracy across systems was 80.5% (39.5% to 100%)
Baseline: Cebuano (84%) and Hiligaynon (96%)
Cebuano only has partial reduplication while Hiligaynon has full reduplication
The prefix choice for Cebuano is more irregular, making it more difficult to predict the correct
conjugation of the verb
In Maori passive voice endings are difficult to predict as the language has undergone a loss of
word-final consonants and there is no clear link between a stem and the passive suffix that it
employs
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 92 / 115
Language-Specific Errors
Niger-Congo
Mean accuracy across systems was very good at 96.4 (62.8% to 100%)
Most languages in this family are considered low resource, and the resources used for data
gathering may have been biased towards the languages’ regular forms
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 93 / 115
Language-Specific Errors
Sino–Tibetan (Tibetan)
Mean accuracy across systems was average at 82.1%(67.9% to 85.1%)
Majority of errors are related to allomorphy
Nonce words and impossible combinations of component units (Di et al., 2019)
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 94 / 115
Language-Specific Errors
Siouan (Dakota)
Mean accuracy across systems was average at 89.4%(0% to 95.7%)
Variable prefixing and infixing of person morphemes, along some complexities related to
fortition processes
Determining the factor(s) that governed variation in affix position was difficult from a
linguist’s perspective, though many systems were largely successful
Issues with first and second person singular allomorphy
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 95 / 115
Language-Specific Errors
Tungusic (Evenki)
Mean accuracy across systems was average at 53.8% (43.5% to 59.0%)
The dataset was created from oral speech samples in various dialects of the language; there
was little attempt at any standardization in the oral speech transcription
Annotation: various past tense forms are all annotated as PST, or there are several comitative
suffixes all annotated as COM
Annotation: some features are present in the word form but they receive no annotation at all
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 96 / 115
Language-Specific Errors
Uto-Aztecan (O’odham)
Mean accuracy across systems was average at 76.4% (54.8% to 82.5%)
Systems with higher accuracy may have benefited from better recall of suppletive forms
relative to lower accuracy systems.
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 97 / 115
SM2020ST0 (Vylomova et al., 2020): Conclusion
AND.....TO CONCLUDE:
Submissions were able to make productive use of multilingual training
Data augmentation techniques such as hallucination helped
Combined with architecture tweaks like sparsemax, it resulted in excellent overall performance
on many languages
Some morphology types and language families (Tungusic, Oto-Manguean, Southern Daly) are
still challenging
In some languages (Ingrian, Tajik, Tagalog, Zarma, and Lingala) hand-encoding linguist
knowledge in finite state grammars resulted in best performance
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 98 / 115
A Case Study on Nen (Papua New Guinea); Muradoglu et al., 2020
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 99 / 115
A Case Study on Nen (Papua New Guinea); Muradoglu et al., 2020
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 100 / 115
Spoken in the village of Bimadbn in the Western Province of PNG, by approx 400 people
Verbs: prefixing, middle, and ambifixing
Distributed Exponence (DE); “morphosyntactic feature values can only be
determined after unification of multiple structural positions
A Case Study on Nen (Papua New Guinea); Muradoglu et al., 2020
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 101 / 115
A Case Study on Nen (Papua New Guinea); Muradoglu et al., 2020
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 102 / 115
Low accuracy on small number of samples (<1000)
A Case Study on Nen (Papua New Guinea); Muradoglu et al., 2020
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 103 / 115
Low accuracy on small number of samples (<1000)
Allomorphy: vowel harmony
Variation in forms/spelling
Looping: *ynawemaylmyylmyylmyylmy-ylmyylmyymayamawemyymamya
Shcherbakov et al., 2020
A Case Study on Nen (Papua New Guinea); Muradoglu et al., 2020
How well do the models generalize?
Syncretism Test: all the TAM categories exhibit syncretism across the second and third-person
singular actor. Exception: The past perfective slot (where they take different forms)
Not observing the past perfective forms, systems tend to predict the forms as syncretic
(generalizing from observed slots), resulting in the misprediction of the actual forms (exceptions)
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 104 / 115
SIGMORPHON 2021 Shared Task 0 (Pimentel, Ryskina et al., 2021): More
under-resourced languages!
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 105 / 115
SIGMORPHON 2021 Shared Task 0 (Pimentel, Ryskina et al., 2021): More
under-resourced languages!
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 106 / 115
SIGMORPHON 2021 Shared Task 0 (Pimentel, Ryskina et al., 2021)
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 107 / 115
SIGMORPHON 2021 Shared Task 0 (Pimentel, Ryskina et al., 2021)
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 108 / 115
Allomorphy
Spelling errors
Multi-Word Lemmas
Complex transformation patterns
SIGMORPHON 2021 Shared Task 0 (Pimentel, Ryskina et al., 2021)
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 109 / 115
Allomorphy
Spelling errors
Most errors are due to limited data
Very sparse data w/o complete paradigms (e.g.,Eibela)
Misprediction in unseen lemmas (also see Goldman et al., 2021)
Multi-Word Lemmas
Complex transformation patterns
Language-Specific Errors
Russian
Mean accuracy across systems was average at 97.4%(94.31% to 98.06%)
Incorrect prediction of the instrumental case forms (even when the other parts of the same
paradigm observed (for the same lemma))
Incorrect prediction of the accusative forms. The forms are different for animate and inanimate
nouns, and animacy should be inferred (from observing other slot of the same case such as PL or
SG)
Errors in inflection of multi-word lemmas that require to infer dependency information.
Similarly, to the above cases, the information could be inferred from other slots of the same
paradigm
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 110 / 115
Language-Specific Errors
Kunwinjku
Accuracy across systems ranges from 14.75% to 63.93%
Due to limited amount of data, augmentation significantly improved the performance
Systems mispredict *ngurriborlbme instead of ngurriborle.
looping effects (Shcherbakov et al., 2020) are observed in RNN-based architectures:
*ngar-rrrrrrrrrrrrrmbbbijj (should be karribelbmerrinj), ngadjarridarrkddrrdddrrmerri (should be
karriyawoyhdjarrkbidyikarrmerrimeninj)
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 111 / 115
PART IV: Current Challenges and Future Directions
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 112 / 115
Challenges in Data Conversion/Annotation
Challenges in Data Conversion/Annotation
Case compounding and stacking (e.g., Kayardild)
I gave the book to my brother’s wife: ‘wife+DAT+ABL, my+GEN+DAT+ABL,
brother+GEN+DAT+ABL’
Clitics: exponential growth of paradigm tables
Polysynthetic languages and paradigms
Derivation – Inflection continuum: some paradigms contain
derivations (participle formation, masdars, etc) and require multi-step transformation
(PL: similar to ‘to run’ → ‘runners’ ).
Multi-word lemmas that might require dependency information
Which features should be added (not language-specific)?
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 113 / 115
Future Directions
Future Directions
Develop a framework for error analysis, e.g. measuring %-ge of allomorphy errors by providing a
set of tasks specifically for allomorphy (e.g., following Elsner and Sims, 2019; Malouf et al., 2020)
Increase interpretability of the models, design a methodology to extract the patterns learned by
the model
Make more typologically plausible language samples
A pipeline to augment UniMorph with new morphosyntatic features
An approach to estimate how representative a paradigm sample for a specific language is
(estimate of the language coverage)
... And ST0 Part 2: Human-like generalization and WUGS!
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 114 / 115
Thank you! Questions?
Please join us: https://groups.google.com/g/unimorph
Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 115 / 115

Contenu connexe

Tendances

Case (group 5)
Case (group 5)Case (group 5)
Case (group 5)rikanissa
 
Syntax powerpoint
Syntax powerpointSyntax powerpoint
Syntax powerpointcswstyle
 
Comparing characteristics of old and middle english
Comparing characteristics of old and middle englishComparing characteristics of old and middle english
Comparing characteristics of old and middle englishAbdel-Fattah Adel
 
An a b-c intro to canto for total new speakers
An a b-c intro to canto for total new speakersAn a b-c intro to canto for total new speakers
An a b-c intro to canto for total new speakersBangulzai
 
презентация по истории английского языка по теме прилагательное как часть реч...
презентация по истории английского языка по теме прилагательное как часть реч...презентация по истории английского языка по теме прилагательное как часть реч...
презентация по истории английского языка по теме прилагательное как часть реч...irinaborovik2013
 
MORPHOLOPGY-Call
MORPHOLOPGY-CallMORPHOLOPGY-Call
MORPHOLOPGY-Callmaysarie
 
English Grammar Lecture 12: Transitive Phrasal Verbs
English Grammar Lecture 12: Transitive Phrasal VerbsEnglish Grammar Lecture 12: Transitive Phrasal Verbs
English Grammar Lecture 12: Transitive Phrasal VerbsEd McCorduck
 
Conscious raising approach
Conscious raising approachConscious raising approach
Conscious raising approachredcrimson07
 
English Grammar Lecture 8: Optional Slots
English Grammar Lecture 8: Optional SlotsEnglish Grammar Lecture 8: Optional Slots
English Grammar Lecture 8: Optional SlotsEd McCorduck
 
Long short form adjectives - sarajevo09 - presentation
Long short form adjectives - sarajevo09 - presentationLong short form adjectives - sarajevo09 - presentation
Long short form adjectives - sarajevo09 - presentationbarsenijevic
 
English Grammar Lecture 9: The Intransitive Verb Pattern
English Grammar Lecture 9: The Intransitive Verb PatternEnglish Grammar Lecture 9: The Intransitive Verb Pattern
English Grammar Lecture 9: The Intransitive Verb PatternEd McCorduck
 

Tendances (13)

Case ppt
Case pptCase ppt
Case ppt
 
Case (group 5)
Case (group 5)Case (group 5)
Case (group 5)
 
Syntax powerpoint
Syntax powerpointSyntax powerpoint
Syntax powerpoint
 
Comparing characteristics of old and middle english
Comparing characteristics of old and middle englishComparing characteristics of old and middle english
Comparing characteristics of old and middle english
 
An a b-c intro to canto for total new speakers
An a b-c intro to canto for total new speakersAn a b-c intro to canto for total new speakers
An a b-c intro to canto for total new speakers
 
презентация по истории английского языка по теме прилагательное как часть реч...
презентация по истории английского языка по теме прилагательное как часть реч...презентация по истории английского языка по теме прилагательное как часть реч...
презентация по истории английского языка по теме прилагательное как часть реч...
 
MORPHOLOPGY-Call
MORPHOLOPGY-CallMORPHOLOPGY-Call
MORPHOLOPGY-Call
 
Call
CallCall
Call
 
English Grammar Lecture 12: Transitive Phrasal Verbs
English Grammar Lecture 12: Transitive Phrasal VerbsEnglish Grammar Lecture 12: Transitive Phrasal Verbs
English Grammar Lecture 12: Transitive Phrasal Verbs
 
Conscious raising approach
Conscious raising approachConscious raising approach
Conscious raising approach
 
English Grammar Lecture 8: Optional Slots
English Grammar Lecture 8: Optional SlotsEnglish Grammar Lecture 8: Optional Slots
English Grammar Lecture 8: Optional Slots
 
Long short form adjectives - sarajevo09 - presentation
Long short form adjectives - sarajevo09 - presentationLong short form adjectives - sarajevo09 - presentation
Long short form adjectives - sarajevo09 - presentation
 
English Grammar Lecture 9: The Intransitive Verb Pattern
English Grammar Lecture 9: The Intransitive Verb PatternEnglish Grammar Lecture 9: The Intransitive Verb Pattern
English Grammar Lecture 9: The Intransitive Verb Pattern
 

Similaire à The UniMorph Project and Morphological Reinflection Task: Past, Present, and Future

граматика английского языка.kobrina
граматика английского языка.kobrinaграматика английского языка.kobrina
граматика английского языка.kobrinaDaniela Balaban
 
Position and combination of sounds in speech
Position and combination of sounds in speechPosition and combination of sounds in speech
Position and combination of sounds in speechSubmissionResearchpa
 
Phonology of igbo morpho syntactic clitics
Phonology of igbo morpho syntactic cliticsPhonology of igbo morpho syntactic clitics
Phonology of igbo morpho syntactic cliticsAlexander Decker
 
A CONTRASTIVE STUDY OF THE NEGATION MORPHEMES ( ENGLISH, KURDISH AND ARABIC)
A CONTRASTIVE STUDY OF THE NEGATION MORPHEMES ( ENGLISH, KURDISH AND ARABIC)A CONTRASTIVE STUDY OF THE NEGATION MORPHEMES ( ENGLISH, KURDISH AND ARABIC)
A CONTRASTIVE STUDY OF THE NEGATION MORPHEMES ( ENGLISH, KURDISH AND ARABIC)kevig
 
A CONTRASTIVE STUDY OF THE NEGATION MORPHEMES ( ENGLISH, KURDISH AND ARABIC)
A CONTRASTIVE STUDY OF THE NEGATION MORPHEMES ( ENGLISH, KURDISH AND ARABIC)A CONTRASTIVE STUDY OF THE NEGATION MORPHEMES ( ENGLISH, KURDISH AND ARABIC)
A CONTRASTIVE STUDY OF THE NEGATION MORPHEMES ( ENGLISH, KURDISH AND ARABIC)kevig
 
Interlanguage l2 acq
Interlanguage l2 acqInterlanguage l2 acq
Interlanguage l2 acqFatima Flor
 
Teaching alphabetics and fluency in reading
Teaching alphabetics and fluency in readingTeaching alphabetics and fluency in reading
Teaching alphabetics and fluency in readingMarcia Luptak
 
The evolution of the technology of language (1)
The evolution of the technology of language (1)The evolution of the technology of language (1)
The evolution of the technology of language (1)tborger
 
History of Phonology
History of PhonologyHistory of Phonology
History of PhonologyRanggaAsmara4
 
(Emerson) Phonetics & Phonology.pptx
(Emerson) Phonetics & Phonology.pptx(Emerson) Phonetics & Phonology.pptx
(Emerson) Phonetics & Phonology.pptxShamsUlFatah
 
Types of deviation
Types of deviationTypes of deviation
Types of deviationAmer Minhas
 
Mclennan luce charlesluce
Mclennan luce charlesluceMclennan luce charlesluce
Mclennan luce charlesluceBrendaWongUdye
 
Syntax
SyntaxSyntax
SyntaxMae
 
English to Non-Native Speakers in Dynamical Systems
English to Non-Native Speakers in Dynamical Systems English to Non-Native Speakers in Dynamical Systems
English to Non-Native Speakers in Dynamical Systems Eva de Lourdes Edwards
 

Similaire à The UniMorph Project and Morphological Reinflection Task: Past, Present, and Future (20)

6
66
6
 
граматика английского языка.kobrina
граматика английского языка.kobrinaграматика английского языка.kobrina
граматика английского языка.kobrina
 
Position and combination of sounds in speech
Position and combination of sounds in speechPosition and combination of sounds in speech
Position and combination of sounds in speech
 
Phonology of igbo morpho syntactic clitics
Phonology of igbo morpho syntactic cliticsPhonology of igbo morpho syntactic clitics
Phonology of igbo morpho syntactic clitics
 
A CONTRASTIVE STUDY OF THE NEGATION MORPHEMES ( ENGLISH, KURDISH AND ARABIC)
A CONTRASTIVE STUDY OF THE NEGATION MORPHEMES ( ENGLISH, KURDISH AND ARABIC)A CONTRASTIVE STUDY OF THE NEGATION MORPHEMES ( ENGLISH, KURDISH AND ARABIC)
A CONTRASTIVE STUDY OF THE NEGATION MORPHEMES ( ENGLISH, KURDISH AND ARABIC)
 
A CONTRASTIVE STUDY OF THE NEGATION MORPHEMES ( ENGLISH, KURDISH AND ARABIC)
A CONTRASTIVE STUDY OF THE NEGATION MORPHEMES ( ENGLISH, KURDISH AND ARABIC)A CONTRASTIVE STUDY OF THE NEGATION MORPHEMES ( ENGLISH, KURDISH AND ARABIC)
A CONTRASTIVE STUDY OF THE NEGATION MORPHEMES ( ENGLISH, KURDISH AND ARABIC)
 
Interlanguage l2 acq
Interlanguage l2 acqInterlanguage l2 acq
Interlanguage l2 acq
 
TERM PAPER-EPENTHESIS
TERM PAPER-EPENTHESISTERM PAPER-EPENTHESIS
TERM PAPER-EPENTHESIS
 
Teaching alphabetics and fluency in reading
Teaching alphabetics and fluency in readingTeaching alphabetics and fluency in reading
Teaching alphabetics and fluency in reading
 
2 allomorphy
2 allomorphy2 allomorphy
2 allomorphy
 
The evolution of the technology of language (1)
The evolution of the technology of language (1)The evolution of the technology of language (1)
The evolution of the technology of language (1)
 
History of Phonology
History of PhonologyHistory of Phonology
History of Phonology
 
(Emerson) Phonetics & Phonology.pptx
(Emerson) Phonetics & Phonology.pptx(Emerson) Phonetics & Phonology.pptx
(Emerson) Phonetics & Phonology.pptx
 
Vowel u
Vowel uVowel u
Vowel u
 
G386271
G386271G386271
G386271
 
Types of deviation
Types of deviationTypes of deviation
Types of deviation
 
Mclennan luce charlesluce
Mclennan luce charlesluceMclennan luce charlesluce
Mclennan luce charlesluce
 
Syntax
SyntaxSyntax
Syntax
 
English to Non-Native Speakers in Dynamical Systems
English to Non-Native Speakers in Dynamical Systems English to Non-Native Speakers in Dynamical Systems
English to Non-Native Speakers in Dynamical Systems
 
Structure of English Language
Structure of English LanguageStructure of English Language
Structure of English Language
 

Plus de Katerina Vylomova

The Secret Life of Words: Exploring Regularity and Systematicity (joint talk ...
The Secret Life of Words: Exploring Regularity and Systematicity (joint talk ...The Secret Life of Words: Exploring Regularity and Systematicity (joint talk ...
The Secret Life of Words: Exploring Regularity and Systematicity (joint talk ...Katerina Vylomova
 
SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection
SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological InflectionSIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection
SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological InflectionKaterina Vylomova
 
Ekaterina vylomova-what-do-neural models-know-about-language-p2
Ekaterina vylomova-what-do-neural models-know-about-language-p2Ekaterina vylomova-what-do-neural models-know-about-language-p2
Ekaterina vylomova-what-do-neural models-know-about-language-p2Katerina Vylomova
 
Ekaterina vylomova-what-do-neural models-know-about-language-p1
Ekaterina vylomova-what-do-neural models-know-about-language-p1Ekaterina vylomova-what-do-neural models-know-about-language-p1
Ekaterina vylomova-what-do-neural models-know-about-language-p1Katerina Vylomova
 
Evaluation of Semantic Change of Harm-Related Concepts in Psychology
Evaluation of Semantic Change of Harm-Related Concepts in PsychologyEvaluation of Semantic Change of Harm-Related Concepts in Psychology
Evaluation of Semantic Change of Harm-Related Concepts in PsychologyKaterina Vylomova
 
Contextualization of Morphological Inflection
Contextualization of Morphological InflectionContextualization of Morphological Inflection
Contextualization of Morphological InflectionKaterina Vylomova
 
Paradigm Completion for Derivational Morphology
Paradigm Completion for Derivational MorphologyParadigm Completion for Derivational Morphology
Paradigm Completion for Derivational MorphologyKaterina Vylomova
 
Contemporary Models of Natural Language Processing
Contemporary Models of Natural Language ProcessingContemporary Models of Natural Language Processing
Contemporary Models of Natural Language ProcessingKaterina Vylomova
 
Men Are from Mars, Women Are from Venus: Evaluation and Modelling of Verbal A...
Men Are from Mars, Women Are from Venus: Evaluation and Modelling of Verbal A...Men Are from Mars, Women Are from Venus: Evaluation and Modelling of Verbal A...
Men Are from Mars, Women Are from Venus: Evaluation and Modelling of Verbal A...Katerina Vylomova
 
Context-Aware Derivation Prediction // EACL 2017
Context-Aware Derivation Prediction // EACL 2017Context-Aware Derivation Prediction // EACL 2017
Context-Aware Derivation Prediction // EACL 2017Katerina Vylomova
 
Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vec...
Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vec...Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vec...
Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vec...Katerina Vylomova
 
Neural models for recognition of basic units of semiographic chants
Neural models for recognition of basic units of semiographic chantsNeural models for recognition of basic units of semiographic chants
Neural models for recognition of basic units of semiographic chantsKaterina Vylomova
 
Russia, Russians and Russian language
Russia, Russians and Russian languageRussia, Russians and Russian language
Russia, Russians and Russian languageKaterina Vylomova
 
Ekaterina Vylomova/Brown Bag seminar presentation
Ekaterina Vylomova/Brown Bag seminar presentationEkaterina Vylomova/Brown Bag seminar presentation
Ekaterina Vylomova/Brown Bag seminar presentationKaterina Vylomova
 

Plus de Katerina Vylomova (15)

The Secret Life of Words: Exploring Regularity and Systematicity (joint talk ...
The Secret Life of Words: Exploring Regularity and Systematicity (joint talk ...The Secret Life of Words: Exploring Regularity and Systematicity (joint talk ...
The Secret Life of Words: Exploring Regularity and Systematicity (joint talk ...
 
SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection
SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological InflectionSIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection
SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection
 
Ekaterina vylomova-what-do-neural models-know-about-language-p2
Ekaterina vylomova-what-do-neural models-know-about-language-p2Ekaterina vylomova-what-do-neural models-know-about-language-p2
Ekaterina vylomova-what-do-neural models-know-about-language-p2
 
Ekaterina vylomova-what-do-neural models-know-about-language-p1
Ekaterina vylomova-what-do-neural models-know-about-language-p1Ekaterina vylomova-what-do-neural models-know-about-language-p1
Ekaterina vylomova-what-do-neural models-know-about-language-p1
 
Evaluation of Semantic Change of Harm-Related Concepts in Psychology
Evaluation of Semantic Change of Harm-Related Concepts in PsychologyEvaluation of Semantic Change of Harm-Related Concepts in Psychology
Evaluation of Semantic Change of Harm-Related Concepts in Psychology
 
Contextualization of Morphological Inflection
Contextualization of Morphological InflectionContextualization of Morphological Inflection
Contextualization of Morphological Inflection
 
Paradigm Completion for Derivational Morphology
Paradigm Completion for Derivational MorphologyParadigm Completion for Derivational Morphology
Paradigm Completion for Derivational Morphology
 
Contemporary Models of Natural Language Processing
Contemporary Models of Natural Language ProcessingContemporary Models of Natural Language Processing
Contemporary Models of Natural Language Processing
 
Men Are from Mars, Women Are from Venus: Evaluation and Modelling of Verbal A...
Men Are from Mars, Women Are from Venus: Evaluation and Modelling of Verbal A...Men Are from Mars, Women Are from Venus: Evaluation and Modelling of Verbal A...
Men Are from Mars, Women Are from Venus: Evaluation and Modelling of Verbal A...
 
Context-Aware Derivation Prediction // EACL 2017
Context-Aware Derivation Prediction // EACL 2017Context-Aware Derivation Prediction // EACL 2017
Context-Aware Derivation Prediction // EACL 2017
 
Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vec...
Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vec...Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vec...
Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vec...
 
Working with text data
Working with text dataWorking with text data
Working with text data
 
Neural models for recognition of basic units of semiographic chants
Neural models for recognition of basic units of semiographic chantsNeural models for recognition of basic units of semiographic chants
Neural models for recognition of basic units of semiographic chants
 
Russia, Russians and Russian language
Russia, Russians and Russian languageRussia, Russians and Russian language
Russia, Russians and Russian language
 
Ekaterina Vylomova/Brown Bag seminar presentation
Ekaterina Vylomova/Brown Bag seminar presentationEkaterina Vylomova/Brown Bag seminar presentation
Ekaterina Vylomova/Brown Bag seminar presentation
 

Dernier

pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flyPRADYUMMAURYA1
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxRenuJangid3
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Silpa
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptxryanrooker
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptxSilpa
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Silpa
 

Dernier (20)

pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 

The UniMorph Project and Morphological Reinflection Task: Past, Present, and Future

  • 1. UniMorph and Morphological Inflection Task: Past, Present, and Future Ekaterina Vylomova@ @ University of Melbourne ekaterina.vylomova@unimelb.edu.au 20 августа 2021 г. Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 1 / 115
  • 2. PART I: The UniMorph Project Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 2 / 115
  • 3. Increasing Multilinguality Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 3 / 115
  • 4. Speech is Special Charles F. Hockett on Essential Properties of Human Languages Displacement Ability to refer to things in space and time and communicate about things that are not present Productivity Ability to create new and unique meanings of utterances from previously existing utterances and sounds Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 4 / 115
  • 5. Speech is Special Charles F. Hockett on Essential Properties of Human Languages Duality of Patterning Meaningless phonic segments (phonemes) are combined to make meaningful words, etc. Learnability A speaker of a language can learn another language Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 5 / 115
  • 6. Linguistic Diversity Roman Jacobson on Differences between Languages «_» “Languages differ essentially in what they must convey and not in what they may convey” Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 6 / 115
  • 7. Languages differ in many ways! (1) Chinese (Isolating) wǒmen I.PL.AN xué learn le .PAST zhè this xiē .PL shēngcı́. new word. “We learned these new words.” (2) Russian (Synthetic) My We.NOM vyučili learn.PAST.PL eti this.ACC.PL novyje new.ACC.PL slova. word.ACC.PL “We learned these new words.” Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 7 / 115
  • 8. Languages differ in many ways! An example of West Greenlandic taken from Fortescue (2017): (3) West Greenlandic (Polysynthetic) Nannu-n-niuti-kkuminar-tu- Polar.bear-catch-instrument.for.achieving-something.good.for-PART- rujussu-u-vuq. big-be-3SG.INDIC “It (a dog) is good for catching polar bears with.” Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 8 / 115
  • 9. Languages differ in many ways! An example of Kunwinjku taken from Evans (2003): (4) Kunwinjku (Polysynthetic) Aban-yawoith-warrgah-marne-ganj-ginje-ng. 1/3PL-again-wrong-BEN-meat-cook-PP “I cooked the wrong meat for them again” Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 9 / 115
  • 10. Languages differ in many ways! An example of Kunwinjku taken from Evans (2003): (5) Kunwinjku (Polysynthetic) Aban-yawoith-warrgah-marne-ganj-ginje-ng. 1/3PL-again-wrong-BEN-meat-cook-PP “I cooked the wrong meat for them again” Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 10 / 115 Discussion of what should be considered as a word: John Mansfield’s “The word as a unit of internal predictability”
  • 11. Languages differ in many ways! Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 11 / 115 Some exhibit rich grammatical case systems (e.g., 12 in Erzya and 24 in Veps) Some mark possessiveness Others might have complex verbal morphology (e.g., Oto-Manguean languages) Even “decline” nouns for tense (e.g., Tupi–Guarani languages)
  • 12. Languages differ in many ways! Let’s Discuss The Following Dimensions: Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 12 / 115 Fusion Inflectional Synthesis Position of Case Affixes
  • 13. Fusion (WALS 20A) Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 13 / 115
  • 14. Fusion (WALS 20A) Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 14 / 115 From isolating to concatenative Concatenative morphology is the most common system Non-linearities such as ablaut or tonal morphology can also be present Isolating languages: the Sahel Belt in West Africa, Southeast Asia and the Pacific Tonal–concatenative morphology can be found in Mesoamerican languages
  • 15. Inflectional Synthesis of the Verb (WALS 22A) Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 15 / 115
  • 16. Inflectional Synthesis of the Verb (WALS 22A) Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 16 / 115 Analytic expressions are common in Eurasia Synthetic expressions are used to a high degree in the Americas
  • 17. Position of Case Affixes (WALS 51A) Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 17 / 115
  • 18. Position of Case Affixes (WALS 51A) Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 18 / 115 Can variably surface as prefixes, suffixes, infixes, or circumfixes Suffixation: Most Eurasian and Australian languages to a lesser extent in South American and New Guinean languages Prefixation:Mesoamerican languages and African languages spoken below the Sahara
  • 19. The Earliest Approach to Morphology (Sanskrit) Pāņini’s karakas Formalize regularities in the words Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 19 / 115 Inflectional Morphology is Paradigmatic
  • 20. ..or Russian Morphology Morphological Inflection Formalize regularities in the words Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 20 / 115 Inflectional Morphology is Paradigmatic Formalizations differ: The number of cases may vary from 6 to 11(Zaliznyak, 1967)
  • 21. Inflectional Morphology: Paradigms (nouns) Morphological Inflection беглец “runner” + pos=N,case=ACC,num=SG → беглеца Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 21 / 115 ru-noun-table | b | беглец | a=an
  • 22. Inflectional Morphology: Classes (nouns) Morphological Inflection беглец + pos=N,case=ACC,num=SG → беглеца Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 22 / 115 EN Wiktionary: ru-noun-table | b | беглец | a=an
  • 23. Inflectional Morphology: Classes (nouns); *Differs in En/Ru Editions of Wiktionary* Morphological Inflection беглец + pos=N,case=ACC,num=SG → беглеца Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 23 / 115 EN Wiktionary: ru-noun-table | b | беглец | a=an RU Wiktionary:сущ ru m a 5b|основа=беглец|основа1=беглец|слоги=по-слогам|бег|лец
  • 24. Inflectional Morphology: Wiktionary annotation is Not Cross-linguistically Consistent Other Languages Hungarian Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 24 / 115 Wiktionary: Inconsistent annotation across languages Within a single language: across different editions (en; ru; de; etc) Many language-specific features
  • 25. Linguistic Diversity and Universals Universal Grammar Evans and Levinson, 2009: The Myth of Language Universals "Diversity can be found at almost every level of linguistic organization” Languages vary greatly on phonological, morphological, semantic, and syntactic levels Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 25 / 115
  • 26. Linguistic Diversity and Universals Universal Grammar Evans and Levinson, 2009: The Myth of Language Universals "Diversity can be found at almost every level of linguistic organization” Languages vary greatly on phonological, morphological, semantic, and syntactic levels Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 26 / 115 Typology: describe the limits of cross-linguistic variation
  • 27. Linguistic Diversity and Universals Universal Grammar Evans and Levinson, 2009: The Myth of Language Universals "Diversity can be found at almost every level of linguistic organization” Languages vary greatly on phonological, morphological, semantic, and syntactic levels Haspelmath, 2010 Descriptive categories (specific to languages) vs. comparative concepts. Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 27 / 115 Typology: describe the limits of cross-linguistic variation
  • 28. UniMorph – Universal Annotation Universal Annotation (by John Sylak-Glassman and David Yarowsky) 1) 23 dimensions of meaning (TAM, case, number, animacy), 212 features 2) A-morphous (word-based) morphology (Anderson, 1992) 3) Initial paradigms were mainly extracted from English Edition of Wiktionary (Kirov et al., 2016) https://unimorph.github.io/ Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 28 / 115 [Sylak-Glassman, 2016]
  • 29. UniMorph – Universal Annotation Universal Annotation (by John Sylak-Glassman and David Yarowsky) 1) 23 dimensions of meaning (TAM, case, number, animacy), 212 features 2) A-morphous (word-based) morphology (Anderson, 1992) 3) Initial paradigms were mainly extracted from English Edition of Wiktionary (Kirov et al., 2016) Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 29 / 115 [Sylak-Glassman, 2016]
  • 30. UniMorph – Universal Annotation Universal Annotation (by John Sylak-Glassman and David Yarowsky) 1) 23 dimensions of meaning (TAM, case, number, animacy), 212 features 2) A-morphous (word-based) morphology (Anderson, 1992) 3) Initial paradigms were mainly extracted from English Edition of Wiktionary (Kirov et al., 2016) Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 30 / 115 [Sylak-Glassman, 2016]
  • 31. UniMorph – Universal Annotation Universal Annotation (by John Sylak-Glassman and David Yarowsky) 1) 23 dimensions of meaning (TAM, case, number, animacy), 212 features 2) A-morphous (word-based) morphology (Anderson, 1992) 3) Initial paradigms were mainly extracted from English Edition of Wiktionary (Kirov et al., 2016) Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 31 / 115 [Sylak-Glassman, 2016]
  • 32. PART II: SIGMORPHON Shared Tasks on Morphological (Re-)inflection Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 32 / 115
  • 33. Morphological (Re-)Inflection SIGMORPHON Shared Task 2016–2019 Inflection: PLAY + PRESENT PARTICIPLE → playing ReInflection: played + PRESENT PARTICIPLE → playing Lemma Tag Form RUN PAST ran RUN PRES;1SG run RUN PRES;2SG run RUN PRES;3SG runs RUN PRES;PL run RUN PART running Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 33 / 115 2018 :∼ 96% accuracy on avg. in high-resource setting But much less well in low-resource setting
  • 34. SIGMORPHON 2016 Shared Task (Cotterell et al., 2016) Morphological (Re-)Inflection (10 Languages) Task1: беглец + pos=N,case=ACC,num=SG → беглеца Task2: беглецами + pos=N,case=INS, num=PL + pos=N,case=ACC,num=SG → беглеца Task3: беглецами + pos=N,case=ACC,num=SG → беглеца Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 34 / 115 [Cotterell et al., 2016] LMU+BIU+Helsinki: Neural (seq2seq) +/- aligner MSU+Col/NYU: rule-based/heuristics Others: external aligner+WFST/CRF
  • 35. SIGMORPHON 2016 Shared Task (Cotterell et al., 2016) Morphological (Re-)Inflection (10 Languages): Neural encoder–decoders 1) character-level input: <s> r u n OUT_POS=V OUT_NUM=SG OUT_TENSE=PRES </s> Output: <s> r u n s </s> 2) Ensembles of seq2seq (GRUs + soft attention (Bahdanau et al., 2015)) 3) Enriching the data with combinations of other (non-lemma) forms Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 35 / 115 [Kann and Schuetze, 2016]
  • 36. SIGMORPHON 2016 Shared Task (Cotterell et al., 2016) Morphological (Re-)Inflection (10 Languages): Neural encoder–decoders 1) character-level input: <s> r u n OUT_POS=V OUT_NUM=SG OUT_TENSE=PRES </s> Output: <s> r u n s </s> 2) Ensembles of seq2seq (GRUs + soft attention (Bahdanau et al., 2015)) 3) Enriching the data with combinations of other (non-lemma) forms Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 36 / 115 [Kann and Schuetze, 2016]
  • 37. SIGMORPHON 2016 Shared Task (Cotterell et al., 2016) Morphological (Re-)Inflection (10 Languages): Neural encoder–decoders 1) character-level input: <s> r u n OUT_POS=V OUT_NUM=SG OUT_TENSE=PRES </s> Output: <s> r u n s </s> 2) Ensembles of seq2seq (GRUs + soft attention (Bahdanau et al., 2015)) 3) Enriching the data with combinations of other (non-lemma) forms Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 37 / 115 [Kann and Schuetze, 2016]
  • 38. SIGMORPHON 2016 Shared Task (Cotterell et al., 2016) Morphological (Re-)Inflection (10 Languages): Neural encoder–decoders 1) character-level input: <s> r u n OUT_POS=V OUT_NUM=SG OUT_TENSE=PRES </s> Output: <s> r u n s </s> 2) Ensembles of seq2seq (GRUs + soft attention (Bahdanau et al., 2015)) 3) Enriching the data with combinations of other (non-lemma) forms Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 38 / 115 [Kann and Schuetze, 2016]
  • 39. SIGMORPHON 2016 Shared Task (Cotterell et al., 2016) Morphological (Re-)Inflection (10 Languages): Neural encoder–decoders 1) Extract input–output string alignments; 2) Train seq2seq (LSTM-based) models to learn a sequence of operations (hard monotonic attention) Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 39 / 115 [Aharoni and Goldberg, 2017]
  • 40. SIGMORPHON 2016 Shared Task (Cotterell et al., 2016) Morphological (Re-)Inflection (10 Languages): Neural encoder–decoders 1) Extract input–output string alignments; 2) Train seq2seq (LSTM-based) models to learn a sequence of operations (hard monotonic attention) Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 40 / 115 [Aharoni and Goldberg, 2017]
  • 41. SIGMORPHON 2016 Shared Task (Cotterell et al., 2016) Morphological (Re-)Inflection (10 Languages): Neural encoder–decoders 1) Extract input–output string alignments; 2) Train seq2seq (LSTM-based) models to learn a sequence of operations (hard monotonic attention) Errors глядеть pos=V,tense=PRS,per=1,num=SG,aspect=IPFV gold: гляжу predicted: глядею увлекаться pos=V,tense=PRS,per=1,num=SG,aspect=IPFV gold: увлекаюсь predicted: увлеклюсь звать pos=V,tense=PRS,per=3,num=SG,aspect=IPFV gold: зовёт predicted: звает Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 41 / 115 [Aharoni and Goldberg, 2017]
  • 42. SIGMORPHON 2016 Shared Task (Cotterell et al., 2016) Morphological (Re-)Inflection (10 Languages): Neural encoder–decoders 1) Extract input–output string alignments; 2) Train seq2seq (LSTM-based) models to learn a sequence of operations (hard monotonic attention) Errors зять pos=N,case=GEN,num=PL gold: зятьёв predicted: зятей перстень pos=N,case=GEN,num=PL gold: перстней predicted: перстеее телекамера pos=N,case=GEN,num=PL gold: телекамер predicted: телекаморо Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 42 / 115 [Aharoni and Goldberg, 2017]
  • 43. SIGMORPHON 2016 Shared Task (Cotterell et al., 2016) Morphological (Re-)Inflection (10 Languages): Neural encoder–decoders 1) Extract input–output string alignments; 2) Train seq2seq (LSTM-based) models to learn a sequence of operations (hard monotonic attention) Errors лоботряс pos=N,case=ACC,num=PL gold: лоботрясов predicted: лоботрясы львица pos=N,case=ACC,num=PL gold: львиц predicted: львица милиционер pos=N,case=ACC,num=PL gold: милиционеров predicted: милиционеры светлячок pos=N,case=ACC,num=PL gold: светлячков predicted: светлячки скот pos=N,case=ACC,num=PL gold: скотов predicted: скоты счёт pos=N,case=ACC,num=PL gold: счета predicted: счеты Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 43 / 115 [Aharoni and Goldberg, 2017]
  • 44. CoNLL–SIGMORPHON 2017 Shared Task (Cotterell et al., 2017) Universal Morphological Reinflection (52 Languages) Task1: Morphological Inflection Task2: Paradigm Completion Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 44 / 115
  • 45. CoNLL–SIGMORPHON 2017 Shared Task (Cotterell et al., 2017) Universal Morphological Reinflection (52 Languages) 3 Settings: Low (100 samples), Medium (1000), High (10,000) Sampled based on their token frequency in Wikipedia corpus (with resampling for syncretic slots) Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 45 / 115 [Cotterell et al., 2017]
  • 46. CoNLL–SIGMORPHON 2017 Shared Task (Cotterell et al., 2017) Universal Morphological Reinflection (52 Languages) 3 Settings: Low (100 samples), Medium (1000), High (10,000) Sampled based on their token frequency in Wikipedia corpus (with resampling for syncretic slots) Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 46 / 115 [Cotterell et al., 2017]
  • 47. CoNLL–SIGMORPHON 2017 Shared Task (Cotterell et al., 2017) Universal Morphological Reinflection (52 Languages): Neural encoder–decoders 1) (Align & Copy): Based on Aharoni and Goldberg, 2017 2) Extract input–output string alignments (add COPY/edit operations) 2) Train seq2seq (LSTM-based) models to learn a sequence of operations (hard monotonic attention) Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 47 / 115 [Makarov et al., 2017]
  • 48. CoNLL–SIGMORPHON 2017 Shared Task (Cotterell et al., 2017) Universal Morphological Reinflection (52 Languages) 3 Settings: Low (100 samples), Medium (1000), High (10,000) Sampled based on their token frequency in Wikipedia corpus (with resampling for syncretic slots) Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 48 / 115 [Makarov et al., 2017]
  • 49. CoNLL–SIGMORPHON 2017 Shared Task (Cotterell et al., 2017) Error taxonomy What are common errors that neural systems make? Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 49 / 115 [Gorman et al., 2019]
  • 50. CoNLL–SIGMORPHON 2017 Shared Task (Cotterell et al., 2017) Error taxonomy What are common errors that neural systems make? Types of Errors Free variation error: more than one acceptable form exists Extraction errors: flaws in UniMorph’s parsing of Wiktionary Wiktionary errors: errors in the Wiktionary data itself Silly errors: “bizarre” errors which defy any purely linguistic characterization (“*membled” instead of “mailed” or enters a loop such as “ynawemaylmyylmyylmyylmyylmyylmyym...” instead of “ysnewem”) Allomorphy errors: misapplication of existing allomorphic patterns Spelling errors: forms that do not follow language-specific orthographic conventions Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 50 / 115 [Gorman et al., 2019]
  • 51. CoNLL–SIGMORPHON 2017 Shared Task (Cotterell et al., 2017) Error taxonomy What are common errors that neural systems make? Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 51 / 115 [Gorman et al., 2019]
  • 52. CoNLL–SIGMORPHON 2017 Shared Task (Cotterell et al., 2017) Error taxonomy What are common errors that neural systems make? Allomorphy Errors Stem-final vowels in Finnish (*pohjanpystykorvojen); Consonant gradation in Finnish (*ei kiemurda) Ablaut in Dutch and German (*pront; *saufte) Umlaut (*Einwohnerzähle, *Förmer), plural suffixes, Verbal prefixes in German (*umkehre) Linking vowels in Hungarian (*masszázsakból instead of *masszázsokból) Yers (*kle ˛sek instead of kle ˛sk), Genitive singular suffixes in Polish (*izotopa) Animacy in Polish and Russian (грузин vs. магазин in ACC.SG ) Aspect in Russian (*будешь сорвать) Internal inflection in Russian compounds (*государствах-донорах, *лёгких промышленности (ACC.PL)) Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 52 / 115 [Gorman et al., 2019]
  • 53. CoNLL–SIGMORPHON 2018 Shared Task (Cotterell et al., 2018) Universal Morphological Reinflection (103 Languages) Task1: Morphological Inflection (Low, Medium, High) Task2: Inflection in Context (Vylomova et al., 2019) Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 53 / 115 [Cotterell et al., 2018]
  • 54. CoNLL–SIGMORPHON 2018 Shared Task (Cotterell et al., 2018) Universal Morphological Reinflection (103 Languages) Task1: Morphological Inflection (Low, Medium, High) Task2: Inflection in Context (Vylomova et al., 2019) Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 54 / 115 [Cotterell et al., 2018]
  • 55. CoNLL–SIGMORPHON 2018 Shared Task (Cotterell et al., 2018) Universal Morphological Reinflection (103 Languages) Task1: Morphological Inflection (Low, Medium, High) Task2: Inflection in Context (Vylomova et al., 2019) Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 55 / 115 [Cotterell et al., 2018]
  • 56. CoNLL–SIGMORPHON 2018 Shared Task (Cotterell et al., 2018) Universal Morphological Reinflection (103 Languages) Task1: Morphological Inflection (Low, Medium, High) Task2: Inflection in Context (Vylomova et al., 2019) Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 56 / 115 [Cotterell et al., 2018] Track 1: With morphosynt. annotation Track 2: Without morphosynt. annotation Requires to capture agreement and infer inherent vs. contextual categories (Vylomova et al., 2019)
  • 57. SIGMORPHON 2019 Shared Task (McCarthy et al., 2019) Morphological Analysis in Context and Cross-Lingual Transfer for Inflection (100 Language Pairs) Task1: Cross-lingual Transfer for Morphological Inflection (10k HR +100 LR) Task2: Morphological Analysis in Context Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 57 / 115 [McCarthy et al., 2019]
  • 58. SIGMORPHON 2019 Shared Task (McCarthy et al., 2019) Morphological Analysis in Context and Cross-Lingual Transfer for Inflection (100 Language Pairs) Task1: Cross-lingual Transfer for Morphological Inflection (10k HR +100 LR) Task2: Morphological Analysis in Context Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 58 / 115 [McCarthy et al., 2019]
  • 59. SIGMORPHON 2019 Shared Task (McCarthy et al., 2019) Morphological Analysis in Context and Cross-Lingual Transfer for Inflection (100 Language Pairs) Task1: Cross-lingual Transfer for Morphological Inflection (10k HR +100 LR) Task2: Morphological Analysis in Context Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 59 / 115 [McCarthy et al., 2019]
  • 60. SIGMORPHON 2019 Shared Task (McCarthy et al., 2019) Morphological Analysis in Context and Cross-Lingual Transfer for Inflection (100 Language Pairs) Task1: Cross-lingual Transfer for Morphological Inflection (10k HR +100 LR) Task2: Morphological Analysis in Context Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 60 / 115 [Anastasopoulos and Neubig, 2019]
  • 61. SIGMORPHON 2019 Shared Task (McCarthy et al., 2019) Morphological Analysis in Context and Cross-Lingual Transfer for Inflection (100 Language Pairs) Task1: Cross-lingual Transfer for Morphological Inflection (10k HR +100 LR) Task2: Morphological Analysis in Context Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 61 / 115 [Anastasopoulos and Neubig, 2019]
  • 62. SIGMORPHON 2019 Shared Task (McCarthy et al., 2019) Morphological Analysis in Context and Cross-Lingual Transfer for Inflection (100 Language Pairs) Task1: Cross-lingual Transfer for Morphological Inflection (10k HR +100 LR) Task2: Morphological Analysis in Context Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 62 / 115 [Anastasopoulos and Neubig, 2019]
  • 63. So... SIGMORPHON Shared Tasks 2016–2019 PLAY + PRESENT PARTICIPLE → playing played + PRESENT PARTICIPLE → playing Lemma Tag Form RUN PAST ran RUN PRES;1SG run RUN PRES;2SG run RUN PRES;3SG runs RUN PRES;PL run RUN PART running Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 63 / 115 2018 :∼ 96% accuracy on avg. in high-resource setting But much less well in low-resource setting
  • 64. So... SIGMORPHON Shared Tasks 2016–2019 PLAY + PRESENT PARTICIPLE → playing played + PRESENT PARTICIPLE → playing Lemma Tag Form RUN PAST ran RUN PRES;1SG run RUN PRES;2SG run RUN PRES;3SG runs RUN PRES;PL run RUN PART running Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 64 / 115 Also see Ling Liu’s 2021 Overview “Computational Morphology with Neural Network Approaches” 2018 :∼ 96% accuracy on avg. in high-resource setting But much less well in low-resource setting
  • 65. PART III: Scaling up and increasing UniMorph Collaboration! From Wiktionary to more linguistic resources: Including grammar books, Apertium data, text/glossed corpora. Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 65 / 115
  • 66. Language-Specific Biases As Bender(2009, 2016) notes architectures and training and tuning algorithms still present language-specific biases Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 66 / 115
  • 67. SIGMORPHON 2020 SHARED TASK 0 (Vylomova et al., 2020) As Bender(2009, 2016) notes architectures and training and tuning algorithms still present language-specific biases Let’s focus on typological diversity and aim to investigate systems’ ability to generalize across typologically distinct languages! Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 67 / 115
  • 68. SIGMORPHON 2020 SHARED TASK 0 (Vylomova et al., 2020) As Bender(2009, 2016) notes architectures and training and tuning algorithms still present language-specific biases Let’s focus on typological diversity and aim to investigate systems’ ability to generalize across typologically distinct languages! If a model works well for a sample of IE languages, should the same model also work well for Tupi–Guarani languages? Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 68 / 115
  • 69. SIGMORPHON 2020 SHARED TASK 0 (Vylomova et al., 2020) 90 Languages from 13 languages families Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 69 / 115
  • 70. Three Phases Development 2 months; train & dev: 45 languages from 5 families (Austronesian, Niger-Congo, Oto-Manguean, Uralic, IE) Generalization 1 week; train & dev: 45 languages from 10 families ( Afro-Asiatic, Algic, Dravidian, Indo-European, Niger-Congo, Sino-Tibetan, Siouan, Songhay, Southern Daly, Tungusic, Turkic, Uralic, and Uto-Aztecan) Evaluation 1 week; test: all 90 languages Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 70 / 115
  • 71. Data Preparation Manually converted their features (tags) into the UniMorph format Canonicalized (https://github.com/unimorph/um-canonicalize) the converted language data Splitting Used only noun, verb, and adjective forms to construct training, development, and evaluation sets. Randomly sampled 70%, 10%, and 20% for train, development, and test, respectively. Zarma, Tajik, Lingala, Ludian, Māori, Sotho, Võro, Anglo-Norman, and Zulu contain less than 400 training samples Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 71 / 115
  • 72. Systems: Baselines Non-neural Simple alignment-based as in previous years (Cotterell et al., 2017;2018) Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 72 / 115
  • 73. Systems: Baselines Neural Neural transducer (Wu et al, 2019), which is essentially a hard monotonic attention model (mono-*) Transformer adopted for character-level tasks Wu et al, (2020; trm-*), SoTA on ST 2017 + data augmentation technique used by Anastasopoulos et al. (2019;-aug-) + family-wise shared parameters (*-shared) Team Description System Model Features Neural Ensemble Multilingual Hallucination Baseline wu2019exact mono-single mono-aug-single mono-shared mono-aug-shared wu2020applying trm-single trm-aug-single trm-shared trm-aug-shared Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 73 / 115
  • 74. Systems: Teams 10 teams submitted 22 systems in total, out of which 19 were neural Team Description System Model Features Neural Ensemble Multilingual Hallucination CMU Tartan Jayarao et al.(2020) cmu_tartan_00-0 cmu_tartan_00-1 cmu_tartan_01-0 cmu_tartan_01-1 cmu_tartan_02-1 CU7565 Beemer et al. (2020) CU7565-01-0 CU7565-02-0 CULing Liu et al. (2020) CULing-01-0 DeepSpin Peterset al. (2020) deepspin-01-1 deepspin-02-1 ETH Zurich Forster et al. (2020) ETHZ00-1 ETHZ02-1 Flexica Scherbakov (2020) flexica-01-0 flexica-02-1 flexica-03-1 IMS Yu et al. (2020) IMS-00-0 LTI Murikinati et al. (2020) LTI-00-1 NYU-CUBoulder Singer et al. (2020) NYU-CUBoulder-01-0 NYU-CUBoulder-02-0 NYU-CUBoulder-03-0 NYU-CUBoulder-04-0 UIUC Canby et al. (2020) uiuc-01-0 Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 74 / 115
  • 75. Systems: Description (* – winning system) Improving neural baselines *UIUC: transformers with synchronous bidirectional decoding technique (Zhou et al.,2019) and family-wise fine-tuning ETH Zurich: exact decoding strategy that uses Dijkstra’s search algorithm Improving previous years’ models: Hard Monotonic Attention IMS: L2R+R2L models with a genetic algorithm for ensemble search and data hallucination Flexica:multilingual (family-wise) model with improved alignment strategy + new data hallucination technique based on phonotactic modelling Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 75 / 115
  • 76. Systems: Description (* – winning system) Improving their 2019 models LTI: multi-source encoder–decoder with two-step attention architecture + cross-lingual transfer+ data hallucination + romanization of scripts *DeepSpin: massively multilingual (all languages) gated sparse two-headed attention model with sparsemax + 1.5-entmax Transformer vs. LSTMs CMU Tartan: compared trasformer- and LSTM-based encoder–decoders trained mono- and multilingually with data hallucination Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 76 / 115
  • 77. Systems: Description (* – winning system) Ensembles of Transformers NYU-CUBoulder: compared vanilla and pointer-generator (monolingual) transformers + ensembles of three and five pointer-generator transformers + data hallucination (less than 1,000 samples) *CULing: ensemble of three (monolingual) transformers + augmented the initial input (that only used the lemma as a source form) with entries corresponding to other (non-lemma) slots (reinflection) to improve learning of principal parts of paradigm Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 77 / 115
  • 78. Systems: Description (* – winning system) Non-neural systems CU7565: manually developed finite-state grammars for 25 languages + hierarchical paradigm clustering (based on similarity of string transformation rules) Flexica: a method similar to Hulden (2014) but with transformation rules treated independently and assigned a score based on their frequency, specificity and diversity of surrounding characters Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 78 / 115
  • 79. Evaluation Per-language accuracy Per-language Levenstein distance Takes into account the statistical significance of differences between systems Ranking Any system which is the same (as assessed via statistical significance) as the best performing one is also ranked 1st for that language. For genus/family: We aggregate the systems’ ranks and re-rank them based on the amount of times they ranked 1st, 2nd, etc. Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 79 / 115
  • 80. Results: 4 winning systems (outperform baselines) uiuc-01-0 2.4 90.5 deepspin-02-1 2.9 90.9 BASE: trm-single 2.8 90.1 CULing-01-0 3.2 91.2 deepspin-01-1 3.8 90.5 BASE: trm-aug-single 3.7 90.3 NYU-CUBoulder-04-0 7.1 88.8 NYU-CUBoulder-03-0 8.9 88.8 NYU-CUBoulder-02-0 8.9 88.7 IMS-00-0 10.6 89.2 NYU-CUBoulder-01-0 9.6 88.6 BASE: trm-shared 10.3 85.9 BASE: mono-aug-single 7.5 88.8 cmu_tartan_00-0 8.7 87.1 BASE: mono-single 7.9 85.8 cmu_tartan_01-1 9.0 87.1 BASE: trm-aug-shared 12.5 86.5 BASE: mono-shared 10.8 86.0 cmu_tartan_00-1 9.4 86.5 LTI-00-1 12.0 86.6 BASE: mono-aug-shared 12.8 86.8 cmu_tartan_02-1 10.6 86.1 cmu_tartan_01-0 10.9 86.6 flexica-03-1 16.7 79.6 ETHZ-00-1 20.1 75.6 *CU7565-01-0 24.1 90.7 flexica-02-1 17.1 78.5 *CU7565-02-0 19.2 83.6 ETHZ-02-1 17.0 80.9 flexica-01-0 24.4 70.8 Oracle (Baselines) 96.1 Oracle (Submissions) 97.7 Oracle (All) 97.9 Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 80 / 115
  • 81. Results: 4 winning systems (outperform baselines) uiuc-01-0 2.4 90.5 deepspin-02-1 2.9 90.9 BASE: trm-single 2.8 90.1 CULing-01-0 3.2 91.2 deepspin-01-1 3.8 90.5 BASE: trm-aug-single 3.7 90.3 NYU-CUBoulder-04-0 7.1 88.8 NYU-CUBoulder-03-0 8.9 88.8 NYU-CUBoulder-02-0 8.9 88.7 IMS-00-0 10.6 89.2 NYU-CUBoulder-01-0 9.6 88.6 BASE: trm-shared 10.3 85.9 BASE: mono-aug-single 7.5 88.8 cmu_tartan_00-0 8.7 87.1 BASE: mono-single 7.9 85.8 cmu_tartan_01-1 9.0 87.1 BASE: trm-aug-shared 12.5 86.5 BASE: mono-shared 10.8 86.0 cmu_tartan_00-1 9.4 86.5 LTI-00-1 12.0 86.6 BASE: mono-aug-shared 12.8 86.8 cmu_tartan_02-1 10.6 86.1 cmu_tartan_01-0 10.9 86.6 flexica-03-1 16.7 79.6 ETHZ-00-1 20.1 75.6 *CU7565-01-0 24.1 90.7 flexica-02-1 17.1 78.5 *CU7565-02-0 19.2 83.6 ETHZ-02-1 17.0 80.9 flexica-01-0 24.4 70.8 Oracle (Baselines) 96.1 Oracle (Submissions) 97.7 Oracle (All) 97.9 Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 81 / 115 The baselines and the submissions are complementary adding them together increases the oracle scored The largest gaps in oracle systems are observed in Algic, Oto-Manguean Sino-Tibetan, Southern Daly, Tungusic, and Uto-Aztecan families
  • 82. Accuracy by language averaged across all submissions Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 82 / 115
  • 83. Accuracy by language averaged across all submissions Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 83 / 115 A significant effect of dataset size was observed Relatively easy: Austronesian and Niger-Congo Difficult: some Uralic and Oto-Manguean languages Challenging: Ludic, Norwegian Nynorsk, Middle Low German , Evenki, and O’odham
  • 84. Accuracy by Language Has morphological inflection become a solved problem in certain scenarios? We have classified test examples into four categories: Very Easy: all submitted systems got correct Easy: predicted correctly by 80% of systems Hard: predicted correctly by 20% of systems Very Hard: none submitted systems got correct Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 84 / 115
  • 85. Noun Samples Difficulty 1 3 6 7 1 5 0 9 2 3 3 9 1 0 8 1 3 4 6 4 5 9 0 1 5 8 5 7 3 9 9 8 8 7 0 1 6 5 1 1 7 9 9 1 9 6 2 4 8 2 3 8 9 5 3 9 7 4 5 4 4 4 2 9 3 3 3 1 3 5 9 6 2 1 1 5 2 0 2 2 6 0 2 7 0 1 7 5 8 4 6 6 3 1 1 3 4 9 1 2 3 5 0 9 8 0 2 4 8 4 4 7 1 6 7 0 4 1 4 9 1 6 3 7 2 1 3 3 9 7 0.00 0.25 0.50 0.75 1.00 ang aze bak ben crh dan deu est evn gmh gml isl izh kan kjh kpv krl liv mdf mhr mlt myv nno nob olo ood pus san sme swe syc tel udm urd vep vot vro VeryEasy Easy Hard VeryHard Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 85 / 115
  • 86. Verb Samples Difficulty 7 6 2 2 9 4 0 1 4 5 6 2 7 2 3 8 7 6 9 4 9 2 0 1 4 8 4 1 1 1 0 9 4 0 1 4 1 2 1 0 8 1 7 2 5 7 8 3 0 2 7 4 6 2 5 5 1 2 5 4 4 2 3 0 1 6 3 5 7 7 5 0 6 7 2 0 6 7 0 3 2 4 3 5 1 5 4 5 1 6 8 6 8 8 1 1 2 4 1 7 2 3 8 4 2 3 6 1 0 3 6 8 3 5 3 9 7 4 8 2 0 8 9 1 0 8 5 1 5 5 5 9 0 8 1 3 4 8 2 4 4 1 7 7 1 4 4 9 9 6 8 4 1 3 0 6 3 5 3 9 3 1 2 6 3 5 0 2 0 9 7 3 6 2 6 8 0 4 3 4 9 5 9 6 8 5 2 2 8 4 2 2 4 4 4 0 3 6 4 2 7 5 9 8 0 2 4 9 3 9 0 2 9 3 3 6 4 4 1 9 8 9 0 9 3 7 3 2 2 1 6 4 7 2 5 7 8 6 7 7 8 6 1 4 6 8 2 2 7 4 7 1 8 5 3 4 8 6 4 9 3 6 4 3 5 9 7 2 2 6 7 5 0.00 0.25 0.50 0.75 1.00 a k a a n g a s t a z e a z g b e n b o d c a t c e b c l y c p a c r e c r h c t p c z n d a k d a n d e u e n g e s t e v n f a s f r m f r r f u r g a a g l g g m h g m l g s w h i l h i n i s l k a n k a z k i r k o n k p v k r l l i n l i v l l d l u g m a o m d f m h r m l g m l t m w f m y v n l d n n o n o b n y a o l o o o d o r m o t e o t m p e i p u s s m e s n a s o t s w a s w e t e l t g l t u k u d m u i g u r d u z b v e c v e p x n o x t y z p v z u l VeryEasy Easy Hard VeryHard Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 86 / 115
  • 88. Questions Addressed in Papers Is developing morphological grammars manually worthwhile? CU7565 manually designed finite-state grammars for 25 languages Paradigms of some languages were relatively easy to describe but neural networks also performed quite well For Ingrian and Tagalog (LRL) grammars demonstrate superior performance but this comes at the expense of a significant amount of person-hours Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 88 / 115
  • 89. Questions Addressed in Papers What is the best training strategy for low-resource languages? Hallucinated data highlighted its utility for LRLs. Augmenting the data with tuples where lemmas are replaced with non-lemma forms and their tags Multilingual training Ensembles Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 89 / 115
  • 90. Error Analysis Systematic Errors: Data Inconsistency The train, development and test sets contain 2%, 0.3%, and 0.6% inconsistent entries Highest rates: Azerbaijani, Old English, Cree, Danish, Middle Low German , Kannada, Norwegian Bokmål, Chichimec, and Veps Dialectal variations in Finno-Ugric and Tungusic Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 90 / 115
  • 91. Language-Specific Errors Algic (Cree) Mean accuracy across systems was 65.1% (41.5% to 73%) Struggled with the choice of preverbal auxiliary ( ‘kitta’ could refer to future, imperfective, or imperative) The paradigms were very large, there were very few lemmas (28 impersonal verbs and 14 transitive verbs Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 91 / 115
  • 92. Language-Specific Errors Austronesian Mean accuracy across systems was 80.5% (39.5% to 100%) Baseline: Cebuano (84%) and Hiligaynon (96%) Cebuano only has partial reduplication while Hiligaynon has full reduplication The prefix choice for Cebuano is more irregular, making it more difficult to predict the correct conjugation of the verb In Maori passive voice endings are difficult to predict as the language has undergone a loss of word-final consonants and there is no clear link between a stem and the passive suffix that it employs Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 92 / 115
  • 93. Language-Specific Errors Niger-Congo Mean accuracy across systems was very good at 96.4 (62.8% to 100%) Most languages in this family are considered low resource, and the resources used for data gathering may have been biased towards the languages’ regular forms Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 93 / 115
  • 94. Language-Specific Errors Sino–Tibetan (Tibetan) Mean accuracy across systems was average at 82.1%(67.9% to 85.1%) Majority of errors are related to allomorphy Nonce words and impossible combinations of component units (Di et al., 2019) Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 94 / 115
  • 95. Language-Specific Errors Siouan (Dakota) Mean accuracy across systems was average at 89.4%(0% to 95.7%) Variable prefixing and infixing of person morphemes, along some complexities related to fortition processes Determining the factor(s) that governed variation in affix position was difficult from a linguist’s perspective, though many systems were largely successful Issues with first and second person singular allomorphy Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 95 / 115
  • 96. Language-Specific Errors Tungusic (Evenki) Mean accuracy across systems was average at 53.8% (43.5% to 59.0%) The dataset was created from oral speech samples in various dialects of the language; there was little attempt at any standardization in the oral speech transcription Annotation: various past tense forms are all annotated as PST, or there are several comitative suffixes all annotated as COM Annotation: some features are present in the word form but they receive no annotation at all Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 96 / 115
  • 97. Language-Specific Errors Uto-Aztecan (O’odham) Mean accuracy across systems was average at 76.4% (54.8% to 82.5%) Systems with higher accuracy may have benefited from better recall of suppletive forms relative to lower accuracy systems. Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 97 / 115
  • 98. SM2020ST0 (Vylomova et al., 2020): Conclusion AND.....TO CONCLUDE: Submissions were able to make productive use of multilingual training Data augmentation techniques such as hallucination helped Combined with architecture tweaks like sparsemax, it resulted in excellent overall performance on many languages Some morphology types and language families (Tungusic, Oto-Manguean, Southern Daly) are still challenging In some languages (Ingrian, Tajik, Tagalog, Zarma, and Lingala) hand-encoding linguist knowledge in finite state grammars resulted in best performance Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 98 / 115
  • 99. A Case Study on Nen (Papua New Guinea); Muradoglu et al., 2020 Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 99 / 115
  • 100. A Case Study on Nen (Papua New Guinea); Muradoglu et al., 2020 Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 100 / 115 Spoken in the village of Bimadbn in the Western Province of PNG, by approx 400 people Verbs: prefixing, middle, and ambifixing Distributed Exponence (DE); “morphosyntactic feature values can only be determined after unification of multiple structural positions
  • 101. A Case Study on Nen (Papua New Guinea); Muradoglu et al., 2020 Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 101 / 115
  • 102. A Case Study on Nen (Papua New Guinea); Muradoglu et al., 2020 Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 102 / 115 Low accuracy on small number of samples (<1000)
  • 103. A Case Study on Nen (Papua New Guinea); Muradoglu et al., 2020 Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 103 / 115 Low accuracy on small number of samples (<1000) Allomorphy: vowel harmony Variation in forms/spelling Looping: *ynawemaylmyylmyylmyylmy-ylmyylmyymayamawemyymamya Shcherbakov et al., 2020
  • 104. A Case Study on Nen (Papua New Guinea); Muradoglu et al., 2020 How well do the models generalize? Syncretism Test: all the TAM categories exhibit syncretism across the second and third-person singular actor. Exception: The past perfective slot (where they take different forms) Not observing the past perfective forms, systems tend to predict the forms as syncretic (generalizing from observed slots), resulting in the misprediction of the actual forms (exceptions) Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 104 / 115
  • 105. SIGMORPHON 2021 Shared Task 0 (Pimentel, Ryskina et al., 2021): More under-resourced languages! Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 105 / 115
  • 106. SIGMORPHON 2021 Shared Task 0 (Pimentel, Ryskina et al., 2021): More under-resourced languages! Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 106 / 115
  • 107. SIGMORPHON 2021 Shared Task 0 (Pimentel, Ryskina et al., 2021) Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 107 / 115
  • 108. SIGMORPHON 2021 Shared Task 0 (Pimentel, Ryskina et al., 2021) Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 108 / 115 Allomorphy Spelling errors Multi-Word Lemmas Complex transformation patterns
  • 109. SIGMORPHON 2021 Shared Task 0 (Pimentel, Ryskina et al., 2021) Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 109 / 115 Allomorphy Spelling errors Most errors are due to limited data Very sparse data w/o complete paradigms (e.g.,Eibela) Misprediction in unseen lemmas (also see Goldman et al., 2021) Multi-Word Lemmas Complex transformation patterns
  • 110. Language-Specific Errors Russian Mean accuracy across systems was average at 97.4%(94.31% to 98.06%) Incorrect prediction of the instrumental case forms (even when the other parts of the same paradigm observed (for the same lemma)) Incorrect prediction of the accusative forms. The forms are different for animate and inanimate nouns, and animacy should be inferred (from observing other slot of the same case such as PL or SG) Errors in inflection of multi-word lemmas that require to infer dependency information. Similarly, to the above cases, the information could be inferred from other slots of the same paradigm Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 110 / 115
  • 111. Language-Specific Errors Kunwinjku Accuracy across systems ranges from 14.75% to 63.93% Due to limited amount of data, augmentation significantly improved the performance Systems mispredict *ngurriborlbme instead of ngurriborle. looping effects (Shcherbakov et al., 2020) are observed in RNN-based architectures: *ngar-rrrrrrrrrrrrrmbbbijj (should be karribelbmerrinj), ngadjarridarrkddrrdddrrmerri (should be karriyawoyhdjarrkbidyikarrmerrimeninj) Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 111 / 115
  • 112. PART IV: Current Challenges and Future Directions Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 112 / 115
  • 113. Challenges in Data Conversion/Annotation Challenges in Data Conversion/Annotation Case compounding and stacking (e.g., Kayardild) I gave the book to my brother’s wife: ‘wife+DAT+ABL, my+GEN+DAT+ABL, brother+GEN+DAT+ABL’ Clitics: exponential growth of paradigm tables Polysynthetic languages and paradigms Derivation – Inflection continuum: some paradigms contain derivations (participle formation, masdars, etc) and require multi-step transformation (PL: similar to ‘to run’ → ‘runners’ ). Multi-word lemmas that might require dependency information Which features should be added (not language-specific)? Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 113 / 115
  • 114. Future Directions Future Directions Develop a framework for error analysis, e.g. measuring %-ge of allomorphy errors by providing a set of tasks specifically for allomorphy (e.g., following Elsner and Sims, 2019; Malouf et al., 2020) Increase interpretability of the models, design a methodology to extract the patterns learned by the model Make more typologically plausible language samples A pipeline to augment UniMorph with new morphosyntatic features An approach to estimate how representative a paradigm sample for a specific language is (estimate of the language coverage) ... And ST0 Part 2: Human-like generalization and WUGS! Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 114 / 115
  • 115. Thank you! Questions? Please join us: https://groups.google.com/g/unimorph Ekaterina Vylomova UniMorph and Morphological Inflection Task 20 августа 2021 г. 115 / 115