SlideShare une entreprise Scribd logo
1  sur  53
Télécharger pour lire hors ligne
What Do Neural Models "Know" About Natural Language?
Ekaterina Vylomova
Vylomova, Ekaterina Neural models and Natural Language 1 / 53
1943: Artificial Neuron (McCulloch-Pitts)
... or, in other words, ˆy = f ( n
i=1 wi xi + b),
Vylomova, Ekaterina Neural models and Natural Language 2 / 53
1943: Artificial Neuron (McCulloch-Pitts)
... or, in other words, ˆy = f ( n
i=1 wi xi + b),
and activation function might be sigmoid: sig(x) = 1
1+e−x
Vylomova, Ekaterina Neural models and Natural Language 3 / 53
1957: Simple Perceptron
The Perceptron: A Probabilistic Model for Information Storage and
Organization in the Brain
Trained with trial-and-error method
It can:
– generalize over characters
– discover character-specific features
But:
– failed to recognized badly written/different
size/partially closed characters
Vylomova, Ekaterina Neural models and Natural Language 4 / 53
1960s: Single Layer Perceptron
The Perceptron: A Probabilistic Model for Information Storage and
Organization in the Brain
Perceptrons: an introduction to
computational geometry
XOR Problem
Vylomova, Ekaterina Neural models and Natural Language 5 / 53
1980s: Multi-Layer Perceptrons with Back-Propagation
Learning Internal Representations by Error Propagation
Solving problems with non-linearly
separable cases
Vylomova, Ekaterina Neural models and Natural Language 6 / 53
1980s: The Past Tense Debate
Rumelhart & McClelland (1985): On learning the past tenses of
English verbs
Vylomova, Ekaterina Neural models and Natural Language 7 / 53
1980s: The Past Tense Debate
Rumelhart & McClelland (1985): On learning the past tenses of
English verbs
Pinker &Prince, 1988: Extremely poor
empirical performance!
Vylomova, Ekaterina Neural models and Natural Language 8 / 53
1990s: RNNs
Finding structure in time
Exploring
– context-dependent learning
– structure in letter sequences
– learning lexical classes from word order
Vylomova, Ekaterina Neural models and Natural Language 9 / 53
1990s: CNNs
Backpropagation Applied to Handwritten Zip Code Recognition
Training
Data: 9,298 segmented numerals from U.S.
mail
Mislassified: Training – 0.14%; Test – 5.0%
Vylomova, Ekaterina Neural models and Natural Language 10 / 53
Meanwhile in NLP: Language Modelling (mostly Ngrams with Kneser-Ney
smoothing)
OK, Marvin, which word comes next: Two cats are ___
Hmmm, let me guess ...
sitting 3.01 ∗ 10−4
play 2.87 ∗ 10−4
running 2.53 ∗ 10−4
nice 2.32 ∗ 10−4
lost 1.97 ∗ 10−4
playing 1.66 ∗ 10−4
sat 1.54 ∗ 10−4
plays 1.32 ∗ 10−4
. .Vylomova, Ekaterina Neural models and Natural Language 11 / 53
2013: Word2Vec Skip-Gram
Distributed Representations of Words and Phrases and their
Compositionality
Training Objective
1
T
T
t=1 −c≤j≤c logp(wt+j |wt)
p(wo|wi ) = exp(v T
wo vwi )
W
w=1 exp(v T
w vwi )
For efficiency, softmax was replaced with Negative
Sampling.
Levy et al., 2015 experimented with positive pointwise
mutual information (PMI) matrix and showed that
Word2vec Skip-Gram with NS is implicit matrix
factorization.
Vylomova, Ekaterina Neural models and Natural Language 12 / 53
2013: Word2Vec CBOW
Efficient Estimation of Word Representations in Vector Space
Training Objective
1
T
T
t=1 logp(wt|w[t−c,t+c])
p(wo|wi ) =
exp(v T
wo −c≤j≤c vwi+j
)
W
w=1 exp(v T
w −c≤j≤c vwi+j
)
Vylomova, Ekaterina Neural models and Natural Language 13 / 53
2013: Word2Vec
Linear Relations and Compositionality
Vylomova, Ekaterina Neural models and Natural Language 14 / 53
2013: Word2Vec: Word Analogies
Linear Relations and Compositionality: Russia + river =
Volga_river
Vylomova, Ekaterina Neural models and Natural Language 15 / 53
2013: Word2Vec: Word Analogies
Linear Relations and Compositionality: king-man+woman = queen?
Vylomova, Ekaterina Neural models and Natural Language 16 / 53
Word Analogies on other embeddings
Word Embeddings, Analogies, and Machine Learning: Beyond King
- Man+ Woman= Queen
Vylomova, Ekaterina Neural models and Natural Language 17 / 53
Word Analogies on other embeddings
Word Embeddings, Analogies, and Machine Learning: Beyond King
- Man+ Woman= Queen
Vylomova, Ekaterina Neural models and Natural Language 18 / 53
Word Analogies on other embeddings
Word Embeddings, Analogies, and Machine Learning: Beyond King
- Man+ Woman= Queen
Vylomova, Ekaterina Neural models and Natural Language 19 / 53
Word Analogies on other embeddings
Word Embeddings, Analogies, and Machine Learning: Beyond King
- Man+ Woman= Queen
Vylomova, Ekaterina Neural models and Natural Language 20 / 53
Word Analogies on other embeddings
Word Embeddings, Analogies, and Machine Learning: Beyond King
- Man+ Woman= Queen
Vylomova, Ekaterina Neural models and Natural Language 21 / 53
Pre-trained Word2Vec (Google News): Bias and Stereotypes
Man is to Computer Programmer as Woman is to Homemaker?
Vylomova, Ekaterina Neural models and Natural Language 22 / 53
Word2vec trained of Reddit data: Bias and Stereotypes
Black is to Criminal as Caucasian is to Police
Vylomova, Ekaterina Neural models and Natural Language 23 / 53
Data Bias and Stereotypes
Gendered Language
Positive adjectives describing women are often related to their bodies, while positive adjectives
describing men are often related to their behavior.
Vylomova, Ekaterina Neural models and Natural Language 24 / 53
Word2Vec and similar models
What do the models learn?
Morphology
– Are capable of learning inflections but not much derivations (less regular and compositional)
Lexical Semantics
– Challenging, especially meronyms, antonyms, synonyms
Major Difficulties
– Polysemy (all word senses in a single vector)
– Negation
Vylomova, Ekaterina Neural models and Natural Language 25 / 53
Broader context – back to RNNs!
Vylomova, Ekaterina Neural models and Natural Language 26 / 53
Neural Machine Translation: Seq2Seq Models (Sutskever et al., 2014)
Vylomova, Ekaterina Neural models and Natural Language 27 / 53
The resulting LSTM has 384M params
64M are pure recurrent connections
BUT: Longer contexts – lower quality (vanishing gradient)
Long Short-Term Memory will solve it!
Vylomova, Ekaterina Neural models and Natural Language 28 / 53
Neural Machine Translation: Seq2Seq Models (Sutskever et al., 2014
Vylomova, Ekaterina Neural models and Natural Language 29 / 53
PCA projection of LSTM hidden state of the corresponding sequences
We can also use both directions (to encode source language)
Vylomova, Ekaterina Neural models and Natural Language 30 / 53
Neural Machine Translation: Seq2Seq Models w/Attention (Bahdanau et al,
2014)
A whole sentence shouldn’t be compressed into a single vector! Use
Attention!
Vylomova, Ekaterina Neural models and Natural Language 31 / 53
Neural Machine Translation: Seq2Seq Models w/Attention (Bahdanau et al,
2014)
A whole sentence shouldn’t be compressed into a single vector! Use
Attention!
Vylomova, Ekaterina Neural models and Natural Language 32 / 53
Neural Machine Translation: Seq2Seq Models w/Attention (Bahdanau et al,
2014)
It learns alignment and it can be visualized!
Vylomova, Ekaterina Neural models and Natural Language 33 / 53
Neural Machine Translation: Seq2Seq Models w/Attention (Bahdanau et al,
2014)
What do the models learn?
Belinkov et al., 2018a, 2018b
– Higher layers are better at learning semantics while lower layers tend to be better for
part-of-speech tagging
– Lower layers of the neural network are better at capturing morphology
Linzen et al., 2018, 2020
English Subject-Verb agreement:
–LSTMs were able to learn to perform the verb-number agreement task in most cases, although
their error rate increased on particularly difficult sentences.
– the LM objective is not by itself sufficient for learning structure-sensitive dependencies, and
suggest a joint training objective
Vylomova, Ekaterina Neural models and Natural Language 34 / 53
Neural Machine Translation: Seq2Seq Models w/Attention (Bahdanau et al,
2014)
What do the models learn?
Vylomova et al., 2019
– Contextual inflection in 10 languages: Three little kitten were _sit_ on the mat. Predict:
sitting
– Agreement: Adjective-Noun ok, Subject-Verb more challenging
– Morphological complexity matters (Uralic languages are more challenging than Germanic)
– Inherent vs. contextual categories. Inherent (tense, noun number, w/o agreement or extra
signal) cannot be predicted
Vylomova, Ekaterina Neural models and Natural Language 35 / 53
Back to Past Tense Debate: Seq2Seq Models w/Attention
Kirov & Cotterell,2018: The model obviates most of Pinker and
Prince’s criticisms
SIGMORPHON 2016 Shared Task
Task 1: run + V;PRES;3SG → (runs)
On Arabic, Finnish, Georgian, German, Hungarian, Maltese, Navajo, Russian, Spanish
Vylomova, Ekaterina Neural models and Natural Language 36 / 53
Lake et al., 2018: Compositionality of RNNs
Vylomova, Ekaterina Neural models and Natural Language 37 / 53
Simplified version of the CommAI Navigation tasks
Lake et al., 2018: Compositionality of RNNs
Vylomova, Ekaterina Neural models and Natural Language 38 / 53
Simplified version of the CommAI Navigation tasks
Successful zero-shot generalizations when the differences between training and test command
Trained on "run", "jump" and "run twice" fails on "jump twice"
Contextualized Embeddings: Addressing the problem with polysemy!
Context matters! ELMo: Let’s make context-specific embeddings!
Features
– Two Independent(!) LSTMs
– Pre-trained embeddings
– Weighted-task specific sum of
embeddings (two hidden state +
word vector)
Vylomova, Ekaterina Neural models and Natural Language 39 / 53
Self-Attention (Cheng et al., 2016)
Relate parts of a single sequence to compute its representation
Vylomova, Ekaterina Neural models and Natural Language 40 / 53
Shows similarity to other parts!
Helpful for coreference resolution!
Contextualized Embeddings
Transformer: Attention is All you Need
Features
– No recursion but wide window (somewhat similar to
CNN)
– positional embeddings (to access token positions)
– Self-attention with several heads (matrices) and separate
key, query and value (masks)
Vylomova, Ekaterina Neural models and Natural Language 41 / 53
Contextualized Embeddings
BERT: Deep Bidirectional Transformers
Features
– Trained on: Masked tokens prediction + Next sentence prediction (binary) – BPE tokenization
– Window: 512, CLS – classification
Vylomova, Ekaterina Neural models and Natural Language 42 / 53
Contextualized Embeddings
BERT: Deep Bidirectional Transformers
Vylomova, Ekaterina Neural models and Natural Language 43 / 53
Contextualized Embeddings
BERTs
BERT BASE(L=12, H=768, A=12, Total Parameters=110M)
BERT LARGE(L=24, H=1024,A=16, Total Parameters=340M).
Vylomova, Ekaterina Neural models and Natural Language 44 / 53
Contextualized Embeddings: BERT
Vylomova, Ekaterina Neural models and Natural Language 45 / 53
Contextualized Embeddings: Word Sense Disambiguation
Word Sense Disambiguation
"A mouse consists of an object held in one’s hand, with one or more buttons."
"Mouse" – an electronic device
Vylomova, Ekaterina Neural models and Natural Language 46 / 53
Contextualized Embeddings: Word Sense Disambiguation
Word Sense Disambiguation
"A mouse consists of an object held in one’s hand, with one or more buttons."
"Mouse" – an electronic device
Vylomova, Ekaterina Neural models and Natural Language 47 / 53
Contextualized Embeddings: Coreference Resolution
Coreference resolution task
The secretary called the physician and told _him_ about a new patient.
him → physician
Vylomova, Ekaterina Neural models and Natural Language 48 / 53
Contextualized Embeddings: Coreference Resolution
Gender Bias in Coreference Resolution
WinoBias: a Winograd-schema style sentences with entities corresponding to people referred by
their occupation
Vylomova, Ekaterina Neural models and Natural Language 49 / 53
Contextualized Embeddings: Bias, bias, bias
Zhao et al., 2019
– Coref SOTA system that depends on ELMo inherits its bias and demonstrates significant bias
on the WinoBias
– training data for ELMo contains significantly more male than female entities
– the trained ELMo embeddings systematically encode gender information
– ELMo unequally encodes gender information about male and female entities
Vylomova, Ekaterina Neural models and Natural Language 50 / 53
Contextualized Embeddings: What does BERT know (Rogers et al., 2020)?
Syntax
– Representations are hierarchical rather than linear and encode POS and syntactic roles(Liu et
al., 2019a,b)
– Does not “understand” negation and is insensitive to malformed input (Ettinger, 2019)
Semantics
– Has some knowledge for semantic roles(Ettinger, 2019)
– Struggles with representations of numbers (floating point; Wallace et al., 2019b)
World Knowledge
– Cannot reason based on its world knowledge ("A dog entered the room" doesn’t yield that
"room is larger than the dog")
Vylomova, Ekaterina Neural models and Natural Language 51 / 53
Extra resources
NLP Progress
Hugging Face – Models
"Embeddings in Natural Language Processing" book
"Dive into Deep Learning" interactive book
Vylomova, Ekaterina Neural models and Natural Language 52 / 53
Thank you! Questions?
Vylomova, Ekaterina Neural models and Natural Language 53 / 53

Contenu connexe

Tendances

Tooltip-type, Frame-type, and Concordance Glossing in L2 Reading
Tooltip-type, Frame-type, and Concordance Glossing in L2 ReadingTooltip-type, Frame-type, and Concordance Glossing in L2 Reading
Tooltip-type, Frame-type, and Concordance Glossing in L2 Reading
engedukamall
 
word embeddings and applications to machine translation and sentiment analysis
word embeddings and applications to machine translation and sentiment analysisword embeddings and applications to machine translation and sentiment analysis
word embeddings and applications to machine translation and sentiment analysis
Mostapha Benhenda
 
A Semantic Importing Approach to Knowledge Reuse from Multiple Ontologies
A Semantic Importing Approach to Knowledge Reuse from Multiple OntologiesA Semantic Importing Approach to Knowledge Reuse from Multiple Ontologies
A Semantic Importing Approach to Knowledge Reuse from Multiple Ontologies
Jie Bao
 
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
Lifeng (Aaron) Han
 
Composing (Im)politeness in Dependent Type Semantics
Composing (Im)politeness in Dependent Type SemanticsComposing (Im)politeness in Dependent Type Semantics
Composing (Im)politeness in Dependent Type Semantics
Daisuke BEKKI
 
Representing and Reasoning with Modular Ontologies (2007)
Representing and Reasoning with Modular Ontologies (2007)Representing and Reasoning with Modular Ontologies (2007)
Representing and Reasoning with Modular Ontologies (2007)
Jie Bao
 

Tendances (20)

Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Word Embeddings, why the hype ?
Word Embeddings, why the hype ?
 
A general method applicable to the search for anglicisms in russian social ne...
A general method applicable to the search for anglicisms in russian social ne...A general method applicable to the search for anglicisms in russian social ne...
A general method applicable to the search for anglicisms in russian social ne...
 
Tooltip-type, Frame-type, and Concordance Glossing in L2 Reading
Tooltip-type, Frame-type, and Concordance Glossing in L2 ReadingTooltip-type, Frame-type, and Concordance Glossing in L2 Reading
Tooltip-type, Frame-type, and Concordance Glossing in L2 Reading
 
Ngrams smoothing
Ngrams smoothingNgrams smoothing
Ngrams smoothing
 
word embeddings and applications to machine translation and sentiment analysis
word embeddings and applications to machine translation and sentiment analysisword embeddings and applications to machine translation and sentiment analysis
word embeddings and applications to machine translation and sentiment analysis
 
Natural Language Inference: for Humans and Machines
Natural Language Inference: for Humans and MachinesNatural Language Inference: for Humans and Machines
Natural Language Inference: for Humans and Machines
 
Effective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic AmbiguityEffective Approach for Disambiguating Chinese Polyphonic Ambiguity
Effective Approach for Disambiguating Chinese Polyphonic Ambiguity
 
Deciability (automata presentation)
Deciability (automata presentation)Deciability (automata presentation)
Deciability (automata presentation)
 
A Semantic Importing Approach to Knowledge Reuse from Multiple Ontologies
A Semantic Importing Approach to Knowledge Reuse from Multiple OntologiesA Semantic Importing Approach to Knowledge Reuse from Multiple Ontologies
A Semantic Importing Approach to Knowledge Reuse from Multiple Ontologies
 
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
 
Cs6503 theory of computation book notes
Cs6503 theory of computation book notesCs6503 theory of computation book notes
Cs6503 theory of computation book notes
 
Embedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglioEmbedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglio
 
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
 
presentation
presentationpresentation
presentation
 
(Deep) Neural Networks在 NLP 和 Text Mining 总结
(Deep) Neural Networks在 NLP 和 Text Mining 总结(Deep) Neural Networks在 NLP 和 Text Mining 总结
(Deep) Neural Networks在 NLP 和 Text Mining 总结
 
Dependent Types in Natural Language Semantics
Dependent Types in Natural Language SemanticsDependent Types in Natural Language Semantics
Dependent Types in Natural Language Semantics
 
Composing (Im)politeness in Dependent Type Semantics
Composing (Im)politeness in Dependent Type SemanticsComposing (Im)politeness in Dependent Type Semantics
Composing (Im)politeness in Dependent Type Semantics
 
Representation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesRepresentation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and Phrases
 
AI Lesson 09
AI Lesson 09AI Lesson 09
AI Lesson 09
 
Representing and Reasoning with Modular Ontologies (2007)
Representing and Reasoning with Modular Ontologies (2007)Representing and Reasoning with Modular Ontologies (2007)
Representing and Reasoning with Modular Ontologies (2007)
 

Similaire à Ekaterina vylomova-what-do-neural models-know-about-language-p1

Interview presentation
Interview presentationInterview presentation
Interview presentation
Joseph Gubbins
 
Tensor-based Models of Natural Language Semantics
Tensor-based Models of Natural Language SemanticsTensor-based Models of Natural Language Semantics
Tensor-based Models of Natural Language Semantics
Dimitrios Kartsaklis
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
kevig
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
kevig
 

Similaire à Ekaterina vylomova-what-do-neural models-know-about-language-p1 (13)

Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Interview presentation
Interview presentationInterview presentation
Interview presentation
 
Tensor-based Models of Natural Language Semantics
Tensor-based Models of Natural Language SemanticsTensor-based Models of Natural Language Semantics
Tensor-based Models of Natural Language Semantics
 
The Secret Life of Words: Exploring Regularity and Systematicity (joint talk ...
The Secret Life of Words: Exploring Regularity and Systematicity (joint talk ...The Secret Life of Words: Exploring Regularity and Systematicity (joint talk ...
The Secret Life of Words: Exploring Regularity and Systematicity (joint talk ...
 
2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt2-Chapter Two-N-gram Language Models.ppt
2-Chapter Two-N-gram Language Models.ppt
 
NLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.pptNLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.ppt
 
Natural language procssing
Natural language procssing Natural language procssing
Natural language procssing
 
Cognitive plausibility in learning algorithms
Cognitive plausibility in learning algorithmsCognitive plausibility in learning algorithms
Cognitive plausibility in learning algorithms
 
Semeval Deep Learning In Semantic Similarity
Semeval Deep Learning In Semantic SimilaritySemeval Deep Learning In Semantic Similarity
Semeval Deep Learning In Semantic Similarity
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali TextChunker Based Sentiment Analysis and Tense Classification for Nepali Text
Chunker Based Sentiment Analysis and Tense Classification for Nepali Text
 
ACL13_sakaguchi
ACL13_sakaguchiACL13_sakaguchi
ACL13_sakaguchi
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 

Plus de Katerina Vylomova

Documenting and modeling inflectional paradigms in under-resourced languages
Documenting and modeling inflectional paradigms in under-resourced languages Documenting and modeling inflectional paradigms in under-resourced languages
Documenting and modeling inflectional paradigms in under-resourced languages
Katerina Vylomova
 

Plus de Katerina Vylomova (15)

Documenting and modeling inflectional paradigms in under-resourced languages
Documenting and modeling inflectional paradigms in under-resourced languages Documenting and modeling inflectional paradigms in under-resourced languages
Documenting and modeling inflectional paradigms in under-resourced languages
 
The UniMorph Project and Morphological Reinflection Task: Past, Present, and ...
The UniMorph Project and Morphological Reinflection Task: Past, Present, and ...The UniMorph Project and Morphological Reinflection Task: Past, Present, and ...
The UniMorph Project and Morphological Reinflection Task: Past, Present, and ...
 
Sigmorphon 2021. Keynote. UniMorph, Morphological inflection
Sigmorphon 2021. Keynote. UniMorph, Morphological inflectionSigmorphon 2021. Keynote. UniMorph, Morphological inflection
Sigmorphon 2021. Keynote. UniMorph, Morphological inflection
 
SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection
SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological InflectionSIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection
SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection
 
Ekaterina vylomova-what-do-neural models-know-about-language-p2
Ekaterina vylomova-what-do-neural models-know-about-language-p2Ekaterina vylomova-what-do-neural models-know-about-language-p2
Ekaterina vylomova-what-do-neural models-know-about-language-p2
 
Evaluation of Semantic Change of Harm-Related Concepts in Psychology
Evaluation of Semantic Change of Harm-Related Concepts in PsychologyEvaluation of Semantic Change of Harm-Related Concepts in Psychology
Evaluation of Semantic Change of Harm-Related Concepts in Psychology
 
Contextualization of Morphological Inflection
Contextualization of Morphological InflectionContextualization of Morphological Inflection
Contextualization of Morphological Inflection
 
Paradigm Completion for Derivational Morphology
Paradigm Completion for Derivational MorphologyParadigm Completion for Derivational Morphology
Paradigm Completion for Derivational Morphology
 
Men Are from Mars, Women Are from Venus: Evaluation and Modelling of Verbal A...
Men Are from Mars, Women Are from Venus: Evaluation and Modelling of Verbal A...Men Are from Mars, Women Are from Venus: Evaluation and Modelling of Verbal A...
Men Are from Mars, Women Are from Venus: Evaluation and Modelling of Verbal A...
 
Context-Aware Derivation Prediction // EACL 2017
Context-Aware Derivation Prediction // EACL 2017Context-Aware Derivation Prediction // EACL 2017
Context-Aware Derivation Prediction // EACL 2017
 
Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vec...
Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vec...Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vec...
Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vec...
 
Working with text data
Working with text dataWorking with text data
Working with text data
 
Neural models for recognition of basic units of semiographic chants
Neural models for recognition of basic units of semiographic chantsNeural models for recognition of basic units of semiographic chants
Neural models for recognition of basic units of semiographic chants
 
Russia, Russians and Russian language
Russia, Russians and Russian languageRussia, Russians and Russian language
Russia, Russians and Russian language
 
Ekaterina Vylomova/Brown Bag seminar presentation
Ekaterina Vylomova/Brown Bag seminar presentationEkaterina Vylomova/Brown Bag seminar presentation
Ekaterina Vylomova/Brown Bag seminar presentation
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 

Ekaterina vylomova-what-do-neural models-know-about-language-p1

  • 1. What Do Neural Models "Know" About Natural Language? Ekaterina Vylomova Vylomova, Ekaterina Neural models and Natural Language 1 / 53
  • 2. 1943: Artificial Neuron (McCulloch-Pitts) ... or, in other words, ˆy = f ( n i=1 wi xi + b), Vylomova, Ekaterina Neural models and Natural Language 2 / 53
  • 3. 1943: Artificial Neuron (McCulloch-Pitts) ... or, in other words, ˆy = f ( n i=1 wi xi + b), and activation function might be sigmoid: sig(x) = 1 1+e−x Vylomova, Ekaterina Neural models and Natural Language 3 / 53
  • 4. 1957: Simple Perceptron The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain Trained with trial-and-error method It can: – generalize over characters – discover character-specific features But: – failed to recognized badly written/different size/partially closed characters Vylomova, Ekaterina Neural models and Natural Language 4 / 53
  • 5. 1960s: Single Layer Perceptron The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain Perceptrons: an introduction to computational geometry XOR Problem Vylomova, Ekaterina Neural models and Natural Language 5 / 53
  • 6. 1980s: Multi-Layer Perceptrons with Back-Propagation Learning Internal Representations by Error Propagation Solving problems with non-linearly separable cases Vylomova, Ekaterina Neural models and Natural Language 6 / 53
  • 7. 1980s: The Past Tense Debate Rumelhart & McClelland (1985): On learning the past tenses of English verbs Vylomova, Ekaterina Neural models and Natural Language 7 / 53
  • 8. 1980s: The Past Tense Debate Rumelhart & McClelland (1985): On learning the past tenses of English verbs Pinker &Prince, 1988: Extremely poor empirical performance! Vylomova, Ekaterina Neural models and Natural Language 8 / 53
  • 9. 1990s: RNNs Finding structure in time Exploring – context-dependent learning – structure in letter sequences – learning lexical classes from word order Vylomova, Ekaterina Neural models and Natural Language 9 / 53
  • 10. 1990s: CNNs Backpropagation Applied to Handwritten Zip Code Recognition Training Data: 9,298 segmented numerals from U.S. mail Mislassified: Training – 0.14%; Test – 5.0% Vylomova, Ekaterina Neural models and Natural Language 10 / 53
  • 11. Meanwhile in NLP: Language Modelling (mostly Ngrams with Kneser-Ney smoothing) OK, Marvin, which word comes next: Two cats are ___ Hmmm, let me guess ... sitting 3.01 ∗ 10−4 play 2.87 ∗ 10−4 running 2.53 ∗ 10−4 nice 2.32 ∗ 10−4 lost 1.97 ∗ 10−4 playing 1.66 ∗ 10−4 sat 1.54 ∗ 10−4 plays 1.32 ∗ 10−4 . .Vylomova, Ekaterina Neural models and Natural Language 11 / 53
  • 12. 2013: Word2Vec Skip-Gram Distributed Representations of Words and Phrases and their Compositionality Training Objective 1 T T t=1 −c≤j≤c logp(wt+j |wt) p(wo|wi ) = exp(v T wo vwi ) W w=1 exp(v T w vwi ) For efficiency, softmax was replaced with Negative Sampling. Levy et al., 2015 experimented with positive pointwise mutual information (PMI) matrix and showed that Word2vec Skip-Gram with NS is implicit matrix factorization. Vylomova, Ekaterina Neural models and Natural Language 12 / 53
  • 13. 2013: Word2Vec CBOW Efficient Estimation of Word Representations in Vector Space Training Objective 1 T T t=1 logp(wt|w[t−c,t+c]) p(wo|wi ) = exp(v T wo −c≤j≤c vwi+j ) W w=1 exp(v T w −c≤j≤c vwi+j ) Vylomova, Ekaterina Neural models and Natural Language 13 / 53
  • 14. 2013: Word2Vec Linear Relations and Compositionality Vylomova, Ekaterina Neural models and Natural Language 14 / 53
  • 15. 2013: Word2Vec: Word Analogies Linear Relations and Compositionality: Russia + river = Volga_river Vylomova, Ekaterina Neural models and Natural Language 15 / 53
  • 16. 2013: Word2Vec: Word Analogies Linear Relations and Compositionality: king-man+woman = queen? Vylomova, Ekaterina Neural models and Natural Language 16 / 53
  • 17. Word Analogies on other embeddings Word Embeddings, Analogies, and Machine Learning: Beyond King - Man+ Woman= Queen Vylomova, Ekaterina Neural models and Natural Language 17 / 53
  • 18. Word Analogies on other embeddings Word Embeddings, Analogies, and Machine Learning: Beyond King - Man+ Woman= Queen Vylomova, Ekaterina Neural models and Natural Language 18 / 53
  • 19. Word Analogies on other embeddings Word Embeddings, Analogies, and Machine Learning: Beyond King - Man+ Woman= Queen Vylomova, Ekaterina Neural models and Natural Language 19 / 53
  • 20. Word Analogies on other embeddings Word Embeddings, Analogies, and Machine Learning: Beyond King - Man+ Woman= Queen Vylomova, Ekaterina Neural models and Natural Language 20 / 53
  • 21. Word Analogies on other embeddings Word Embeddings, Analogies, and Machine Learning: Beyond King - Man+ Woman= Queen Vylomova, Ekaterina Neural models and Natural Language 21 / 53
  • 22. Pre-trained Word2Vec (Google News): Bias and Stereotypes Man is to Computer Programmer as Woman is to Homemaker? Vylomova, Ekaterina Neural models and Natural Language 22 / 53
  • 23. Word2vec trained of Reddit data: Bias and Stereotypes Black is to Criminal as Caucasian is to Police Vylomova, Ekaterina Neural models and Natural Language 23 / 53
  • 24. Data Bias and Stereotypes Gendered Language Positive adjectives describing women are often related to their bodies, while positive adjectives describing men are often related to their behavior. Vylomova, Ekaterina Neural models and Natural Language 24 / 53
  • 25. Word2Vec and similar models What do the models learn? Morphology – Are capable of learning inflections but not much derivations (less regular and compositional) Lexical Semantics – Challenging, especially meronyms, antonyms, synonyms Major Difficulties – Polysemy (all word senses in a single vector) – Negation Vylomova, Ekaterina Neural models and Natural Language 25 / 53
  • 26. Broader context – back to RNNs! Vylomova, Ekaterina Neural models and Natural Language 26 / 53
  • 27. Neural Machine Translation: Seq2Seq Models (Sutskever et al., 2014) Vylomova, Ekaterina Neural models and Natural Language 27 / 53 The resulting LSTM has 384M params 64M are pure recurrent connections
  • 28. BUT: Longer contexts – lower quality (vanishing gradient) Long Short-Term Memory will solve it! Vylomova, Ekaterina Neural models and Natural Language 28 / 53
  • 29. Neural Machine Translation: Seq2Seq Models (Sutskever et al., 2014 Vylomova, Ekaterina Neural models and Natural Language 29 / 53 PCA projection of LSTM hidden state of the corresponding sequences
  • 30. We can also use both directions (to encode source language) Vylomova, Ekaterina Neural models and Natural Language 30 / 53
  • 31. Neural Machine Translation: Seq2Seq Models w/Attention (Bahdanau et al, 2014) A whole sentence shouldn’t be compressed into a single vector! Use Attention! Vylomova, Ekaterina Neural models and Natural Language 31 / 53
  • 32. Neural Machine Translation: Seq2Seq Models w/Attention (Bahdanau et al, 2014) A whole sentence shouldn’t be compressed into a single vector! Use Attention! Vylomova, Ekaterina Neural models and Natural Language 32 / 53
  • 33. Neural Machine Translation: Seq2Seq Models w/Attention (Bahdanau et al, 2014) It learns alignment and it can be visualized! Vylomova, Ekaterina Neural models and Natural Language 33 / 53
  • 34. Neural Machine Translation: Seq2Seq Models w/Attention (Bahdanau et al, 2014) What do the models learn? Belinkov et al., 2018a, 2018b – Higher layers are better at learning semantics while lower layers tend to be better for part-of-speech tagging – Lower layers of the neural network are better at capturing morphology Linzen et al., 2018, 2020 English Subject-Verb agreement: –LSTMs were able to learn to perform the verb-number agreement task in most cases, although their error rate increased on particularly difficult sentences. – the LM objective is not by itself sufficient for learning structure-sensitive dependencies, and suggest a joint training objective Vylomova, Ekaterina Neural models and Natural Language 34 / 53
  • 35. Neural Machine Translation: Seq2Seq Models w/Attention (Bahdanau et al, 2014) What do the models learn? Vylomova et al., 2019 – Contextual inflection in 10 languages: Three little kitten were _sit_ on the mat. Predict: sitting – Agreement: Adjective-Noun ok, Subject-Verb more challenging – Morphological complexity matters (Uralic languages are more challenging than Germanic) – Inherent vs. contextual categories. Inherent (tense, noun number, w/o agreement or extra signal) cannot be predicted Vylomova, Ekaterina Neural models and Natural Language 35 / 53
  • 36. Back to Past Tense Debate: Seq2Seq Models w/Attention Kirov & Cotterell,2018: The model obviates most of Pinker and Prince’s criticisms SIGMORPHON 2016 Shared Task Task 1: run + V;PRES;3SG → (runs) On Arabic, Finnish, Georgian, German, Hungarian, Maltese, Navajo, Russian, Spanish Vylomova, Ekaterina Neural models and Natural Language 36 / 53
  • 37. Lake et al., 2018: Compositionality of RNNs Vylomova, Ekaterina Neural models and Natural Language 37 / 53 Simplified version of the CommAI Navigation tasks
  • 38. Lake et al., 2018: Compositionality of RNNs Vylomova, Ekaterina Neural models and Natural Language 38 / 53 Simplified version of the CommAI Navigation tasks Successful zero-shot generalizations when the differences between training and test command Trained on "run", "jump" and "run twice" fails on "jump twice"
  • 39. Contextualized Embeddings: Addressing the problem with polysemy! Context matters! ELMo: Let’s make context-specific embeddings! Features – Two Independent(!) LSTMs – Pre-trained embeddings – Weighted-task specific sum of embeddings (two hidden state + word vector) Vylomova, Ekaterina Neural models and Natural Language 39 / 53
  • 40. Self-Attention (Cheng et al., 2016) Relate parts of a single sequence to compute its representation Vylomova, Ekaterina Neural models and Natural Language 40 / 53 Shows similarity to other parts! Helpful for coreference resolution!
  • 41. Contextualized Embeddings Transformer: Attention is All you Need Features – No recursion but wide window (somewhat similar to CNN) – positional embeddings (to access token positions) – Self-attention with several heads (matrices) and separate key, query and value (masks) Vylomova, Ekaterina Neural models and Natural Language 41 / 53
  • 42. Contextualized Embeddings BERT: Deep Bidirectional Transformers Features – Trained on: Masked tokens prediction + Next sentence prediction (binary) – BPE tokenization – Window: 512, CLS – classification Vylomova, Ekaterina Neural models and Natural Language 42 / 53
  • 43. Contextualized Embeddings BERT: Deep Bidirectional Transformers Vylomova, Ekaterina Neural models and Natural Language 43 / 53
  • 44. Contextualized Embeddings BERTs BERT BASE(L=12, H=768, A=12, Total Parameters=110M) BERT LARGE(L=24, H=1024,A=16, Total Parameters=340M). Vylomova, Ekaterina Neural models and Natural Language 44 / 53
  • 45. Contextualized Embeddings: BERT Vylomova, Ekaterina Neural models and Natural Language 45 / 53
  • 46. Contextualized Embeddings: Word Sense Disambiguation Word Sense Disambiguation "A mouse consists of an object held in one’s hand, with one or more buttons." "Mouse" – an electronic device Vylomova, Ekaterina Neural models and Natural Language 46 / 53
  • 47. Contextualized Embeddings: Word Sense Disambiguation Word Sense Disambiguation "A mouse consists of an object held in one’s hand, with one or more buttons." "Mouse" – an electronic device Vylomova, Ekaterina Neural models and Natural Language 47 / 53
  • 48. Contextualized Embeddings: Coreference Resolution Coreference resolution task The secretary called the physician and told _him_ about a new patient. him → physician Vylomova, Ekaterina Neural models and Natural Language 48 / 53
  • 49. Contextualized Embeddings: Coreference Resolution Gender Bias in Coreference Resolution WinoBias: a Winograd-schema style sentences with entities corresponding to people referred by their occupation Vylomova, Ekaterina Neural models and Natural Language 49 / 53
  • 50. Contextualized Embeddings: Bias, bias, bias Zhao et al., 2019 – Coref SOTA system that depends on ELMo inherits its bias and demonstrates significant bias on the WinoBias – training data for ELMo contains significantly more male than female entities – the trained ELMo embeddings systematically encode gender information – ELMo unequally encodes gender information about male and female entities Vylomova, Ekaterina Neural models and Natural Language 50 / 53
  • 51. Contextualized Embeddings: What does BERT know (Rogers et al., 2020)? Syntax – Representations are hierarchical rather than linear and encode POS and syntactic roles(Liu et al., 2019a,b) – Does not “understand” negation and is insensitive to malformed input (Ettinger, 2019) Semantics – Has some knowledge for semantic roles(Ettinger, 2019) – Struggles with representations of numbers (floating point; Wallace et al., 2019b) World Knowledge – Cannot reason based on its world knowledge ("A dog entered the room" doesn’t yield that "room is larger than the dog") Vylomova, Ekaterina Neural models and Natural Language 51 / 53
  • 52. Extra resources NLP Progress Hugging Face – Models "Embeddings in Natural Language Processing" book "Dive into Deep Learning" interactive book Vylomova, Ekaterina Neural models and Natural Language 52 / 53
  • 53. Thank you! Questions? Vylomova, Ekaterina Neural models and Natural Language 53 / 53