SlideShare une entreprise Scribd logo
1  sur  27
Télécharger pour lire hors ligne
Explaining Character-Aware Neural
Networks for Word-Level Prediction
Frederic Godin, Kris Demuynck, Joni Dambre, Wesley Deneve and Thomas Demeester
Department of Electronics and Information Systems
Ghent University, Belgium
Do They Discover Linguistic Rules?
Introduction
2
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Example: Rule-based tagger for PoS tagging
Brill (1994)’s transformation-based error-driven tagger
3
Template
Change the most-likely tag X to
Y if the last (1,2,3,4) characters
of the word are x
Rule
Change the tag common noun to
plural common noun if the word has
suffix -s
Easily interpretable
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Interpretability in NLP used to be easy
Rule-based/Tree-based models
Shallow statistical models (E.g., Logistic regression, CRF)
4
Very transparent: follow the trace
Essentially: weight + feature
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Current NLP interpretability...
5
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Our proposed method
6
We present contextual decomposition (CD) for CNNs
- Extends CD for LSTMs (Murdoch et al. 2018)
- White box approach to interpretability
We trace back morphological tagging decisions to the
character-level
- Which characters are important?
- Same patterns as linguistically known?
- Difference CNN and BiLSTM?
Contextual decomposition
for CNNs
7
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Contextual decomposition
Idea: every output value can be “decomposed” in
- Relevant contributions originating from the input we are interested in
(E.g., some characters)
- Irrelevant contributions originating from all the other inputs (E.g., all
the other characters in a word)
8
CNNeconomicas plural
economicas
economicas
economicas
economicas
Relevant
relevant irrelevantrelevant
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Contextual decomposition for CNNs
Three main components of CNN
̶ Convolution
̶ Activation function
̶ Max-over-time pooling
Classification layer
9
^ e c o n o m i c a s $
...
Max over time
FC
Gender = feminine
CNN filters
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Contextual decomposition for CNNs: Convolution
Output of single convolutional filter at timestep t:
10
Relevant Irrelevant
n = filter size
S = Indexes of of relevant inputs
Wi = i-th column of filter W
^ e c o n o m i c a s $
Indexes: 8, 9, 10, 11
9 8, 10, 11
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Contextual decomposition for CNNs: Activation func.
Goal: Linearize activation function to be able to split output.
Linearization formula:
11
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Contextual decomposition for CNNs: Max pooling
Max-over-time pooling:
Determine t first and just copy that split:
12
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Contextual decomposition of classification layer
Probability of certain class:
13
We simplify:
Relevant contribution to class j
Experiments
14
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Task
15
Morphological tagging: predict morphological labels for a word (gender,
tense, singular/plural,..)
economicas
For a subset of words, we have manual segmentations and
annotations
lemma=económico
gender=feminine
number=plural
economicas
lemma=económico
gender=feminine
number=pluraleconomicas
economicas
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Datasets
Universal dependencies 1.4:
̶ Finnish, Spanish and Swedish
̶ Select all unique words and their morphological labels
Manual annotations and segmentations of 300 test set words
16
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Architectures: CNN vs BiLSTM
17
^ e c o n o m i c a s $
FC
Gender = feminine
^ e c o n o m i c a s $
...
Max over time
FC
Gender = feminine
CNN filters
CNN BiLSTM
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Do the NN patterns follow manual segmentations?
18
All = every possible combination of characters
Cons = all consecutive character n-grams
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Visualizing contributions: 1 character
19
Spanish
^ g r a t u i t a $
Label: Gender=feminine
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Visualizing contributions: 2 characters (Swedish)
20
CNN BiLSTM
^ k r o n o r $ ^ k r o n o r $
^
k
r
o
n
o
r
$
^
k
r
o
n
o
r
$
Label: number=plural
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Most important patterns per language: Spanish
21
Linguistic rules for feminine gender:
- Feminine adjectives often end with “a”
- Nouns ending with “dad” or “ión” are often feminine
Found pattern:
- “a” is a very important pattern
- “dad” and “sió” are import trigrams
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Most important patterns per language: Swedish
22
Linguistic rules for plural form:
- 5 suffixes: or, ar, (e)r, n, and no ending
“na” is definite article in plural forms
Found pattern:
- “or” and “ar”
- But also “na” and “rn”
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Interactions/compositions of patterns
How do positive and negative patterns interact?
Consider the Spanish verb “gusta”
- Gender=Not Applicable (NA)
- We know that suffix “a” is indicator for gender=feminine
23
Consider most positive/negative set of characters per class:
The stem provides counterevidence for gender=feminine
Conclusion
24
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Summary
We introduced a white box approach to understanding CNNs
We showed that:
̶ BiLSTMs and CNNs sometimes choose different patterns
̶ The learned patterns coincide with our linguistic knowledge
̶ Sometimes other plausible patterns are used
25
Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction
Questions?
26
Fréderic Godin
Ph.D. Researcher Deep Learning and NLP
IDLab
E frederic.godin@ugent.be
@frederic_godin
www.fredericgodin.com
idlab.technology / idlab.ugent.be

Contenu connexe

Tendances

NLP using Deep learning
NLP using Deep learningNLP using Deep learning
NLP using Deep learningBabu Priyavrat
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingNimrita Koul
 
DataXDay - The wonders of deep learning: how to leverage it for natural langu...
DataXDay - The wonders of deep learning: how to leverage it for natural langu...DataXDay - The wonders of deep learning: how to leverage it for natural langu...
DataXDay - The wonders of deep learning: how to leverage it for natural langu...DataXDay Conference by Xebia
 
Word Embeddings - Introduction
Word Embeddings - IntroductionWord Embeddings - Introduction
Word Embeddings - IntroductionChristian Perone
 
Modular Ontologies - A Formal Investigation of Semantics and Expressivity
Modular Ontologies - A Formal Investigation of Semantics and ExpressivityModular Ontologies - A Formal Investigation of Semantics and Expressivity
Modular Ontologies - A Formal Investigation of Semantics and ExpressivityJie Bao
 
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextGDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextrudolf eremyan
 
Language Interaction and Quality Issues: An Exploratory Study
Language Interaction and Quality Issues: An Exploratory StudyLanguage Interaction and Quality Issues: An Exploratory Study
Language Interaction and Quality Issues: An Exploratory StudyMarco Torchiano
 

Tendances (9)

NLP using Deep learning
NLP using Deep learningNLP using Deep learning
NLP using Deep learning
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Tutorial on word2vec
Tutorial on word2vecTutorial on word2vec
Tutorial on word2vec
 
DataXDay - The wonders of deep learning: how to leverage it for natural langu...
DataXDay - The wonders of deep learning: how to leverage it for natural langu...DataXDay - The wonders of deep learning: how to leverage it for natural langu...
DataXDay - The wonders of deep learning: how to leverage it for natural langu...
 
Word Embeddings - Introduction
Word Embeddings - IntroductionWord Embeddings - Introduction
Word Embeddings - Introduction
 
Probabilistic content models,
Probabilistic content models,Probabilistic content models,
Probabilistic content models,
 
Modular Ontologies - A Formal Investigation of Semantics and Expressivity
Modular Ontologies - A Formal Investigation of Semantics and ExpressivityModular Ontologies - A Formal Investigation of Semantics and Expressivity
Modular Ontologies - A Formal Investigation of Semantics and Expressivity
 
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextGDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
 
Language Interaction and Quality Issues: An Exploratory Study
Language Interaction and Quality Issues: An Exploratory StudyLanguage Interaction and Quality Issues: An Exploratory Study
Language Interaction and Quality Issues: An Exploratory Study
 

Similaire à Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They Discover Linguistic Rules?

CS571: Distributional semantics
CS571: Distributional semanticsCS571: Distributional semantics
CS571: Distributional semanticsJinho Choi
 
Framester: A Wide Coverage Linguistic Linked Data Hub
Framester: A Wide Coverage Linguistic Linked Data HubFramester: A Wide Coverage Linguistic Linked Data Hub
Framester: A Wide Coverage Linguistic Linked Data HubMehwish Alam
 
Introduction to Tree-LSTMs
Introduction to Tree-LSTMsIntroduction to Tree-LSTMs
Introduction to Tree-LSTMsDaniel Perez
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Simplilearn
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Mustafa Jarrar
 
Designing, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural NetworksDesigning, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural Networksconnectbeubax
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsRoelof Pieters
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Bhaskar Mitra
 
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...Andre Freitas
 
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015rusbase
 
Functional specialization in human cognition: a large-scale neuroimaging init...
Functional specialization in human cognition: a large-scale neuroimaging init...Functional specialization in human cognition: a large-scale neuroimaging init...
Functional specialization in human cognition: a large-scale neuroimaging init...Ana Luísa Pinho
 
Research Summary: Hidden Topic Markov Models, Gruber
Research Summary: Hidden Topic Markov Models, GruberResearch Summary: Hidden Topic Markov Models, Gruber
Research Summary: Hidden Topic Markov Models, GruberAlex Klibisz
 
Segmenting dna sequence into words
Segmenting dna sequence into wordsSegmenting dna sequence into words
Segmenting dna sequence into wordsLiang Wang
 
Crash-course in Natural Language Processing
Crash-course in Natural Language ProcessingCrash-course in Natural Language Processing
Crash-course in Natural Language ProcessingVsevolod Dyomkin
 
Lean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural LogicLean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural LogicValeria de Paiva
 
Exploiting Distributional Semantic Models in Question Answering
Exploiting Distributional Semantic Models in Question AnsweringExploiting Distributional Semantic Models in Question Answering
Exploiting Distributional Semantic Models in Question AnsweringPierpaolo Basile
 
Lean Logic for Lean Times: Entailment and Contradiction Revisited
Lean Logic for Lean Times: Entailment and Contradiction RevisitedLean Logic for Lean Times: Entailment and Contradiction Revisited
Lean Logic for Lean Times: Entailment and Contradiction RevisitedValeria de Paiva
 

Similaire à Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They Discover Linguistic Rules? (20)

CS571: Distributional semantics
CS571: Distributional semanticsCS571: Distributional semantics
CS571: Distributional semantics
 
Framester: A Wide Coverage Linguistic Linked Data Hub
Framester: A Wide Coverage Linguistic Linked Data HubFramester: A Wide Coverage Linguistic Linked Data Hub
Framester: A Wide Coverage Linguistic Linked Data Hub
 
Zadeh Bisc2004
Zadeh Bisc2004Zadeh Bisc2004
Zadeh Bisc2004
 
Introduction to Tree-LSTMs
Introduction to Tree-LSTMsIntroduction to Tree-LSTMs
Introduction to Tree-LSTMs
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 
Designing, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural NetworksDesigning, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural Networks
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
Basics of coding theory
Basics of coding theoryBasics of coding theory
Basics of coding theory
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
 
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
 
Functional specialization in human cognition: a large-scale neuroimaging init...
Functional specialization in human cognition: a large-scale neuroimaging init...Functional specialization in human cognition: a large-scale neuroimaging init...
Functional specialization in human cognition: a large-scale neuroimaging init...
 
Research Summary: Hidden Topic Markov Models, Gruber
Research Summary: Hidden Topic Markov Models, GruberResearch Summary: Hidden Topic Markov Models, Gruber
Research Summary: Hidden Topic Markov Models, Gruber
 
Segmenting dna sequence into words
Segmenting dna sequence into wordsSegmenting dna sequence into words
Segmenting dna sequence into words
 
Crash-course in Natural Language Processing
Crash-course in Natural Language ProcessingCrash-course in Natural Language Processing
Crash-course in Natural Language Processing
 
Lean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural LogicLean Logic for Lean Times: Varieties of Natural Logic
Lean Logic for Lean Times: Varieties of Natural Logic
 
haenelt.ppt
haenelt.ppthaenelt.ppt
haenelt.ppt
 
Exploiting Distributional Semantic Models in Question Answering
Exploiting Distributional Semantic Models in Question AnsweringExploiting Distributional Semantic Models in Question Answering
Exploiting Distributional Semantic Models in Question Answering
 
Lean Logic for Lean Times: Entailment and Contradiction Revisited
Lean Logic for Lean Times: Entailment and Contradiction RevisitedLean Logic for Lean Times: Entailment and Contradiction Revisited
Lean Logic for Lean Times: Entailment and Contradiction Revisited
 

Dernier

Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptxArvind Kumar
 
Terpineol and it's characterization pptx
Terpineol and it's characterization pptxTerpineol and it's characterization pptx
Terpineol and it's characterization pptxMuhammadRazzaq31
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Cherry
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...Monika Rani
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 
Cot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNACot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNACherry
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.Cherry
 
FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.takadzanijustinmaime
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCherry
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
Concept of gene and Complementation test.pdf
Concept of gene and Complementation test.pdfConcept of gene and Complementation test.pdf
Concept of gene and Complementation test.pdfCherry
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCherry
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxRenuJangid3
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLkantirani197
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Genome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptxGenome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptxCherry
 

Dernier (20)

Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
Terpineol and it's characterization pptx
Terpineol and it's characterization pptxTerpineol and it's characterization pptx
Terpineol and it's characterization pptx
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Cot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNACot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNA
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
 
FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Concept of gene and Complementation test.pdf
Concept of gene and Complementation test.pdfConcept of gene and Complementation test.pdf
Concept of gene and Complementation test.pdf
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Genome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptxGenome organization in virus,bacteria and eukaryotes.pptx
Genome organization in virus,bacteria and eukaryotes.pptx
 

Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They Discover Linguistic Rules?

  • 1. Explaining Character-Aware Neural Networks for Word-Level Prediction Frederic Godin, Kris Demuynck, Joni Dambre, Wesley Deneve and Thomas Demeester Department of Electronics and Information Systems Ghent University, Belgium Do They Discover Linguistic Rules?
  • 3. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Example: Rule-based tagger for PoS tagging Brill (1994)’s transformation-based error-driven tagger 3 Template Change the most-likely tag X to Y if the last (1,2,3,4) characters of the word are x Rule Change the tag common noun to plural common noun if the word has suffix -s Easily interpretable
  • 4. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Interpretability in NLP used to be easy Rule-based/Tree-based models Shallow statistical models (E.g., Logistic regression, CRF) 4 Very transparent: follow the trace Essentially: weight + feature
  • 5. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Current NLP interpretability... 5
  • 6. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Our proposed method 6 We present contextual decomposition (CD) for CNNs - Extends CD for LSTMs (Murdoch et al. 2018) - White box approach to interpretability We trace back morphological tagging decisions to the character-level - Which characters are important? - Same patterns as linguistically known? - Difference CNN and BiLSTM?
  • 8. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Contextual decomposition Idea: every output value can be “decomposed” in - Relevant contributions originating from the input we are interested in (E.g., some characters) - Irrelevant contributions originating from all the other inputs (E.g., all the other characters in a word) 8 CNNeconomicas plural economicas economicas economicas economicas Relevant relevant irrelevantrelevant
  • 9. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Contextual decomposition for CNNs Three main components of CNN ̶ Convolution ̶ Activation function ̶ Max-over-time pooling Classification layer 9 ^ e c o n o m i c a s $ ... Max over time FC Gender = feminine CNN filters
  • 10. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Contextual decomposition for CNNs: Convolution Output of single convolutional filter at timestep t: 10 Relevant Irrelevant n = filter size S = Indexes of of relevant inputs Wi = i-th column of filter W ^ e c o n o m i c a s $ Indexes: 8, 9, 10, 11 9 8, 10, 11
  • 11. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Contextual decomposition for CNNs: Activation func. Goal: Linearize activation function to be able to split output. Linearization formula: 11
  • 12. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Contextual decomposition for CNNs: Max pooling Max-over-time pooling: Determine t first and just copy that split: 12
  • 13. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Contextual decomposition of classification layer Probability of certain class: 13 We simplify: Relevant contribution to class j
  • 15. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Task 15 Morphological tagging: predict morphological labels for a word (gender, tense, singular/plural,..) economicas For a subset of words, we have manual segmentations and annotations lemma=económico gender=feminine number=plural economicas lemma=económico gender=feminine number=pluraleconomicas economicas
  • 16. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Datasets Universal dependencies 1.4: ̶ Finnish, Spanish and Swedish ̶ Select all unique words and their morphological labels Manual annotations and segmentations of 300 test set words 16
  • 17. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Architectures: CNN vs BiLSTM 17 ^ e c o n o m i c a s $ FC Gender = feminine ^ e c o n o m i c a s $ ... Max over time FC Gender = feminine CNN filters CNN BiLSTM
  • 18. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Do the NN patterns follow manual segmentations? 18 All = every possible combination of characters Cons = all consecutive character n-grams
  • 19. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Visualizing contributions: 1 character 19 Spanish ^ g r a t u i t a $ Label: Gender=feminine
  • 20. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Visualizing contributions: 2 characters (Swedish) 20 CNN BiLSTM ^ k r o n o r $ ^ k r o n o r $ ^ k r o n o r $ ^ k r o n o r $ Label: number=plural
  • 21. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Most important patterns per language: Spanish 21 Linguistic rules for feminine gender: - Feminine adjectives often end with “a” - Nouns ending with “dad” or “ión” are often feminine Found pattern: - “a” is a very important pattern - “dad” and “sió” are import trigrams
  • 22. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Most important patterns per language: Swedish 22 Linguistic rules for plural form: - 5 suffixes: or, ar, (e)r, n, and no ending “na” is definite article in plural forms Found pattern: - “or” and “ar” - But also “na” and “rn”
  • 23. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Interactions/compositions of patterns How do positive and negative patterns interact? Consider the Spanish verb “gusta” - Gender=Not Applicable (NA) - We know that suffix “a” is indicator for gender=feminine 23 Consider most positive/negative set of characters per class: The stem provides counterevidence for gender=feminine
  • 25. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Summary We introduced a white box approach to understanding CNNs We showed that: ̶ BiLSTMs and CNNs sometimes choose different patterns ̶ The learned patterns coincide with our linguistic knowledge ̶ Sometimes other plausible patterns are used 25
  • 26. Fréderic Godin - Explaining Character-Aware Neural Networks for Word-Level Prediction Questions? 26
  • 27. Fréderic Godin Ph.D. Researcher Deep Learning and NLP IDLab E frederic.godin@ugent.be @frederic_godin www.fredericgodin.com idlab.technology / idlab.ugent.be