SlideShare une entreprise Scribd logo
1  sur  17
Télécharger pour lire hors ligne
Harnessing Deep Neural
Networks with Logic Rules
Zhiting Hu, Xuezhe Ma,
Zhengzhong Liu, Eduard Hovy, and Eric P. Xing
116/09/12 第8回最先端NLP勉強会
スライド中の図,表は [Hu+ 16] から引⽤用
•  ⼀一般的な規則や⼈人の直観をニューラル
– 評判分析において,A but B という⽂文のポジ/
ネガは B と⼀一致する
– 固有表現抽出において,B-PERの後にI-ORG
•  規則や直観を⼀一階述語論論理理で表現
– equal(yi-1, B-PER) → ¬ equal(yi, I-ORG)
•  論論理理規則を制約として学習に⽤用いる
216/09/12 第8回最先端NLP勉強会
•  pθ(y|x):モデル(任意のニューラルネット,例例えばCNN)の出⼒力力
•  q(y|x):制約を満たした上でのモデルの出⼒力力
•  q(y|x)の計算とパラメータの学習(θの更更新)を交互に⾏行行う
–  Posterior regularization [Ganchev+ 10] をニューラルネットで⾏行行う
316/09/12 第8回最先端NLP勉強会
•  pθ(y|x):モデル(任意のニューラルネット,例例えばCNN)の出⼒力力
•  q(y|x):制約を満たした上でのモデルの出⼒力力
•  q(y|x)の計算とパラメータの学習(θの更更新)を交互に⾏行行う
–  Posterior regularization [Ganchev+ 10] をニューラルネットで⾏行行う
16/09/12 第8回最先端NLP勉強会
•  達成したいこと
•  ⽬目的関数
–  1と2をπで重み調整し,⾜足す
–  lossは損失関数(今回は交差エントロピー)
(1 ⇡)loss(yn, ✓(xn)) + ⇡loss(q(y|xn), ✓(xn))
1に相当 2に相当
16/09/12 第8回最先端NLP勉強会
•  達成したいこと
•  ⽬目的関数
expectation operator. That is, for each rule (indexed
by l) and each of its groundings (indexed by g)
on (X, Y ), we expect Eq(Y |X)[rlg(X, Y )] = 1,
with confidence l. The constraints define a rule-
regularized space of all valid distributions. For the
second property, we measure the closeness between
q and p✓ with KL-divergence, and wish to minimize
it. Combining the two factors together and further
allowing slackness for the constraints, we finally
get the following optimization problem:
q,⇠ 0
KL(q(Y |X)kp✓(Y |X)) + C
s.t. l(1 Eq[rl,gl (X, Y )])  ⇠l,gl
gl = 1, . . . , Gl, l = 1, . . . , L,
where ⇠l,gl
0 is the slack variable for respec-
tive logic constraint; and C is the regularization
parameter. The problem can be seen as project-
the bas
gram de
task (se
ming fo
the con
the rele
when a
the soft
16/09/12 第8回最先端NLP勉強会
•  q(y|x)は解析的に解ける(ラグランジュ双対問題)
–  Posterior regularization [Ganchev+ 10] と同様の解き⽅方  
•  制約の強さはC(定数)と  λ(規則ごとの値)で決定
•  規則を満たしていない場合,q(y|x)は⼩小さくなる
regularized space of all valid distributions. For the
second property, we measure the closeness between
q and p✓ with KL-divergence, and wish to minimize
it. Combining the two factors together and further
allowing slackness for the constraints, we finally
get the following optimization problem:
q,⇠ 0
KL(q(Y |X)kp✓(Y |X)) + C
s.t. l(1 Eq[rl,gl (X, Y )])  ⇠l,gl
gl = 1, . . . , Gl, l = 1, . . . , L,
where ⇠l,gl
0 is the slack variable for respec-
tive logic constraint; and C is the regularization
parameter. The problem can be seen as project-
ing p✓ into the constrained subspace. The problem
is convex and can be efficiently solved in its dual
form with closed-form solutions. We provide the
detailed derivation in the supplementary materials
gram de
task (se
ming fo
the con
the rele
when a
the soft
p v.s. q
q,⇠ 0 l,gl
s.t. l(1 Eq[rl,gl (X, Y )])  ⇠l,gl
gl = 1, . . . , Gl, l = 1, . . . , L,
where ⇠l,gl
0 is the slack variable for respec-
tive logic constraint; and C is the regularization
parameter. The problem can be seen as project-
ing p✓ into the constrained subspace. The problem
is convex and can be efficiently solved in its dual
form with closed-form solutions. We provide the
detailed derivation in the supplementary materials
and directly give the solution here:
(Y |X) / p✓(Y |X) exp
C l(1 rl,gl (X, Y ))
Intuitively, a strong rule with large l will lead to
low probabilities of predictions that fail to meet
the constraints spa
the relevant instan
ference (and rando
when a group is to
the soft prediction
forward pass is re
tribution p✓(y|x)
calculating the trut
p v.s. q at Test T
either the distilled
network q after a fi
sults show that bot
over the base netwo
label instances. In
p. Particularly, q i
rules introduce add
16/09/12 第8回最先端NLP勉強会
•  ⼀一階述語論論理理で表現
–  A but B という⽂文のポジ/ネガは B と⼀一致
•  Probabilistic soft logicの枠組みで0から1の連続値
–  論論理理演算⼦子は
lg g=1
is typically relevant to only a single or subset of
examples, though here we give the most general
form on the entire set.
We encode the FOL rules using soft logic (Bach
et al., 2015) for flexible encoding and stable opti-
mization. Specifically, soft logic allows continu-
ous truth values from the interval [0, 1] instead of
{0, 1}, and the Boolean logic operators are refor-
mulated as:
A&B = max{A + B 1, 0}
A _ B = min{A + B, 1}
A1 ^ · · · ^ AN =
¬A = 1 A
Here & and ^ are two different approximations
tion pa
of the t
A si
other s
et al.,
cess is
to hum
by pro
(i.e., th
ence f
is a classification
gnition which is a
describe the base
not focusing on
we largely use the
evious successful
the linguistically-
junction word “but” is one of the strong indicators
for such sentiment changes in a sentence, where
the sentiment of clauses following “but” generally
dominates. We thus consider sentences S with an
“A-but-B” structure, and expect the sentiment of the
whole sentence to be consistent with the sentiment
of clause B. The logic rule is written as:
has-‘A-but-B’-structure(S) )
(1(y = +) ) ✓(B)+ ^ ✓(B)+ ) 1(y = +)) ,
16/09/12 第8回最先端NLP勉強会
•  A but B という⽂文のポジ/ネガは  B と⼀一致
•  ⽂文がポジティブ:
•  ⽂文がネガティブ:
ich is a classification
ecognition which is a
efly describe the base
are not focusing on
s, we largely use the
o previous successful
ign the linguistically-
junction word “but” is one of the strong indicators
for such sentiment changes in a sentence, where
the sentiment of clauses following “but” generally
dominates. We thus consider sentences S with an
“A-but-B” structure, and expect the sentiment of the
whole sentence to be consistent with the sentiment
of clause B. The logic rule is written as:
has-‘A-but-B’-structure(S) )
(1(y = +) ) ✓(B)+ ^ ✓(B)+ ) 1(y = +)) ,
⽂文Sが  A but B という構造を持つ
(1(y = +) ) ✓(B)+ ^ ✓(B)+ ) 1(y = +))
, (¬1(y = +) _ ✓(B)+ ^ ¬ ✓(B)+ _ 1(y = +))
, (1 1(y = +) _ ✓(B)+ ^ 1 ✓(B)+ _ 1(y = +))
, (min{1 1(y = +) + ✓(B)+, 1} ^ min{1 ✓(B)+ + 1(y = +), 1})
(1 + ✓(B)+)/2
(2 ✓(B)+)/2
16/09/12 第8回最先端NLP勉強会
•  制約を導⼊入して学習し,性能が向上する
•  評判分析,固有表現抽出で実験
– p,qの両⽅方の性能を検証
1016/09/12 第8回最先端NLP勉強会
•  ポジ/ネガの⼆二値分類タスク
•  データセット:
–  Stanford Sentiment Treebank(SST2)
–  Movie Review(MR)
–  Customer Review(CR)
•  ベースライン:(単純な)CNN
–  [Kim+ 14]  と同じ⼿手法
•  適⽤用する制約(規則)
–  A but B という⽂文のポジ/ネガは  B と⼀一致
•  重要性  λ = 1
Algorithm 1 Harnessing NN with Rules
Input: The training data D = {(xn, yn)}N
The rule set R = {(Rl, l)}L
Parameters: ⇡ – imitation parameter
C – regularization strength
1: Initialize neural network parameter ✓
2: repeat
3: Sample a minibatch (X, Y ) ⇢ D
4: Construct teacher network q with Eq.(4)
5: Transfer knowledge into p✓ by updating ✓ with Eq.(2)
6: until convergence
Output: Distill student network p✓ and teacher network q
ning over multiple examples), requiring joint infer-
ence. In contrast, as mentioned above, p is more
lightweight and efficient, and useful when rule eval-
uation is expensive or impossible at prediction time.
Our experiments compare the performance of p and
q extensively.
Imitation Strength ⇡ The imitation parameter ⇡
in Eq.(2) balances between emulating the teacher
soft predictions and predicting the true hard la-
bels. Since the teacher network is constructed from
p✓, which, at the beginning of training, would pro-
duce low-quality predictions, we thus favor pre-
I like this book store a lot PaddingPadding
Max Pooling
Figure 2: The CNN architecture for sentence-level
sentiment analysis. The sentence representation
vector is followed by a fully-connected layer with
softmax output activation, to output sentiment pre-
4.1 Sentiment Classification
Sentence-level sentiment analysis is to identify the
sentiment (e.g., positive or negative) underlying
an individual sentence. The task is crucial for
many opinion mining applications. One challeng-
ing point of the task is to capture the contrastive
of neural networks
t users are allowed
ntentions through
c. In this section
ur approach by ap-
work architectures,
recurrent network,
ons, i.e., sentence-
is a classification
gnition which is a
describe the base
not focusing on
we largely use the
evious successful
the linguistically-
windows. Multiple filters with varying window
sizes are used to obtain multiple features. Figure 2
shows the network architecture.
Logic Rules One difficulty for the plain neural
network is to identify contrastive sense in order to
capture the dominant sentiment precisely. The con-
junction word “but” is one of the strong indicators
for such sentiment changes in a sentence, where
the sentiment of clauses following “but” generally
dominates. We thus consider sentences S with an
“A-but-B” structure, and expect the sentiment of the
whole sentence to be consistent with the sentiment
of clause B. The logic rule is written as:
has-‘A-but-B’-structure(S) )
(1(y = +) ) ✓(B)+ ^ ✓(B)+ ) 1(y = +)) ,
241416/09/12 第8回最先端NLP勉強会
•  ベースライン [Kim+ 14] から性能が向上
•  MR,CRでstate-of-the-art
•  MVCNN(複数の単語ベクトル利利⽤用 + 複雑なCNN
Model SST2 MR CR
1 CNN (Kim, 2014) 87.2 81.3±0.1 84.3±0.2
2 CNN-Rule-p 88.8 81.6±0.1 85.0±0.3
3 CNN-Rule-q 89.3 81.7±0.1 85.3±0.3
4 MGNC-CNN (Zhang et al., 2016) 88.4 – –
5 MVCNN (Yin and Schutze, 2015) 89.4 – –
6 CNN-multichannel (Kim, 2014) 88.1 81.1 85.0
7 Paragraph-Vec (Le and Mikolov, 2014) 87.8 – –
8 CRF-PR (Yang and Cardie, 2014) – – 82.7
9 RNTN (Socher et al., 2013) 85.4 – –
10 G-Dropout (Wang and Manning, 2013) – 79.0 82.1
Table 1: Accuracy (%) of Sentiment Classification. Row 1, CNN (Kim, 2014) is the base network
corresponding to the “CNN-non-static” model in (Kim, 2014). Rows 2-3 are the networks enhanced by
our framework: CNN-Rule-p is the student network and CNN-Rule-q is the teacher network. For MR and
CR, we report the average accuracy±one standard deviation using 10-fold cross validation.
the base networks, we obtain substantial improve-
ments on both tasks and achieve state-of-the-art
or comparable results to previous best-performing
systems. Comparison with a diverse set of other
or positive sentiment. 3) CR (Hu and Liu, 2004),
customer reviews of various products, containing 2
classes and 3,775 instances. For MR and CR, we
use 10-fold cross validation as in previous work. In
16/09/12 第8回最先端NLP勉強会
•  pθ(y|x) と  q(y|x) を交互に計算する必要性
–  どちらか⼀一⽅方を最適化,⽚片⽅方の最適化後にもう⼀一⽅方
Model Accuracy (%)
1 CNN (Kim, 2014) 87.2
2 -but-clause 87.3
3 -`2-reg 87.5
4 -project 87.9
5 -opt-project 88.3
6 -pipeline 87.9
7 -Rule-p 88.8
8 -Rule-q 89.3
Table 2: Performance of different rule integration
Table 3
of labe
16/09/12 第8回最先端NLP勉強会
•  データ量量に対する効果とラベルなしデータの利利⽤用
•  ラベルなしデータの利利⽤用で性能向上
–  制約によってラベルなしデータを上⼿手く使える
Data size 5% 10% 30% 100%
1 CNN 79.9 81.6 83.6 87.2
2 -Rule-p 81.5 83.2 84.5 88.8
3 -Rule-q 82.5 83.9 85.6 89.3
4 -semi-PR 81.5 83.1 84.6 –
5 -semi-Rule-p 81.7 83.3 84.7 –
6 -semi-Rule-q 82.7 84.2 85.7 –
Table 3: Accuracy (%) on SST2 with varying sizes
of labeled data and semi-supervised learning. The
header row is the percentage of labeled examples
データと(100 - ◯)%の
16/09/12 第8回最先端NLP勉強会
•  4種の固有表現(PER,ORG,LOC,Misc)の認識識タスク
•  データセット:CoNLL-2003 データセット
–  BIOESタグを採⽤用([Lample+ 16] などと同じ)
•  ベースライン:双⽅方向LSTM
–  [Chiu and Nichols, 15] からCNNを除去
•  適⽤用する制約(規則)
–  出⼒力力タグの並びが破綻していない
•  重要性  λ = ∞(強い制約)
–  リスト形式の場合,同種のタグとなる
•  1.  Juventus, 2. Barcelona, 3. …で  Juventus と Barcelona のタグは同種
•  重要性  λ = 1
1516/09/12 第8回最先端NLP勉強会
where 1(·) is an indicator function that takes 1
when its argument is true, and 0 otherwise; class ‘+’
represents ‘positive’; and ✓(B)+ is the element of
✓(B) for class ’+’. By Eq.(1), when S has the ‘A-
but-B’ structure, the truth value of the above logic
rule equals to (1 + ✓(B)+)/2 when y = +, and
(2 ✓(B)+)/2 otherwise 1. Note that here we
assume two-way classification (i.e., positive and
negative), though it is straightforward to design
rules for finer grained sentiment classification.
4.2 Named Entity Recognition
NER is to locate and classify elements in text into
entity categories such as “persons” and “organiza-
tions”. It is an essential first step for downstream
language understanding applications. The task as-
signs to each word a named entity tag in an “X-Y”
format where X is one of BIEOS (Beginning, In-
side, End, Outside, and Singleton) and Y is the
entity category. A valid tag sequence has to follow
NYC locates in USA
Figure 3: The architecture of the bidirectional
LSTM recurrent network for NER. The CNN for
extracting character representation is omitted.
The confidence levels are set to 1 to prevent any
We further leverage the list structures within and
などequal(yi 1, B PER) ) ¬ equal(yi, I ORG)
s. The task as-
g in an “X-Y”
Beginning, In-
and Y is the
e has to follow
of the tagging
es (e.g., lists)
y expose some
has a similar
LSTM recur-
) proposed in
which has out-
models. The
word vectors
l information,
hen fed into a
s for sequence
hols, 2015) we
The confidence levels are set to 1 to prevent any
We further leverage the list structures within and
across sentences of the same documents. Specifi-
cally, named entities at corresponding positions in
a list are likely to be in the same categories. For
instance, in “1. Juventus, 2. Barcelona, 3. ...” we
know “Barcelona” must be an organization rather
than a location, since its counterpart entity “Juven-
tus” is an organization. We describe our simple
procedure for identifying lists and counterparts in
the supplementary materials. The logic rule is en-
coded as:
is-counterpart(X, A) ) 1 kc(ey) c( ✓(A))k2, (7)
where ey is the one-hot encoding of y (the class pre-
diction of X); c(·) collapses the probability mass
•  ベースラインから性能が向上
•  膨⼤大な外部資源を使った⼿手法 [Luo+ 15] やパラメータの多い
ニューラルネット [Ma and Hovy, 16] と同等の性能
16/09/12 第8回最先端NLP勉強会 16
Model F1
1 BLSTM 89.55
2 BLSTM-Rule-trans p: 89.80, q: 91.11
3 BLSTM-Rules p: 89.93, q: 91.18
4 NN-lex (Collobert et al., 2011) 89.59
5 S-LSTM (Lample et al., 2016) 90.33
6 BLSTM-lex (Chiu and Nichols, 2015) 90.77
7 BLSTM-CRF1 (Lample et al., 2016) 90.94
8 Joint-NER-EL (Luo et al., 2015) 91.20
9 BLSTM-CRF2 (Ma and Hovy, 2016) 91.21
Table 4: Performance of NER on CoNLL-2003.
Row 2, BLSTM-Rule-trans imposes the transition
rules (Eq.(6)) on the base BLSTM. Row 3, BLSTM-
Rules further incorporates the list rule (Eq.(7)). We
report the performance of both the student model p
as w
6 D
We h
to al
the w
•  ⼀一般的な規則や⼈人の直観をニューラル
– 規則を⼀一階述語論論理理で表現
– Probabilistic soft logicで0から1の連続値に
– 制約として学習
•  評判分析,固有表現抽出で実験
– 制約の導⼊入で性能が向上
– 複雑なネットワークなどと同等の性能
1716/09/12 第8回最先端NLP勉強会

Contenu connexe


(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
Masahiro Suzuki
Composing (Im)politeness in Dependent Type Semantics
Composing (Im)politeness in Dependent Type SemanticsComposing (Im)politeness in Dependent Type Semantics
Composing (Im)politeness in Dependent Type Semantics
Daisuke BEKKI
Fuzzy logic and fuzzy time series edited
Fuzzy logic and fuzzy time series   editedFuzzy logic and fuzzy time series   edited
Fuzzy logic and fuzzy time series edited
Prof Dr S.M.Aqil Burney

Tendances (20)

(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
Research Summary: Hidden Topic Markov Models, Gruber
Research Summary: Hidden Topic Markov Models, GruberResearch Summary: Hidden Topic Markov Models, Gruber
Research Summary: Hidden Topic Markov Models, Gruber
Skip-gram Model Broken Down
Skip-gram Model Broken DownSkip-gram Model Broken Down
Skip-gram Model Broken Down
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
Dependent Types in Natural Language Semantics
Dependent Types in Natural Language SemanticsDependent Types in Natural Language Semantics
Dependent Types in Natural Language Semantics
Sergey Nikolenko and Elena Tutubalina - Constructing Aspect-Based Sentiment ...
Sergey Nikolenko and  Elena Tutubalina - Constructing Aspect-Based Sentiment ...Sergey Nikolenko and  Elena Tutubalina - Constructing Aspect-Based Sentiment ...
Sergey Nikolenko and Elena Tutubalina - Constructing Aspect-Based Sentiment ...
Topic model an introduction
Topic model an introductionTopic model an introduction
Topic model an introduction
Meta back translation
Meta back translationMeta back translation
Meta back translation
Composing (Im)politeness in Dependent Type Semantics
Composing (Im)politeness in Dependent Type SemanticsComposing (Im)politeness in Dependent Type Semantics
Composing (Im)politeness in Dependent Type Semantics
Uncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game TheoryUncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game Theory
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks
Topic Models
Topic ModelsTopic Models
Topic Models
Principle of Maximum Entropy
Principle of Maximum EntropyPrinciple of Maximum Entropy
Principle of Maximum Entropy
Language Technology Enhanced Learning
Language Technology Enhanced LearningLanguage Technology Enhanced Learning
Language Technology Enhanced Learning
Fuzzy logic and fuzzy time series edited
Fuzzy logic and fuzzy time series   editedFuzzy logic and fuzzy time series   edited
Fuzzy logic and fuzzy time series edited
Handling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithmHandling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithm
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Deep Learning Opening Workshop - Deep ReLU Networks Viewed as a Statistical M...
Deep Learning Opening Workshop - Deep ReLU Networks Viewed as a Statistical M...Deep Learning Opening Workshop - Deep ReLU Networks Viewed as a Statistical M...
Deep Learning Opening Workshop - Deep ReLU Networks Viewed as a Statistical M...

En vedette

Language and Domain Independent Entity Linking with Quantified Collective Val...
Language and Domain Independent Entity Linking with Quantified Collective Val...Language and Domain Independent Entity Linking with Quantified Collective Val...
Language and Domain Independent Entity Linking with Quantified Collective Val...
Shuangshuang Zhou
Black holes and white rabbits metaphor identification with visual features
Black holes and white rabbits  metaphor identification with visual featuresBlack holes and white rabbits  metaphor identification with visual features
Black holes and white rabbits metaphor identification with visual features
Sumit Maharjan
論文輪読資料「Multi-view Face Detection Using Deep Convolutional Neural Networks」
論文輪読資料「Multi-view Face Detection Using Deep Convolutional Neural Networks」論文輪読資料「Multi-view Face Detection Using Deep Convolutional Neural Networks」
論文輪読資料「Multi-view Face Detection Using Deep Convolutional Neural Networks」
Kaoru Nasuno
Yuya Unno
Preferred Networks

En vedette (17)

Improving Coreference Resolution by Learning Entity-Level Distributed Represe...
Improving Coreference Resolution by Learning Entity-Level Distributed Represe...Improving Coreference Resolution by Learning Entity-Level Distributed Represe...
Improving Coreference Resolution by Learning Entity-Level Distributed Represe...
第8回最先端NLP勉強会 EMNLP2015 Guosh et al Sarcastic or Not
第8回最先端NLP勉強会 EMNLP2015 Guosh et al Sarcastic or Not第8回最先端NLP勉強会 EMNLP2015 Guosh et al Sarcastic or Not
第8回最先端NLP勉強会 EMNLP2015 Guosh et al Sarcastic or Not
Language and Domain Independent Entity Linking with Quantified Collective Val...
Language and Domain Independent Entity Linking with Quantified Collective Val...Language and Domain Independent Entity Linking with Quantified Collective Val...
Language and Domain Independent Entity Linking with Quantified Collective Val...
Large-Scale Information Extraction from Textual Definitions through Deep Syn...
Large-Scale Information Extraction from Textual Definitions through Deep Syn...Large-Scale Information Extraction from Textual Definitions through Deep Syn...
Large-Scale Information Extraction from Textual Definitions through Deep Syn...
Snlp2016 kameko
Snlp2016 kamekoSnlp2016 kameko
Snlp2016 kameko
Retrofitting Word Vectors to Semantic Lexicons
Retrofitting Word Vectors to Semantic LexiconsRetrofitting Word Vectors to Semantic Lexicons
Retrofitting Word Vectors to Semantic Lexicons
Learning Composition Models for Phrase Embeddings
Learning Composition Models for Phrase EmbeddingsLearning Composition Models for Phrase Embeddings
Learning Composition Models for Phrase Embeddings
Black holes and white rabbits metaphor identification with visual features
Black holes and white rabbits  metaphor identification with visual featuresBlack holes and white rabbits  metaphor identification with visual features
Black holes and white rabbits metaphor identification with visual features
NLP2015 構成性に基づく関係パタンの意味計算
NLP2015 構成性に基づく関係パタンの意味計算NLP2015 構成性に基づく関係パタンの意味計算
NLP2015 構成性に基づく関係パタンの意味計算
論文輪読資料「Multi-view Face Detection Using Deep Convolutional Neural Networks」
論文輪読資料「Multi-view Face Detection Using Deep Convolutional Neural Networks」論文輪読資料「Multi-view Face Detection Using Deep Convolutional Neural Networks」
論文輪読資料「Multi-view Face Detection Using Deep Convolutional Neural Networks」
Visualizing and understanding neural models in NLP
Visualizing and understanding neural models in NLPVisualizing and understanding neural models in NLP
Visualizing and understanding neural models in NLP
"Joint Extraction of Events and Entities within a Document Context"の解説
"Joint Extraction of Events and Entities within a Document Context"の解説"Joint Extraction of Events and Entities within a Document Context"の解説
"Joint Extraction of Events and Entities within a Document Context"の解説
これからのコンピュータビジョン技術 - cvpaper.challenge in PRMU Grand Challenge 2016 (PRMU研究会 2...
これからのコンピュータビジョン技術 - cvpaper.challenge in PRMU Grand Challenge 2016 (PRMU研究会 2...これからのコンピュータビジョン技術 - cvpaper.challenge in PRMU Grand Challenge 2016 (PRMU研究会 2...
これからのコンピュータビジョン技術 - cvpaper.challenge in PRMU Grand Challenge 2016 (PRMU研究会 2...
論文紹介:「End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF」
論文紹介:「End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF」論文紹介:「End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF」
論文紹介:「End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF」

Similaire à Harnessing Deep Neural Networks with Logic Rules


Similaire à Harnessing Deep Neural Networks with Logic Rules (20)

Efficient projections
Efficient projectionsEfficient projections
Efficient projections
Efficient projections
Efficient projectionsEfficient projections
Efficient projections
A Systematic Approach To Probabilistic Pointer Analysis
A Systematic Approach To Probabilistic Pointer AnalysisA Systematic Approach To Probabilistic Pointer Analysis
A Systematic Approach To Probabilistic Pointer Analysis
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Deep Reinforcement Learning with Distributional Semantic Rewards for Abstract...
Stochastic Processes Homework Help
Stochastic Processes Homework Help Stochastic Processes Homework Help
Stochastic Processes Homework Help
Adaptive relevance feedback in information retrieval
Adaptive relevance feedback in information retrievalAdaptive relevance feedback in information retrieval
Adaptive relevance feedback in information retrieval
Joint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilitiesJoint contrastive learning with infinite possibilities
Joint contrastive learning with infinite possibilities
Duality Theory in Multi Objective Linear Programming Problems
Duality Theory in Multi Objective Linear Programming ProblemsDuality Theory in Multi Objective Linear Programming Problems
Duality Theory in Multi Objective Linear Programming Problems
PAGOdA paper
PAGOdA paperPAGOdA paper
PAGOdA paper
Stochastic Processes Homework Help
Stochastic Processes Homework HelpStochastic Processes Homework Help
Stochastic Processes Homework Help
AI Lesson 11
AI Lesson 11AI Lesson 11
AI Lesson 11
A Connectionist Approach To The Quadratic Assignment Problem
A Connectionist Approach To The Quadratic Assignment ProblemA Connectionist Approach To The Quadratic Assignment Problem
A Connectionist Approach To The Quadratic Assignment Problem
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural Networks
A Formal Approach To Problem Solving
A Formal Approach To Problem SolvingA Formal Approach To Problem Solving
A Formal Approach To Problem Solving
Deep learning ensembles loss landscape
Deep learning ensembles loss landscapeDeep learning ensembles loss landscape
Deep learning ensembles loss landscape
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional Verification
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
Wei Yang - 2015 - Sampling-based Alignment and Hierarchical Sub-sentential Al...
Learning group variational inference
Learning group  variational inferenceLearning group  variational inference
Learning group variational inference

Plus de Sho Takase

Sho Takase

Plus de Sho Takase (9)

STAIR Lab Seminar 202105
STAIR Lab Seminar 202105STAIR Lab Seminar 202105
STAIR Lab Seminar 202105
Robust Neural Machine Translation with Doubly Adversarial Inputs
Robust Neural Machine Translation with Doubly Adversarial InputsRobust Neural Machine Translation with Doubly Adversarial Inputs
Robust Neural Machine Translation with Doubly Adversarial Inputs
Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-lineari...
Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-lineari...Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-lineari...
Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-lineari...
Enriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationEnriching Word Vectors with Subword Information
Enriching Word Vectors with Subword Information
Lexical Inference over Multi-Word Predicates
Lexical Inference over Multi-Word PredicatesLexical Inference over Multi-Word Predicates
Lexical Inference over Multi-Word Predicates


Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking

Dernier (20)

Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking

Harnessing Deep Neural Networks with Logic Rules

  • 1. Harnessing Deep Neural Networks with Logic Rules Zhiting Hu, Xuezhe Ma, Zhengzhong Liu, Eduard Hovy, and Eric P. Xing ACL2016 読む⼈人:東北北⼤大学,⾼高瀬翔 116/09/12 第8回最先端NLP勉強会 スライド中の図,表は [Hu+ 16] から引⽤用
  • 2. ⽬目的 •  ⼀一般的な規則や⼈人の直観をニューラル ネットに導⼊入したい – 評判分析において,A but B という⽂文のポジ/ ネガは B と⼀一致する – 固有表現抽出において,B-PERの後にI-ORG はありえない •  規則や直観を⼀一階述語論論理理で表現 – equal(yi-1, B-PER) → ¬ equal(yi, I-ORG) •  論論理理規則を制約として学習に⽤用いる 216/09/12 第8回最先端NLP勉強会
  • 3. ⼿手法の概要 •  pθ(y|x):モデル(任意のニューラルネット,例例えばCNN)の出⼒力力 •  q(y|x):制約を満たした上でのモデルの出⼒力力 •  q(y|x)の計算とパラメータの学習(θの更更新)を交互に⾏行行う –  Posterior regularization [Ganchev+ 10] をニューラルネットで⾏行行う 316/09/12 第8回最先端NLP勉強会
  • 4. ⼿手法の概要 •  pθ(y|x):モデル(任意のニューラルネット,例例えばCNN)の出⼒力力 •  q(y|x):制約を満たした上でのモデルの出⼒力力 •  q(y|x)の計算とパラメータの学習(θの更更新)を交互に⾏行行う –  Posterior regularization [Ganchev+ 10] をニューラルネットで⾏行行う 4 説明1 θの更更新 説明2 q(y|x) 説明3 16/09/12 第8回最先端NLP勉強会
  • 5. パラメータの学習 •  達成したいこと 1,訓練事例例で正しいラベルを出⼒力力できる 2,制約を満たしたモデル(q(y|x))に似た出⼒力力 •  ⽬目的関数 –  1と2をπで重み調整し,⾜足す –  lossは損失関数(今回は交差エントロピー) 5 min 1 N NX n=1 (1 ⇡)loss(yn, ✓(xn)) + ⇡loss(q(y|xn), ✓(xn)) 1に相当 2に相当 xnについて,モデルの予測値 16/09/12 第8回最先端NLP勉強会
  • 6. •  達成したいこと 1,制約(論論理理規則)を満たす 2,モデルの出⼒力力(pθ(y|x))に近い値となる •  ⽬目的関数 expectation operator. That is, for each rule (indexed by l) and each of its groundings (indexed by g) on (X, Y ), we expect Eq(Y |X)[rlg(X, Y )] = 1, with confidence l. The constraints define a rule- regularized space of all valid distributions. For the second property, we measure the closeness between q and p✓ with KL-divergence, and wish to minimize it. Combining the two factors together and further allowing slackness for the constraints, we finally get the following optimization problem: min q,⇠ 0 KL(q(Y |X)kp✓(Y |X)) + C X l,gl ⇠l,gl s.t. l(1 Eq[rl,gl (X, Y )])  ⇠l,gl gl = 1, . . . , Gl, l = 1, . . . , L, (3) where ⇠l,gl 0 is the slack variable for respec- tive logic constraint; and C is the regularization parameter. The problem can be seen as project- forward straints the bas sentime straints gram de task (se ming fo constra proxima samples the con the rele ference when a the soft forward 2に相当 q(y|x)の計算(1/2) 6 1に相当 ξで制約を緩和 0から1までの連続値 事例例glが規則rlを満たすとき1 規則rlの強さ (λが⼤大きい =満たすべき規則) 16/09/12 第8回最先端NLP勉強会
  • 7. •  q(y|x)は解析的に解ける(ラグランジュ双対問題) –  Posterior regularization [Ganchev+ 10] と同様の解き⽅方   •  制約の強さはC(定数)と  λ(規則ごとの値)で決定 •  規則を満たしていない場合,q(y|x)は⼩小さくなる regularized space of all valid distributions. For the second property, we measure the closeness between q and p✓ with KL-divergence, and wish to minimize it. Combining the two factors together and further allowing slackness for the constraints, we finally get the following optimization problem: min q,⇠ 0 KL(q(Y |X)kp✓(Y |X)) + C X l,gl ⇠l,gl s.t. l(1 Eq[rl,gl (X, Y )])  ⇠l,gl gl = 1, . . . , Gl, l = 1, . . . , L, (3) where ⇠l,gl 0 is the slack variable for respec- tive logic constraint; and C is the regularization parameter. The problem can be seen as project- ing p✓ into the constrained subspace. The problem is convex and can be efficiently solved in its dual form with closed-form solutions. We provide the detailed derivation in the supplementary materials sentime straints gram de task (se ming fo constrai proxima samples the con the rele ference when a the soft forward tributio calculat p v.s. q q(y|x)の計算(2/2) 7 q,⇠ 0 l,gl l s.t. l(1 Eq[rl,gl (X, Y )])  ⇠l,gl gl = 1, . . . , Gl, l = 1, . . . , L, (3) where ⇠l,gl 0 is the slack variable for respec- tive logic constraint; and C is the regularization parameter. The problem can be seen as project- ing p✓ into the constrained subspace. The problem is convex and can be efficiently solved in its dual form with closed-form solutions. We provide the detailed derivation in the supplementary materials and directly give the solution here: q⇤ (Y |X) / p✓(Y |X) exp 8 < : X l,gl C l(1 rl,gl (X, Y )) 9 = ; (4) Intuitively, a strong rule with large l will lead to low probabilities of predictions that fail to meet the constraints spa the relevant instan ference (and rando when a group is to the soft prediction forward pass is re tribution p✓(y|x) calculating the trut p v.s. q at Test T either the distilled network q after a fi sults show that bot over the base netwo label instances. In p. Particularly, q i rules introduce add 2413 16/09/12 第8回最先端NLP勉強会
  • 8. 規則について •  ⼀一階述語論論理理で表現 –  A but B という⽂文のポジ/ネガは B と⼀一致 •  Probabilistic soft logicの枠組みで0から1の連続値 に変換 –  論論理理演算⼦子は 8 lg g=1 is typically relevant to only a single or subset of examples, though here we give the most general form on the entire set. We encode the FOL rules using soft logic (Bach et al., 2015) for flexible encoding and stable opti- mization. Specifically, soft logic allows continu- ous truth values from the interval [0, 1] instead of {0, 1}, and the Boolean logic operators are refor- mulated as: A&B = max{A + B 1, 0} A _ B = min{A + B, 1} A1 ^ · · · ^ AN = X i Ai/N ¬A = 1 A (1) Here & and ^ are two different approximations vector tion pa of the t A si other s et al., cess is p✓(y|x which to hum system by pro (i.e., th ence f teache trained is a classification gnition which is a describe the base not focusing on we largely use the evious successful the linguistically- junction word “but” is one of the strong indicators for such sentiment changes in a sentence, where the sentiment of clauses following “but” generally dominates. We thus consider sentences S with an “A-but-B” structure, and expect the sentiment of the whole sentence to be consistent with the sentiment of clause B. The logic rule is written as: has-‘A-but-B’-structure(S) ) (1(y = +) ) ✓(B)+ ^ ✓(B)+ ) 1(y = +)) , (5) 2414 16/09/12 第8回最先端NLP勉強会
  • 9. 規則の計算例例 •  A but B という⽂文のポジ/ネガは  B と⼀一致 •  ⽂文がポジティブ: •  ⽂文がネガティブ: 9 ich is a classification ecognition which is a efly describe the base are not focusing on s, we largely use the o previous successful ign the linguistically- ted. junction word “but” is one of the strong indicators for such sentiment changes in a sentence, where the sentiment of clauses following “but” generally dominates. We thus consider sentences S with an “A-but-B” structure, and expect the sentiment of the whole sentence to be consistent with the sentiment of clause B. The logic rule is written as: has-‘A-but-B’-structure(S) ) (1(y = +) ) ✓(B)+ ^ ✓(B)+ ) 1(y = +)) , (5) 2414 ⽂文Sが  A but B という構造を持つ ⽂文がポジティブのとき1 そうでないとき0 B部分がポジティブと モデルが予測した確率率率 (1(y = +) ) ✓(B)+ ^ ✓(B)+ ) 1(y = +)) , (¬1(y = +) _ ✓(B)+ ^ ¬ ✓(B)+ _ 1(y = +)) , (1 1(y = +) _ ✓(B)+ ^ 1 ✓(B)+ _ 1(y = +)) , (min{1 1(y = +) + ✓(B)+, 1} ^ min{1 ✓(B)+ + 1(y = +), 1}) (1 + ✓(B)+)/2 (2 ✓(B)+)/2 16/09/12 第8回最先端NLP勉強会
  • 11. 実験設定(評判分析) •  ポジ/ネガの⼆二値分類タスク •  データセット: –  Stanford Sentiment Treebank(SST2) –  Movie Review(MR) –  Customer Review(CR) •  ベースライン:(単純な)CNN –  [Kim+ 14]  と同じ⼿手法 •  適⽤用する制約(規則) –  A but B という⽂文のポジ/ネガは  B と⼀一致 •  重要性  λ = 1    11 Algorithm 1 Harnessing NN with Rules Input: The training data D = {(xn, yn)}N n=1, The rule set R = {(Rl, l)}L l=1, Parameters: ⇡ – imitation parameter C – regularization strength 1: Initialize neural network parameter ✓ 2: repeat 3: Sample a minibatch (X, Y ) ⇢ D 4: Construct teacher network q with Eq.(4) 5: Transfer knowledge into p✓ by updating ✓ with Eq.(2) 6: until convergence Output: Distill student network p✓ and teacher network q ning over multiple examples), requiring joint infer- ence. In contrast, as mentioned above, p is more lightweight and efficient, and useful when rule eval- uation is expensive or impossible at prediction time. Our experiments compare the performance of p and q extensively. Imitation Strength ⇡ The imitation parameter ⇡ in Eq.(2) balances between emulating the teacher soft predictions and predicting the true hard la- bels. Since the teacher network is constructed from p✓, which, at the beginning of training, would pro- duce low-quality predictions, we thus favor pre- I like this book store a lot PaddingPadding Word Embedding Convolution Max Pooling Sentence Representation Figure 2: The CNN architecture for sentence-level sentiment analysis. The sentence representation vector is followed by a fully-connected layer with softmax output activation, to output sentiment pre- dictions. 4.1 Sentiment Classification Sentence-level sentiment analysis is to identify the sentiment (e.g., positive or negative) underlying an individual sentence. The task is crucial for many opinion mining applications. One challeng- ing point of the task is to capture the contrastive of neural networks t users are allowed ntentions through c. In this section ur approach by ap- work architectures, recurrent network, ons, i.e., sentence- is a classification gnition which is a describe the base not focusing on we largely use the evious successful the linguistically- windows. Multiple filters with varying window sizes are used to obtain multiple features. Figure 2 shows the network architecture. Logic Rules One difficulty for the plain neural network is to identify contrastive sense in order to capture the dominant sentiment precisely. The con- junction word “but” is one of the strong indicators for such sentiment changes in a sentence, where the sentiment of clauses following “but” generally dominates. We thus consider sentences S with an “A-but-B” structure, and expect the sentiment of the whole sentence to be consistent with the sentiment of clause B. The logic rule is written as: has-‘A-but-B’-structure(S) ) (1(y = +) ) ✓(B)+ ^ ✓(B)+ ) 1(y = +)) , (5) 241416/09/12 第8回最先端NLP勉強会
  • 12. 実験結果(評判分析)(1/3) •  ベースライン [Kim+ 14] から性能が向上 •  MR,CRでstate-of-the-art •  MVCNN(複数の単語ベクトル利利⽤用 + 複雑なCNN (マルチチャンネル,多層))と同等の性能 12 Model SST2 MR CR 1 CNN (Kim, 2014) 87.2 81.3±0.1 84.3±0.2 2 CNN-Rule-p 88.8 81.6±0.1 85.0±0.3 3 CNN-Rule-q 89.3 81.7±0.1 85.3±0.3 4 MGNC-CNN (Zhang et al., 2016) 88.4 – – 5 MVCNN (Yin and Schutze, 2015) 89.4 – – 6 CNN-multichannel (Kim, 2014) 88.1 81.1 85.0 7 Paragraph-Vec (Le and Mikolov, 2014) 87.8 – – 8 CRF-PR (Yang and Cardie, 2014) – – 82.7 9 RNTN (Socher et al., 2013) 85.4 – – 10 G-Dropout (Wang and Manning, 2013) – 79.0 82.1 Table 1: Accuracy (%) of Sentiment Classification. Row 1, CNN (Kim, 2014) is the base network corresponding to the “CNN-non-static” model in (Kim, 2014). Rows 2-3 are the networks enhanced by our framework: CNN-Rule-p is the student network and CNN-Rule-q is the teacher network. For MR and CR, we report the average accuracy±one standard deviation using 10-fold cross validation. the base networks, we obtain substantial improve- ments on both tasks and achieve state-of-the-art or comparable results to previous best-performing systems. Comparison with a diverse set of other or positive sentiment. 3) CR (Hu and Liu, 2004), customer reviews of various products, containing 2 classes and 3,775 instances. For MR and CR, we use 10-fold cross validation as in previous work. In 16/09/12 第8回最先端NLP勉強会
  • 13. 実験結果(評判分析)(2/3) •  pθ(y|x) と  q(y|x) を交互に計算する必要性 –  どちらか⼀一⽅方を最適化,⽚片⽅方の最適化後にもう⼀一⽅方 を求める,などでの性能は?    13 Model Accuracy (%) 1 CNN (Kim, 2014) 87.2 2 -but-clause 87.3 3 -`2-reg 87.5 4 -project 87.9 5 -opt-project 88.3 6 -pipeline 87.9 7 -Rule-p 88.8 8 -Rule-q 89.3 Table 2: Performance of different rule integration 1 2 3 4 5 6 Table 3 of labe header CNNを学習→q(y|x)の計算 q(y|x)を最適化 q(y|x)を最適化→CNNの学習 交互に計算したほうが 性能が良良い 16/09/12 第8回最先端NLP勉強会
  • 14. 実験結果(評判分析)(3/3) •  データ量量に対する効果とラベルなしデータの利利⽤用 •  ラベルなしデータの利利⽤用で性能向上 –  制約によってラベルなしデータを上⼿手く使える 14 (%) integration Data size 5% 10% 30% 100% 1 CNN 79.9 81.6 83.6 87.2 2 -Rule-p 81.5 83.2 84.5 88.8 3 -Rule-q 82.5 83.9 85.6 89.3 4 -semi-PR 81.5 83.1 84.6 – 5 -semi-Rule-p 81.7 83.3 84.7 – 6 -semi-Rule-q 82.7 84.2 85.7 – Table 3: Accuracy (%) on SST2 with varying sizes of labeled data and semi-supervised learning. The header row is the percentage of labeled examples ◯%のラベル付き データを⽤用いて 学習 ◯%のラベル付き データと(100 - ◯)%の ラベルなしデータを ⽤用いて学習 16/09/12 第8回最先端NLP勉強会
  • 15. 実験設定(固有表現抽出) •  4種の固有表現(PER,ORG,LOC,Misc)の認識識タスク •  データセット:CoNLL-2003 データセット –  BIOESタグを採⽤用([Lample+ 16] などと同じ) •  ベースライン:双⽅方向LSTM –  [Chiu and Nichols, 15] からCNNを除去 •  適⽤用する制約(規則) –  出⼒力力タグの並びが破綻していない •  重要性  λ = ∞(強い制約) –  リスト形式の場合,同種のタグとなる •  1.  Juventus, 2. Barcelona, 3. …で  Juventus と Barcelona のタグは同種 •  重要性  λ = 1    1516/09/12 第8回最先端NLP勉強会 where 1(·) is an indicator function that takes 1 when its argument is true, and 0 otherwise; class ‘+’ represents ‘positive’; and ✓(B)+ is the element of ✓(B) for class ’+’. By Eq.(1), when S has the ‘A- but-B’ structure, the truth value of the above logic rule equals to (1 + ✓(B)+)/2 when y = +, and (2 ✓(B)+)/2 otherwise 1. Note that here we assume two-way classification (i.e., positive and negative), though it is straightforward to design rules for finer grained sentiment classification. 4.2 Named Entity Recognition NER is to locate and classify elements in text into entity categories such as “persons” and “organiza- tions”. It is an essential first step for downstream language understanding applications. The task as- signs to each word a named entity tag in an “X-Y” format where X is one of BIEOS (Beginning, In- side, End, Outside, and Singleton) and Y is the entity category. A valid tag sequence has to follow Char+Word Representation Backward LSTM Forward LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM Output Representation NYC locates in USA Figure 3: The architecture of the bidirectional LSTM recurrent network for NER. The CNN for extracting character representation is omitted. The confidence levels are set to 1 to prevent any violation. We further leverage the list structures within and などequal(yi 1, B PER) ) ¬ equal(yi, I ORG) s. The task as- g in an “X-Y” Beginning, In- and Y is the e has to follow of the tagging es (e.g., lists) y expose some has a similar LSTM recur- ) proposed in which has out- models. The word vectors l information, hen fed into a s for sequence hols, 2015) we The confidence levels are set to 1 to prevent any violation. We further leverage the list structures within and across sentences of the same documents. Specifi- cally, named entities at corresponding positions in a list are likely to be in the same categories. For instance, in “1. Juventus, 2. Barcelona, 3. ...” we know “Barcelona” must be an organization rather than a location, since its counterpart entity “Juven- tus” is an organization. We describe our simple procedure for identifying lists and counterparts in the supplementary materials. The logic rule is en- coded as: is-counterpart(X, A) ) 1 kc(ey) c( ✓(A))k2, (7) where ey is the one-hot encoding of y (the class pre- diction of X); c(·) collapses the probability mass
  • 16. 実験結果(固有表現抽出) •  ベースラインから性能が向上 •  膨⼤大な外部資源を使った⼿手法 [Luo+ 15] やパラメータの多い ニューラルネット [Ma and Hovy, 16] と同等の性能 16/09/12 第8回最先端NLP勉強会 16 Model F1 1 BLSTM 89.55 2 BLSTM-Rule-trans p: 89.80, q: 91.11 3 BLSTM-Rules p: 89.93, q: 91.18 4 NN-lex (Collobert et al., 2011) 89.59 5 S-LSTM (Lample et al., 2016) 90.33 6 BLSTM-lex (Chiu and Nichols, 2015) 90.77 7 BLSTM-CRF1 (Lample et al., 2016) 90.94 8 Joint-NER-EL (Luo et al., 2015) 91.20 9 BLSTM-CRF2 (Ma and Hovy, 2016) 91.21 Table 4: Performance of NER on CoNLL-2003. Row 2, BLSTM-Rule-trans imposes the transition rules (Eq.(6)) on the base BLSTM. Row 3, BLSTM- Rules further incorporates the list rule (Eq.(7)). We report the performance of both the student model p NER extra as w joint tured 6 D We h deep to al tions pose fers the w リストに関する制約なし リストに関する制約あり
  • 17. まとめ •  ⼀一般的な規則や⼈人の直観をニューラル ネットに導⼊入する⼿手法を提案 – 規則を⼀一階述語論論理理で表現 – Probabilistic soft logicで0から1の連続値に – 制約として学習 •  評判分析,固有表現抽出で実験 – 制約の導⼊入で性能が向上 – 複雑なネットワークなどと同等の性能 1716/09/12 第8回最先端NLP勉強会