Paper Reading : Learning to compose neural networks for question answering

/25
Learning to Compose Neural
Networks for Question
Answering
Jacob Andreas et al.
NAACL HLT 2016(Best Paper)
박상현(ESCA Lab)
1

/25
Abstract (1/2)
 Dynamic Neural Module Network
 이미지와 구조적 지식 베이스 모두에 적용 가능한, 동적으로 조립되는 뉴
럴 네트워크 QA 모델.
2

/25
Abstract (2/2)
 질문 문장을 구문 분석하여 모듈의 컬렉션으로부터 맞춤형 뉴럴 네트워크를 동적으
로 구축.
 이 네트워크를 이미지 또는 지식 베이스에 적용하여 답변을 생성함.
 각 모듈의 매개변수와 Network Layout 매개변수는 강화학습을 통해 공동으로 학습
됨.
질문 파싱
모듈로부터
뉴럴 네트워크
후보 생성
뉴럴 네트워크
선택
답변 생성
Lookup Find Relate Describe ExistsAnd Images KB
3

/25
1. Introduction (1/3)
 이 논문은 다양한 World representation에 대한 QA 작업을 수행하
는 Compositional, Attentional 모델을 제시함.
 논문의 모델은 공동으로 학습되는 두 가지 컴포넌트로 구성됨.
1) Neural Module Collection 2) Layout Predictor
Image, Knowledgebase
4

/25
 VQA를 위한 모듈 기반의 뉴럴 네트워크는 이미 이전 논문(Andreas
et al., 2016)에서 제시됨.
 이전 논문 대비 본 논문의 개선점은 다음 두가지임.
 1) 학습이 가능한 뉴럴 네트워크 Layout 예측기.
 2) 이미지에서만 사용이 가능했던 Visual Primitive를 Knowledge base에 대
해서도 추론이 가능하도록 확장.
5

/25
 이 모델의 학습데이터는 다음 세가지로 구성
됨.
 world
 question
 answer
 Unsupervised 학습을 수행.
 이 모델은 자연 이미지(VQA)와 US 지리 정보(GeoQA)
에 대한 QA 작업에서 state of the art 성능을 성취함.
6

/25
2. Deep networks as functional programs
(1/4)
 저자의 이전 논문에서 VQA 작업을 Modular Sub Problem으로 분해하는
Heuristic한 방법을 제시함.
① 질문을 Stanford Parser로 파싱하여 universal dependency representation(tree) 취득
② 그 다음, wh-단어 또는 연결동사에 연결된 디펜던시의 집합을 필터함
 ex) what is standing in the field?  what(stand)
what color is the cat?  color(cat)
is there a circle next to a square?  is(circle, next-to(square))
③ 모든 Leaf는 find 모듈, 모든 내부 노드는 transform 또는 combine 모듈, 그리고 루트 노
드는 describe 또는 measure 모듈로 구성
ex) what color is the cat?  describe[color](find[cat])
where is the truck?  describe[where](find[truck])
7
본 논문에서는 이 과정을 학습을 통해 결정

/25
(2/4)
Attention
Labeling
“What color is the bird?”
“Where is the bird?”
(find)
“What color is that part
of the image?”
(describe)
8

/25
(3/4)
Attention
Labeling
“Are there any state?”
“where are the states?”
(find)
“does the state exist?”
(Exists)
9

/25
(4/4)
 2 contributions of this paper.
1) Knowledge base에 대해서도 attention 메커니즘을 적용할 수 있도록 확
장하고 일반화함.
2) 모듈을 구조적으로 조립하는 것을 학습하는 모델
 Dynamic Neural Module Network
 질문 문장을 구문 분석하여 구성 가능한 모듈의 컬렉션으로부터 뉴럴 네
트워크를 동적으로 구축하는 모델.
10

/25
3. Related work
 Database QnA
 Wong & Mooney, 2007; Kwiatkowski et al., 2010; Liang et al., 2011; Andreas et al., 2013
 Neural models for QnA
 Iyyer et al., 2014; Bordes et al., 2014; Yang et al., 2015; Malinowski et al., 2015
 Visual QnA
 Simonyan and Zisserman, 2014; Xu et al., 2015; Yang et al., 2015
 Formal logic and representation learning
 Beltagy et al., 2013; Lewis & Steedman, 2013; Malinowski & Fritz, 2014
 Fixed tree structure using universal parser
 Bottou et al., 1997; Socher et al., 2011; Bottou, 2014
11

/25
4. Model
 The goal
 Layout model
 Predict Layout from a Question : 𝑝(𝑧|𝑥; 𝜃𝑙)
 Execution model
 Generate answer from W/R : 𝑝 𝑧 (𝑦|𝑤; 𝜃𝑒)
Questions
World Representations
(Images, Knowledge bases)
Answers
map
12

/25
4.1. Evaluating Modules
 Execution Model :
𝑝 𝑧 𝑦 𝑤 = ([𝑧] 𝑤) 𝑦
 z의 substructure를 명시적으로 언급할 때, ([𝑧] 𝑤) 𝑦를 다음과 같이 나타낼 수
있음.
([𝑧] 𝑤) 𝑦= [𝑚(ℎ1, ℎ2)]
 layout z의 집합은 각 module의 다음 두 가지 Type Constraint에 의해 제한됨.
 Attention : A distribution over pixels or entities
 Labels : a distribution over answers.
[𝑧] 𝑤: 입력 W/R w에 대한 레이아웃 z의 출력
m은 root 모듈, h1, h2는 submodule의
output(attention)
13

/25
 다른 네트워크의 모듈 인스턴스끼리 파라미터를 공유(Parameter Tying)할 수 있음.
 각 모듈은 Parameter Arguments 또는 Ordinary Inputs 을 가짐.
 Parameter Arguments
 layout으로부터 제공 받으며, 어휘 요소에 대한 모듈의 기능을 특정할 때 사용됨.
 ex) what color is the cat?  describe[color](find[cat])
 Ordinary Inputs :
 하위 네트워크의 계산 결과
 ex) what color is the cat?  describe[color](find[cat])
14

/25
• 𝑤1
, 𝑤2
,… : world representation
• W : world representation expressed as a matrix
• σ : ReLU
• h : attention
• 𝑤(ℎ) = 𝑘 ℎ 𝑘 𝑤 𝑘
(ℎ 𝑘는 h의 k번째 요소)
• A, a, B, b, … : Global weights
• 𝑢 𝑖
, 𝑣 𝑖
: Weights associated
with the parameter argument i
• i : Parameter Argument
𝜃𝑒
ex) describe[color](find[cat])
15

/25
 각 네트워크 레이아웃의 최상위 모듈이 describe 또는 exists 모듈
이라고 가정하면, 조립된 전체 네트워크는 출력 레이블 상의 분포
에 상응함.
 학습을 위해 관찰된 z에 대해 (𝑤,𝑦,𝑧) log 𝑝 𝑧
(𝑦|𝑤; 𝜃𝑒 )를 최대화 시
킴.
16

/25
4.2. Assembling networks
 Layout 선정 과정
1) layout 후보 집합 생성.
2) 각 후보 Scoring 하여 Top 1 선택.
17

/25
1) layout 후보 집합 생성
① 입력 문장을 dependency tree로 표현
② wh-word 또는 연결동사에 붙어있는
모든 명사, 동사, 전치사구를 수집
③ 각 단어, 구를 layout fragment에 연관시킴.
- 일반 명사(city) : find
- 고유 명사(Georgia) : lookup
- 전치사구(in) : relate
④ layout fragment 집합의 하위 집합을 구성.
- and 모듈로 모든 하위 fragment를 결합
- measure 또는 describe 모듈을 최상위에 얹음.
논문의 오타로 판단됨. measure는 이전 논문에서 있었지만
본 논문에는 없어짐. measure 대신 exists가 와야 함. 18

/25
2) 각 후보의 점수를 측정하여 최종 선택.
① 질문 문장의 LSTM representation과 query(layout)의 feature based representation를 생성.
② ①에서 얻은 LSTM representation과 feature representation을 이용하여 Score 𝑠 𝑧𝑖 𝑥 계산
𝑠 𝑧𝑖 𝑥 = 𝑎 𝑇
𝜎(𝐵ℎ 𝑞 𝑥 + 𝐶𝑓 𝑧𝑖 + 𝑑)
③ 이 스코어로부터 확률분포를 얻기 위해 Softmax로 정규화 수행
𝑝 𝑧𝑖 𝑥; 𝜃𝑙 =
𝑒 𝑠(𝑧 𝑖|𝑥)
𝑗=1
𝑛
𝑒
𝑠(𝑧 𝑗|𝑥)
𝜃𝑙 = {𝑎, 𝐵, 𝐶, 𝑑} 는 Layout Parameter
ℎ 𝑞 𝑥 : x는 질문문장 𝑓 𝑧𝑖 : i번째 후보 네트워크(z)의 임베딩
19

/25
 저자는 다음과 같은 이유로 강화학습을 이용.
 Key Constraint :
 계산 비용이 비싼 execution model 𝑝 𝑧 𝑦 𝑤; 𝜃𝑒 의 평가량을 최소화 해야 하는 반면,
layout model의 평가 (모든 z에 대한 𝑝 𝑧 𝑥; 𝜃𝑙 계산;scoring도 여기에서 이루어짐) 는
비용이 저렴함.
 이와는 반대로, semantic parsing에서는 쿼리 Execution model은 계산 비용
이 저렴하고, 점수를 철저히 매기기에는 구문 분석 결과 집합이 너무 큼.
 오히려 이 모델의 제약 사항은 강화 학습에서 에이전트가 처하는 시나리
오와 유사함. (action을 scoring하는 비용은 저렴하지만 action을 실행하고
보상을 취득하는 비용은 비쌈)
20

/25
 저자들은 자신들의 모델을 stochastic policy로 표현하여 학습 과정을 모
델링함.
① log 𝑝(𝑧|𝑥; 𝜃𝑙)로부터 z를 샘플링
② 샘플링한 z를 knowledge source에 적용하고 답변 p 𝑦 𝑧, 𝑤; 𝜃𝑒 상의 분포를 얻음.
③ 네트워크 z가 선택되면, log 𝑝 (𝑦|𝑧, 𝑤; 𝜃𝑒)를 최대화함으로써 execution model을
학습시킬 수 있음. 확률 분포에 의해 샘플링하는 과정은 미분이 불가능하므로
Policy Gradient Method 를 이용하여 𝑝(𝑧|𝑥; 𝜃𝑙) 를 최적화함.
 𝛻𝐽 𝜃𝑙 = 𝐸[𝛻 log 𝑝 (𝑧|𝑥; 𝜃𝑙) ∙ 𝑟]
𝛻𝐽 𝜃𝑙 = 𝐸[𝛻 log 𝑝 (𝑧|𝑥; 𝜃𝑙) ∙ log 𝑝 (𝑦|𝑧, 𝑤; 𝜃𝑒)]
r : 보상
execution modellayout model
21

/25
6. Conclusion
 Dynamic Neural Module Network :
 비구조적(예:이미지) 또는 구조적(예:XML 데이터)에 대해 Q&A 작업 가능
 Question, Answer, World Representation만으로 모듈을 조립하는 과정을 학
습.
24

Paper Reading : Learning to compose neural networks for question answering

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Paper Reading : Learning to compose neural networks for question answering

Similaire à Paper Reading : Learning to compose neural networks for question answering (20)

Dernier

Dernier (6)

Paper Reading : Learning to compose neural networks for question answering

Notes de l'éditeur