A joint many task model

•Télécharger en tant que PPTX, PDF•

0 j'aime•179 vues

성

I will introduce a paper about Joint Many-Task Model in 5 NLP tasks accepted EMNLP 2017 This slide were presented at Deep Learning Study group in DAVIAN LAB. Paper link: https://arxiv.org/abs/1611.01587

Ingénierie

A Joint Many-Task Model:
Growing a Neural Network for Multiple
NLP Tasks
Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, and Richart Socher
The University of Tokyo
EMNLP 2017 Accepted
Presented by Choi Seong Jae

Motivation
• 기존의 접근법은 Single Task를 다루는 것에 치중 돼 있음
• Multi-task를 다루는 것에서도, 연관성이 큰 task들을 학습하는 형태였음(POS tagging, Chunking
ETC.)
• Zhang and Weiss (2016) 논문에서 POS tagging과 dependency parsing을 jointly learning을 할 경
우 효과적이란 것을 보임

Details: Word Representations
• Word embeddings
• Skip-gram
• Character embeddings
• N-gram embeddings
Example:
(n = 1, 2, 3) of the word “Cat”
{C, a, t, #B#C, Ca, at, t#E#, #B#Ca, Cat, at#E#}
Average of the unique character n–gram embeddings

Details: Word-Level Task: POS Tagging
• Bi-directional LSTM
Input
𝒕-th word:

Details: Word-Level Task: Chunking
• Word-level에서 chunking-tag(B-NP, I-VP, etc.)등을 분류하는 Task
• Bi-directional LSTM
Input
Number of POS tags
Corresponding label embedding

Details: Syntactic Task: Dependency Parsing
• 문장 내의 word pair들 사이의 syntactic relation을 찾아내는 Task
• Bi-directional LSTM
𝑤𝑡의 parent node를 예측하기 위한 matching function

Details: Semantic Task: Semantic relatedness
• 두 문장 사이의 semantic relationship를 찾아내는 Task
• Output으로 두 문장 사이의 real-valued relatedness score가 됨
Sentence representations The absolute values of the element-wise subtraction
Element-wise multiplicationThe feature vector representation
Max pooling strategy

Details: Semantic Task: Textual entailment
• Sentence s와 Hypothesis h가 있으면, s가 주어졌을 때 h를 추론할 수 있
는지를 확인하는 Task
• Entailment, Contradiction and Neutral 3개의 class로 분류
The absolute values of the element-wise subtraction
Element-wise multiplicationThe feature vector representation
어느 문장이 hypothesis인지 알기 위해

Training: POS tagging, Chunking, Dependency Parsing Layer
L2-norm regularization
Successive regularization Model이 이전 Task에서 학습한 것을 잊지 않도록

Training: Relatedness, Textual Entailment Layer
두 확률분포의 차이를 구하는 KL-divergence

Experimental Settings
• POS tagging, Chunking, Dependency Parsing
• Wall Street Journal(WSJ) portion of Penn Treebank dataset 사용
• Semantic relatedness, Textual entailment
• SICK dataset(Marelli et al., 2014) 를 사용

Conclusion
• Growing depth를 통해 multiple NLP task를 다루는 것을 하였음
• Depth를 증가 시킬 때, linguistic hierarchies 고려하고 shortcut
connection 적용함으로써 성공적으로 학습 할 수 있었음
• 논문에 나온 5개의 task외에 entity detection, relation extraction 등을
사용해 좀 더 발전할 여지가 충분함

Recommandé

GloVe:Global vectors for word representationkeunbong kwak

Convolutional neural networks for sentence classificationkeunbong kwak

Brief hystory of NLP and Word2VecSilverQ

From A Neural Probalistic Language Model to Word2vecJungkyu Lee

추천 시스템 개요 (1)-drafthyunsung lee

(Kor ver.)NLP embedding(word2vec) tutorial & implementation(Tensorflow)Adonis Han

딥러닝을 이용한 자연어처리의 연구동향홍배 김

머신러닝의 자연어 처리기술(I)홍배 김

Recommandé

GloVe:Global vectors for word representationkeunbong kwak

Convolutional neural networks for sentence classificationkeunbong kwak

Brief hystory of NLP and Word2VecSilverQ

From A Neural Probalistic Language Model to Word2vecJungkyu Lee

추천 시스템 개요 (1)-drafthyunsung lee

(Kor ver.)NLP embedding(word2vec) tutorial & implementation(Tensorflow)Adonis Han

딥러닝을 이용한 자연어처리의 연구동향홍배 김

머신러닝의 자연어 처리기술(I)홍배 김

Bidirectional attention flow for machine comprehensionWoodam Lim

Attention is all you needHoon Heo

Variational Autoencoder를 여러 가지 각도에서 이해하기 (Understanding Variational Autoencod...Haezoom Inc.

검색엔진에 적용된 딥러닝 모델 방법론Tae Young Lee

MT-DNNYoung Rok Jang

1910 tfkr3 warnikchowWarNik Chow

<Little Big Data #1> 한국어 채팅 데이터로 머신러닝 하기Han-seok Jo

[221] 딥러닝을 이용한 지역 컨텍스트 검색 김진호NAVER D2

230112_word2vec1_논문리뷰.pdfminalang

Mahout주영 송

Rumor detection on social mediasegwangkim

Word 2 Vec AlgorithmHyeongmin Lee

20230727_tinystoriesYongSang Yoo

풀잎스쿨 - LIME 발표자료(설명가능한 인공지능 기획!)Bong-Ho Lee

InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...홍배 김

[Tf2017] day3 jwkang_pubJaewook. Kang

Mathematical SociologyChangdong Oh

221011_BERTYongSang Yoo

제 11회 보아즈(BOAZ) 빅데이터 컨퍼런스 - 코끼리(BOAZ) 사서의 도서 추천 솔루션BOAZ Bigdata

[PyCon KR 2018] 땀내를 줄이는 Data와 Feature 다루기Joeun Park

Contenu connexe

Similaire à A joint many task model

Bidirectional attention flow for machine comprehensionWoodam Lim

Attention is all you needHoon Heo

Variational Autoencoder를 여러 가지 각도에서 이해하기 (Understanding Variational Autoencod...Haezoom Inc.

검색엔진에 적용된 딥러닝 모델 방법론Tae Young Lee

MT-DNNYoung Rok Jang

1910 tfkr3 warnikchowWarNik Chow

<Little Big Data #1> 한국어 채팅 데이터로 머신러닝 하기Han-seok Jo

[221] 딥러닝을 이용한 지역 컨텍스트 검색 김진호NAVER D2

230112_word2vec1_논문리뷰.pdfminalang

Mahout주영 송

Rumor detection on social mediasegwangkim

Word 2 Vec AlgorithmHyeongmin Lee

20230727_tinystoriesYongSang Yoo

풀잎스쿨 - LIME 발표자료(설명가능한 인공지능 기획!)Bong-Ho Lee

InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...홍배 김

[Tf2017] day3 jwkang_pubJaewook. Kang

Mathematical SociologyChangdong Oh

221011_BERTYongSang Yoo

제 11회 보아즈(BOAZ) 빅데이터 컨퍼런스 - 코끼리(BOAZ) 사서의 도서 추천 솔루션BOAZ Bigdata

[PyCon KR 2018] 땀내를 줄이는 Data와 Feature 다루기Joeun Park

Similaire à A joint many task model (20)

Bidirectional attention flow for machine comprehension

Attention is all you need

Variational Autoencoder를 여러 가지 각도에서 이해하기 (Understanding Variational Autoencod...

검색엔진에 적용된 딥러닝 모델 방법론

MT-DNN

1910 tfkr3 warnikchow

<Little Big Data #1> 한국어 채팅 데이터로 머신러닝 하기

[221] 딥러닝을 이용한 지역 컨텍스트 검색 김진호

230112_word2vec1_논문리뷰.pdf

Mahout

Rumor detection on social media

Word 2 Vec Algorithm

20230727_tinystories

풀잎스쿨 - LIME 발표자료(설명가능한 인공지능 기획!)

InfoGAN: Interpretable Representation Learning by Information Maximizing Gene...

[Tf2017] day3 jwkang_pub

Mathematical Sociology

221011_BERT

제 11회 보아즈(BOAZ) 빅데이터 컨퍼런스 - 코끼리(BOAZ) 사서의 도서 추천 솔루션

[PyCon KR 2018] 땀내를 줄이는 Data와 Feature 다루기

A joint many task model

1. A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, and Richart Socher The University of Tokyo EMNLP 2017 Accepted Presented by Choi Seong Jae

2. Overview: Joint Many-Task (JMT) Model

3. Motivation • 기존의 접근법은 Single Task를 다루는 것에 치중 돼 있음 • Multi-task를 다루는 것에서도, 연관성이 큰 task들을 학습하는 형태였음(POS tagging, Chunking ETC.) • Zhang and Weiss (2016) 논문에서 POS tagging과 dependency parsing을 jointly learning을 할 경 우 효과적이란 것을 보임

4. Details: Word Representations • Word embeddings • Skip-gram • Character embeddings • N-gram embeddings Example: (n = 1, 2, 3) of the word “Cat” {C, a, t, #B#C, Ca, at, t#E#, #B#Ca, Cat, at#E#} Average of the unique character n–gram embeddings

5. Details: Word-Level Task: POS Tagging • Bi-directional LSTM Input 𝒕-th word:

6. Details: Word-Level Task: Chunking • Word-level에서 chunking-tag(B-NP, I-VP, etc.)등을 분류하는 Task • Bi-directional LSTM Input Number of POS tags Corresponding label embedding

7. Details: Syntactic Task: Dependency Parsing • 문장 내의 word pair들 사이의 syntactic relation을 찾아내는 Task • Bi-directional LSTM 𝑤𝑡의 parent node를 예측하기 위한 matching function

8. Details: Semantic Task: Semantic relatedness • 두 문장 사이의 semantic relationship를 찾아내는 Task • Output으로 두 문장 사이의 real-valued relatedness score가 됨 Sentence representations The absolute values of the element-wise subtraction Element-wise multiplicationThe feature vector representation Max pooling strategy

9. Details: Semantic Task: Textual entailment • Sentence s와 Hypothesis h가 있으면, s가 주어졌을 때 h를 추론할 수 있 는지를 확인하는 Task • Entailment, Contradiction and Neutral 3개의 class로 분류 The absolute values of the element-wise subtraction Element-wise multiplicationThe feature vector representation 어느 문장이 hypothesis인지 알기 위해

10. Training: POS tagging, Chunking, Dependency Parsing Layer L2-norm regularization Successive regularization Model이 이전 Task에서 학습한 것을 잊지 않도록

11. Training: Relatedness, Textual Entailment Layer 두 확률분포의 차이를 구하는 KL-divergence

12. Experimental Settings • POS tagging, Chunking, Dependency Parsing • Wall Street Journal(WSJ) portion of Penn Treebank dataset 사용 • Semantic relatedness, Textual entailment • SICK dataset(Marelli et al., 2014) 를 사용

13. Experiments

14. Experiments

15. Experiments

16. Experiments

17. Conclusion • Growing depth를 통해 multiple NLP task를 다루는 것을 하였음 • Depth를 증가 시킬 때, linguistic hierarchies 고려하고 shortcut connection 적용함으로써 성공적으로 학습 할 수 있었음 • 논문에 나온 5개의 task외에 entity detection, relation extraction 등을 사용해 좀 더 발전할 여지가 충분함

Notes de l'éditeur

기존의 방법론들과는 달리, linguistic hierarchical 하게 POS tagging, chunking, dependency parsing, semantic relatedness, and textual entailment 로 구성하여 end-to-end 형식으로 점점 복잡한 Model들을 만들어 가는 방식입니다. 이렇게 hierarchical 한 형태로 만들어 low level layer들이 high level layer들의 성능을 높일 수 있다는 것을 보여주는 논문입니다.
연관된 TASK 끼리 합쳐서 학습할 경우, low-level과 high-level task 전부 성능이 향상 된 것을 볼 수 있다.
POS tagging의 경우 state-of-the-art 한 방법과 비슷한 성능을 내었다. Best resul는 Ling et al 방법론인데, character-based LSTM을 사용한 방법이다. Chunking은 stat—of-the-art 한 성능을 냈다. Sogaard and goldberg의 방법론은 pos tagging과 chunking을 different layer에서 jointly learn을 했는데, chunking에서만 성능 향상을 보였다. Dependency parsing 에서 beam search를 사용한 andor et al. 방법보다 뛰어난 성능을 냈다. Best result는 sophisticated attention mechanism(biaffine attention)을 사용한 방법이다. Semantic relatedness 는 state-of-the-art 한 성능이다. 기존 방법은 syntactic trees을 사용하거나 tree와 attention을 사용한 방법이다. Textual entailment 는 state-of-the-art 한 성능. 기존 방법은 dataset에 맞춘 preprocessing과 feature와 attention mechanism을 이용한 방법이다.
Shortcut connection 은 word representations 을 모든 bi-directional LSTM 에 넣는 것을 의미함