BERT

•Télécharger en tant que PPTX, PDF•

1 j'aime•466 vues

Khang Pham

Short introduction about BERT and some related papers

Données & analyses

BERT
Bidirectional Encoder
Representations from Transformer
PHAM QUANG KHANG

Concept
1. A pre-trained language representation utilizing the architecture of Transformer on 2 tasks
a. Randomly mask some words within a sequence and let the model try to predict that masked words
b. Predict if a pair of sequences is actually one next to other in a larger context: “next sentence
prediction”
2. Can be used as transfer learning (similar to pre-trained on ImageNet in Computer Vision)
a. Pre-train on a large corpus as un-supervised learning to learn the language representation
b. Fine-tune the model for specific tasks: text classification, Name entity recognition, SQuAD
2019/12/16 PHAM QUANG KHANG 2
Devlin et al,. 2018

Architecture
1. Encoder: encoder from transformer
a. Base model: N = 12, Hidden dim=768, Heads=12
b. Large model: N=24, Hidden dime=1024, Heads=16
2. Embedding:
2019/12/16 PHAM QUANG KHANG 3
Token Embedding
Multi-head
Attention
Add & Norm
Feed Forward
Add & Norm
N×
Positional Embedding
Segment Embedding
Linear + Softmax
Output

Pre-training tasks
Masked LM Next sentence prediction
2019/12/16 PHAM QUANG KHANG 4

Fine-tuning on SQuAD
 Use output hidden states to predict start and end span
 Apply 1 Linear(output=2) onto output hidden
state vectors T’i
 Output is predictions of starting and ending
positions of answer within input paragraph
 Objective function is log-likelihood of correct
start and end positions
2019/12/16 PHAM QUANG KHANG 5

Result on SQuAD
SQuAD 1.1: new SOTA SQuAD 2.0: being used as pre-trained model
2019/12/16 PHAM QUANG KHANG 6
https://rajpurkar.github.io/SQuAD-explorer/

Improving from BERT
ROBERTA
1. Train longer, bigger batches, more data
2. Remove next-sentence-prediction task
3. Longer sequences
4. Dynamic changing masks
ALBERT
1. Factorized embedding params
2. Cross-layer param sharing
3. Inter-sentence coherence loss
2019/12/16 PHAM QUANG KHANG 7

References
1. Devlin et al,. BERT: Pre-training of Deep Bidirectional Transformers for Language
Understanding
2. https://colab.research.google.com/github/tensorflow/tpu/blob/master/tools/colab/bert_fi
netuning_with_cloud_tpus.ipynb
3. https://github.com/google-research/bert
4. Pytorch version: https://github.com/huggingface/pytorch-pretrained-BERT
5. Liu et al,. RoBERTa: A Robustly Optimized BERT Pretraining Approach
6. Lan et al,. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
2019/12/16 PHAM QUANG KHANG 8

Recommandé

BERT: Pre-training of Deep Bidirectional Transformers for Language Understandinggohyunwoong

BERT introductionHanwha System / ICT

BERT - Part 1 Learning Notes of Senthil KumarSenthil Kumar M

BERTMohd Shukri Hasan

[Paper review] BERTJEE HYUN PARK

BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingMinh Pham

An introduction to the Transformers architecture and BERTSuman Debnath

1909 BERT: why-and-how (CODE SEMINAR)WarNik Chow

Recommandé

BERT: Pre-training of Deep Bidirectional Transformers for Language Understandinggohyunwoong

BERT introductionHanwha System / ICT

BERT - Part 1 Learning Notes of Senthil KumarSenthil Kumar M

BERTMohd Shukri Hasan

[Paper review] BERTJEE HYUN PARK

BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingMinh Pham

An introduction to the Transformers architecture and BERTSuman Debnath

1909 BERT: why-and-how (CODE SEMINAR)WarNik Chow

BertAbdallah Bashir

NLP State of the Art | BERTshaurya uppal

Word embeddings, RNN, GRU and LSTMDivya Gera

NLP using transformers Arvind Devaraj

BERT: Bidirectional Encoder Representations from TransformersLiangqun Lu

BERT Finetuning Webinar Presentationbhavesh_physics

Thomas Wolf "Transfer learning in NLP"Fwdays

Introduction For seq2seq(sequence to sequence) and RNNHye-min Ahn

GPT-2: Language Models are Unsupervised Multitask LearnersYoung Seok Kim

Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Databricks

Deep learning for NLP and TransformerArvind Devaraj

Word2Vechyunyoung Lee

Fine tune and deploy Hugging Face NLP modelsOVHcloud

Word embedding ShivaniChoudhary74

Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Sergey Karayev

A Review of Deep Contextualized Word Representations (Peters+, 2018)Shuntaro Yada

Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Deep Learning Italia

Bert.pptxDivya Gera

BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingYoung Seok Kim

Word Embeddings, why the hype ? Hady Elsahar

Sreerag parallel programmingSreerag Gopinath

What's New In Python 2.4Richard Jones

Contenu connexe

Tendances

BertAbdallah Bashir

NLP State of the Art | BERTshaurya uppal

Word embeddings, RNN, GRU and LSTMDivya Gera

NLP using transformers Arvind Devaraj

BERT: Bidirectional Encoder Representations from TransformersLiangqun Lu

BERT Finetuning Webinar Presentationbhavesh_physics

Thomas Wolf "Transfer learning in NLP"Fwdays

Introduction For seq2seq(sequence to sequence) and RNNHye-min Ahn

GPT-2: Language Models are Unsupervised Multitask LearnersYoung Seok Kim

Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Databricks

Deep learning for NLP and TransformerArvind Devaraj

Word2Vechyunyoung Lee

Fine tune and deploy Hugging Face NLP modelsOVHcloud

Word embedding ShivaniChoudhary74

Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Sergey Karayev

A Review of Deep Contextualized Word Representations (Peters+, 2018)Shuntaro Yada

Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Deep Learning Italia

Bert.pptxDivya Gera

BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingYoung Seok Kim

Word Embeddings, why the hype ? Hady Elsahar

Tendances (20)

Bert

NLP State of the Art | BERT

Word embeddings, RNN, GRU and LSTM

NLP using transformers

BERT: Bidirectional Encoder Representations from Transformers

BERT Finetuning Webinar Presentation

Thomas Wolf "Transfer learning in NLP"

Introduction For seq2seq(sequence to sequence) and RNN

GPT-2: Language Models are Unsupervised Multitask Learners

Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...

Deep learning for NLP and Transformer

Word2Vec

Fine tune and deploy Hugging Face NLP models

Word embedding

Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)

A Review of Deep Contextualized Word Representations (Peters+, 2018)

Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)

Bert.pptx

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Word Embeddings, why the hype ?

Similaire à BERT

Sreerag parallel programmingSreerag Gopinath

What's New In Python 2.4Richard Jones

Learning spark ch06 - Advanced Spark Programmingphanleson

BERT QnA System for Airplane Flight ManualArkaGhosh65

Understanding Large Social Networks | IRE Major Project | Team 57 Raj Patel

Ire presentationRaj Patel

Understanding Large Social Networks | IRE Major Project | Team 57 | LINERaj Patel

Unit No. 1 Introduction to Java.pptxDrYogeshDeshmukh1

Preparing for Scala 3Martin Odersky

Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Yahoo Developer Network

NS-CUK Seminar: V.T.Hoang, Review on "Relative Molecule Self-Attention Transf...ssuser4b1f48

Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...Universitat Politècnica de Catalunya

Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Universitat Politècnica de Catalunya

PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...Jisang Yoon

Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...Databricks

240318_JW_labseminar[Attention Is All You Need].pptxthanhdowork

Attribute Reduction:An Implementation of Heuristic Algorithm using Apache SparkIRJET Journal

CSE215_Module_02_Elementary_Programming.pptRashedurRahman18

Project Hydrogen: State-of-the-Art Deep Learning on Apache SparkDatabricks

Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARNJosh Patterson

Similaire à BERT (20)

Sreerag parallel programming

What's New In Python 2.4

Learning spark ch06 - Advanced Spark Programming

BERT QnA System for Airplane Flight Manual

Understanding Large Social Networks | IRE Major Project | Team 57

Ire presentation

Understanding Large Social Networks | IRE Major Project | Team 57 | LINE

Unit No. 1 Introduction to Java.pptx

Preparing for Scala 3

Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...

NS-CUK Seminar: V.T.Hoang, Review on "Relative Molecule Self-Attention Transf...

Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...

Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018

PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...

Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...

240318_JW_labseminar[Attention Is All You Need].pptx

Attribute Reduction:An Implementation of Heuristic Algorithm using Apache Spark

CSE215_Module_02_Elementary_Programming.ppt

Project Hydrogen: State-of-the-Art Deep Learning on Apache Spark

Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARN

Dernier

Ravak dropshipping via API with DroFx.pptxolyaivanovalion

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H

04242024_CCC TUG_Joins and Relationshipsccctableauusergroup

CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor

VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

Industrialised data - the key to AI success.pdfLars Albertsson

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa

April 2024 - Crypto Market Report's Analysismanisha194592

Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

Introduction-to-Machine-Learning (1).pptxfirstjob4

Mature dropshipping via API with DroFx.pptxolyaivanovalion

Smarteg dropshipping via API with DroFx.pptxolyaivanovalion

Midocean dropshipping via API with DroFxolyaivanovalion

定制英国白金汉大学毕业证（UCB毕业证书）成绩单原版一比一ffjhghh

Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083

Brighton SEO | April 2024 | Data StorytellingNeil Barnes

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408

Dernier (20)

Ravak dropshipping via API with DroFx.pptx

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf

04242024_CCC TUG_Joins and Relationships

CebaBaby dropshipping via API with DroFX.pptx

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...

VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...

Industrialised data - the key to AI success.pdf

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf

April 2024 - Crypto Market Report's Analysis

Call Girls In Mahipalpur O9654467111 Escorts Service

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...

Introduction-to-Machine-Learning (1).pptx

Mature dropshipping via API with DroFx.pptx

Smarteg dropshipping via API with DroFx.pptx

Midocean dropshipping via API with DroFx

定制英国白金汉大学毕业证（UCB毕业证书）成绩单原版一比一

Dubai Call Girls Wifey O52&786472 Call Girls Dubai

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call

Brighton SEO | April 2024 | Data Storytelling

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps

BERT

1. BERT Bidirectional Encoder Representations from Transformer PHAM QUANG KHANG

2. Concept 1. A pre-trained language representation utilizing the architecture of Transformer on 2 tasks a. Randomly mask some words within a sequence and let the model try to predict that masked words b. Predict if a pair of sequences is actually one next to other in a larger context: “next sentence prediction” 2. Can be used as transfer learning (similar to pre-trained on ImageNet in Computer Vision) a. Pre-train on a large corpus as un-supervised learning to learn the language representation b. Fine-tune the model for specific tasks: text classification, Name entity recognition, SQuAD 2019/12/16 PHAM QUANG KHANG 2 Devlin et al,. 2018

3. Architecture 1. Encoder: encoder from transformer a. Base model: N = 12, Hidden dim=768, Heads=12 b. Large model: N=24, Hidden dime=1024, Heads=16 2. Embedding: 2019/12/16 PHAM QUANG KHANG 3 Token Embedding Multi-head Attention Add & Norm Feed Forward Add & Norm N× Positional Embedding Segment Embedding Linear + Softmax Output

4. Pre-training tasks Masked LM Next sentence prediction 2019/12/16 PHAM QUANG KHANG 4

5. Fine-tuning on SQuAD  Use output hidden states to predict start and end span  Apply 1 Linear(output=2) onto output hidden state vectors T’i  Output is predictions of starting and ending positions of answer within input paragraph  Objective function is log-likelihood of correct start and end positions 2019/12/16 PHAM QUANG KHANG 5

6. Result on SQuAD SQuAD 1.1: new SOTA SQuAD 2.0: being used as pre-trained model 2019/12/16 PHAM QUANG KHANG 6 https://rajpurkar.github.io/SQuAD-explorer/

7. Improving from BERT ROBERTA 1. Train longer, bigger batches, more data 2. Remove next-sentence-prediction task 3. Longer sequences 4. Dynamic changing masks ALBERT 1. Factorized embedding params 2. Cross-layer param sharing 3. Inter-sentence coherence loss 2019/12/16 PHAM QUANG KHANG 7

8. References 1. Devlin et al,. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 2. https://colab.research.google.com/github/tensorflow/tpu/blob/master/tools/colab/bert_fi netuning_with_cloud_tpus.ipynb 3. https://github.com/google-research/bert 4. Pytorch version: https://github.com/huggingface/pytorch-pretrained-BERT 5. Liu et al,. RoBERTa: A Robustly Optimized BERT Pretraining Approach 6. Lan et al,. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations 2019/12/16 PHAM QUANG KHANG 8