SlideShare une entreprise Scribd logo
1  sur  8
BERT
Bidirectional Encoder
Representations from Transformer
PHAM QUANG KHANG
Concept
1. A pre-trained language representation utilizing the architecture of Transformer on 2 tasks
a. Randomly mask some words within a sequence and let the model try to predict that masked words
b. Predict if a pair of sequences is actually one next to other in a larger context: “next sentence
prediction”
2. Can be used as transfer learning (similar to pre-trained on ImageNet in Computer Vision)
a. Pre-train on a large corpus as un-supervised learning to learn the language representation
b. Fine-tune the model for specific tasks: text classification, Name entity recognition, SQuAD
2019/12/16 PHAM QUANG KHANG 2
Devlin et al,. 2018
Architecture
1. Encoder: encoder from transformer
a. Base model: N = 12, Hidden dim=768, Heads=12
b. Large model: N=24, Hidden dime=1024, Heads=16
2. Embedding:
2019/12/16 PHAM QUANG KHANG 3
Token Embedding
Multi-head
Attention
Add & Norm
Feed Forward
Add & Norm
N×
Positional Embedding
Segment Embedding
Linear + Softmax
Output
Pre-training tasks
Masked LM Next sentence prediction
2019/12/16 PHAM QUANG KHANG 4
Fine-tuning on SQuAD
 Use output hidden states to predict start and end span
 Apply 1 Linear(output=2) onto output hidden
state vectors T’i
 Output is predictions of starting and ending
positions of answer within input paragraph
 Objective function is log-likelihood of correct
start and end positions
2019/12/16 PHAM QUANG KHANG 5
Result on SQuAD
SQuAD 1.1: new SOTA SQuAD 2.0: being used as pre-trained model
2019/12/16 PHAM QUANG KHANG 6
https://rajpurkar.github.io/SQuAD-explorer/
Improving from BERT
ROBERTA
1. Train longer, bigger batches, more data
2. Remove next-sentence-prediction task
3. Longer sequences
4. Dynamic changing masks
ALBERT
1. Factorized embedding params
2. Cross-layer param sharing
3. Inter-sentence coherence loss
2019/12/16 PHAM QUANG KHANG 7
References
1. Devlin et al,. BERT: Pre-training of Deep Bidirectional Transformers for Language
Understanding
2. https://colab.research.google.com/github/tensorflow/tpu/blob/master/tools/colab/bert_fi
netuning_with_cloud_tpus.ipynb
3. https://github.com/google-research/bert
4. Pytorch version: https://github.com/huggingface/pytorch-pretrained-BERT
5. Liu et al,. RoBERTa: A Robustly Optimized BERT Pretraining Approach
6. Lan et al,. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
2019/12/16 PHAM QUANG KHANG 8

Contenu connexe

Tendances

NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERTshaurya uppal
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMDivya Gera
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers Arvind Devaraj
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersLiangqun Lu
 
BERT Finetuning Webinar Presentation
BERT Finetuning Webinar PresentationBERT Finetuning Webinar Presentation
BERT Finetuning Webinar Presentationbhavesh_physics
 
Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Fwdays
 
Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNHye-min Ahn
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersYoung Seok Kim
 
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Databricks
 
Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and TransformerArvind Devaraj
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsOVHcloud
 
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Sergey Karayev
 
A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)Shuntaro Yada
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Deep Learning Italia
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingYoung Seok Kim
 
Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Hady Elsahar
 

Tendances (20)

Bert
BertBert
Bert
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTM
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from Transformers
 
BERT Finetuning Webinar Presentation
BERT Finetuning Webinar PresentationBERT Finetuning Webinar Presentation
BERT Finetuning Webinar Presentation
 
Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"
 
Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNN
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask Learners
 
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
 
Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and Transformer
 
Word2Vec
Word2VecWord2Vec
Word2Vec
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP models
 
Word embedding
Word embedding Word embedding
Word embedding
 
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
 
A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
 
Bert.pptx
Bert.pptxBert.pptx
Bert.pptx
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Word Embeddings, why the hype ?
Word Embeddings, why the hype ?
 

Similaire à BERT

Sreerag parallel programming
Sreerag   parallel programmingSreerag   parallel programming
Sreerag parallel programmingSreerag Gopinath
 
What's New In Python 2.4
What's New In Python 2.4What's New In Python 2.4
What's New In Python 2.4Richard Jones
 
Learning spark ch06 - Advanced Spark Programming
Learning spark ch06 - Advanced Spark ProgrammingLearning spark ch06 - Advanced Spark Programming
Learning spark ch06 - Advanced Spark Programmingphanleson
 
BERT QnA System for Airplane Flight Manual
BERT QnA System for Airplane Flight ManualBERT QnA System for Airplane Flight Manual
BERT QnA System for Airplane Flight ManualArkaGhosh65
 
Understanding Large Social Networks | IRE Major Project | Team 57
Understanding Large Social Networks | IRE Major Project | Team 57 Understanding Large Social Networks | IRE Major Project | Team 57
Understanding Large Social Networks | IRE Major Project | Team 57 Raj Patel
 
Ire presentation
Ire presentationIre presentation
Ire presentationRaj Patel
 
Understanding Large Social Networks | IRE Major Project | Team 57 | LINE
Understanding Large Social Networks | IRE Major Project | Team 57 | LINEUnderstanding Large Social Networks | IRE Major Project | Team 57 | LINE
Understanding Large Social Networks | IRE Major Project | Team 57 | LINERaj Patel
 
Unit No. 1 Introduction to Java.pptx
Unit No. 1 Introduction to Java.pptxUnit No. 1 Introduction to Java.pptx
Unit No. 1 Introduction to Java.pptxDrYogeshDeshmukh1
 
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Yahoo Developer Network
 
NS-CUK Seminar: V.T.Hoang, Review on "Relative Molecule Self-Attention Transf...
NS-CUK Seminar: V.T.Hoang, Review on "Relative Molecule Self-Attention Transf...NS-CUK Seminar: V.T.Hoang, Review on "Relative Molecule Self-Attention Transf...
NS-CUK Seminar: V.T.Hoang, Review on "Relative Molecule Self-Attention Transf...ssuser4b1f48
 
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...Universitat Politècnica de Catalunya
 
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...Jisang Yoon
 
Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...
Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...
Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...Databricks
 
240318_JW_labseminar[Attention Is All You Need].pptx
240318_JW_labseminar[Attention Is All You Need].pptx240318_JW_labseminar[Attention Is All You Need].pptx
240318_JW_labseminar[Attention Is All You Need].pptxthanhdowork
 
Attribute Reduction:An Implementation of Heuristic Algorithm using Apache Spark
Attribute Reduction:An Implementation of Heuristic Algorithm using Apache SparkAttribute Reduction:An Implementation of Heuristic Algorithm using Apache Spark
Attribute Reduction:An Implementation of Heuristic Algorithm using Apache SparkIRJET Journal
 
CSE215_Module_02_Elementary_Programming.ppt
CSE215_Module_02_Elementary_Programming.pptCSE215_Module_02_Elementary_Programming.ppt
CSE215_Module_02_Elementary_Programming.pptRashedurRahman18
 
Project Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Project Hydrogen: State-of-the-Art Deep Learning on Apache SparkProject Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Project Hydrogen: State-of-the-Art Deep Learning on Apache SparkDatabricks
 
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARN
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARNHadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARN
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARNJosh Patterson
 

Similaire à BERT (20)

Sreerag parallel programming
Sreerag   parallel programmingSreerag   parallel programming
Sreerag parallel programming
 
What's New In Python 2.4
What's New In Python 2.4What's New In Python 2.4
What's New In Python 2.4
 
Learning spark ch06 - Advanced Spark Programming
Learning spark ch06 - Advanced Spark ProgrammingLearning spark ch06 - Advanced Spark Programming
Learning spark ch06 - Advanced Spark Programming
 
BERT QnA System for Airplane Flight Manual
BERT QnA System for Airplane Flight ManualBERT QnA System for Airplane Flight Manual
BERT QnA System for Airplane Flight Manual
 
Understanding Large Social Networks | IRE Major Project | Team 57
Understanding Large Social Networks | IRE Major Project | Team 57 Understanding Large Social Networks | IRE Major Project | Team 57
Understanding Large Social Networks | IRE Major Project | Team 57
 
Ire presentation
Ire presentationIre presentation
Ire presentation
 
Understanding Large Social Networks | IRE Major Project | Team 57 | LINE
Understanding Large Social Networks | IRE Major Project | Team 57 | LINEUnderstanding Large Social Networks | IRE Major Project | Team 57 | LINE
Understanding Large Social Networks | IRE Major Project | Team 57 | LINE
 
Unit No. 1 Introduction to Java.pptx
Unit No. 1 Introduction to Java.pptxUnit No. 1 Introduction to Java.pptx
Unit No. 1 Introduction to Java.pptx
 
Preparing for Scala 3
Preparing for Scala 3Preparing for Scala 3
Preparing for Scala 3
 
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
 
NS-CUK Seminar: V.T.Hoang, Review on "Relative Molecule Self-Attention Transf...
NS-CUK Seminar: V.T.Hoang, Review on "Relative Molecule Self-Attention Transf...NS-CUK Seminar: V.T.Hoang, Review on "Relative Molecule Self-Attention Transf...
NS-CUK Seminar: V.T.Hoang, Review on "Relative Molecule Self-Attention Transf...
 
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
 
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
 
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
 
Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...
Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...
Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...
 
240318_JW_labseminar[Attention Is All You Need].pptx
240318_JW_labseminar[Attention Is All You Need].pptx240318_JW_labseminar[Attention Is All You Need].pptx
240318_JW_labseminar[Attention Is All You Need].pptx
 
Attribute Reduction:An Implementation of Heuristic Algorithm using Apache Spark
Attribute Reduction:An Implementation of Heuristic Algorithm using Apache SparkAttribute Reduction:An Implementation of Heuristic Algorithm using Apache Spark
Attribute Reduction:An Implementation of Heuristic Algorithm using Apache Spark
 
CSE215_Module_02_Elementary_Programming.ppt
CSE215_Module_02_Elementary_Programming.pptCSE215_Module_02_Elementary_Programming.ppt
CSE215_Module_02_Elementary_Programming.ppt
 
Project Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Project Hydrogen: State-of-the-Art Deep Learning on Apache SparkProject Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Project Hydrogen: State-of-the-Art Deep Learning on Apache Spark
 
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARN
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARNHadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARN
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARN
 

Dernier

Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 

Dernier (20)

Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 

BERT

  • 1. BERT Bidirectional Encoder Representations from Transformer PHAM QUANG KHANG
  • 2. Concept 1. A pre-trained language representation utilizing the architecture of Transformer on 2 tasks a. Randomly mask some words within a sequence and let the model try to predict that masked words b. Predict if a pair of sequences is actually one next to other in a larger context: “next sentence prediction” 2. Can be used as transfer learning (similar to pre-trained on ImageNet in Computer Vision) a. Pre-train on a large corpus as un-supervised learning to learn the language representation b. Fine-tune the model for specific tasks: text classification, Name entity recognition, SQuAD 2019/12/16 PHAM QUANG KHANG 2 Devlin et al,. 2018
  • 3. Architecture 1. Encoder: encoder from transformer a. Base model: N = 12, Hidden dim=768, Heads=12 b. Large model: N=24, Hidden dime=1024, Heads=16 2. Embedding: 2019/12/16 PHAM QUANG KHANG 3 Token Embedding Multi-head Attention Add & Norm Feed Forward Add & Norm N× Positional Embedding Segment Embedding Linear + Softmax Output
  • 4. Pre-training tasks Masked LM Next sentence prediction 2019/12/16 PHAM QUANG KHANG 4
  • 5. Fine-tuning on SQuAD  Use output hidden states to predict start and end span  Apply 1 Linear(output=2) onto output hidden state vectors T’i  Output is predictions of starting and ending positions of answer within input paragraph  Objective function is log-likelihood of correct start and end positions 2019/12/16 PHAM QUANG KHANG 5
  • 6. Result on SQuAD SQuAD 1.1: new SOTA SQuAD 2.0: being used as pre-trained model 2019/12/16 PHAM QUANG KHANG 6 https://rajpurkar.github.io/SQuAD-explorer/
  • 7. Improving from BERT ROBERTA 1. Train longer, bigger batches, more data 2. Remove next-sentence-prediction task 3. Longer sequences 4. Dynamic changing masks ALBERT 1. Factorized embedding params 2. Cross-layer param sharing 3. Inter-sentence coherence loss 2019/12/16 PHAM QUANG KHANG 7
  • 8. References 1. Devlin et al,. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 2. https://colab.research.google.com/github/tensorflow/tpu/blob/master/tools/colab/bert_fi netuning_with_cloud_tpus.ipynb 3. https://github.com/google-research/bert 4. Pytorch version: https://github.com/huggingface/pytorch-pretrained-BERT 5. Liu et al,. RoBERTa: A Robustly Optimized BERT Pretraining Approach 6. Lan et al,. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations 2019/12/16 PHAM QUANG KHANG 8