SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
BERT – Bird’s Eye View
PART1
Learning Notes of Senthil Kumar
Agenda
Bird’s eye view (PART1)
About Language Models
What is BERT? – A SOTA LM
What does BERT do? – Multi-task Learning
How is BERT built? – BERT Architecture
BERT Fine-tuning and Results
What is powering NLP’s Growth?
Two important trends that power recent NLP growth
• Better representation of the text data (with no supervision)
• By grasping the 'context' better
(e.g. terms used language models, word embeddings, contextualized word embeddings)
• By enabling 'transfer learning’
(e.g.: term used pre-trained language models)
• Deep Learning Models that use the better representations to solve real-world problems like Text
Classification, Semantic Role Labelling, Translation, Image Captioning, etc.,
Important Terms
Language Model:
A language model (LM) is a model that generates a probability distribution over sequences of words
Language Model - A model to predict the next word in a sequence of words
***************************
Encoder:
Converts variable length text (any sequence information) into a fixed length vector
***************************
Transformer:
- A sequence model forgoes the recurrent structure of RNN to adopt attention-based approach
- Transformer vs Seq2Seq Model
- { Encoder + (attention-mechanism) + Decoder } vs { Encoder + Decoder }
***************************
Recurrent Structure: Processes all input elements sequentially
Attention-based Approach: Process all input elements SIMULTANEOUSLY
Intro – Count-based N-gram LM (1/5)
Sources:
NLP course on Coursera – National Higher School of Economics
Advanced NLP and Deep Learning course on Udemy (by LazyProgrammer)
Intro – Context-Prediction based LMs (2/5)
𝑃(𝑦|𝑥) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(WT x)
𝑃(𝑦 = 𝑗|𝑥) =
𝑒𝑥𝑝 𝑤𝑗
𝑇 𝑥 + 𝑏𝑗
σ 𝑘 ∊ 𝐾 𝑒𝑥𝑝 𝑤 𝑘
𝑇 𝑥 + 𝑏𝑘
𝑃 𝑦 = 𝑗 𝑥 =
𝑒𝑥𝑝 tanh(𝑤𝑗𝑇 𝑥 + 𝑏𝑗 )
σ 𝑘 ∊ 𝐾 𝑒𝑥𝑝 (𝑡𝑎𝑛ℎ 𝑤 𝑘
𝑇 𝑥 + 𝑏𝑘 )
𝑃(𝑦|𝑥) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(tanh(WT x))
𝐿𝑜𝑔𝑖𝑠𝑡𝑖𝑐 𝐵𝑖𝑔𝑟𝑎𝑚 𝑀𝑜𝑑𝑒𝑙 Feed Forward 𝑁𝑒𝑢𝑟𝑎𝑙 𝑁𝑒𝑡𝑤𝑜𝑟𝑘 𝐿𝑀
proposed by Bengio in 2001
Sources:
Advanced NLP and Deep Learning course on Udemy (by LazyProgrammer); Idea: http://www.marekrei.com/blog/dont-count-predict/
Word2Vec
proposed by Mikilov in 2001
𝑃 𝑦 𝑥 =
𝑛𝑒𝑔_𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔_𝑠𝑜𝑓𝑡𝑚𝑎𝑥
(WT x)
Higher Form of LMs(3/5)
𝑇𝑦𝑝𝑖𝑐𝑎𝑙 𝐿𝑀
Source:
https://indico.io/blog/sequence-modeling-neuralnets-part1/
Seq2Seq – A higher form of LM
Sequence to sequence models build on top of language models by adding an encoder step
and a decoder step. the decoder model sees an encoded representation of the input
sequence as well as the translation sequence, it can make more intelligent predictions
about future words based on the current word
Higher Form of LMs(4/5)
𝑇𝑦𝑝𝑖𝑐𝑎𝑙 𝑆𝑒𝑞2𝑆𝑒𝑞 𝑀𝑜𝑑𝑒𝑙 𝑇𝑟𝑎𝑛𝑠𝑓𝑜𝑟𝑚𝑒𝑟 𝑀𝑜𝑑𝑒𝑙
{ Encoder + (attention-mechanism) + Decoder }
Attention-based Approach: Process all input elements
SIMULTANEOUSLY
{ Encoder + Decoder }
Recurrent Structure: Processes all input elements
sequentially
Sources:
https://towardsdatascience.com/nlp-sequence-to-sequence-networks-part-2-seq2seq-model-encoderdecoder-model-6c22e29fd7e1
Pre-trained Language Models (5/5)
Sources: BERT Paper - https://arxiv.org/pdf/1810.04805.pdf
What is BERT
• Bidirectional Encoder Representations
from Transformers
• SOTA Language Model
• ULMFiT, ELMo, BERT, XLNet -- NLP’s
ImageNet Moment – the era of
contextualized word embedding and
Transfer Learning
Sources:
https://slideplayer.com/slide/17025344/
BERT is Bidirectional
• BERT is bidirectional – reads text from both directions,
left as well as from right to gain better understanding
of the text
• For example, the above sentence is read from BERT is
bidirectional … as well as … understanding of the text
• BERT is bidirectional because it uses Transformer
Encoder (which is made bidirectional by the self
attention mechanism)
• OpenAI Generative Pre-trained Transformer is uni-
directional because it employs Transformer Decoder
(the masked self-attention makes it only left to right)
Sources:
https://github.com/google-research/bert/issues/319
https://github.com/google-research/bert/issues/83
BERT uses Encoder portion of Transformer
Sources: Attention is all you Need: https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf
BERT Architecture
Source: BERT GitHub Repository: https://github.com/google-research/bert
Pretrained Tasks
BERT's core methodology:
- Applying the bidirectional
training of Transformer to do
Language Modeling and Next
Sentence Prediction
BERT is originally pretrained to do 2
tasks:
• Masked Language Model technique
allows bidirectional training for next
word prediction
• Next Sentence Prediction: BERT to
predict the next sentence
Multi-task Learning
Sources:
Idea of Multi-task learning – Karate Kid reference:
https://towardsdatascience.com/whats-new-in-deep-learning-research-how-deep-
builds-multi-task-reinforcement-learning-algorithms-f0d868684c45
Pretrained Tasks - Finetuning
Sources:
BERT Paper - https://arxiv.org/pdf/1810.04805.pdf
http://jalammar.github.io/illustrated-bert/
Task1 - Masked Language Model
• BERT (Masked LM technique) converges slower than left to right predictions. But after a small number of training steps, its accuracy beats the unidirectional model
• Multi-genre Natural Language Inference is a text classification task
• MNLI is a large-scale, crowdsourced entailment classification task (Williams et al., 2018). Given a pair of sentences, the goal is to predict whether the second sentence is an
entailment, contradiction, or neutral with respect to the first one.
Source: https://medium.com/dissecting-bert/dissecting-bert-part-1-d3c3d495cdb3
Task2 - Next Sentence Prediction
BERT to predict the next sentence
BERT <-- two types of input pairs of sentences are fed to the
model
i.50% of the sentence pairs are in the right order
ii.In the other 50%, the second sentence in the pair is
randomly chosen
Source: https://medium.com/dissecting-bert/dissecting-bert-part-1-d3c3d495cdb3
BERT Fine-tuning - Classification
- Stanford Sentiment Treebank 2
- Corpus of Linguistic Acceptability
- Mutigenre Natural Language Inference
- The General Language Understanding Evaluation(GLUE) benchmark is a col-lection
of diverse natural language understandingtasks.
Sources: BERT Paper - https://arxiv.org/pdf/1810.04805.pdf
BERT Fine-tuning – Semantic Role Labeling
- Conference on Natural Language Learning 2003 NER Task
Sources: BERT Paper - https://arxiv.org/pdf/1810.04805.pdf
BERT Fine-tuning – Q&A
TheStanfordQuestionAnsweringDataset
Sources: BERT Paper - https://arxiv.org/pdf/1810.04805.pdf
Thank You

Contenu connexe

Tendances

1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)WarNik Chow
 
Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and TransformerArvind Devaraj
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers Arvind Devaraj
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTSuman Debnath
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer modelsDing Li
 
BERT Finetuning Webinar Presentation
BERT Finetuning Webinar PresentationBERT Finetuning Webinar Presentation
BERT Finetuning Webinar Presentationbhavesh_physics
 
Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Fwdays
 
Gpt1 and 2 model review
Gpt1 and 2 model reviewGpt1 and 2 model review
Gpt1 and 2 model reviewSeoung-Ho Choi
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingYoung Seok Kim
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Deep Learning Italia
 
Pre trained language model
Pre trained language modelPre trained language model
Pre trained language modelJiWenKim
 
Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Yuta Niki
 
Transformers and BERT with SageMaker
Transformers and BERT with SageMakerTransformers and BERT with SageMaker
Transformers and BERT with SageMakerSuman Debnath
 
BERT - Part 2 Learning Notes
BERT - Part 2 Learning NotesBERT - Part 2 Learning Notes
BERT - Part 2 Learning NotesSenthil Kumar M
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMDivya Gera
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersYoung Seok Kim
 

Tendances (20)

1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)1909 BERT: why-and-how (CODE SEMINAR)
1909 BERT: why-and-how (CODE SEMINAR)
 
Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and Transformer
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERT
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
 
BERT Finetuning Webinar Presentation
BERT Finetuning Webinar PresentationBERT Finetuning Webinar Presentation
BERT Finetuning Webinar Presentation
 
BERT
BERTBERT
BERT
 
Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"
 
Gpt1 and 2 model review
Gpt1 and 2 model reviewGpt1 and 2 model review
Gpt1 and 2 model review
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You Need
 
Pre trained language model
Pre trained language modelPre trained language model
Pre trained language model
 
Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)
 
Transformers and BERT with SageMaker
Transformers and BERT with SageMakerTransformers and BERT with SageMaker
Transformers and BERT with SageMaker
 
BERT - Part 2 Learning Notes
BERT - Part 2 Learning NotesBERT - Part 2 Learning Notes
BERT - Part 2 Learning Notes
 
Word embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTMWord embeddings, RNN, GRU and LSTM
Word embeddings, RNN, GRU and LSTM
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask Learners
 

Similaire à BERT - Part 1 Learning Notes of Senthil Kumar

BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...Kyuri Kim
 
Transformer Models_ BERT vs. GPT.pdf
Transformer Models_ BERT vs. GPT.pdfTransformer Models_ BERT vs. GPT.pdf
Transformer Models_ BERT vs. GPT.pdfhelloworld28847
 
BERT Explained_ State of the art language model for NLP.pdf
BERT Explained_ State of the art language model for NLP.pdfBERT Explained_ State of the art language model for NLP.pdf
BERT Explained_ State of the art language model for NLP.pdfsudeshnakundu10
 
Natural language processing
Natural language processingNatural language processing
Natural language processingBirger Moell
 
Andrea gatto meetup_dli_18_feb_2020
Andrea gatto meetup_dli_18_feb_2020Andrea gatto meetup_dli_18_feb_2020
Andrea gatto meetup_dli_18_feb_2020Deep Learning Italia
 
A Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to MarathiA Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to Marathiaciijournal
 
A Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to MarathiA Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to Marathiaciijournal
 
A Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to MarathiA Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to Marathiaciijournal
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFJayavardhan Reddy Peddamail
 
Intent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextIntent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextBayu Aldi Yansyah
 
Natural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application TrendsNatural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application TrendsShreyas Suresh Rao
 
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISHA NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISHIRJET Journal
 
Meta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methodsMeta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methodsLifeng (Aaron) Han
 
Deep Learning in NLP (BERT, ERNIE and REFORMER)
Deep Learning in NLP (BERT, ERNIE and REFORMER)Deep Learning in NLP (BERT, ERNIE and REFORMER)
Deep Learning in NLP (BERT, ERNIE and REFORMER)Biswajit Biswas
 
Introduction to Large Language Models and the Transformer Architecture.pdf
Introduction to Large Language Models and the Transformer Architecture.pdfIntroduction to Large Language Models and the Transformer Architecture.pdf
Introduction to Large Language Models and the Transformer Architecture.pdfsudeshnakundu10
 
Named Entity Recognition (NER) Using Automatic Summarization of Resumes
Named Entity Recognition (NER) Using Automatic Summarization of ResumesNamed Entity Recognition (NER) Using Automatic Summarization of Resumes
Named Entity Recognition (NER) Using Automatic Summarization of ResumesIRJET Journal
 
Masterclass: Natural Language Processing in Trading with Terry Benzschawel & ...
Masterclass: Natural Language Processing in Trading with Terry Benzschawel & ...Masterclass: Natural Language Processing in Trading with Terry Benzschawel & ...
Masterclass: Natural Language Processing in Trading with Terry Benzschawel & ...QuantInsti
 
Modern word embeddings | Andrei Kulagin | Kazan ODSC Meetup
Modern word embeddings | Andrei Kulagin | Kazan ODSC MeetupModern word embeddings | Andrei Kulagin | Kazan ODSC Meetup
Modern word embeddings | Andrei Kulagin | Kazan ODSC MeetupProvectus
 

Similaire à BERT - Part 1 Learning Notes of Senthil Kumar (20)

BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
BERT- Pre-training of Deep Bidirectional Transformers for Language Understand...
 
Transformer Models_ BERT vs. GPT.pdf
Transformer Models_ BERT vs. GPT.pdfTransformer Models_ BERT vs. GPT.pdf
Transformer Models_ BERT vs. GPT.pdf
 
BERT Explained_ State of the art language model for NLP.pdf
BERT Explained_ State of the art language model for NLP.pdfBERT Explained_ State of the art language model for NLP.pdf
BERT Explained_ State of the art language model for NLP.pdf
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Andrea gatto meetup_dli_18_feb_2020
Andrea gatto meetup_dli_18_feb_2020Andrea gatto meetup_dli_18_feb_2020
Andrea gatto meetup_dli_18_feb_2020
 
A Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to MarathiA Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to Marathi
 
A Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to MarathiA Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to Marathi
 
A Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to MarathiA Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to Marathi
 
Transformer Zoo
Transformer ZooTransformer Zoo
Transformer Zoo
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
 
Intent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextIntent Classifier with Facebook fastText
Intent Classifier with Facebook fastText
 
How to Translate from English to Khmer using Moses
How to Translate from English to Khmer using MosesHow to Translate from English to Khmer using Moses
How to Translate from English to Khmer using Moses
 
Natural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application TrendsNatural Language Processing - Research and Application Trends
Natural Language Processing - Research and Application Trends
 
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISHA NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
 
Meta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methodsMeta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methods
 
Deep Learning in NLP (BERT, ERNIE and REFORMER)
Deep Learning in NLP (BERT, ERNIE and REFORMER)Deep Learning in NLP (BERT, ERNIE and REFORMER)
Deep Learning in NLP (BERT, ERNIE and REFORMER)
 
Introduction to Large Language Models and the Transformer Architecture.pdf
Introduction to Large Language Models and the Transformer Architecture.pdfIntroduction to Large Language Models and the Transformer Architecture.pdf
Introduction to Large Language Models and the Transformer Architecture.pdf
 
Named Entity Recognition (NER) Using Automatic Summarization of Resumes
Named Entity Recognition (NER) Using Automatic Summarization of ResumesNamed Entity Recognition (NER) Using Automatic Summarization of Resumes
Named Entity Recognition (NER) Using Automatic Summarization of Resumes
 
Masterclass: Natural Language Processing in Trading with Terry Benzschawel & ...
Masterclass: Natural Language Processing in Trading with Terry Benzschawel & ...Masterclass: Natural Language Processing in Trading with Terry Benzschawel & ...
Masterclass: Natural Language Processing in Trading with Terry Benzschawel & ...
 
Modern word embeddings | Andrei Kulagin | Kazan ODSC Meetup
Modern word embeddings | Andrei Kulagin | Kazan ODSC MeetupModern word embeddings | Andrei Kulagin | Kazan ODSC Meetup
Modern word embeddings | Andrei Kulagin | Kazan ODSC Meetup
 

Dernier

9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 

Dernier (20)

9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 

BERT - Part 1 Learning Notes of Senthil Kumar

  • 1. BERT – Bird’s Eye View PART1 Learning Notes of Senthil Kumar
  • 2. Agenda Bird’s eye view (PART1) About Language Models What is BERT? – A SOTA LM What does BERT do? – Multi-task Learning How is BERT built? – BERT Architecture BERT Fine-tuning and Results
  • 3. What is powering NLP’s Growth? Two important trends that power recent NLP growth • Better representation of the text data (with no supervision) • By grasping the 'context' better (e.g. terms used language models, word embeddings, contextualized word embeddings) • By enabling 'transfer learning’ (e.g.: term used pre-trained language models) • Deep Learning Models that use the better representations to solve real-world problems like Text Classification, Semantic Role Labelling, Translation, Image Captioning, etc.,
  • 4. Important Terms Language Model: A language model (LM) is a model that generates a probability distribution over sequences of words Language Model - A model to predict the next word in a sequence of words *************************** Encoder: Converts variable length text (any sequence information) into a fixed length vector *************************** Transformer: - A sequence model forgoes the recurrent structure of RNN to adopt attention-based approach - Transformer vs Seq2Seq Model - { Encoder + (attention-mechanism) + Decoder } vs { Encoder + Decoder } *************************** Recurrent Structure: Processes all input elements sequentially Attention-based Approach: Process all input elements SIMULTANEOUSLY
  • 5. Intro – Count-based N-gram LM (1/5) Sources: NLP course on Coursera – National Higher School of Economics Advanced NLP and Deep Learning course on Udemy (by LazyProgrammer)
  • 6. Intro – Context-Prediction based LMs (2/5) 𝑃(𝑦|𝑥) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(WT x) 𝑃(𝑦 = 𝑗|𝑥) = 𝑒𝑥𝑝 𝑤𝑗 𝑇 𝑥 + 𝑏𝑗 σ 𝑘 ∊ 𝐾 𝑒𝑥𝑝 𝑤 𝑘 𝑇 𝑥 + 𝑏𝑘 𝑃 𝑦 = 𝑗 𝑥 = 𝑒𝑥𝑝 tanh(𝑤𝑗𝑇 𝑥 + 𝑏𝑗 ) σ 𝑘 ∊ 𝐾 𝑒𝑥𝑝 (𝑡𝑎𝑛ℎ 𝑤 𝑘 𝑇 𝑥 + 𝑏𝑘 ) 𝑃(𝑦|𝑥) = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(tanh(WT x)) 𝐿𝑜𝑔𝑖𝑠𝑡𝑖𝑐 𝐵𝑖𝑔𝑟𝑎𝑚 𝑀𝑜𝑑𝑒𝑙 Feed Forward 𝑁𝑒𝑢𝑟𝑎𝑙 𝑁𝑒𝑡𝑤𝑜𝑟𝑘 𝐿𝑀 proposed by Bengio in 2001 Sources: Advanced NLP and Deep Learning course on Udemy (by LazyProgrammer); Idea: http://www.marekrei.com/blog/dont-count-predict/ Word2Vec proposed by Mikilov in 2001 𝑃 𝑦 𝑥 = 𝑛𝑒𝑔_𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔_𝑠𝑜𝑓𝑡𝑚𝑎𝑥 (WT x)
  • 7. Higher Form of LMs(3/5) 𝑇𝑦𝑝𝑖𝑐𝑎𝑙 𝐿𝑀 Source: https://indico.io/blog/sequence-modeling-neuralnets-part1/ Seq2Seq – A higher form of LM Sequence to sequence models build on top of language models by adding an encoder step and a decoder step. the decoder model sees an encoded representation of the input sequence as well as the translation sequence, it can make more intelligent predictions about future words based on the current word
  • 8. Higher Form of LMs(4/5) 𝑇𝑦𝑝𝑖𝑐𝑎𝑙 𝑆𝑒𝑞2𝑆𝑒𝑞 𝑀𝑜𝑑𝑒𝑙 𝑇𝑟𝑎𝑛𝑠𝑓𝑜𝑟𝑚𝑒𝑟 𝑀𝑜𝑑𝑒𝑙 { Encoder + (attention-mechanism) + Decoder } Attention-based Approach: Process all input elements SIMULTANEOUSLY { Encoder + Decoder } Recurrent Structure: Processes all input elements sequentially Sources: https://towardsdatascience.com/nlp-sequence-to-sequence-networks-part-2-seq2seq-model-encoderdecoder-model-6c22e29fd7e1
  • 9. Pre-trained Language Models (5/5) Sources: BERT Paper - https://arxiv.org/pdf/1810.04805.pdf
  • 10. What is BERT • Bidirectional Encoder Representations from Transformers • SOTA Language Model • ULMFiT, ELMo, BERT, XLNet -- NLP’s ImageNet Moment – the era of contextualized word embedding and Transfer Learning Sources: https://slideplayer.com/slide/17025344/
  • 11. BERT is Bidirectional • BERT is bidirectional – reads text from both directions, left as well as from right to gain better understanding of the text • For example, the above sentence is read from BERT is bidirectional … as well as … understanding of the text • BERT is bidirectional because it uses Transformer Encoder (which is made bidirectional by the self attention mechanism) • OpenAI Generative Pre-trained Transformer is uni- directional because it employs Transformer Decoder (the masked self-attention makes it only left to right) Sources: https://github.com/google-research/bert/issues/319 https://github.com/google-research/bert/issues/83
  • 12. BERT uses Encoder portion of Transformer Sources: Attention is all you Need: https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf
  • 13. BERT Architecture Source: BERT GitHub Repository: https://github.com/google-research/bert
  • 14. Pretrained Tasks BERT's core methodology: - Applying the bidirectional training of Transformer to do Language Modeling and Next Sentence Prediction BERT is originally pretrained to do 2 tasks: • Masked Language Model technique allows bidirectional training for next word prediction • Next Sentence Prediction: BERT to predict the next sentence Multi-task Learning Sources: Idea of Multi-task learning – Karate Kid reference: https://towardsdatascience.com/whats-new-in-deep-learning-research-how-deep- builds-multi-task-reinforcement-learning-algorithms-f0d868684c45
  • 15. Pretrained Tasks - Finetuning Sources: BERT Paper - https://arxiv.org/pdf/1810.04805.pdf http://jalammar.github.io/illustrated-bert/
  • 16. Task1 - Masked Language Model • BERT (Masked LM technique) converges slower than left to right predictions. But after a small number of training steps, its accuracy beats the unidirectional model • Multi-genre Natural Language Inference is a text classification task • MNLI is a large-scale, crowdsourced entailment classification task (Williams et al., 2018). Given a pair of sentences, the goal is to predict whether the second sentence is an entailment, contradiction, or neutral with respect to the first one. Source: https://medium.com/dissecting-bert/dissecting-bert-part-1-d3c3d495cdb3
  • 17. Task2 - Next Sentence Prediction BERT to predict the next sentence BERT <-- two types of input pairs of sentences are fed to the model i.50% of the sentence pairs are in the right order ii.In the other 50%, the second sentence in the pair is randomly chosen Source: https://medium.com/dissecting-bert/dissecting-bert-part-1-d3c3d495cdb3
  • 18. BERT Fine-tuning - Classification - Stanford Sentiment Treebank 2 - Corpus of Linguistic Acceptability - Mutigenre Natural Language Inference - The General Language Understanding Evaluation(GLUE) benchmark is a col-lection of diverse natural language understandingtasks. Sources: BERT Paper - https://arxiv.org/pdf/1810.04805.pdf
  • 19. BERT Fine-tuning – Semantic Role Labeling - Conference on Natural Language Learning 2003 NER Task Sources: BERT Paper - https://arxiv.org/pdf/1810.04805.pdf
  • 20. BERT Fine-tuning – Q&A TheStanfordQuestionAnsweringDataset Sources: BERT Paper - https://arxiv.org/pdf/1810.04805.pdf