LLaMA 2.pptx

RkRahul16
RkRahul16Student à Daffodil International University
Large Language
Model Mata AI
-LLaMA 2
© Rk Rahul
LLaMA - Overview
● LLaMA is a family of large language models (LLMs)
● LLaMA has four model sizes were trained: 7, 13, 33 and 65
billion parameters
● LLaMA developed by Meta
● First released in February 2023.
LLaMA 2 - Overview
● LLaMA 2 is a family of large language models (LLMs)
● LLaMA 2 is an auto-regressive language model
● First Release, July 18, 2023, in partnership with Microsoft, Meta
and Open-source Large Language models.
● LLaMA 2 pretrained models are trained on 2 trillion tokens, and
have double the context length than LLaMA 1
● Three model sizes were trained: 7, 13,70 billion parameters
● LLaMA 2 is available for free for research and commercial use.
LLaMA 2 – Can Do
● Generate different creative text formats of text content, like poems,
code, scripts, musical pieces, email, letters, etc.
● Translate languages.
● Write different kinds of creative content.
● Answer your questions in an informative way, even if they are open
ended, challenging, or strange.
● Help you with coding tasks.
● Generate dialogue for chatbots and other conversational AI systems.
LLaMA 2 - Improvements
● Increased Training on Tokens: Llama 2 is trained on 40% more
tokens.
● Longer Context Length: With a longer context length of 4k tokens.
● Fine-Tuning for Dialogues: The versions of Llama 2 that are fine-
tuned (Labelled Llama 2-Chat) are aimed at being optimized for
dialogue applications using Reinforcement Learning from Human
Feedback (RLHF).
Fine-Tuning Process and LLaMA-2-Chat
Supervised Fine-
Tuning
LLaMA 2 Building Process
1
Pre-Training
2
3
Reinforcement
Learning from
Human Feedback
(RLHF)
4
Reward Model
LLaMA 2 Pre-Training
● The pretraining approach using an optimized auto-regressive transformer
(several changes to improve performance)
● Also used grouped-query attention (GQA) (improve inference scalability)
● Trained on 2 trillion tokens of data for good performance.
● Model architecture uses standard transformer architecture.
● Pre-normalization using RMSNorm (Root Mean Square Layer Normalization)
LLaMA 2 Pre-Training Normalization
LLaMA 2 - Pretraining Functionality
● Trained using the AdamW optimizer (β1 = 0.9, β2 = 0.95, eps = 10−5
)
● The SwiGLU activation function
● To use a cosine learning rate schedule (warmup of 2000 steps) and decay
final learning rate.
● Weight decay of 0.1 and gradient clipping of 1.0
LLaMA 2 - Training Hardware
● LLaMA 2 was pre-trained on Meta's Research Super Cluster
(RSC) as well as internal production clusters.
● Both clusters use NVIDIA A100 GPUs.
● RSC use NVIDIA Quantum InfiniBand while production
cluster is using a RoCE (RDMA over converged Ethernet)
LLaMA 2 - Supervised Fine-Tuning (SFT)
● SFT is the technique of next-token prediction objective that is nearly
identical to pre-training.
● To encode text for LLaMA 2 and using the method of the tokenizer.
● Supervised fine-tuning to use a cosine learning rate schedule with an
initial learning rate of 2 × 𝟏𝟎−𝟓, a weight decay of 0.1, a batch size
of 64, and a sequence length of 4096 tokens.
LLaMA 2 - Tokenizer
● To encode text from SFT (LLaMA 2), the tokenizer first splits all
numbers into individual digits. LLaMA 2 is a sub word language
model, and it can learn to represent numbers using a small
number of sub words.
● LLaMA 2 is a byte pair encoding (BPE) tokenizer based on the
SentencePiece implementation.
● The total vocabulary size is 32k tokens.
LLaMA 2 - Tokenizer
LLaMA 2 - RLHF
● Reinforcement learning from human feedback (RLHF) is a model training
procedure that is applied to a fine-tuned language model to further align model
behavior with human preferences and instruction following.
● RLHF collects data that represents sampled human preferences, whereby
human annotators select which directly from human feedback on the
model’s output.
● Safety-based data collection during RLHF
● This human feedback is subsequently used to train a reward model, which
learns patterns in the preferences of the human annotators and can then
automate preference decisions.
LLaMA 2 - Reward Model
● The reward model is responsible
for telling the language model
what constitutes a good response.
Its response based on how helpful
and safe it is.
● The reward model takes a model
response and its corresponding as
inputs and outputs a scalar score to
indicate the quality of the model
generation.
LLaMA 2 - Model Evaluations
Reference
● Deep (Learning) Focus -
https://cameronrwolfe.substack.com/p/llama-2-from-the-ground-up
● Meta AI - https://ai.meta.com/
● Research Article - Llama 2: Open Foundation and Fine-Tuned
Chat Models
Thanks!
1 sur 19

Recommandé

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent... par
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...Po-Chuan Chen
135 vues32 diapositives
LLaMA Open and Efficient Foundation Language Models - 230528.pdf par
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdftaeseon ryu
214 vues20 diapositives
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in... par
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...David Talby
669 vues36 diapositives
A Comprehensive Review of Large Language Models for.pptx par
A Comprehensive Review of Large Language Models for.pptxA Comprehensive Review of Large Language Models for.pptx
A Comprehensive Review of Large Language Models for.pptxSaiPragnaKancheti
2.3K vues24 diapositives
And then there were ... Large Language Models par
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language ModelsLeon Dohmen
2.4K vues40 diapositives
Neural Language Generation Head to Toe par
Neural Language Generation Head to Toe Neural Language Generation Head to Toe
Neural Language Generation Head to Toe Hady Elsahar
102 vues193 diapositives

Contenu connexe

Tendances

What is langchain par
What is langchainWhat is langchain
What is langchainBluebash LLC
487 vues9 diapositives
Large Language Models - Chat AI.pdf par
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfDavid Rostcheck
722 vues19 diapositives
Natural language processing and transformer models par
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer modelsDing Li
658 vues31 diapositives
LanGCHAIN Framework par
LanGCHAIN FrameworkLanGCHAIN Framework
LanGCHAIN FrameworkKeymate.AI
1.5K vues19 diapositives
gpt3_presentation.pdf par
gpt3_presentation.pdfgpt3_presentation.pdf
gpt3_presentation.pdfGiacomo Frisoni
439 vues14 diapositives
Introduction to Auto ML par
Introduction to Auto MLIntroduction to Auto ML
Introduction to Auto MLDmitry Petukhov
369 vues12 diapositives

Tendances(20)

Natural language processing and transformer models par Ding Li
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
Ding Li658 vues
LanGCHAIN Framework par Keymate.AI
LanGCHAIN FrameworkLanGCHAIN Framework
LanGCHAIN Framework
Keymate.AI1.5K vues
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT! par taozen
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
taozen1K vues
An introduction to the Transformers architecture and BERT par Suman Debnath
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERT
Suman Debnath341 vues
Training language models to follow instructions with human feedback (Instruct... par Rama Irsheidat
Training language models to follow instructions with human feedback (Instruct...Training language models to follow instructions with human feedback (Instruct...
Training language models to follow instructions with human feedback (Instruct...
Rama Irsheidat196 vues
Building NLP applications with Transformers par Julien SIMON
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with Transformers
Julien SIMON1.1K vues
Training language models to follow instructions with human feedback.pdf par Po-Chuan Chen
Training language models to follow instructions
with human feedback.pdfTraining language models to follow instructions
with human feedback.pdf
Training language models to follow instructions with human feedback.pdf
Po-Chuan Chen272 vues
AutoML - The Future of AI par Ning Jiang
AutoML - The Future of AIAutoML - The Future of AI
AutoML - The Future of AI
Ning Jiang715 vues
Explainable AI par Dinesh V
Explainable AIExplainable AI
Explainable AI
Dinesh V3.9K vues

Similaire à LLaMA 2.pptx

20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in... par
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...San Kim
22 vues56 diapositives
Icbai 2018 ver_1 par
Icbai 2018 ver_1Icbai 2018 ver_1
Icbai 2018 ver_1BlackhatGAURAV
65 vues42 diapositives
230915 paper summary learning to world model with language with details - pub... par
230915 paper summary learning to world model with language with details - pub...230915 paper summary learning to world model with language with details - pub...
230915 paper summary learning to world model with language with details - pub...Seungjoon1
33 vues38 diapositives
Roman Kyslyi: Великі мовні моделі: огляд, виклики та рішення par
Roman Kyslyi: Великі мовні моделі: огляд, виклики та рішенняRoman Kyslyi: Великі мовні моделі: огляд, виклики та рішення
Roman Kyslyi: Великі мовні моделі: огляд, виклики та рішенняLviv Startup Club
16 vues18 diapositives
Long Short Term Memory (Neural Networks) par
Long Short Term Memory (Neural Networks)Long Short Term Memory (Neural Networks)
Long Short Term Memory (Neural Networks)Olusola Amusan
175 vues17 diapositives
Emnlp2015 reading festival_lstm_cws par
Emnlp2015 reading festival_lstm_cwsEmnlp2015 reading festival_lstm_cws
Emnlp2015 reading festival_lstm_cwsAce12358
748 vues23 diapositives

Similaire à LLaMA 2.pptx(20)

20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in... par San Kim
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
20230419-LLaMA-Adapter_ Efficient Fine-tuning of Language Models with Zero-in...
San Kim22 vues
230915 paper summary learning to world model with language with details - pub... par Seungjoon1
230915 paper summary learning to world model with language with details - pub...230915 paper summary learning to world model with language with details - pub...
230915 paper summary learning to world model with language with details - pub...
Seungjoon133 vues
Roman Kyslyi: Великі мовні моделі: огляд, виклики та рішення par Lviv Startup Club
Roman Kyslyi: Великі мовні моделі: огляд, виклики та рішенняRoman Kyslyi: Великі мовні моделі: огляд, виклики та рішення
Roman Kyslyi: Великі мовні моделі: огляд, виклики та рішення
Long Short Term Memory (Neural Networks) par Olusola Amusan
Long Short Term Memory (Neural Networks)Long Short Term Memory (Neural Networks)
Long Short Term Memory (Neural Networks)
Olusola Amusan175 vues
Emnlp2015 reading festival_lstm_cws par Ace12358
Emnlp2015 reading festival_lstm_cwsEmnlp2015 reading festival_lstm_cws
Emnlp2015 reading festival_lstm_cws
Ace12358748 vues
Transfer Learning in NLP: A Survey par NUPUR YADAV
Transfer Learning in NLP: A SurveyTransfer Learning in NLP: A Survey
Transfer Learning in NLP: A Survey
NUPUR YADAV46 vues
ROS 2 AI Integration Working Group 1: ALMA, SustainML & ROS 2 use case par eProsima
ROS 2 AI Integration Working Group 1: ALMA, SustainML & ROS 2 use case ROS 2 AI Integration Working Group 1: ALMA, SustainML & ROS 2 use case
ROS 2 AI Integration Working Group 1: ALMA, SustainML & ROS 2 use case
eProsima56 vues
Ml also helps generic compiler ? par Ryo Takahashi
Ml also helps generic compiler ?Ml also helps generic compiler ?
Ml also helps generic compiler ?
Ryo Takahashi76 vues
High Performance Transfer Learning for Classifying Intent of Sales Engagement... par Databricks
High Performance Transfer Learning for Classifying Intent of Sales Engagement...High Performance Transfer Learning for Classifying Intent of Sales Engagement...
High Performance Transfer Learning for Classifying Intent of Sales Engagement...
Databricks625 vues
A vision for ejabberd - ejabberd SF Meetup par Mickaël Rémond
A vision for ejabberd - ejabberd SF MeetupA vision for ejabberd - ejabberd SF Meetup
A vision for ejabberd - ejabberd SF Meetup
Mickaël Rémond2.3K vues
Machine learning with Apache Spark MLlib | Big Data Hadoop Spark Tutorial | C... par CloudxLab
Machine learning with Apache Spark MLlib | Big Data Hadoop Spark Tutorial | C...Machine learning with Apache Spark MLlib | Big Data Hadoop Spark Tutorial | C...
Machine learning with Apache Spark MLlib | Big Data Hadoop Spark Tutorial | C...
CloudxLab555 vues
PMML - Predictive Model Markup Language par aguazzel
PMML - Predictive Model Markup LanguagePMML - Predictive Model Markup Language
PMML - Predictive Model Markup Language
aguazzel8.1K vues
Thomas Wolf "Transfer learning in NLP" par Fwdays
Thomas Wolf "Transfer learning in NLP"Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"
Fwdays2K vues

Dernier

CryptoBotsAI par
CryptoBotsAICryptoBotsAI
CryptoBotsAIchandureddyvadala199
42 vues5 diapositives
Why and How CloudStack at weSystems - Stephan Bienek - weSystems par
Why and How CloudStack at weSystems - Stephan Bienek - weSystemsWhy and How CloudStack at weSystems - Stephan Bienek - weSystems
Why and How CloudStack at weSystems - Stephan Bienek - weSystemsShapeBlue
247 vues13 diapositives
"Running students' code in isolation. The hard way", Yurii Holiuk par
"Running students' code in isolation. The hard way", Yurii Holiuk "Running students' code in isolation. The hard way", Yurii Holiuk
"Running students' code in isolation. The hard way", Yurii Holiuk Fwdays
36 vues34 diapositives
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ... par
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...ShapeBlue
171 vues28 diapositives
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT par
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBITUpdates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBITShapeBlue
208 vues8 diapositives
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti... par
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...ShapeBlue
141 vues29 diapositives

Dernier(20)

Why and How CloudStack at weSystems - Stephan Bienek - weSystems par ShapeBlue
Why and How CloudStack at weSystems - Stephan Bienek - weSystemsWhy and How CloudStack at weSystems - Stephan Bienek - weSystems
Why and How CloudStack at weSystems - Stephan Bienek - weSystems
ShapeBlue247 vues
"Running students' code in isolation. The hard way", Yurii Holiuk par Fwdays
"Running students' code in isolation. The hard way", Yurii Holiuk "Running students' code in isolation. The hard way", Yurii Holiuk
"Running students' code in isolation. The hard way", Yurii Holiuk
Fwdays36 vues
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ... par ShapeBlue
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
ShapeBlue171 vues
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT par ShapeBlue
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBITUpdates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
ShapeBlue208 vues
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti... par ShapeBlue
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
ShapeBlue141 vues
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda... par ShapeBlue
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
ShapeBlue164 vues
Business Analyst Series 2023 - Week 4 Session 7 par DianaGray10
Business Analyst Series 2023 -  Week 4 Session 7Business Analyst Series 2023 -  Week 4 Session 7
Business Analyst Series 2023 - Week 4 Session 7
DianaGray10146 vues
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ... par ShapeBlue
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
ShapeBlue120 vues
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue par ShapeBlue
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlueVNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue
ShapeBlue207 vues
State of the Union - Rohit Yadav - Apache CloudStack par ShapeBlue
State of the Union - Rohit Yadav - Apache CloudStackState of the Union - Rohit Yadav - Apache CloudStack
State of the Union - Rohit Yadav - Apache CloudStack
ShapeBlue303 vues
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue par ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlueCloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
ShapeBlue137 vues
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023 par BookNet Canada
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023Redefining the book supply chain: A glimpse into the future - Tech Forum 2023
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023
BookNet Canada44 vues
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue par ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlueWhat’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
ShapeBlue265 vues
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P... par ShapeBlue
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
ShapeBlue196 vues
Optimizing Communication to Optimize Human Behavior - LCBM par Yaman Kumar
Optimizing Communication to Optimize Human Behavior - LCBMOptimizing Communication to Optimize Human Behavior - LCBM
Optimizing Communication to Optimize Human Behavior - LCBM
Yaman Kumar38 vues
"Node.js Development in 2024: trends and tools", Nikita Galkin par Fwdays
"Node.js Development in 2024: trends and tools", Nikita Galkin "Node.js Development in 2024: trends and tools", Nikita Galkin
"Node.js Development in 2024: trends and tools", Nikita Galkin
Fwdays33 vues
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And... par ShapeBlue
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
ShapeBlue108 vues
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue par ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlueElevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
ShapeBlue224 vues

LLaMA 2.pptx

  • 1. Large Language Model Mata AI -LLaMA 2 © Rk Rahul
  • 2. LLaMA - Overview ● LLaMA is a family of large language models (LLMs) ● LLaMA has four model sizes were trained: 7, 13, 33 and 65 billion parameters ● LLaMA developed by Meta ● First released in February 2023.
  • 3. LLaMA 2 - Overview ● LLaMA 2 is a family of large language models (LLMs) ● LLaMA 2 is an auto-regressive language model ● First Release, July 18, 2023, in partnership with Microsoft, Meta and Open-source Large Language models. ● LLaMA 2 pretrained models are trained on 2 trillion tokens, and have double the context length than LLaMA 1 ● Three model sizes were trained: 7, 13,70 billion parameters ● LLaMA 2 is available for free for research and commercial use.
  • 4. LLaMA 2 – Can Do ● Generate different creative text formats of text content, like poems, code, scripts, musical pieces, email, letters, etc. ● Translate languages. ● Write different kinds of creative content. ● Answer your questions in an informative way, even if they are open ended, challenging, or strange. ● Help you with coding tasks. ● Generate dialogue for chatbots and other conversational AI systems.
  • 5. LLaMA 2 - Improvements ● Increased Training on Tokens: Llama 2 is trained on 40% more tokens. ● Longer Context Length: With a longer context length of 4k tokens. ● Fine-Tuning for Dialogues: The versions of Llama 2 that are fine- tuned (Labelled Llama 2-Chat) are aimed at being optimized for dialogue applications using Reinforcement Learning from Human Feedback (RLHF).
  • 6. Fine-Tuning Process and LLaMA-2-Chat
  • 7. Supervised Fine- Tuning LLaMA 2 Building Process 1 Pre-Training 2 3 Reinforcement Learning from Human Feedback (RLHF) 4 Reward Model
  • 8. LLaMA 2 Pre-Training ● The pretraining approach using an optimized auto-regressive transformer (several changes to improve performance) ● Also used grouped-query attention (GQA) (improve inference scalability) ● Trained on 2 trillion tokens of data for good performance. ● Model architecture uses standard transformer architecture. ● Pre-normalization using RMSNorm (Root Mean Square Layer Normalization)
  • 9. LLaMA 2 Pre-Training Normalization
  • 10. LLaMA 2 - Pretraining Functionality ● Trained using the AdamW optimizer (β1 = 0.9, β2 = 0.95, eps = 10−5 ) ● The SwiGLU activation function ● To use a cosine learning rate schedule (warmup of 2000 steps) and decay final learning rate. ● Weight decay of 0.1 and gradient clipping of 1.0
  • 11. LLaMA 2 - Training Hardware ● LLaMA 2 was pre-trained on Meta's Research Super Cluster (RSC) as well as internal production clusters. ● Both clusters use NVIDIA A100 GPUs. ● RSC use NVIDIA Quantum InfiniBand while production cluster is using a RoCE (RDMA over converged Ethernet)
  • 12. LLaMA 2 - Supervised Fine-Tuning (SFT) ● SFT is the technique of next-token prediction objective that is nearly identical to pre-training. ● To encode text for LLaMA 2 and using the method of the tokenizer. ● Supervised fine-tuning to use a cosine learning rate schedule with an initial learning rate of 2 × 𝟏𝟎−𝟓, a weight decay of 0.1, a batch size of 64, and a sequence length of 4096 tokens.
  • 13. LLaMA 2 - Tokenizer ● To encode text from SFT (LLaMA 2), the tokenizer first splits all numbers into individual digits. LLaMA 2 is a sub word language model, and it can learn to represent numbers using a small number of sub words. ● LLaMA 2 is a byte pair encoding (BPE) tokenizer based on the SentencePiece implementation. ● The total vocabulary size is 32k tokens.
  • 14. LLaMA 2 - Tokenizer
  • 15. LLaMA 2 - RLHF ● Reinforcement learning from human feedback (RLHF) is a model training procedure that is applied to a fine-tuned language model to further align model behavior with human preferences and instruction following. ● RLHF collects data that represents sampled human preferences, whereby human annotators select which directly from human feedback on the model’s output. ● Safety-based data collection during RLHF ● This human feedback is subsequently used to train a reward model, which learns patterns in the preferences of the human annotators and can then automate preference decisions.
  • 16. LLaMA 2 - Reward Model ● The reward model is responsible for telling the language model what constitutes a good response. Its response based on how helpful and safe it is. ● The reward model takes a model response and its corresponding as inputs and outputs a scalar score to indicate the quality of the model generation.
  • 17. LLaMA 2 - Model Evaluations
  • 18. Reference ● Deep (Learning) Focus - https://cameronrwolfe.substack.com/p/llama-2-from-the-ground-up ● Meta AI - https://ai.meta.com/ ● Research Article - Llama 2: Open Foundation and Fine-Tuned Chat Models