Optimizing BERT and Natural Language Models with SigOpt Experiment Management

•

0 j'aime•226 vues

SigOpt Machine Learning Engineer Meghana Ravikumar explains how she reduced the size of a BERT natural language model trained on the SQUAD 2.0 question-answer database, to reduce its size while maintaining performance using a "distillation" process optimized with SigOpt's Experiment Management functionality.

Technologie

SigOpt. Conﬁdential.
Why is Experiment Management
important for NLP?

SigOpt. Conﬁdential.
First, a quick overview of the
problem

SigOpt. Conﬁdential.
Two main questions
3
Can we understand the
trade-offs made during
model compression?
Can we find a model
architecture that fits our
needs?

SigOpt. Conﬁdential.
The Data: SQUAD 2.0
4
SQUAD 2.0

SigOpt. Conﬁdential.
Distilling BERT for Question Answering
5
BERT
Pre-trained for language
modeling
Student Model
SQUAD 2.0
SQUAD 2.0
Soft
target
loss
Hard
target
loss
BERT
Fine-tuned for SQUAD 2.0
Trained Student Model
For more on distillation: Hinton et al 2015, DistilBERT

SigOpt. Conﬁdential.
Multimetric Bayesian Optimization
Optimizing for two competing metrics
6
SigOpt’s Multimetric Optimization

SigOpt. Conﬁdential.
Experiment Management was critical
throughout this process

SigOpt. Conﬁdential.
Experiment management was critical for
8
Model Development
Understanding the Problem
Space
Monitoring Long Cycles

SigOpt. Conﬁdential.
Establishing a Baseline
10
BERT
Pre-trained for language
modeling
Student Model
SQUAD 2.0
SQUAD 2.0
Soft
target
loss
Hard
target
loss
BERT
Fine-tuned for SQUAD 2.0
Trained Student Model
?
?
?
? ?

SigOpt. Conﬁdential.
Establishing a Baseline: Training from scratch
11
BERT
Pre-trained for language
modeling
DistilBERT
SQUAD 2.0
SQUAD 2.0
Standard
soft target
loss
Standard
hard
target
loss
BERT
Fine-tuned for SQUAD 2.0
Trained Model
? ?
?
?
?

SigOpt. Conﬁdential.
Baseline #1: Trained from scratch
Dashboard link

SigOpt. Conﬁdential.
Establishing a Baseline: Warm starting the model
13
BERT
Pre-trained for language
modeling
DistilBERT
SQUAD 2.0
SQUAD 2.0
Standard soft
target loss
Standard
Hard target
loss
BERT
Fine-tuned for SQUAD 2.0
Trained Model
DistilBERT
Pre-trained for language
modeling
Pretrained Weights
?
?

SigOpt. Conﬁdential.
Baseline #2: Pretrained weights
Dashboard link

SigOpt. Conﬁdential.
Understanding the problem space

SigOpt. Conﬁdential.
Running HPO to understand the problem space
16

SigOpt. Conﬁdential.
Let’s take a look at the experiment
dashboard

SigOpt. Conﬁdential.
Correlations in the parameter space
18

SigOpt. Conﬁdential.
Exploring speciﬁc parameter areas
19
Runs dashboard

SigOpt. Conﬁdential.
Taking data properties into account
20

SigOpt. Conﬁdential.
Providing feedback to the optimizer
21

SigOpt. Conﬁdential.
Monitoring the full experiment

SigOpt. Conﬁdential.
Monitoring the full experiment
23
Run dashboard

SigOpt. Conﬁdential.
SigOpt found dozens of viable models
24
Baseline Exact
Baseline
Size
Metric Threshold

SigOpt. Conﬁdential.
How did experiment management
help throughout my process?

SigOpt. Conﬁdential.26
Model Development
Understanding the
Problem Space
Monitoring Long Cycles
Experiment Validation
Experiment design and
exploring the
parameter space
Tracking and
Debugging

SigOpt. Conﬁdential.
So why Experiment Management?
27

SigOpt. Conﬁdential.
Check out our
YouTube channel:
Learn more about SigOpt
Read our research and product blog.
See more videos here.
Sign up to try out SigOpt
for free!
Join the Experiment Management
beta
Click Here
Read the full work on Nvidia’s
dev blog

Recommandé

Efficient NLP by Distilling BERT and Multimetric OptimizationSigOpt

Tuning 2.0: Advanced Optimization Techniques WebinarSigOpt

“An Industry Standard Performance Benchmark Suite for Machine Learning,” a Pr...Edge AI and Vision Alliance

PF_MAO2010 SoumaSouma Chowdhury

Speeding up your team with GitOpsBrice Fernandes

Unlocking the Power of Integer ProgrammingFlorian Wilhelm

PF_MAO_2010_SouamMDO_Lab

The Moldable Debugger, SLE 2014Andrei Chiș

Recommandé

Efficient NLP by Distilling BERT and Multimetric OptimizationSigOpt

Tuning 2.0: Advanced Optimization Techniques WebinarSigOpt

“An Industry Standard Performance Benchmark Suite for Machine Learning,” a Pr...Edge AI and Vision Alliance

PF_MAO2010 SoumaSouma Chowdhury

Speeding up your team with GitOpsBrice Fernandes

Unlocking the Power of Integer ProgrammingFlorian Wilhelm

PF_MAO_2010_SouamMDO_Lab

The Moldable Debugger, SLE 2014Andrei Chiș

Sattose 2020 presentationCéline Deknop

Deep MIML NetworkSaad Elbeleidy

Introduction To Git For Version Control Architecture And Common Commands Comp...SlideTeam

TC39: How we work, what we are working on, and how you can get involved (dotJ...Igalia

Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hab...Ricardo Guerrero Gómez-Olmedo

Delivering Quality at Speed with GitOpsWeaveworks

Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Intel® Software

SOLID Design principlesMohamed Sanaulla

Fine tuning large LMsSylvainGugger

Azure Industrial Iot EdgeRiccardo Zamana

Large Language Models for Test Case Evolution and RepairLionel Briand

“Powering the Connected Intelligent Edge and the Future of On-Device AI,” a P...Edge AI and Vision Alliance

What is the best approach to tddRed Hat

Gitops Hands OnBrice Fernandes

BBD Hands-on with Python. Practical Hands-on Workshop about "Behaviour Driven...Hemmerling

Oh the compilers you'll buildMark Stoodley

Using Diversity for Automated Boundary Value TestingFelix Dobslaw

Easydd programTaha Sochi

BigData HUB WorkshopAhmed Salman

Squeezing Blood From a Stone V1.2Jen Costillo

Experiment Management for the EnterpriseSigOpt

Detecting COVID-19 Cases with Deep LearningSigOpt

Contenu connexe

Similaire à Optimizing BERT and Natural Language Models with SigOpt Experiment Management

Sattose 2020 presentationCéline Deknop

Deep MIML NetworkSaad Elbeleidy

Introduction To Git For Version Control Architecture And Common Commands Comp...SlideTeam

TC39: How we work, what we are working on, and how you can get involved (dotJ...Igalia

Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hab...Ricardo Guerrero Gómez-Olmedo

Delivering Quality at Speed with GitOpsWeaveworks

Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Intel® Software

SOLID Design principlesMohamed Sanaulla

Fine tuning large LMsSylvainGugger

Azure Industrial Iot EdgeRiccardo Zamana

Large Language Models for Test Case Evolution and RepairLionel Briand

“Powering the Connected Intelligent Edge and the Future of On-Device AI,” a P...Edge AI and Vision Alliance

What is the best approach to tddRed Hat

Gitops Hands OnBrice Fernandes

BBD Hands-on with Python. Practical Hands-on Workshop about "Behaviour Driven...Hemmerling

Oh the compilers you'll buildMark Stoodley

Using Diversity for Automated Boundary Value TestingFelix Dobslaw

Easydd programTaha Sochi

BigData HUB WorkshopAhmed Salman

Squeezing Blood From a Stone V1.2Jen Costillo

Similaire à Optimizing BERT and Natural Language Models with SigOpt Experiment Management (20)

Sattose 2020 presentation

Deep MIML Network

Introduction To Git For Version Control Architecture And Common Commands Comp...

TC39: How we work, what we are working on, and how you can get involved (dotJ...

Meetup Python Madrid 2018: ¿Segmentación semántica? ¿Pero de qué me estás hab...

Delivering Quality at Speed with GitOps

Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...

SOLID Design principles

Fine tuning large LMs

Azure Industrial Iot Edge

Large Language Models for Test Case Evolution and Repair

“Powering the Connected Intelligent Edge and the Future of On-Device AI,” a P...

What is the best approach to tdd

Gitops Hands On

BBD Hands-on with Python. Practical Hands-on Workshop about "Behaviour Driven...

Oh the compilers you'll build

Using Diversity for Automated Boundary Value Testing

Easydd program

BigData HUB Workshop

Squeezing Blood From a Stone V1.2

Plus de SigOpt

Experiment Management for the EnterpriseSigOpt

Detecting COVID-19 Cases with Deep LearningSigOpt

Metric Management: a SigOpt Applied Use CaseSigOpt

Tuning for Systematic Trading: Talk 3: Training, Tuning, and Metric StrategySigOpt

Tuning for Systematic Trading: Talk 2: Deep LearningSigOpt

Tuning for Systematic Trading: Talk 1SigOpt

Tuning Data Augmentation to Boost Model PerformanceSigOpt

Advanced Optimization for the Enterprise WebinarSigOpt

Modeling at Scale: SigOpt at TWIMLcon 2019SigOpt

SigOpt at Ai4 Finance—Modeling at Scale SigOpt

Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...SigOpt

Machine Learning InfrastructureSigOpt

SigOpt at Uber Science Symposium - Exploring the spectrum of black-box optimi...SigOpt

SigOpt at O'Reilly - Best Practices for Scaling Modeling PlatformsSigOpt

SigOpt at GTC - Tuning the UntunableSigOpt

SigOpt at GTC - Reducing operational barriers to optimizationSigOpt

Lessons for an enterprise approach to modeling at scaleSigOpt

Modeling at scale in systematic tradingSigOpt

SigOpt at MLconf - Reducing Operational Barriers to Model TrainingSigOpt

Machine Learning InfrastructureSigOpt

Plus de SigOpt (20)

Experiment Management for the Enterprise

Detecting COVID-19 Cases with Deep Learning

Metric Management: a SigOpt Applied Use Case

Tuning for Systematic Trading: Talk 3: Training, Tuning, and Metric Strategy

Tuning for Systematic Trading: Talk 2: Deep Learning

Tuning for Systematic Trading: Talk 1

Tuning Data Augmentation to Boost Model Performance

Advanced Optimization for the Enterprise Webinar

Modeling at Scale: SigOpt at TWIMLcon 2019

SigOpt at Ai4 Finance—Modeling at Scale

Interactive Tradeoffs Between Competing Offline Metrics with Bayesian Optimiz...

Machine Learning Infrastructure

SigOpt at Uber Science Symposium - Exploring the spectrum of black-box optimi...

SigOpt at O'Reilly - Best Practices for Scaling Modeling Platforms

SigOpt at GTC - Tuning the Untunable

SigOpt at GTC - Reducing operational barriers to optimization

Lessons for an enterprise approach to modeling at scale

Modeling at scale in systematic trading

SigOpt at MLconf - Reducing Operational Barriers to Model Training

Machine Learning Infrastructure

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software

[BuildWithAI] Introduction to Gemini.pdfSandro Moreira

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz

Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services

Exploring Multimodal Embeddings with MilvusZilliz

Corporate and higher education May webinar.pptxRustici Software

MS Copilot expands with MS Graph connectorsNanddeep Nachan

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea

Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea

Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede

Apidays New York 2024 - The value of a flexible API Management solution for O...apidays

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot

Why Teams call analytics are critical to your entire businesspanagenda

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

Dernier (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

[BuildWithAI] Introduction to Gemini.pdf

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

Strategies for Landing an Oracle DBA Job as a Fresher

Exploring Multimodal Embeddings with Milvus

Corporate and higher education May webinar.pptx

MS Copilot expands with MS Graph connectors

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024

Finding Java's Hidden Performance Traps @ DevoxxUK 2024

Spring Boot vs Quarkus the ultimate battle - DevoxxUK

Apidays New York 2024 - The value of a flexible API Management solution for O...

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

Why Teams call analytics are critical to your entire business

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Boost Fertility New Invention Ups Success Rates.pdf

Optimizing BERT and Natural Language Models with SigOpt Experiment Management

1. SigOpt. Conﬁdential. Why is Experiment Management important for NLP?

2. SigOpt. Conﬁdential. First, a quick overview of the problem

3. SigOpt. Conﬁdential. Two main questions 3 Can we understand the trade-offs made during model compression? Can we find a model architecture that fits our needs?

4. SigOpt. Conﬁdential. The Data: SQUAD 2.0 4 SQUAD 2.0

5. SigOpt. Conﬁdential. Distilling BERT for Question Answering 5 BERT Pre-trained for language modeling Student Model SQUAD 2.0 SQUAD 2.0 Soft target loss Hard target loss BERT Fine-tuned for SQUAD 2.0 Trained Student Model For more on distillation: Hinton et al 2015, DistilBERT

6. SigOpt. Conﬁdential. Multimetric Bayesian Optimization Optimizing for two competing metrics 6 SigOpt’s Multimetric Optimization

7. SigOpt. Conﬁdential. Experiment Management was critical throughout this process

8. SigOpt. Conﬁdential. Experiment management was critical for 8 Model Development Understanding the Problem Space Monitoring Long Cycles

9. SigOpt. Conﬁdential. Model Development

10. SigOpt. Conﬁdential. Establishing a Baseline 10 BERT Pre-trained for language modeling Student Model SQUAD 2.0 SQUAD 2.0 Soft target loss Hard target loss BERT Fine-tuned for SQUAD 2.0 Trained Student Model ? ? ? ? ?

11. SigOpt. Conﬁdential. Establishing a Baseline: Training from scratch 11 BERT Pre-trained for language modeling DistilBERT SQUAD 2.0 SQUAD 2.0 Standard soft target loss Standard hard target loss BERT Fine-tuned for SQUAD 2.0 Trained Model ? ? ? ? ?

12. SigOpt. Conﬁdential. Baseline #1: Trained from scratch Dashboard link

13. SigOpt. Conﬁdential. Establishing a Baseline: Warm starting the model 13 BERT Pre-trained for language modeling DistilBERT SQUAD 2.0 SQUAD 2.0 Standard soft target loss Standard Hard target loss BERT Fine-tuned for SQUAD 2.0 Trained Model DistilBERT Pre-trained for language modeling Pretrained Weights ? ?

14. SigOpt. Conﬁdential. Baseline #2: Pretrained weights Dashboard link

15. SigOpt. Conﬁdential. Understanding the problem space

16. SigOpt. Conﬁdential. Running HPO to understand the problem space 16

17. SigOpt. Conﬁdential. Let’s take a look at the experiment dashboard

18. SigOpt. Conﬁdential. Correlations in the parameter space 18

19. SigOpt. Conﬁdential. Exploring speciﬁc parameter areas 19 Runs dashboard

20. SigOpt. Conﬁdential. Taking data properties into account 20

21. SigOpt. Conﬁdential. Providing feedback to the optimizer 21

22. SigOpt. Conﬁdential. Monitoring the full experiment

23. SigOpt. Conﬁdential. Monitoring the full experiment 23 Run dashboard

24. SigOpt. Conﬁdential. SigOpt found dozens of viable models 24 Baseline Exact Baseline Size Metric Threshold

25. SigOpt. Conﬁdential. How did experiment management help throughout my process?

26. SigOpt. Conﬁdential.26 Model Development Understanding the Problem Space Monitoring Long Cycles Experiment Validation Experiment design and exploring the parameter space Tracking and Debugging

27. SigOpt. Conﬁdential. So why Experiment Management? 27

28. SigOpt. Conﬁdential. Check out our YouTube channel: Learn more about SigOpt Read our research and product blog. See more videos here. Sign up to try out SigOpt for free! Join the Experiment Management beta Click Here Read the full work on Nvidia’s dev blog

29. SigOpt. Conﬁdential. Thank you!