SlideShare a Scribd company logo
1 of 30
2022.01.07
MusicBERT:
Symbolic Music Understanding
with Large-Scale Pre-Training
Mingliang Zeng, Xu Tan, Rui Wang, Zeqian Ju, Tao Qin, Tie-Yan Liu
ACL 2021
Hyeshin Chu
Contents
β€’ Overview
β€’ Introduction
β€’ Related Work
β€’ Methodology
β€’ Experiments & Results
β€’ Conclusion
3
Overview
Suggest novel methods to apply NLP approaches to music domain
Introduce MusicBERT, a large-scale pretrained model for symbolic music understanding
Evaluate the performance on four tasks
4
Overview
Suggest novel methods to apply NLP approaches to music domain
Introduce MusicBERT, a large-scale pretrained model for symbolic music understanding
Evaluate the performance on four tasks
5
Contributions
Construct a large-scale symbolic music corpus
– Million MIDI Dataset(MMD)
Design some mechanisms to enhance
pre-training with symbolic music data
(OctupleMIDI Encoding & Masking Strategies)
Achieve the state-of-the-art results on
four music understanding tasks
: Melody Completion, Accompaniment Suggestion,
Genre Classification, and Style Classification
6
Related Work
Symbolic Music Understanding Symbolic Music Encoding
Masking Strategies in Pre-
training
Word2vec models for
music:
β€’ Huang et al., 2016
β€’ Madjiheurem et al.,
2016
Divide music pieces
 Fixed duration music
slices
β€’ Herremans et al., 2017
β€’ Chuan et al., 2020
Small NN models &
Only a few music tokens
as inputs
MIDI-based
β€’ MIDI
β€’ REMI (Huang and Yang,
2020)
β€’ CP (Hsiao et al., 2021)
Pianoroll-based
β€’ Brunner et al., 2018
β€’ Ji et al., 2020
Still need long input tokens
Application of masking
strategies for music
domain
β€’ MASS (Song et al., 2019)
β€’ SpanBERT (Joshi et al., 2020)
Not considering the
difference between
NLP & music
7
Model Overview
MusicBERT, a large scale Transformer model for symbolic music understanding
8
Model Overview
MusicBERT, a large scale Transformer model for symbolic music understanding
Based on Transformer encoder (Vaswani et al., 2017)
9
Model Overview
MusicBERT, a large scale Transformer model for symbolic music understanding
A novel encoding method, OctupleMIDI, to encode the music sequence more
efficiently
10
Model Overview
MusicBERT, a large scale Transformer model for symbolic music understanding
Predict music tokens as output
11
OctupleMIDI Encoding
Figure 2. Different encoding methods for symbolic music
12
OctupleMIDI Encoding
Previous MIDI-based representations: Still long for Transformer structure
(computation complexity & learning inefficiency)
13
OctupleMIDI Encoding
OctupleMIDI,
a compact symbolic music encoding method
β€’ Encode 6 notes into 6 tokens
β€’ Much shorter than REMI & CP
β€’ Apply to various kinds of music
14
OctupleMIDI Encoding
OctupleMIDI,
a compact symbolic music encoding method
β€’ Encode 6 notes into 6 tokens
β€’ Much shorter than REMI & CP
β€’ Apply to various kinds of music
15
OctupleMIDI Encoding
OctupleMIDI,
a compact symbolic music encoding method
β€’ Encode 6 notes into 6 tokens
β€’ Much shorter than REMI & CP
β€’ Apply to various kinds of music
Each Octuple token:
β€’ Correspon to a note
β€’ Contain 8 elements
16
OctupleMIDI Encoding
Time Signature
Tempo
Bar and Position
A fraction (e.g., 2/4):
β€’ Length of a beat (note duration  e.g., a quarter note in
2/4),
β€’ Number of beats in a bar (e.g., 2 beats in 2/4)
Beats per minute (BPM)
β€’ Pace of music
β€’ From 16 to 256 for OctupleMIDI
On-set time of a note
β€’ 256 bars in a music piece (0 to 255)
β€’ 1/64 note to represent the on-set time of a note (from
0)
17
OctupleMIDI Encoding
Instrument
Pitch
Duration
Velocity
Follow MIDI format
β€’ 129 tokens to represent instruments
β€’ 0 to 127: different general instruments (e.g., piano and
bass)
β€’ 128: special percussion instrument (e.g., drum)
Note pitches for general instruments
β€’ 128 tokens to represent pitch values (follow MIDI
format)
Note pitches for percussion instruments
β€’ 128 pitch tokens to represent percussion type
Note duration
β€’ 128 tokens (percussion: all set to 0)
Quantize the velocity of a note into 32 different values
β€’ Interval of 4 (e.g., 2, 6, 10, … , 122, 126)
18
Masking Strategy
Bar-level masking strategy:
Elements with the same type in the same bar & mask simulaneously
 Avoid information leakage & Learn the contextual representation well
19
Pre-training Corpus
Table 2. Size of different music datasets
OctupleMIDI encoding is universal
 Most MIDI files can be converted
without noticeable loss of musical
information
 Cleaning and deduplication
 Obtain Million-MIDI Dataset (MMD):
1.5 million songs with 2 billion octuple
tokens (musical notes)
20
Experiments & Results
Pre-training Setup Fine-tuning MusicBERT Method Analysis
Table 4. Model configurations of MusicBERT
Small MusicBERT
To compare with baselines (similar data
size)
Base MusicBERT
To achieve the SOTA results
21
Experiments & Results
Pre-training Setup Fine-tuning MusicBERT Method Analysis
Four downstream task
Melody Completion Genre & Style Classification
Accompaniment Suggestion
Table 3. Results of different models on the four downstream tasks
22
Experiments & Results
Pre-training Setup Fine-tuning MusicBERT Method Analysis
Four downstream task
Melody Completion Genre & Style Classification
Accompaniment Suggestion
Table 3. Results of different models
on the four downstream tasks
Task Find the most matched consecutive phrase
in a given set of candidates for a given melodic
phrase
Evaluation The rate of correctly chosen phrase
in the top k candidates
Best Performance π‘€π‘’π‘ π‘–π‘π΅πΈπ‘…π‘‡π‘ π‘šπ‘Žπ‘™π‘™ , π‘€π‘’π‘ π‘–π‘π΅πΈπ‘…π‘‡π‘π‘Žπ‘ π‘’
23
Experiments & Results
Pre-training Setup Fine-tuning MusicBERT Method Analysis
Four downstream task
Melody Completion Genre & Style Classification
Accompaniment Suggestion
Table 3. Results of different models
on the four downstream tasks
Task To find the most related accompaniment phrase
in a given set of harmonic phrase candidates for a
given melodic phrase
Evaluation The rate of correctly chosen phrase
in the top k candidates
Best Performance π‘€π‘’π‘ π‘–π‘π΅πΈπ‘…π‘‡π‘ π‘šπ‘Žπ‘™π‘™ , π‘€π‘’π‘ π‘–π‘π΅πΈπ‘…π‘‡π‘π‘Žπ‘ π‘’
24
Experiments & Results
Pre-training Setup Fine-tuning MusicBERT Method Analysis
Four downstream task
Melody Completion Genre & Style Classification
Accompaniment Suggestion
Table 3. Results of different models
on the four downstream tasks
Task To classify the genre and style
Dataset TOP-MAGD for genre, MASD for style
Evaluation F1-micro score
Best Performance π‘€π‘’π‘ π‘–π‘π΅πΈπ‘…π‘‡π‘ π‘šπ‘Žπ‘™π‘™ , π‘€π‘’π‘ π‘–π‘π΅πΈπ‘…π‘‡π‘π‘Žπ‘ π‘’
25
Experiments & Results
Pre-training Setup Fine-tuning MusicBERT Method Analysis
Experiment on π‘΄π’–π’”π’Šπ’„π‘©π‘¬π‘Ήπ‘»π’”π’Žπ’‚π’π’
Effectiveness of OctupleMIDI
Effectiveness of Bar-Level
Masking
Effectiveness of Pre-training
OctupleMIDI significantly outperforms REMI and CP
: Learn from a larger proportion of a music song
with the compact OctupleMIDI encoding
Table 5. Results of different encoding methods
26
Experiments & Results
Pre-training Setup Fine-tuning MusicBERT Method Analysis
Effectiveness of OctupleMIDI
Effectiveness of Bar-Level
Masking
Effectiveness of Pre-training
Experiment on π‘΄π’–π’”π’Šπ’„π‘©π‘¬π‘Ήπ‘»π’”π’Žπ’‚π’π’
Random Randomly masks the elements in the octuple
token
Octuple Randomly mask some octuple tokens
(mask all the elements in an octuple token)
Bar The elements with the same type in the same bar are
27
Experiments & Results
Pre-training Setup Fine-tuning MusicBERT Method Analysis
Effectiveness of OctupleMIDI
Effectiveness of Bar-Level
Masking
Effectiveness of Pre-training
Experiment on π‘΄π’–π’”π’Šπ’„π‘©π‘¬π‘Ήπ‘»π’”π’Žπ’‚π’π’
Pre-training is critical for symbolic music
understanding
28
Conclusion
Propose OctupleMIDI encoding & bar-level masking strategy for music
domain
Develop MusicBERT, a large-scale pre-trained model
for symbolic music understanding
Achieve state-of-the-art performance on
all four evaluated symbolic music understanding task
29
For my research
Acquire some baseline models & datasets to review
Understand new symbolic music representation method
Learn how to design experiments to measure each feature of a model
Thank you

More Related Content

Similar to MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training

A system to generate rhythms automatically for songs in rhythm game
A system to generate rhythms automatically for songs in rhythm gameA system to generate rhythms automatically for songs in rhythm game
A system to generate rhythms automatically for songs in rhythm game
Kuan Ting Chen
Β 

Similar to MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training (9)

PopMAG: Pop Music Accompaniment Generation
PopMAG: Pop Music Accompaniment GenerationPopMAG: Pop Music Accompaniment Generation
PopMAG: Pop Music Accompaniment Generation
Β 
MLConf2013: Teaching Computer to Listen to Music
MLConf2013: Teaching Computer to Listen to MusicMLConf2013: Teaching Computer to Listen to Music
MLConf2013: Teaching Computer to Listen to Music
Β 
Ml conf2013 teaching_computers_share
Ml conf2013 teaching_computers_shareMl conf2013 teaching_computers_share
Ml conf2013 teaching_computers_share
Β 
Learning to Groove with Inverse Sequence Transformations
Learning to Groove with Inverse Sequence TransformationsLearning to Groove with Inverse Sequence Transformations
Learning to Groove with Inverse Sequence Transformations
Β 
A system to generate rhythms automatically for songs in rhythm game
A system to generate rhythms automatically for songs in rhythm gameA system to generate rhythms automatically for songs in rhythm game
A system to generate rhythms automatically for songs in rhythm game
Β 
Two-step Melody Harmonious Generator
Two-step Melody Harmonious GeneratorTwo-step Melody Harmonious Generator
Two-step Melody Harmonious Generator
Β 
EEND-SS.pdf
EEND-SS.pdfEEND-SS.pdf
EEND-SS.pdf
Β 
Automatic Set List Identification and Song Segmentation of Full-Length Concer...
Automatic Set List Identification and Song Segmentation of Full-Length Concer...Automatic Set List Identification and Song Segmentation of Full-Length Concer...
Automatic Set List Identification and Song Segmentation of Full-Length Concer...
Β 
Music genre prediction
Music genre predictionMusic genre prediction
Music genre prediction
Β 

More from ivaderivader

Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
ivaderivader
Β 
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
ivaderivader
Β 

More from ivaderivader (20)

Argument Mining
Argument MiningArgument Mining
Argument Mining
Β 
Papers at CHI23
Papers at CHI23Papers at CHI23
Papers at CHI23
Β 
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph KernelsDDGK: Learning Graph Representations for Deep Divergence Graph Kernels
DDGK: Learning Graph Representations for Deep Divergence Graph Kernels
Β 
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
So Predictable! Continuous 3D Hand Trajectory Prediction in Virtual Reality
Β 
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
Reinforcement Learning-based Placement of Charging Stations in Urban Road Net...
Β 
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
Prediction for Retrospection: Integrating Algorithmic Stress Prediction into ...
Β 
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Orien...
Β 
A Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial NetworksA Style-Based Generator Architecture for Generative Adversarial Networks
A Style-Based Generator Architecture for Generative Adversarial Networks
Β 
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
CatchLIve: Real-time Summarization of Live Streams with Stream Content and In...
Β 
Perception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
Perception! Immersion! Empowerment! Superpowers as Inspiration for VisualizationPerception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
Perception! Immersion! Empowerment! Superpowers as Inspiration for Visualization
Β 
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
Learning to Remember Patterns: Pattern Matching Memory Networks for Traffic F...
Β 
Neural Approximate Dynamic Programming for On-Demand Ride-Pooling
Neural Approximate Dynamic Programming for On-Demand Ride-PoolingNeural Approximate Dynamic Programming for On-Demand Ride-Pooling
Neural Approximate Dynamic Programming for On-Demand Ride-Pooling
Β 
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
StoryMap: Using Social Modeling and Self-Modeling to Support Physical Activit...
Β 
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTubeBad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
Bad Breakdowns, Useful Seams, and Face Slapping: Analysis of VR Fails on YouTube
Β 
Invertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise RemovalInvertible Denoising Network: A Light Solution for Real Noise Removal
Invertible Denoising Network: A Light Solution for Real Noise Removal
Β 
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural NetworkTraffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
Traffic Demand Prediction Based Dynamic Transition Convolutional Neural Network
Β 
Screen2Vec: Semantic Embedding of GUI Screens and GUI Components
Screen2Vec: Semantic Embedding of GUI Screens and GUI ComponentsScreen2Vec: Semantic Embedding of GUI Screens and GUI Components
Screen2Vec: Semantic Embedding of GUI Screens and GUI Components
Β 
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...
Augmenting Decisions of Taxi Drivers through Reinforcement Learning for Impro...
Β 
Natural Language to Visualization by Neural Machine Translation
Natural Language to Visualization by Neural Machine TranslationNatural Language to Visualization by Neural Machine Translation
Natural Language to Visualization by Neural Machine Translation
Β 
Recommending What Video to Watch Next: A Multitask Ranking System
Recommending What Video to Watch Next: A Multitask Ranking SystemRecommending What Video to Watch Next: A Multitask Ranking System
Recommending What Video to Watch Next: A Multitask Ranking System
Β 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
Β 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
Β 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
Β 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
Β 

Recently uploaded (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
Β 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Β 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Β 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Β 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
Β 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
Β 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Β 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Β 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Β 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
Β 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
Β 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Β 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
Β 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
Β 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Β 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
Β 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Β 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Β 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Β 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
Β 

MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training

  • 1. 2022.01.07 MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training Mingliang Zeng, Xu Tan, Rui Wang, Zeqian Ju, Tao Qin, Tie-Yan Liu ACL 2021 Hyeshin Chu
  • 2. Contents β€’ Overview β€’ Introduction β€’ Related Work β€’ Methodology β€’ Experiments & Results β€’ Conclusion
  • 3. 3 Overview Suggest novel methods to apply NLP approaches to music domain Introduce MusicBERT, a large-scale pretrained model for symbolic music understanding Evaluate the performance on four tasks
  • 4. 4 Overview Suggest novel methods to apply NLP approaches to music domain Introduce MusicBERT, a large-scale pretrained model for symbolic music understanding Evaluate the performance on four tasks
  • 5. 5 Contributions Construct a large-scale symbolic music corpus – Million MIDI Dataset(MMD) Design some mechanisms to enhance pre-training with symbolic music data (OctupleMIDI Encoding & Masking Strategies) Achieve the state-of-the-art results on four music understanding tasks : Melody Completion, Accompaniment Suggestion, Genre Classification, and Style Classification
  • 6. 6 Related Work Symbolic Music Understanding Symbolic Music Encoding Masking Strategies in Pre- training Word2vec models for music: β€’ Huang et al., 2016 β€’ Madjiheurem et al., 2016 Divide music pieces  Fixed duration music slices β€’ Herremans et al., 2017 β€’ Chuan et al., 2020 Small NN models & Only a few music tokens as inputs MIDI-based β€’ MIDI β€’ REMI (Huang and Yang, 2020) β€’ CP (Hsiao et al., 2021) Pianoroll-based β€’ Brunner et al., 2018 β€’ Ji et al., 2020 Still need long input tokens Application of masking strategies for music domain β€’ MASS (Song et al., 2019) β€’ SpanBERT (Joshi et al., 2020) Not considering the difference between NLP & music
  • 7. 7 Model Overview MusicBERT, a large scale Transformer model for symbolic music understanding
  • 8. 8 Model Overview MusicBERT, a large scale Transformer model for symbolic music understanding Based on Transformer encoder (Vaswani et al., 2017)
  • 9. 9 Model Overview MusicBERT, a large scale Transformer model for symbolic music understanding A novel encoding method, OctupleMIDI, to encode the music sequence more efficiently
  • 10. 10 Model Overview MusicBERT, a large scale Transformer model for symbolic music understanding Predict music tokens as output
  • 11. 11 OctupleMIDI Encoding Figure 2. Different encoding methods for symbolic music
  • 12. 12 OctupleMIDI Encoding Previous MIDI-based representations: Still long for Transformer structure (computation complexity & learning inefficiency)
  • 13. 13 OctupleMIDI Encoding OctupleMIDI, a compact symbolic music encoding method β€’ Encode 6 notes into 6 tokens β€’ Much shorter than REMI & CP β€’ Apply to various kinds of music
  • 14. 14 OctupleMIDI Encoding OctupleMIDI, a compact symbolic music encoding method β€’ Encode 6 notes into 6 tokens β€’ Much shorter than REMI & CP β€’ Apply to various kinds of music
  • 15. 15 OctupleMIDI Encoding OctupleMIDI, a compact symbolic music encoding method β€’ Encode 6 notes into 6 tokens β€’ Much shorter than REMI & CP β€’ Apply to various kinds of music Each Octuple token: β€’ Correspon to a note β€’ Contain 8 elements
  • 16. 16 OctupleMIDI Encoding Time Signature Tempo Bar and Position A fraction (e.g., 2/4): β€’ Length of a beat (note duration  e.g., a quarter note in 2/4), β€’ Number of beats in a bar (e.g., 2 beats in 2/4) Beats per minute (BPM) β€’ Pace of music β€’ From 16 to 256 for OctupleMIDI On-set time of a note β€’ 256 bars in a music piece (0 to 255) β€’ 1/64 note to represent the on-set time of a note (from 0)
  • 17. 17 OctupleMIDI Encoding Instrument Pitch Duration Velocity Follow MIDI format β€’ 129 tokens to represent instruments β€’ 0 to 127: different general instruments (e.g., piano and bass) β€’ 128: special percussion instrument (e.g., drum) Note pitches for general instruments β€’ 128 tokens to represent pitch values (follow MIDI format) Note pitches for percussion instruments β€’ 128 pitch tokens to represent percussion type Note duration β€’ 128 tokens (percussion: all set to 0) Quantize the velocity of a note into 32 different values β€’ Interval of 4 (e.g., 2, 6, 10, … , 122, 126)
  • 18. 18 Masking Strategy Bar-level masking strategy: Elements with the same type in the same bar & mask simulaneously  Avoid information leakage & Learn the contextual representation well
  • 19. 19 Pre-training Corpus Table 2. Size of different music datasets OctupleMIDI encoding is universal  Most MIDI files can be converted without noticeable loss of musical information  Cleaning and deduplication  Obtain Million-MIDI Dataset (MMD): 1.5 million songs with 2 billion octuple tokens (musical notes)
  • 20. 20 Experiments & Results Pre-training Setup Fine-tuning MusicBERT Method Analysis Table 4. Model configurations of MusicBERT Small MusicBERT To compare with baselines (similar data size) Base MusicBERT To achieve the SOTA results
  • 21. 21 Experiments & Results Pre-training Setup Fine-tuning MusicBERT Method Analysis Four downstream task Melody Completion Genre & Style Classification Accompaniment Suggestion Table 3. Results of different models on the four downstream tasks
  • 22. 22 Experiments & Results Pre-training Setup Fine-tuning MusicBERT Method Analysis Four downstream task Melody Completion Genre & Style Classification Accompaniment Suggestion Table 3. Results of different models on the four downstream tasks Task Find the most matched consecutive phrase in a given set of candidates for a given melodic phrase Evaluation The rate of correctly chosen phrase in the top k candidates Best Performance π‘€π‘’π‘ π‘–π‘π΅πΈπ‘…π‘‡π‘ π‘šπ‘Žπ‘™π‘™ , π‘€π‘’π‘ π‘–π‘π΅πΈπ‘…π‘‡π‘π‘Žπ‘ π‘’
  • 23. 23 Experiments & Results Pre-training Setup Fine-tuning MusicBERT Method Analysis Four downstream task Melody Completion Genre & Style Classification Accompaniment Suggestion Table 3. Results of different models on the four downstream tasks Task To find the most related accompaniment phrase in a given set of harmonic phrase candidates for a given melodic phrase Evaluation The rate of correctly chosen phrase in the top k candidates Best Performance π‘€π‘’π‘ π‘–π‘π΅πΈπ‘…π‘‡π‘ π‘šπ‘Žπ‘™π‘™ , π‘€π‘’π‘ π‘–π‘π΅πΈπ‘…π‘‡π‘π‘Žπ‘ π‘’
  • 24. 24 Experiments & Results Pre-training Setup Fine-tuning MusicBERT Method Analysis Four downstream task Melody Completion Genre & Style Classification Accompaniment Suggestion Table 3. Results of different models on the four downstream tasks Task To classify the genre and style Dataset TOP-MAGD for genre, MASD for style Evaluation F1-micro score Best Performance π‘€π‘’π‘ π‘–π‘π΅πΈπ‘…π‘‡π‘ π‘šπ‘Žπ‘™π‘™ , π‘€π‘’π‘ π‘–π‘π΅πΈπ‘…π‘‡π‘π‘Žπ‘ π‘’
  • 25. 25 Experiments & Results Pre-training Setup Fine-tuning MusicBERT Method Analysis Experiment on π‘΄π’–π’”π’Šπ’„π‘©π‘¬π‘Ήπ‘»π’”π’Žπ’‚π’π’ Effectiveness of OctupleMIDI Effectiveness of Bar-Level Masking Effectiveness of Pre-training OctupleMIDI significantly outperforms REMI and CP : Learn from a larger proportion of a music song with the compact OctupleMIDI encoding Table 5. Results of different encoding methods
  • 26. 26 Experiments & Results Pre-training Setup Fine-tuning MusicBERT Method Analysis Effectiveness of OctupleMIDI Effectiveness of Bar-Level Masking Effectiveness of Pre-training Experiment on π‘΄π’–π’”π’Šπ’„π‘©π‘¬π‘Ήπ‘»π’”π’Žπ’‚π’π’ Random Randomly masks the elements in the octuple token Octuple Randomly mask some octuple tokens (mask all the elements in an octuple token) Bar The elements with the same type in the same bar are
  • 27. 27 Experiments & Results Pre-training Setup Fine-tuning MusicBERT Method Analysis Effectiveness of OctupleMIDI Effectiveness of Bar-Level Masking Effectiveness of Pre-training Experiment on π‘΄π’–π’”π’Šπ’„π‘©π‘¬π‘Ήπ‘»π’”π’Žπ’‚π’π’ Pre-training is critical for symbolic music understanding
  • 28. 28 Conclusion Propose OctupleMIDI encoding & bar-level masking strategy for music domain Develop MusicBERT, a large-scale pre-trained model for symbolic music understanding Achieve state-of-the-art performance on all four evaluated symbolic music understanding task
  • 29. 29 For my research Acquire some baseline models & datasets to review Understand new symbolic music representation method Learn how to design experiments to measure each feature of a model