SlideShare une entreprise Scribd logo
1  sur  66
Télécharger pour lire hors ligne
Machine Learning for
Creative AI Applications
in Music
Music and AI Lab,
Research Center for IT Innovation,
Academia Sinica
Yi-Hsuan Yang Ph.D.
http://www.citi.sinica.edu.tw/pages/yang/
yang@citi.sinica.edu.tw
About Us
• Academia Sinica
 National academy of Taiwan, founded in 1928
 About 1,000 Full/Associate/Assistant Researchers
 Located at Nangang District, Taipei City
• Music and AI Lab (musicai)
Members: Research Assistants, (co-advised) PhD/master
students
Application-oriented research: machine learning & music
2
3
Music transcription (audio2score)
• audio → note (pitch, onset, offset)
• audio → instrument (flute, cello)
• audio → meter (4/4)
• audio → key (E-flat major)
audio score
ML in Music: “Music Info Retrieval/Analysis”
(existing
song)
ML in Music: “Music Info Retrieval/Analysis”
4
Music transcription (audio2score)
• audio → note (pitch, onset, offset)
• audio → instrument (flute, cello)
• audio → meter (4/4)
• audio → key (E-flat major)
audio score
Music semantic labeling
• audio → genre (classical)
• audio → emotion (yearning)
• audio → other attributes (slow/fast)
labels
applications in
music retrieval,
education,
archival, etc
(existing
song)
AI listener
Music transcription (audio2score)
• audio → note (pitch, onset, offset)
• audio → instrument (flute, cello)
• audio → meter (4/4)
• audio → key (E-flat major)
ML in Music: “Music Generation/Synthesis”
5
audio score
Music semantic labeling
• audio → genre (classical)
• audio → emotion (yearning)
• audio → other attributes (slow/fast)
labels
(new
song)
AI composer
random seed
AI performer (score2audio)
Music transcription (audio2score)
• audio → note (pitch, onset, offset)
• audio → instrument (flute, cello)
• audio → meter (4/4)
• audio → key (E-flat major)
ML in Music: “Music Generation/Synthesis”
6
audio
features
Music semantic labeling
• audio → genre (classical)
• audio → emotion (yearning)
• audio → other attributes (slow/fast)
labels
(existing
songs)
AI listener
score
AI DJ
audio
(a new
song)
remix, mashup, etc
(image from the Internet)
Recap
• ML in Music
 Music information retrieval/analysis
 AI listener
 Music transcription (audio → score)
 Music semantic labeling (audio → label)
 For analyzing and indexing existing songs
 Music generation/synthesis
 AI composer (random seed → score)
 AI performer (score → audio)
 AI DJ (exis ng songs → new song)
 For creating new music
7
ML for Creative AI Applications in Music
• AI Listener
• AI DJ
• AI Composer
• AI Performer
8
AI Listener: Source Separation
• “Demix” the music signal
 input: audio mixture
output: individual tracks
9
(image from the Internet)
Our Result
http://ss.ciaua.com/
ICMLA’18aICMLA’18a
Denoising auto-encoder with recurrent skip
connections and residual regression for
music source separation
12
AI Listener: Source Separation
• “Demix” the music signal
• Application
 Music production, DJ related skills
 Singing voice processing, karaoke, soundtracks for movies
 Smart headset, smart loudspeaker
 Education
• Extension
Multi-instrument separation, speech voice separation
Melody extraction, beat estimation
13
Algorithm 1/4: Background
• Autoencoder (AE)
• Denoising autoencoder (DAE)
14
(image from the Internet)
Algorithm 2/4: Main Idea
mixture vocal
mixture drum
mixture others
• Training data
 Demixing Secrets Dataset (DSD): 100 Western pop
songs with multi-track version (vocals, drums, bass, others)
 No Chinese or Japanese pop songs at all in DSD
15
DAE1
DAE2
DAE3
Algorithm 3/4: Input, Output
16
Algorithm 4/4: Architecture
• U-net
 Encoder: Conv2D
 Decoder: Deconv2D
 Skip connections
• allows low-level
information to flow
directly from the high
resolution input to the
high-resolution output
(at the corresponding
hierarchy)
17
[1] “U-Net: Convolutional networks for biomedical image segmentation,” arXiv 2015
[2] “Singing voice separation with deep u-net convolutional networks,” ISMIR 2017
(figure from [2])
Evaluation Campaign: SiSEC 2018
18
Ours
Sony
Oracle
Ours
The “Sony” guys:
Naoya Takahashi,
Nabarun Goswami,
Yuki Mitsufuji
Stefan Uhlich
et al.
AI Listener: Sound Event Detection
• Applications
Surveillance
Self-driven car
Industry 4.0
Healthcare
AIoT
Smart city
• Strength
Sound (ears) is comple-
mentary to vision (eyes)
Can work well even under
a dim environment, or
when the event is at a
distance from the camera
19
20
Sound Event Detection
• Competitor: Samsung Galaxy S7
• Link: https://youtu.be/4fhJp3tIptI
IJCAI’18IJCAI’18
Learning to recognize transient sound
events using attentional supervision
ML for Creative AI Applications in Music
• AI Listener
• AI DJ
• AI Composer
• AI Performer
21
audio
features
(existing
songs)
AI listener AI DJ
audio
(a new
song)
AI DJ
• Smart speaker + recommendation + DJ skills
22
DJ Skill #1: Music Thumbnailing
• Extract music highlights
• Application: music browsing, ringtone generation
• Related papers published by NAVER Corp
 “Automatic DJ mix generation using highlight
detection,” ISMIR 2017
 “Automatic music highlight extraction using
convolutional recurrent attention networks,”
arxiv 1712.05901
23
↙30 sec highlight
“A song”
Algorithm
• CNN for emotion prediction + attention
(predicting weights of different parts of a song)
• Transfer learning: no need of structural (chorus) labels
24
TISMIR’18TISMIR’18
Pop music highlighter:
Marking the emotion
keypoints
Open source!
https://remyhuang.github.io/music_thumbnailing/
Demo
25
周杰倫 - 稻香
光良 - 童話
胡夏 - 那些年
Linkin Park - Burn It Down
Adam Lambert - Whataya Want from Me
DJ Skill #2: Music Sequencing
• Find an ordering of music pieces
28
“Automatic playlist sequencing and transitions,” Proc. ISMIR 2017 (from )
DJ Skill #2: Music Sequencing
• Find an ordering of music pieces
 “Automatic playlist sequencing and transitions,”
Proc. ISMIR 2017 (from )
 “Generating music medleys via playing music puzzle games by
unsupervised similarity embedding,” (from MAC Lab)
29
https://remyhuang.github.io/
► Demo:
Algorithm 1/2: Music Puzzle Games
• Divide a song into non-overlapping chunks
• Learn to order them by a Siamese CNN network
 Positive pair: R1R2, R2R3
 Negative pair: R2R1, R3R2, R1R3, R3R1
 Unsupervised (self-supervised) learning
30
AAAI’18aAAAI’18a
Generating music
medleys via playing
music puzzle games
Algorithm 2/2: Similarity Embedding Net
• Divide a song into non-overlapping chunks
• Siamese CNN + similarity embedding
[a b c d], [a b c d],
[a b c d] [d c a b]
31
AAAI’18aAAAI’18a
Generating music
medleys via playing
music puzzle gameshttps://remyhuang.github.io/► Demo:
Open source!
Result
• For 8-pieces puzzle games, our model
reaches 99.0% pairwise accuracy (PA)
and 96.1% overall accuracy (OA)
• For medley, we have 75.0% OA
33
(our method)
(baseline 1)
(baseline 2)
DJ Skill #3: Music Mash-up
34
ML for Creative AI Applications in Music
• AI Listener
• AI DJ
• AI Composer
• AI Performer
41
AI Composer
42
IBM Waston Beats Sony ->
“Create unique, royalty-free soundtracks
for your videos”
AI Composer
• Create music
• Why?
Make musician’s life easier
Create copyright-free music (for films, Ads, games)
Classic AI problem
43
Eminem - When I'm Gone
Our Research on “AI Composer”
• Collaboration with musicians & producers from KKBOX
• Projects
 MidiNet: melody generation (ISMIR’17) --- already cited by 53
 Melody harmonization
 Lead sheet generation
 Drum VAE
 MuseGAN: multitrack music generation (AAAI’18) --- 206 stars
 Multi-track music generation using binary neurons (ISMIR’18)
 Lead sheet arrangement and interpolation (ICMLA’18)
 Automatic instrumentation arrangement
 Emotion-based music generation
44
Lead Sheet Generation
• Lead sheet
 melody
 chord
• Given chord, generate melody
• Given melody, generate chord (a.k.a., harmonization)
• Or, from scratch
45
Melody Generation by RNN
Google
MelodyRNN
C-RNN-
GAN
Song from PI DeepBach
Google
WaveNet
core model RNN RNN RNN RNN CNN
data type symbolic symbolic symbolic symbolic audio
genre specificity ─ ─ ─
Bach
chorale
─
mandatory prior
knowledge
priming
melody
─
music scale &
melody profile
melody of
one part
priming
wave
follow a priming
melody
V V V
follow a chord
sequence
generate from
scratch
V
generate multi-
part music
V V V
open source V V
47
Melody Generation by CNN+GAN
Google
MelodyRNN
MidiNet
Google
WaveNet
core model RNN CNN CNN
data type symbolic symbolic audio
genre specificity ─
─
─
mandatory prior
knowledge
priming
melody
─
priming
wave
follow a priming
melody
V V V
follow a chord
sequence
V
generate from
scratch
V
generate multi-
part music
V V
open source V V
48
• By Google
• RNN
• Trained with
thousands of
melodies
• By MAC Lab
• CNN
• 526 tabs
(4,208 bars)
• One GPU
(GTX 1080)
• <30 mins
ISMIR’17ISMIR’17
MidiNet
Algorithm: Desired Output
• Generate the melody of a bar at a time
• Use a matrix to represent the music of a bar
• Condition on the previous bar (the history)
49
96 time steps (current bar)
84notes
(next bar)(previous bar)
Algorithm: Main Idea
50
• Generative adversarial nets (GAN)
Discriminator: tell real from fake
Generator: fool the discriminator
• Generate from scratch
real or
fake?
Algorithm: Main Idea
51
• Generative adversarial nets (GAN)
Discriminator: tell real from fake
Generator: fool the discriminator
• Generate from scratch
• Or, given chord, generate melody
real or
fake?
Algorithm: Temporal Model
52
• Conditioner: provide 2-D conditions
 use the same filter shapes as the generator CNN
 so that their intermediate outputs are “compatible”
real or
fake?
real or
fake?
Algorithm
53
• Generative adversarial nets (GAN)
• Don’t know what the “desired output” should be
(for example, what should be played next)
Only know whether it “sounds like real”
It learns the mapping between two spaces
MidiNet: Examples
• Variants of MidiNet
54
1 2 3
• Google Magenta
vs. MidiNet
• With drums
Lead Sheet Generation & Interpolation
• Given a reference melody, generate variations of the
melody + chords
• Lead sheet interpolation
http://vibertthio.com/leadsheet-vae-client/
58
UnpublishedUnpublished
Amazing Grace
original melody
Amazing Grace
+chord (version 1)
Amazing Grace
+chord (version 2)
Drum VAE
• http://vibertthio.com/drum-vae-client/v3/
59
UnpublishedUnpublished
Multi-track Generation
• Multi-track
(not only melody
and chord)
60
Begin Again (2013)
https://salu133445.github.io/musegan/► Demo:
AAAI’18bAAAI’18b
MuseGAN
Algorithm: Data
• LPD dataset: 128K MIDIs (piano-rolls) from LMD
61
http://colinraffel.com/projects/lmd/
https://salu133445.github.io/musegan/dataset
Algorithm: Intra- & Inter-tracks
• Multi-track
piano, guitar, bass,
strings, drums
• Hybrid model
 one “shared” (inter) z
 five “private” (intra) zi
 five generators
 one discriminator
62
Algorithm: Temporal Model
63
(a) generation from scratch
(b) track-conditional generation
Algorithm: Combined Model
64
generation from scratch
Algorithm: WGAN-gp
• Beginning – 500:1 D/G updates
• Later – 5:1 D/G updates
• Training time:
<24 hours
65
NegativeDloss
Algorithm: WGAN-gp
• Beginning – 500:1 D/G updates
• Later – 5:1 D/G updates
• Training time:
<24 hours
66
NegativeDloss
Lead Sheet Arrangement
• RNN-based lead sheet generator
+ chord feature extraction
+ chord-conditioned multi-track generation
72
ICMLA’18bICMLA’18b
Lead sheet generation and
arrangement by conditional
generative adversarial
network
Instrumentation Generation
• Pitch/timbre disentanglement
Zt: timbre code
Skip connections: pitch information
74
arxiv: 1811.03271arxiv: 1811.03271
Learning disentangled
representations for timber
and pitch in music audio
Instrumentation Generation
75
arxiv: 1811.03271arxiv: 1811.03271
AI Composer: Open Source Code
• MidiNet:
https://github.com/RichardYang40148/MidiNet
• DrumVAE:
https://github.com/vibertthio/drum_vae_server
• MuseGAN: https://github.com/salu133445/musegan
• BMuseGAN: https://salu133445.github.io/bmusegan/
• Pypianoroll:
https://github.com/salu133445/pypianoroll
• Lead sheet arrangement:
https://github.com/liuhaumin/LeadsheetArrangement
76
Music and AI
• AI Listener
• AI DJ
• AI Composer
• AI Performer
78
AI Performer
• Generate expressive music audio from score
• Performance “brings music to life”
• Existing work mainly focuses only the piano only
79
AI Performer
• Direct score-to-waveform synthesis is hard
• Another idea: score-to-spec correspondence
81
2D to 1D
2D to 2D
PerformanceNet
• ContourNet + TextureNet
(‘transla on’ → ‘super resolu on’)
https://github.com/bwang514/PerformanceNet
83
AAAI’19AAAI’19
PerformanceNet: Score-to-
audio music generation with
multi-band convolutional
residual network
ContourNet
• Challenge 1: different input/output dimension
Use asymmetric U-net
• Challenge 2: hard to control note duration
Encode additional onset/offset information
84
ContourNet
85
PerformanceNet
• Challenge 3: realistic
timbre and overtone
Enhance sound texture
with multi-band residual learning
86
PerformanceNet - Sound Examples
87
Cello
(Soft Synth)
Cello
(Logic Pro)
Cello
(Ours)
PerformanceNet - Bonus
88
愛江山更愛美人 – Violin
(Ours)
https://www.youtube.com/watch?v=k0-cT6GxS3g&feature=youtu.be
Wrap-Up
• AI Listener
 Create the singing-only version of songs
 Sound event detection
• AI DJ
 Create DJ skills such as thumbnailing and sequencing
• AI Composer
 Create lead sheets or multi-track piano-rolls
• AI Performer
 From score to audio
90
91
audio features
(existing
songs)
AI listener AI DJ audio
(a new
song)
Music transcription (audio2score)
• audio → note (pitch, onset, offset)
• audio → instrument (flute, cello)
• audio → meter (4/4)
• audio → key (E-flat major)
audio score
Music semantic labeling
• audio → genre (classical)
• audio → emotion (yearning)
• audio → other attributes (slow/fast)
labels
(a new
song)
AI composer
random seed
AI performer (score2audio)
Conclusion

Contenu connexe

Tendances

Array implementation and linked list as datat structure
Array implementation and linked list as datat structureArray implementation and linked list as datat structure
Array implementation and linked list as datat structure
Tushar Aneyrao
 
Data Structures and Algorithms
Data Structures and AlgorithmsData Structures and Algorithms
Data Structures and Algorithms
Pierre Vigneras
 

Tendances (20)

Music Recommendation Tutorial
Music Recommendation TutorialMusic Recommendation Tutorial
Music Recommendation Tutorial
 
Merge sort
Merge sortMerge sort
Merge sort
 
Speech Recognition
Speech Recognition Speech Recognition
Speech Recognition
 
heap Sort Algorithm
heap  Sort Algorithmheap  Sort Algorithm
heap Sort Algorithm
 
Introduction to numpy
Introduction to numpyIntroduction to numpy
Introduction to numpy
 
Nonrecursive predictive parsing
Nonrecursive predictive parsingNonrecursive predictive parsing
Nonrecursive predictive parsing
 
Html / CSS Presentation
Html / CSS PresentationHtml / CSS Presentation
Html / CSS Presentation
 
Parsing
ParsingParsing
Parsing
 
Introduction to HTML and CSS
Introduction to HTML and CSSIntroduction to HTML and CSS
Introduction to HTML and CSS
 
Css ppt
Css pptCss ppt
Css ppt
 
Queue Data Structure
Queue Data StructureQueue Data Structure
Queue Data Structure
 
PHP HTML CSS Notes
PHP HTML CSS  NotesPHP HTML CSS  Notes
PHP HTML CSS Notes
 
Text to-speech & voice recognition
Text to-speech & voice recognitionText to-speech & voice recognition
Text to-speech & voice recognition
 
HTML5 Web storage
HTML5 Web storageHTML5 Web storage
HTML5 Web storage
 
Array implementation and linked list as datat structure
Array implementation and linked list as datat structureArray implementation and linked list as datat structure
Array implementation and linked list as datat structure
 
Python NumPy Tutorial | NumPy Array | Edureka
Python NumPy Tutorial | NumPy Array | EdurekaPython NumPy Tutorial | NumPy Array | Edureka
Python NumPy Tutorial | NumPy Array | Edureka
 
Tokenization using nlp | NLP Course
Tokenization using nlp | NLP CourseTokenization using nlp | NLP Course
Tokenization using nlp | NLP Course
 
Data Structures and Algorithms
Data Structures and AlgorithmsData Structures and Algorithms
Data Structures and Algorithms
 
Quick sort-Data Structure
Quick sort-Data StructureQuick sort-Data Structure
Quick sort-Data Structure
 
Vector class in C++
Vector class in C++Vector class in C++
Vector class in C++
 

Similaire à Machine learning for creative AI applications in music (2018 nov)

QMUL C4DM API Presentation @ BCN Music Hack Day
QMUL C4DM API Presentation @ BCN Music Hack DayQMUL C4DM API Presentation @ BCN Music Hack Day
QMUL C4DM API Presentation @ BCN Music Hack Day
Amélie Anglade
 
Electronic Music and Software Craftsmanship: analogue patterns.
Electronic Music and Software Craftsmanship: analogue patterns.Electronic Music and Software Craftsmanship: analogue patterns.
Electronic Music and Software Craftsmanship: analogue patterns.
Guillaume Saint Etienne
 
Music discovery on the net
Music discovery on the netMusic discovery on the net
Music discovery on the net
guestbf080
 
Shaun warburton ig2 task 1 work sheet improved
Shaun warburton ig2 task 1 work sheet improvedShaun warburton ig2 task 1 work sheet improved
Shaun warburton ig2 task 1 work sheet improved
warburton9191
 
Ig2 task 1 work sheet - JS
Ig2 task 1 work sheet - JSIg2 task 1 work sheet - JS
Ig2 task 1 work sheet - JS
JamieShepherd
 
Gracenote API - MusicHackDay
Gracenote API - MusicHackDayGracenote API - MusicHackDay
Gracenote API - MusicHackDay
Oscar Celma
 

Similaire à Machine learning for creative AI applications in music (2018 nov) (20)

Research at MAC Lab, Academia Sincia, in 2017
Research at MAC Lab, Academia Sincia, in 2017Research at MAC Lab, Academia Sincia, in 2017
Research at MAC Lab, Academia Sincia, in 2017
 
Machine Learning for Creative AI Applications in Music (2018 May)
Machine Learning for Creative AI Applications in Music (2018 May)Machine Learning for Creative AI Applications in Music (2018 May)
Machine Learning for Creative AI Applications in Music (2018 May)
 
20211026 taicca 2 music generation
20211026 taicca 2 music generation20211026 taicca 2 music generation
20211026 taicca 2 music generation
 
Deep Learning Meetup #5
Deep Learning Meetup #5Deep Learning Meetup #5
Deep Learning Meetup #5
 
Deep dive into Android’s audio latency problem
Deep dive into Android’s audio latency problemDeep dive into Android’s audio latency problem
Deep dive into Android’s audio latency problem
 
QMUL C4DM API Presentation @ BCN Music Hack Day
QMUL C4DM API Presentation @ BCN Music Hack DayQMUL C4DM API Presentation @ BCN Music Hack Day
QMUL C4DM API Presentation @ BCN Music Hack Day
 
Electronic Music and Software Craftsmanship: analogue patterns.
Electronic Music and Software Craftsmanship: analogue patterns.Electronic Music and Software Craftsmanship: analogue patterns.
Electronic Music and Software Craftsmanship: analogue patterns.
 
audio-production-1231352387673755-2.ppt
audio-production-1231352387673755-2.pptaudio-production-1231352387673755-2.ppt
audio-production-1231352387673755-2.ppt
 
Research on Automatic Music Composition at the Taiwan AI Labs, April 2020
Research on Automatic Music Composition at the Taiwan AI Labs, April 2020Research on Automatic Music Composition at the Taiwan AI Labs, April 2020
Research on Automatic Music Composition at the Taiwan AI Labs, April 2020
 
Digital Hymnals: Exploring the Capabilities of QuickTime Pro
Digital Hymnals: Exploring the Capabilities of QuickTime ProDigital Hymnals: Exploring the Capabilities of QuickTime Pro
Digital Hymnals: Exploring the Capabilities of QuickTime Pro
 
Automatic Music Composition with Transformers, Jan 2021
Automatic Music Composition with Transformers, Jan 2021Automatic Music Composition with Transformers, Jan 2021
Automatic Music Composition with Transformers, Jan 2021
 
ScoReader: A Mobile Computer Vision System for Optical Music Recognition
ScoReader: A Mobile Computer Vision System for Optical Music RecognitionScoReader: A Mobile Computer Vision System for Optical Music Recognition
ScoReader: A Mobile Computer Vision System for Optical Music Recognition
 
Music discovery on the net
Music discovery on the netMusic discovery on the net
Music discovery on the net
 
The Next-Gen Dynamic Sound System of Killzone Shadow Fall
The Next-Gen Dynamic Sound System of Killzone Shadow FallThe Next-Gen Dynamic Sound System of Killzone Shadow Fall
The Next-Gen Dynamic Sound System of Killzone Shadow Fall
 
Shaun warburton ig2 task 1 work sheet improved
Shaun warburton ig2 task 1 work sheet improvedShaun warburton ig2 task 1 work sheet improved
Shaun warburton ig2 task 1 work sheet improved
 
Ig2 task 1 work sheet - JS
Ig2 task 1 work sheet - JSIg2 task 1 work sheet - JS
Ig2 task 1 work sheet - JS
 
Two-step Melody Harmonious Generator
Two-step Melody Harmonious GeneratorTwo-step Melody Harmonious Generator
Two-step Melody Harmonious Generator
 
Mining the social web for music-related data: a hands-on tutorial
Mining the social web for music-related data: a hands-on tutorialMining the social web for music-related data: a hands-on tutorial
Mining the social web for music-related data: a hands-on tutorial
 
Mining the social web for music-related data: a hands-on tutorial
Mining the social web for music-related data: a hands-on tutorialMining the social web for music-related data: a hands-on tutorial
Mining the social web for music-related data: a hands-on tutorial
 
Gracenote API - MusicHackDay
Gracenote API - MusicHackDayGracenote API - MusicHackDay
Gracenote API - MusicHackDay
 

Dernier

VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Dernier (20)

VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 

Machine learning for creative AI applications in music (2018 nov)

  • 1. Machine Learning for Creative AI Applications in Music Music and AI Lab, Research Center for IT Innovation, Academia Sinica Yi-Hsuan Yang Ph.D. http://www.citi.sinica.edu.tw/pages/yang/ yang@citi.sinica.edu.tw
  • 2. About Us • Academia Sinica  National academy of Taiwan, founded in 1928  About 1,000 Full/Associate/Assistant Researchers  Located at Nangang District, Taipei City • Music and AI Lab (musicai) Members: Research Assistants, (co-advised) PhD/master students Application-oriented research: machine learning & music 2
  • 3. 3 Music transcription (audio2score) • audio → note (pitch, onset, offset) • audio → instrument (flute, cello) • audio → meter (4/4) • audio → key (E-flat major) audio score ML in Music: “Music Info Retrieval/Analysis” (existing song)
  • 4. ML in Music: “Music Info Retrieval/Analysis” 4 Music transcription (audio2score) • audio → note (pitch, onset, offset) • audio → instrument (flute, cello) • audio → meter (4/4) • audio → key (E-flat major) audio score Music semantic labeling • audio → genre (classical) • audio → emotion (yearning) • audio → other attributes (slow/fast) labels applications in music retrieval, education, archival, etc (existing song) AI listener
  • 5. Music transcription (audio2score) • audio → note (pitch, onset, offset) • audio → instrument (flute, cello) • audio → meter (4/4) • audio → key (E-flat major) ML in Music: “Music Generation/Synthesis” 5 audio score Music semantic labeling • audio → genre (classical) • audio → emotion (yearning) • audio → other attributes (slow/fast) labels (new song) AI composer random seed AI performer (score2audio)
  • 6. Music transcription (audio2score) • audio → note (pitch, onset, offset) • audio → instrument (flute, cello) • audio → meter (4/4) • audio → key (E-flat major) ML in Music: “Music Generation/Synthesis” 6 audio features Music semantic labeling • audio → genre (classical) • audio → emotion (yearning) • audio → other attributes (slow/fast) labels (existing songs) AI listener score AI DJ audio (a new song) remix, mashup, etc (image from the Internet)
  • 7. Recap • ML in Music  Music information retrieval/analysis  AI listener  Music transcription (audio → score)  Music semantic labeling (audio → label)  For analyzing and indexing existing songs  Music generation/synthesis  AI composer (random seed → score)  AI performer (score → audio)  AI DJ (exis ng songs → new song)  For creating new music 7
  • 8. ML for Creative AI Applications in Music • AI Listener • AI DJ • AI Composer • AI Performer 8
  • 9. AI Listener: Source Separation • “Demix” the music signal  input: audio mixture output: individual tracks 9 (image from the Internet)
  • 10. Our Result http://ss.ciaua.com/ ICMLA’18aICMLA’18a Denoising auto-encoder with recurrent skip connections and residual regression for music source separation 12
  • 11. AI Listener: Source Separation • “Demix” the music signal • Application  Music production, DJ related skills  Singing voice processing, karaoke, soundtracks for movies  Smart headset, smart loudspeaker  Education • Extension Multi-instrument separation, speech voice separation Melody extraction, beat estimation 13
  • 12. Algorithm 1/4: Background • Autoencoder (AE) • Denoising autoencoder (DAE) 14 (image from the Internet)
  • 13. Algorithm 2/4: Main Idea mixture vocal mixture drum mixture others • Training data  Demixing Secrets Dataset (DSD): 100 Western pop songs with multi-track version (vocals, drums, bass, others)  No Chinese or Japanese pop songs at all in DSD 15 DAE1 DAE2 DAE3
  • 15. Algorithm 4/4: Architecture • U-net  Encoder: Conv2D  Decoder: Deconv2D  Skip connections • allows low-level information to flow directly from the high resolution input to the high-resolution output (at the corresponding hierarchy) 17 [1] “U-Net: Convolutional networks for biomedical image segmentation,” arXiv 2015 [2] “Singing voice separation with deep u-net convolutional networks,” ISMIR 2017 (figure from [2])
  • 16. Evaluation Campaign: SiSEC 2018 18 Ours Sony Oracle Ours The “Sony” guys: Naoya Takahashi, Nabarun Goswami, Yuki Mitsufuji Stefan Uhlich et al.
  • 17. AI Listener: Sound Event Detection • Applications Surveillance Self-driven car Industry 4.0 Healthcare AIoT Smart city • Strength Sound (ears) is comple- mentary to vision (eyes) Can work well even under a dim environment, or when the event is at a distance from the camera 19
  • 18. 20 Sound Event Detection • Competitor: Samsung Galaxy S7 • Link: https://youtu.be/4fhJp3tIptI IJCAI’18IJCAI’18 Learning to recognize transient sound events using attentional supervision
  • 19. ML for Creative AI Applications in Music • AI Listener • AI DJ • AI Composer • AI Performer 21 audio features (existing songs) AI listener AI DJ audio (a new song)
  • 20. AI DJ • Smart speaker + recommendation + DJ skills 22
  • 21. DJ Skill #1: Music Thumbnailing • Extract music highlights • Application: music browsing, ringtone generation • Related papers published by NAVER Corp  “Automatic DJ mix generation using highlight detection,” ISMIR 2017  “Automatic music highlight extraction using convolutional recurrent attention networks,” arxiv 1712.05901 23 ↙30 sec highlight “A song”
  • 22. Algorithm • CNN for emotion prediction + attention (predicting weights of different parts of a song) • Transfer learning: no need of structural (chorus) labels 24 TISMIR’18TISMIR’18 Pop music highlighter: Marking the emotion keypoints Open source!
  • 23. https://remyhuang.github.io/music_thumbnailing/ Demo 25 周杰倫 - 稻香 光良 - 童話 胡夏 - 那些年 Linkin Park - Burn It Down Adam Lambert - Whataya Want from Me
  • 24. DJ Skill #2: Music Sequencing • Find an ordering of music pieces 28 “Automatic playlist sequencing and transitions,” Proc. ISMIR 2017 (from )
  • 25. DJ Skill #2: Music Sequencing • Find an ordering of music pieces  “Automatic playlist sequencing and transitions,” Proc. ISMIR 2017 (from )  “Generating music medleys via playing music puzzle games by unsupervised similarity embedding,” (from MAC Lab) 29 https://remyhuang.github.io/ ► Demo:
  • 26. Algorithm 1/2: Music Puzzle Games • Divide a song into non-overlapping chunks • Learn to order them by a Siamese CNN network  Positive pair: R1R2, R2R3  Negative pair: R2R1, R3R2, R1R3, R3R1  Unsupervised (self-supervised) learning 30 AAAI’18aAAAI’18a Generating music medleys via playing music puzzle games
  • 27. Algorithm 2/2: Similarity Embedding Net • Divide a song into non-overlapping chunks • Siamese CNN + similarity embedding [a b c d], [a b c d], [a b c d] [d c a b] 31 AAAI’18aAAAI’18a Generating music medleys via playing music puzzle gameshttps://remyhuang.github.io/► Demo: Open source!
  • 28. Result • For 8-pieces puzzle games, our model reaches 99.0% pairwise accuracy (PA) and 96.1% overall accuracy (OA) • For medley, we have 75.0% OA 33 (our method) (baseline 1) (baseline 2)
  • 29. DJ Skill #3: Music Mash-up 34
  • 30. ML for Creative AI Applications in Music • AI Listener • AI DJ • AI Composer • AI Performer 41
  • 31. AI Composer 42 IBM Waston Beats Sony -> “Create unique, royalty-free soundtracks for your videos”
  • 32. AI Composer • Create music • Why? Make musician’s life easier Create copyright-free music (for films, Ads, games) Classic AI problem 43 Eminem - When I'm Gone
  • 33. Our Research on “AI Composer” • Collaboration with musicians & producers from KKBOX • Projects  MidiNet: melody generation (ISMIR’17) --- already cited by 53  Melody harmonization  Lead sheet generation  Drum VAE  MuseGAN: multitrack music generation (AAAI’18) --- 206 stars  Multi-track music generation using binary neurons (ISMIR’18)  Lead sheet arrangement and interpolation (ICMLA’18)  Automatic instrumentation arrangement  Emotion-based music generation 44
  • 34. Lead Sheet Generation • Lead sheet  melody  chord • Given chord, generate melody • Given melody, generate chord (a.k.a., harmonization) • Or, from scratch 45
  • 35. Melody Generation by RNN Google MelodyRNN C-RNN- GAN Song from PI DeepBach Google WaveNet core model RNN RNN RNN RNN CNN data type symbolic symbolic symbolic symbolic audio genre specificity ─ ─ ─ Bach chorale ─ mandatory prior knowledge priming melody ─ music scale & melody profile melody of one part priming wave follow a priming melody V V V follow a chord sequence generate from scratch V generate multi- part music V V V open source V V 47
  • 36. Melody Generation by CNN+GAN Google MelodyRNN MidiNet Google WaveNet core model RNN CNN CNN data type symbolic symbolic audio genre specificity ─ ─ ─ mandatory prior knowledge priming melody ─ priming wave follow a priming melody V V V follow a chord sequence V generate from scratch V generate multi- part music V V open source V V 48 • By Google • RNN • Trained with thousands of melodies • By MAC Lab • CNN • 526 tabs (4,208 bars) • One GPU (GTX 1080) • <30 mins ISMIR’17ISMIR’17 MidiNet
  • 37. Algorithm: Desired Output • Generate the melody of a bar at a time • Use a matrix to represent the music of a bar • Condition on the previous bar (the history) 49 96 time steps (current bar) 84notes (next bar)(previous bar)
  • 38. Algorithm: Main Idea 50 • Generative adversarial nets (GAN) Discriminator: tell real from fake Generator: fool the discriminator • Generate from scratch real or fake?
  • 39. Algorithm: Main Idea 51 • Generative adversarial nets (GAN) Discriminator: tell real from fake Generator: fool the discriminator • Generate from scratch • Or, given chord, generate melody real or fake?
  • 40. Algorithm: Temporal Model 52 • Conditioner: provide 2-D conditions  use the same filter shapes as the generator CNN  so that their intermediate outputs are “compatible” real or fake?
  • 41. real or fake? Algorithm 53 • Generative adversarial nets (GAN) • Don’t know what the “desired output” should be (for example, what should be played next) Only know whether it “sounds like real” It learns the mapping between two spaces
  • 42. MidiNet: Examples • Variants of MidiNet 54 1 2 3 • Google Magenta vs. MidiNet • With drums
  • 43. Lead Sheet Generation & Interpolation • Given a reference melody, generate variations of the melody + chords • Lead sheet interpolation http://vibertthio.com/leadsheet-vae-client/ 58 UnpublishedUnpublished Amazing Grace original melody Amazing Grace +chord (version 1) Amazing Grace +chord (version 2)
  • 45. Multi-track Generation • Multi-track (not only melody and chord) 60 Begin Again (2013) https://salu133445.github.io/musegan/► Demo: AAAI’18bAAAI’18b MuseGAN
  • 46. Algorithm: Data • LPD dataset: 128K MIDIs (piano-rolls) from LMD 61 http://colinraffel.com/projects/lmd/ https://salu133445.github.io/musegan/dataset
  • 47. Algorithm: Intra- & Inter-tracks • Multi-track piano, guitar, bass, strings, drums • Hybrid model  one “shared” (inter) z  five “private” (intra) zi  five generators  one discriminator 62
  • 48. Algorithm: Temporal Model 63 (a) generation from scratch (b) track-conditional generation
  • 50. Algorithm: WGAN-gp • Beginning – 500:1 D/G updates • Later – 5:1 D/G updates • Training time: <24 hours 65 NegativeDloss
  • 51. Algorithm: WGAN-gp • Beginning – 500:1 D/G updates • Later – 5:1 D/G updates • Training time: <24 hours 66 NegativeDloss
  • 52. Lead Sheet Arrangement • RNN-based lead sheet generator + chord feature extraction + chord-conditioned multi-track generation 72 ICMLA’18bICMLA’18b Lead sheet generation and arrangement by conditional generative adversarial network
  • 53. Instrumentation Generation • Pitch/timbre disentanglement Zt: timbre code Skip connections: pitch information 74 arxiv: 1811.03271arxiv: 1811.03271 Learning disentangled representations for timber and pitch in music audio
  • 55. AI Composer: Open Source Code • MidiNet: https://github.com/RichardYang40148/MidiNet • DrumVAE: https://github.com/vibertthio/drum_vae_server • MuseGAN: https://github.com/salu133445/musegan • BMuseGAN: https://salu133445.github.io/bmusegan/ • Pypianoroll: https://github.com/salu133445/pypianoroll • Lead sheet arrangement: https://github.com/liuhaumin/LeadsheetArrangement 76
  • 56. Music and AI • AI Listener • AI DJ • AI Composer • AI Performer 78
  • 57. AI Performer • Generate expressive music audio from score • Performance “brings music to life” • Existing work mainly focuses only the piano only 79
  • 58. AI Performer • Direct score-to-waveform synthesis is hard • Another idea: score-to-spec correspondence 81 2D to 1D 2D to 2D
  • 59. PerformanceNet • ContourNet + TextureNet (‘transla on’ → ‘super resolu on’) https://github.com/bwang514/PerformanceNet 83 AAAI’19AAAI’19 PerformanceNet: Score-to- audio music generation with multi-band convolutional residual network
  • 60. ContourNet • Challenge 1: different input/output dimension Use asymmetric U-net • Challenge 2: hard to control note duration Encode additional onset/offset information 84
  • 62. PerformanceNet • Challenge 3: realistic timbre and overtone Enhance sound texture with multi-band residual learning 86
  • 63. PerformanceNet - Sound Examples 87 Cello (Soft Synth) Cello (Logic Pro) Cello (Ours)
  • 64. PerformanceNet - Bonus 88 愛江山更愛美人 – Violin (Ours) https://www.youtube.com/watch?v=k0-cT6GxS3g&feature=youtu.be
  • 65. Wrap-Up • AI Listener  Create the singing-only version of songs  Sound event detection • AI DJ  Create DJ skills such as thumbnailing and sequencing • AI Composer  Create lead sheets or multi-track piano-rolls • AI Performer  From score to audio 90
  • 66. 91 audio features (existing songs) AI listener AI DJ audio (a new song) Music transcription (audio2score) • audio → note (pitch, onset, offset) • audio → instrument (flute, cello) • audio → meter (4/4) • audio → key (E-flat major) audio score Music semantic labeling • audio → genre (classical) • audio → emotion (yearning) • audio → other attributes (slow/fast) labels (a new song) AI composer random seed AI performer (score2audio) Conclusion