Machine learning for creative AI applications in music (2018 nov)

Machine Learning for
Creative AI Applications
in Music
Music and AI Lab,
Research Center for IT Innovation,
Academia Sinica
Yi-Hsuan Yang Ph.D.
http://www.citi.sinica.edu.tw/pages/yang/
yang@citi.sinica.edu.tw

About Us
• Academia Sinica
 National academy of Taiwan, founded in 1928
 About 1,000 Full/Associate/Assistant Researchers
 Located at Nangang District, Taipei City
• Music and AI Lab (musicai)
Members: Research Assistants, (co-advised) PhD/master
students
Application-oriented research: machine learning & music
2

3
Music transcription (audio2score)
• audio → note (pitch, onset, offset)
• audio → instrument (flute, cello)
• audio → meter (4/4)
• audio → key (E-flat major)
audio score
ML in Music: “Music Info Retrieval/Analysis”
(existing
song)

ML in Music: “Music Info Retrieval/Analysis”
4
audio score
Music semantic labeling
• audio → genre (classical)
• audio → emotion (yearning)
• audio → other attributes (slow/fast)
labels
applications in
music retrieval,
education,
archival, etc
(existing
song)
AI listener

ML in Music: “Music Generation/Synthesis”
5
audio score
labels
(new
song)
AI composer
random seed
AI performer (score2audio)

ML in Music: “Music Generation/Synthesis”
6
audio
features
labels
(existing
songs)
AI listener
score
AI DJ
audio
(a new
song)
remix, mashup, etc
(image from the Internet)

Recap
• ML in Music
 Music information retrieval/analysis
 AI listener
 Music transcription (audio → score)
 Music semantic labeling (audio → label)
 For analyzing and indexing existing songs
 Music generation/synthesis
 AI composer (random seed → score)
 AI performer (score → audio)
 AI DJ (exis ng songs → new song)
 For creating new music
7

ML for Creative AI Applications in Music
• AI Listener
• AI DJ
• AI Composer
• AI Performer
8

AI Listener: Source Separation
• “Demix” the music signal
 input: audio mixture
output: individual tracks
9

Our Result
http://ss.ciaua.com/
ICMLA’18aICMLA’18a
Denoising auto-encoder with recurrent skip
connections and residual regression for
music source separation
12

AI Listener: Source Separation
• “Demix” the music signal
• Application
 Music production, DJ related skills
 Singing voice processing, karaoke, soundtracks for movies
 Smart headset, smart loudspeaker
 Education
• Extension
Multi-instrument separation, speech voice separation
Melody extraction, beat estimation
13

Algorithm 1/4: Background
• Autoencoder (AE)
• Denoising autoencoder (DAE)
14

Algorithm 2/4: Main Idea
mixture vocal
mixture drum
mixture others
• Training data
 Demixing Secrets Dataset (DSD): 100 Western pop
songs with multi-track version (vocals, drums, bass, others)
 No Chinese or Japanese pop songs at all in DSD
15
DAE1
DAE2
DAE3

Algorithm 3/4: Input, Output
16

Algorithm 4/4: Architecture
• U-net
 Encoder: Conv2D
 Decoder: Deconv2D
 Skip connections
• allows low-level
information to flow
directly from the high
resolution input to the
high-resolution output
(at the corresponding
hierarchy)
17
[1] “U-Net: Convolutional networks for biomedical image segmentation,” arXiv 2015
[2] “Singing voice separation with deep u-net convolutional networks,” ISMIR 2017
(figure from [2])

Evaluation Campaign: SiSEC 2018
18
Ours
Sony
Oracle
Ours
The “Sony” guys:
Naoya Takahashi,
Nabarun Goswami,
Yuki Mitsufuji
Stefan Uhlich
et al.

AI Listener: Sound Event Detection
• Applications
Surveillance
Self-driven car
Industry 4.0
Healthcare
AIoT
Smart city
• Strength
Sound (ears) is comple-
mentary to vision (eyes)
Can work well even under
a dim environment, or
when the event is at a
distance from the camera
19

20
Sound Event Detection
• Competitor: Samsung Galaxy S7
• Link: https://youtu.be/4fhJp3tIptI
IJCAI’18IJCAI’18
Learning to recognize transient sound
events using attentional supervision

• AI Listener
• AI DJ
• AI Composer
• AI Performer
21
audio
features
(existing
songs)
AI listener AI DJ
audio
(a new
song)

AI DJ
• Smart speaker + recommendation + DJ skills
22

DJ Skill #1: Music Thumbnailing
• Extract music highlights
• Application: music browsing, ringtone generation
• Related papers published by NAVER Corp
 “Automatic DJ mix generation using highlight
detection,” ISMIR 2017
 “Automatic music highlight extraction using
convolutional recurrent attention networks,”
arxiv 1712.05901
23
↙30 sec highlight
“A song”

Algorithm
• CNN for emotion prediction + attention
(predicting weights of different parts of a song)
• Transfer learning: no need of structural (chorus) labels
24
TISMIR’18TISMIR’18
Pop music highlighter:
Marking the emotion
keypoints
Open source!

https://remyhuang.github.io/music_thumbnailing/
Demo
25
周杰倫 - 稻香
光良 - 童話
胡夏 - 那些年
Linkin Park - Burn It Down
Adam Lambert - Whataya Want from Me

DJ Skill #2: Music Sequencing
• Find an ordering of music pieces
28
“Automatic playlist sequencing and transitions,” Proc. ISMIR 2017 (from )

DJ Skill #2: Music Sequencing
• Find an ordering of music pieces
 “Automatic playlist sequencing and transitions,”
Proc. ISMIR 2017 (from )
 “Generating music medleys via playing music puzzle games by
unsupervised similarity embedding,” (from MAC Lab)
29
https://remyhuang.github.io/
► Demo:

Algorithm 1/2: Music Puzzle Games
• Divide a song into non-overlapping chunks
• Learn to order them by a Siamese CNN network
 Positive pair: R1R2, R2R3
 Negative pair: R2R1, R3R2, R1R3, R3R1
 Unsupervised (self-supervised) learning
30
AAAI’18aAAAI’18a
Generating music
medleys via playing
music puzzle games

Algorithm 2/2: Similarity Embedding Net
• Divide a song into non-overlapping chunks
• Siamese CNN + similarity embedding
[a b c d], [a b c d],
[a b c d] [d c a b]
31
AAAI’18aAAAI’18a
Generating music
medleys via playing
music puzzle gameshttps://remyhuang.github.io/► Demo:
Open source!

Result
• For 8-pieces puzzle games, our model
reaches 99.0% pairwise accuracy (PA)
and 96.1% overall accuracy (OA)
• For medley, we have 75.0% OA
33
(our method)
(baseline 1)
(baseline 2)

• AI Listener
• AI DJ
• AI Composer
• AI Performer
41

AI Composer
42
IBM Waston Beats Sony ->
“Create unique, royalty-free soundtracks
for your videos”

AI Composer
• Create music
• Why?
Make musician’s life easier
Create copyright-free music (for films, Ads, games)
Classic AI problem
43
Eminem - When I'm Gone

Our Research on “AI Composer”
• Collaboration with musicians & producers from KKBOX
• Projects
 MidiNet: melody generation (ISMIR’17) --- already cited by 53
 Melody harmonization
 Lead sheet generation
 Drum VAE
 MuseGAN: multitrack music generation (AAAI’18) --- 206 stars
 Multi-track music generation using binary neurons (ISMIR’18)
 Lead sheet arrangement and interpolation (ICMLA’18)
 Automatic instrumentation arrangement
 Emotion-based music generation
44

Lead Sheet Generation
• Lead sheet
 melody
 chord
• Given chord, generate melody
• Given melody, generate chord (a.k.a., harmonization)
• Or, from scratch
45

Melody Generation by RNN
Google
MelodyRNN
C-RNN-
GAN
Song from PI DeepBach
Google
WaveNet
core model RNN RNN RNN RNN CNN
data type symbolic symbolic symbolic symbolic audio
genre specificity ─ ─ ─
Bach
chorale
─
mandatory prior
knowledge
priming
melody
─
music scale &
melody profile
melody of
one part
priming
wave
follow a priming
melody
V V V
follow a chord
sequence
generate from
scratch
V
generate multi-
part music
V V V
open source V V
47

Melody Generation by CNN+GAN
Google
MelodyRNN
MidiNet
Google
WaveNet
core model RNN CNN CNN
data type symbolic symbolic audio
genre specificity ─
─
─
mandatory prior
knowledge
priming
melody
─
priming
wave
follow a priming
melody
V V V
follow a chord
sequence
V
generate from
scratch
V
generate multi-
part music
V V
open source V V
48
• By Google
• RNN
• Trained with
thousands of
melodies
• By MAC Lab
• CNN
• 526 tabs
(4,208 bars)
• One GPU
(GTX 1080)
• <30 mins
ISMIR’17ISMIR’17
MidiNet

Algorithm: Desired Output
• Generate the melody of a bar at a time
• Use a matrix to represent the music of a bar
• Condition on the previous bar (the history)
49
96 time steps (current bar)
84notes
(next bar)(previous bar)

Algorithm: Main Idea
50
• Generative adversarial nets (GAN)
Discriminator: tell real from fake
Generator: fool the discriminator
• Generate from scratch
real or
fake?

Algorithm: Main Idea
51
Discriminator: tell real from fake
Generator: fool the discriminator
• Generate from scratch
• Or, given chord, generate melody
real or
fake?

Algorithm: Temporal Model
52
• Conditioner: provide 2-D conditions
 use the same filter shapes as the generator CNN
 so that their intermediate outputs are “compatible”
real or
fake?

real or
fake?
Algorithm
53
• Don’t know what the “desired output” should be
(for example, what should be played next)
Only know whether it “sounds like real”
It learns the mapping between two spaces

MidiNet: Examples
• Variants of MidiNet
54
1 2 3
• Google Magenta
vs. MidiNet
• With drums

Lead Sheet Generation & Interpolation
• Given a reference melody, generate variations of the
melody + chords
• Lead sheet interpolation
http://vibertthio.com/leadsheet-vae-client/
58
UnpublishedUnpublished
Amazing Grace
original melody
Amazing Grace
+chord (version 1)
Amazing Grace
+chord (version 2)

Drum VAE
• http://vibertthio.com/drum-vae-client/v3/
59
UnpublishedUnpublished

Multi-track Generation
• Multi-track
(not only melody
and chord)
60
Begin Again (2013)
https://salu133445.github.io/musegan/► Demo:
AAAI’18bAAAI’18b
MuseGAN

Algorithm: Data
• LPD dataset: 128K MIDIs (piano-rolls) from LMD
61
http://colinraffel.com/projects/lmd/
https://salu133445.github.io/musegan/dataset

Algorithm: Intra- & Inter-tracks
• Multi-track
piano, guitar, bass,
strings, drums
• Hybrid model
 one “shared” (inter) z
 five “private” (intra) zi
 five generators
 one discriminator
62

Algorithm: Temporal Model
63
(a) generation from scratch
(b) track-conditional generation

Algorithm: Combined Model
64
generation from scratch

Algorithm: WGAN-gp
• Beginning – 500:1 D/G updates
• Later – 5:1 D/G updates
• Training time:
<24 hours
65
NegativeDloss

Algorithm: WGAN-gp
• Beginning – 500:1 D/G updates
• Later – 5:1 D/G updates
• Training time:
<24 hours
66
NegativeDloss

Lead Sheet Arrangement
• RNN-based lead sheet generator
+ chord feature extraction
+ chord-conditioned multi-track generation
72
ICMLA’18bICMLA’18b
Lead sheet generation and
arrangement by conditional
generative adversarial
network

Instrumentation Generation
• Pitch/timbre disentanglement
Zt: timbre code
Skip connections: pitch information
74
arxiv: 1811.03271arxiv: 1811.03271
Learning disentangled
representations for timber
and pitch in music audio

Instrumentation Generation
75
arxiv: 1811.03271arxiv: 1811.03271

AI Composer: Open Source Code
• MidiNet:
https://github.com/RichardYang40148/MidiNet
• DrumVAE:
https://github.com/vibertthio/drum_vae_server
• MuseGAN: https://github.com/salu133445/musegan
• BMuseGAN: https://salu133445.github.io/bmusegan/
• Pypianoroll:
https://github.com/salu133445/pypianoroll
• Lead sheet arrangement:
https://github.com/liuhaumin/LeadsheetArrangement
76

Music and AI
• AI Listener
• AI DJ
• AI Composer
• AI Performer
78

AI Performer
• Generate expressive music audio from score
• Performance “brings music to life”
• Existing work mainly focuses only the piano only
79

AI Performer
• Direct score-to-waveform synthesis is hard
• Another idea: score-to-spec correspondence
81
2D to 1D
2D to 2D

PerformanceNet
• ContourNet + TextureNet
(‘transla on’ → ‘super resolu on’)
https://github.com/bwang514/PerformanceNet
83
AAAI’19AAAI’19
PerformanceNet: Score-to-
audio music generation with
multi-band convolutional
residual network

ContourNet
• Challenge 1: different input/output dimension
Use asymmetric U-net
• Challenge 2: hard to control note duration
Encode additional onset/offset information
84

PerformanceNet
• Challenge 3: realistic
timbre and overtone
Enhance sound texture
with multi-band residual learning
86

PerformanceNet - Sound Examples
87
Cello
(Soft Synth)
Cello
(Logic Pro)
Cello
(Ours)

PerformanceNet - Bonus
88
愛江山更愛美人 – Violin
(Ours)
https://www.youtube.com/watch?v=k0-cT6GxS3g&feature=youtu.be

Wrap-Up
• AI Listener
 Create the singing-only version of songs
 Sound event detection
• AI DJ
 Create DJ skills such as thumbnailing and sequencing
• AI Composer
 Create lead sheets or multi-track piano-rolls
• AI Performer
 From score to audio
90

91
audio features
(existing
songs)
AI listener AI DJ audio
(a new
song)
audio score
labels
(a new
song)
AI composer
random seed
AI performer (score2audio)
Conclusion

Machine learning for creative AI applications in music (2018 nov)

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Machine learning for creative AI applications in music (2018 nov)

Similaire à Machine learning for creative AI applications in music (2018 nov) (20)

Dernier

Dernier (20)

Machine learning for creative AI applications in music (2018 nov)