SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
UTTERANCE-LEVEL SEQUENTIAL MODELING
FOR DEEP GAUSSIAN PROCESS BASED

SPEECH SYNTHESIS USING

SIMPLE RECURRENT UNIT
Tomoki Koriyama, Hiroshi Saruwatari
The University of Tokyo, Japan
May 7, 2020
TH2.PB.10, SPE-P12, ICASSP 2020
Background: deep learning on speech synthesis
‣ Neural network (NN)-based speech synthesis
•Model the relationship between texts and speech parameters
‣ Differentiable components enable complicated models
•RNN (LSTM, GRU), CNN, Self-attention, Attention
‣ RNN for speech synthesis
•Can capture continuously changing speech parameters
•Was used in the best framework in Blizzard Challenge 2019
[Jiang2019]
•Is included in end-to-end frameworks (e.g. Tacotron [Wang2017])
Background: deep Gaussian process
‣ Deep Gaussian process (DGP) [Damianou2013][Salimbeni2018]
•Multi-layer Gaussian process regressions (GPRs)
•Nonlinear regression by kernel methods
•Bayesian learning considering model complexity
•Differentiable by variational approximation
‣ DGP-based speech synthesis [Koriyama2019]
Outperformed DNN-based method
Restricted model to feedforward architecture
p(y|x)
x
p(h1
|x)
p(h2
|x)
GPR
GPR
GPR
Is it possible to apply recurrent architecture to DGP?
Extension of DGPs
‣ Convolutional DGP [Kumar2018], TICK-GP [Dutordoir2019]
•Incorporate CNN architecture into DGP
‣ Probabilistic recurrent state space model (PR-SSM)
[Doerr2018]
•Incorporate RNN architecture into DGP
•Perform GPR in each time step
•Require much time for utterance-level training and generation
Purpose of study
To incorporate recurrent architecture into DGP with fast
computation to enable utterance-level sequential modeling
‣ Approach
•Utilize simple recurrent unit (SRU) [Lei2018]
•Separate parallel computation of GPR from recurrent architecture
Simple recurrent unit (SRU) [Lei2018]
SRU does not use the past hidden-layer value 

to calculate gates or update memory cell
hℓ
t−1
cℓ
t
<latexit sha1_base64="Zn0U1cf4rLt4wCaqCE5UdU8jWIw=">AAAUI3iclZi/b9tGFMfPStu46o/Y1VKgC1HHQYoixikIkCJAgEQOjARJEP+S5SRMBJE+iYRJkSBp+QehTu3SoWvmFOhQ9M/oEqBb0Q4Z+gcUBbpk6NKh746UREl3fBcStI7vvc/3vbujjmdZoefGCaWvFyrn3nn3vfOL71c/+PCjjy8sLX+yFwdHkc2aduAF0b7ViZnn9lkzcROP7YcR6/iWx1rW4Tr3twYsit2gv5uchuyZ3+n13a5rdxIwtZcX/jFDx220k+cm8zzj0k3DjN2e3zFMj3WTy2ar0eYBmdd0eGDK21fqQ+NLw7SKbrgfTIUHB0FimHajnSYQL4xm5Pac5AvDMM0q9xTyFuuQoVxfFFU3rkxFjzRzaFy4XVK1PV9NVDoKUYlYVOx/hHeeZ3OmskXSno/7nXV5HDTfVaekOqeYu720QteoOIz5Rj1vrJD82AyWF78nJjkgAbHJEfEJI32SQNsjHRLD+ZTUCSUh2J6RFGwRtFzhZ2RIqsAeQRSDiA5YD+FvD+6e5tY+3HPNWNA2ZPHgioA0yCr9g/5E39BX9Gf6F/1PqZUKDV7LKXxaGcvC9oXvPt35F6V8+EyIM6FKa05Il3wlanWh9lBYeC/sjB+cvXizc2N7Nb1Ef6B/Q/0v6Wv6C30FmtlpQiwjx6Knvsjdh7FNwc5HricsJ6DFLaPKAshTvI+glc7ED8HvQU0+XAnUPtTKxfvwNrmyeP1c/LlxxejhOUaRQ3FWS+bNgvghMpqcikGRP0190sh7aYlMs94hMlo+1DGrwG0YtwMZshmapSceTOOBeEotuJtXKfownQ70NYSxnleZeDANC+yJRGFkx3g+13KFiQfT2FVq7Gpr7MPoz9LchnEnEu5Eg2uWPovNt3oWQ0kNoUYNj8Qq25OM28SDaWyKOmf5zIqx8jnTm687YPek/MSD1+5Ia3fGta/C+p15DkCjK9Y6lo9vkQ1zpnyWZNnCQrYy+vaYHK2QFtSTgh3rZUNBNjS+23LSQklbQdooua4g1zWeCDl5ByW7CrKLkhsKcgMlHQXpoORdBXkXJe8pyHsoeaggD1HyvoK8r/GWk5MPUPKhgnyo8VaXk77GWionH2msQnJyU2Pdl5MhSm4pyC2U3FaQ2ygZKchIY88kJ3c0dgZyclfj/Swnmyh5pCDxveFAQQ5Qck9B7qHksYI8RsmWgmxp7Jnk5InGLk1O7qPkqYI8RcnHCvIxSp4pyDOUfKIgn2jkZPCNChQ8RflAeOR0fbxPKX+bjmrgGim5CJZ2QSv7z457vwYf/m4eqfXhisXaVq56cVxltnvTrXNjPDq63HRFE77aXlqpz/56Mt/Yu7pWp2v1rWsrtxr5LyuL5DPyObkMI32d3II3+iasN3alWUkr31S+rf1Y+7X2W+33LLSykDM1MnXU/vwfEmpa0Q==</latexit>
<latexit sha1_base64="wW4g1/uON3A4J2o4GJ588dFDFZ4=">AAAVK3iclZhLb9tGEMc3dh+u+rCd6pCiF6KOAwdBjVURoEWBAqkcGAmSIH7JUhKmgkivRMKkSJC0/CDUQ4/9Aj23t6Lop+jF96KHFL32EPSYQy89dHapB0XtcjYiZC1n5jfz3yG1XMsKPTdOKH1xZWHxjTffenvpncq7773/wfLK6tXDODiJbNawAy+IWlYnZp7bZ43ETTzWCiPW8S2PNa3jLe5vDlgUu0H/IDkP2XO/0+u7XdfuJGBqry58Z4aOW28n35jM84wbXxlm7Pb8jmF6rJtsmM16mwdkXtPhgSkff1obGrcMs1F0pwl4shAeYJhWPgKIwQwRHAWJYdpjjhvNyO05yU1AzYpZrqxMFqJpRpCGGi7GnhGTb5uM5LnnIibS7RLptlq60G7P9yko7VNQUiwo71OQ71MwP9v5LjkzUoJiB4S2jTx8s72yRjepeBnzg9posEZGr51gdemSmOSIBMQmJ8QnjPRJAmOPdEgMxzNSI5SEYHtOUrBFMHKFn5EhqQB7AlEMIjpgPYa/PTh7NrL24ZznjAVtQxUP3hGQBlmnf9Cf6St6SX+hL+l/ylypyMG1nMOnlbEsbC9/f23/X5Ty4TMhzpQq1ZyQLvlCaHVBeygsfBZ2xg8ufni1/+XeenqD/kT/Af0/0hf0N3oJObPDhFhGTsVMfVG7D71Nwc471xOWM8jFLWNlAdTJn0cwSgvxQ/B7oMmHdwLah1q1+Bxep1YWr1+L3zeu6B5eYxw5FEel5LpZED9EusmpGDLyu6lP6qNZWqJS0TtEuuWDjmIGbsO4faiQXaEiPfVgOR6Ku9SCs/kseR+WpwNzDaHX81mmHiyHBfZEkmFsx3h+reUZph4sx4Eyx4F2jhZ0v0hzG8adSbgzDa5Rei82XuteDCUaQg0Nj8Uq25P0berBcuwInUU+s2Ks/JrpXa+7YPek/NSDa3ek2p2J9nVYvzPPEeToirWOjfqbZ8MRU36VZNXCXLUy+usJOV4hLdCTgh2bZV1B1jW+23LSQklbQdoouaUgtzTuCDl5FyW7CrKLktsKchslHQXpoOQ9BXkPJV0FiX9T7yvI+yh5rCCPUfKBgnyg8XyUkw9R8pGCfKSxH5CTPkoGCjLQWL/l5GONlU9O7mg8a+RkiJK7CnIXJfcU5B5KRgoy0tinycl9jd2InDzQ2BPIyQZKnihIfD86UJADlDxUkIcoeaogT1GyqSCbGvs0OXmGki0F2ULJcwV5jpJPFOQTlLxQkBco+VRBPtWoyeAbpVrHqMYKyD1yujbZG5U/wccaeI6UXAdLO5cr+2+Se78FH74fGGfrwzsWa1t51usTldmOUVfn9qQ7utysoilfaa+s1Yq/2MwPDj/brNHN2u7ttTv10a85S+Rj8gnZgE5/Tu7ALmIH1ht74eXi8uK1xY+qv1Z/r/5Z/SsLXbgyYj4kM6/q3/8DE72k7w==</latexit>
LSTM
SRU
Simple recurrent unit (SRU) [Lei2018]
SRU can be decomposed into two blocks:
parallel computation and light recurrent
state
gate
state
layer output
layer input
gate
Light recurrent block
Linear
Parallel computation block
Simple recurrent unit for DGP
Replace linear transformation by GPR

in parallel computation block
state
gate
state
layer output
layer input
gate
Light recurrent block
GPR
Parallel computation block
SRU-DGP-based speech synthesis
Speech param
Context
GPR
GPR
SRU-layer w/ GPR
GPR
Context
SRU-layer w/ GPR
GPR
Context
GPR GPR
Speech param Speech param
# of SRU

layers
Time t
Utterance-level sampling for training
‣ In training process of DGP, inference and sampling is
repeatedly performed for each layer [Salimbeni19]
‣ Utterance-level predictive distribution is multivariate
Gaussian distribution:
•Hidden-layer values of adjacent frames are correlated
‣ Although the sampling can be performed by using
Cholesky decomposition of , this often unstable
‣ Use random feature expansion [Rahimi2008, Cutajar2017] for
stability of training
Σ
𝒩(h; μ, Σ)
Methods for experiments
Architecture
Models Kernel Bayes FeedForward LSTM SRU
NN - - FF-NN LSTM-RNN SRU-NN
BayesNN - ✓ FF-BayesNN -
SRU-
BayesNN
DGP ✓ ✓ FF-DGP -
SRU-DGP

(proposed)
Experimental conditions: database
Database
JSUT corpus [Sonobe2017]
1 female, BASIC0001~BASIC2000
Train / valid / test

sentences
1788 (1.95 h) / 60 / 60
Input featrue 575 dim. linguistic feature vector
Output feature
187 dim. acoustic feature vector
(Mel-cepstrum, log F0, code aperiodicity, v/uv & Δ, Δ2)
Experimental conditions: model configurations
Hidden layer dim. 256
# of inducing points 1024
Kernel function ArcCos [Cho09]
Optimizer Adam (learning rate: 0.01)
DGP
Hidden units 1024
Activation ReLU
Optimizer Adam (learning rate: 10-5)
BayesNN
NN: Hyperparameters were tuned by Optuna [Akiba2019] with 100 trials.
Objective evaluation: spectral feature distortion
Bayesian and SRU models yield smaller distortions
1 2 3 4 5 6 7 8
Number of layers
5.5
5.6
5.7
5.8
5.9
6.0
6.1
MCD[dB]
FF-NN (best)
FF-BayesNN
SRU-BayesNN
SRU-DGP
FF-DGP
LSTM-RNN (best)
SRU-NN (best)
Subjective evaluation
Proposed SRU-DGP gave higher score than other methods
1 2 3 4 5
Score
Method MOS
LSTM-RNN
SRU-NN
SRU-BayesNN
FF-DGP
SRU-DGP
ORIG
2.99
2.98
3.09
3.01
3.19
3.97
Computation time
SRU-DGP can generate speech faster than LSTM-RNN
1 2 3 4 5 6 7 8
Number of layers
SRU-DGP
FF-DGP
LSTM-RNN
SRU-NN
0.00
0.02
0.04
0.06
0.08
0.10Real-timefactor
Conclusions
‣ Incorporate simple recurrent unit (SRU) into DGP
‣ Achieve utterance-level sequential modeling
‣ The proposed SRU-DGP
•Outperformed feedforward (FF)-DGP and LSTM-RNN
•Achieved faster generation than LSTM-RNN
‣ Future work
•Investigate other differentiable components in DGP
- attention, self-attention
Additional speech samples
https://hyama5.github.io/demo_SRU_DGP_TTS/
Thank you for listening!

Contenu connexe

Similaire à UTTERANCE-LEVEL SEQUENTIAL MODELING FOR DEEP GAUSSIAN PROCESS BASED
 SPEECH SYNTHESIS USING
 SIMPLE RECURRENT UNIT

A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...
A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...
A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...Tomoki Koriyama
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFJayavardhan Reddy Peddamail
 
Parallel WaveGAN review
Parallel WaveGAN reviewParallel WaveGAN review
Parallel WaveGAN reviewJune-Woo Kim
 
Non autoregressive neural text-to-speech review
Non autoregressive neural text-to-speech reviewNon autoregressive neural text-to-speech review
Non autoregressive neural text-to-speech reviewJune-Woo Kim
 
Ngs de novo assembly progresses and challenges
Ngs de novo assembly progresses and challengesNgs de novo assembly progresses and challenges
Ngs de novo assembly progresses and challengesScott Edmunds
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Deep Learning Italia
 
Deep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - MeetupDeep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - MeetupLINAGORA
 
GDG DevFest Xiamen 2017
GDG DevFest Xiamen 2017GDG DevFest Xiamen 2017
GDG DevFest Xiamen 2017Taegyun Jeon
 
Speech Separation under Reverberant Condition.pdf
Speech Separation under Reverberant Condition.pdfSpeech Separation under Reverberant Condition.pdf
Speech Separation under Reverberant Condition.pdfssuser849b73
 
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...kevig
 
Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...
Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...
Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...kevig
 
Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1Thomas Keane
 
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Te...
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Te...ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Te...
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Te...Tomoki Hayashi
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ijnlc
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...kevig
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...kevig
 

Similaire à UTTERANCE-LEVEL SEQUENTIAL MODELING FOR DEEP GAUSSIAN PROCESS BASED
 SPEECH SYNTHESIS USING
 SIMPLE RECURRENT UNIT (20)

A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...
A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...
A TRAINING METHOD USING
 DNN-GUIDED LAYERWISE PRETRAINING
 FOR DEEP GAUSSIAN ...
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
 
Parallel WaveGAN review
Parallel WaveGAN reviewParallel WaveGAN review
Parallel WaveGAN review
 
Non autoregressive neural text-to-speech review
Non autoregressive neural text-to-speech reviewNon autoregressive neural text-to-speech review
Non autoregressive neural text-to-speech review
 
Ngs de novo assembly progresses and challenges
Ngs de novo assembly progresses and challengesNgs de novo assembly progresses and challenges
Ngs de novo assembly progresses and challenges
 
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
 
Deep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - MeetupDeep Learning in practice : Speech recognition and beyond - Meetup
Deep Learning in practice : Speech recognition and beyond - Meetup
 
GDG DevFest Xiamen 2017
GDG DevFest Xiamen 2017GDG DevFest Xiamen 2017
GDG DevFest Xiamen 2017
 
Machine Learning - Supervised Learning
Machine Learning - Supervised LearningMachine Learning - Supervised Learning
Machine Learning - Supervised Learning
 
Rajat CV
Rajat CVRajat CV
Rajat CV
 
Speech Separation under Reverberant Condition.pdf
Speech Separation under Reverberant Condition.pdfSpeech Separation under Reverberant Condition.pdf
Speech Separation under Reverberant Condition.pdf
 
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...
 
Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...
Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...
Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...
 
Conv-TasNet.pdf
Conv-TasNet.pdfConv-TasNet.pdf
Conv-TasNet.pdf
 
Pegasus
PegasusPegasus
Pegasus
 
Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1
 
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Te...
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Te...ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Te...
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Te...
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
 

Plus de Tomoki Koriyama

深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討
深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討
深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討Tomoki Koriyama
 
Sparse Approximation of Gram Matrices for GMMN-based Speech Synthesis
Sparse Approximation of Gram Matrices for GMMN-based Speech SynthesisSparse Approximation of Gram Matrices for GMMN-based Speech Synthesis
Sparse Approximation of Gram Matrices for GMMN-based Speech SynthesisTomoki Koriyama
 
Semi-supervised Prosody Modeling Using Deep Gaussian Process Latent Variable...
 Semi-supervised Prosody Modeling Using Deep Gaussian Process Latent Variable... Semi-supervised Prosody Modeling Using Deep Gaussian Process Latent Variable...
Semi-supervised Prosody Modeling Using Deep Gaussian Process Latent Variable...Tomoki Koriyama
 
ICASSP2019音声&音響論文読み会 論文紹介(合成系) #icassp2019jp
ICASSP2019音声&音響論文読み会 論文紹介(合成系) #icassp2019jpICASSP2019音声&音響論文読み会 論文紹介(合成系) #icassp2019jp
ICASSP2019音声&音響論文読み会 論文紹介(合成系) #icassp2019jpTomoki Koriyama
 
GMMNに基づく音声合成におけるグラム行列の
スパース近似の検討
GMMNに基づく音声合成におけるグラム行列の
スパース近似の検討GMMNに基づく音声合成におけるグラム行列の
スパース近似の検討
GMMNに基づく音声合成におけるグラム行列の
スパース近似の検討Tomoki Koriyama
 
深層ガウス過程とアクセントの潜在変数表現に基づく音声合成の検討
深層ガウス過程とアクセントの潜在変数表現に基づく音声合成の検討深層ガウス過程とアクセントの潜在変数表現に基づく音声合成の検討
深層ガウス過程とアクセントの潜在変数表現に基づく音声合成の検討Tomoki Koriyama
 
グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討
グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討
グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討Tomoki Koriyama
 
深層ガウス過程に基づく音声合成のための
事前学習の検討
深層ガウス過程に基づく音声合成のための
事前学習の検討深層ガウス過程に基づく音声合成のための
事前学習の検討
深層ガウス過程に基づく音声合成のための
事前学習の検討Tomoki Koriyama
 
GPR音声合成における深層ガウス過程の利用の検討
GPR音声合成における深層ガウス過程の利用の検討GPR音声合成における深層ガウス過程の利用の検討
GPR音声合成における深層ガウス過程の利用の検討Tomoki Koriyama
 
GP-DNNハイブリッドモデルに基づく統計的音声合成の検討
GP-DNNハイブリッドモデルに基づく統計的音声合成の検討GP-DNNハイブリッドモデルに基づく統計的音声合成の検討
GP-DNNハイブリッドモデルに基づく統計的音声合成の検討Tomoki Koriyama
 
GPR音声合成のためのフレームコンテキストカーネルに基づく決定木構築の検討
GPR音声合成のためのフレームコンテキストカーネルに基づく決定木構築の検討GPR音声合成のためのフレームコンテキストカーネルに基づく決定木構築の検討
GPR音声合成のためのフレームコンテキストカーネルに基づく決定木構築の検討Tomoki Koriyama
 
ICASSP2017読み会(Speech Synthesis)
ICASSP2017読み会(Speech Synthesis)ICASSP2017読み会(Speech Synthesis)
ICASSP2017読み会(Speech Synthesis)Tomoki Koriyama
 

Plus de Tomoki Koriyama (12)

深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討
深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討
深層ガウス過程に基づく音声合成におけるリカレント構造を用いた系列モデリングの検討
 
Sparse Approximation of Gram Matrices for GMMN-based Speech Synthesis
Sparse Approximation of Gram Matrices for GMMN-based Speech SynthesisSparse Approximation of Gram Matrices for GMMN-based Speech Synthesis
Sparse Approximation of Gram Matrices for GMMN-based Speech Synthesis
 
Semi-supervised Prosody Modeling Using Deep Gaussian Process Latent Variable...
 Semi-supervised Prosody Modeling Using Deep Gaussian Process Latent Variable... Semi-supervised Prosody Modeling Using Deep Gaussian Process Latent Variable...
Semi-supervised Prosody Modeling Using Deep Gaussian Process Latent Variable...
 
ICASSP2019音声&音響論文読み会 論文紹介(合成系) #icassp2019jp
ICASSP2019音声&音響論文読み会 論文紹介(合成系) #icassp2019jpICASSP2019音声&音響論文読み会 論文紹介(合成系) #icassp2019jp
ICASSP2019音声&音響論文読み会 論文紹介(合成系) #icassp2019jp
 
GMMNに基づく音声合成におけるグラム行列の
スパース近似の検討
GMMNに基づく音声合成におけるグラム行列の
スパース近似の検討GMMNに基づく音声合成におけるグラム行列の
スパース近似の検討
GMMNに基づく音声合成におけるグラム行列の
スパース近似の検討
 
深層ガウス過程とアクセントの潜在変数表現に基づく音声合成の検討
深層ガウス過程とアクセントの潜在変数表現に基づく音声合成の検討深層ガウス過程とアクセントの潜在変数表現に基づく音声合成の検討
深層ガウス過程とアクセントの潜在変数表現に基づく音声合成の検討
 
グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討
グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討
グラム行列のスパース近似を用いた生成的モーメントマッチングネットに基づく音声合成の検討
 
深層ガウス過程に基づく音声合成のための
事前学習の検討
深層ガウス過程に基づく音声合成のための
事前学習の検討深層ガウス過程に基づく音声合成のための
事前学習の検討
深層ガウス過程に基づく音声合成のための
事前学習の検討
 
GPR音声合成における深層ガウス過程の利用の検討
GPR音声合成における深層ガウス過程の利用の検討GPR音声合成における深層ガウス過程の利用の検討
GPR音声合成における深層ガウス過程の利用の検討
 
GP-DNNハイブリッドモデルに基づく統計的音声合成の検討
GP-DNNハイブリッドモデルに基づく統計的音声合成の検討GP-DNNハイブリッドモデルに基づく統計的音声合成の検討
GP-DNNハイブリッドモデルに基づく統計的音声合成の検討
 
GPR音声合成のためのフレームコンテキストカーネルに基づく決定木構築の検討
GPR音声合成のためのフレームコンテキストカーネルに基づく決定木構築の検討GPR音声合成のためのフレームコンテキストカーネルに基づく決定木構築の検討
GPR音声合成のためのフレームコンテキストカーネルに基づく決定木構築の検討
 
ICASSP2017読み会(Speech Synthesis)
ICASSP2017読み会(Speech Synthesis)ICASSP2017読み会(Speech Synthesis)
ICASSP2017読み会(Speech Synthesis)
 

Dernier

IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicinesherlingomez2
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
Unit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oUnit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oManavSingh202607
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Mohammad Khajehpour
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...ssuser79fe74
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICEayushi9330
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Servicenishacall1
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxBhagirath Gogikar
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Silpa
 

Dernier (20)

IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicine
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Unit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oUnit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 o
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptx
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 

UTTERANCE-LEVEL SEQUENTIAL MODELING FOR DEEP GAUSSIAN PROCESS BASED
 SPEECH SYNTHESIS USING
 SIMPLE RECURRENT UNIT

  • 1. UTTERANCE-LEVEL SEQUENTIAL MODELING FOR DEEP GAUSSIAN PROCESS BASED
 SPEECH SYNTHESIS USING
 SIMPLE RECURRENT UNIT Tomoki Koriyama, Hiroshi Saruwatari The University of Tokyo, Japan May 7, 2020 TH2.PB.10, SPE-P12, ICASSP 2020
  • 2. Background: deep learning on speech synthesis ‣ Neural network (NN)-based speech synthesis •Model the relationship between texts and speech parameters ‣ Differentiable components enable complicated models •RNN (LSTM, GRU), CNN, Self-attention, Attention ‣ RNN for speech synthesis •Can capture continuously changing speech parameters •Was used in the best framework in Blizzard Challenge 2019 [Jiang2019] •Is included in end-to-end frameworks (e.g. Tacotron [Wang2017])
  • 3. Background: deep Gaussian process ‣ Deep Gaussian process (DGP) [Damianou2013][Salimbeni2018] •Multi-layer Gaussian process regressions (GPRs) •Nonlinear regression by kernel methods •Bayesian learning considering model complexity •Differentiable by variational approximation ‣ DGP-based speech synthesis [Koriyama2019] Outperformed DNN-based method Restricted model to feedforward architecture p(y|x) x p(h1 |x) p(h2 |x) GPR GPR GPR Is it possible to apply recurrent architecture to DGP?
  • 4. Extension of DGPs ‣ Convolutional DGP [Kumar2018], TICK-GP [Dutordoir2019] •Incorporate CNN architecture into DGP ‣ Probabilistic recurrent state space model (PR-SSM) [Doerr2018] •Incorporate RNN architecture into DGP •Perform GPR in each time step •Require much time for utterance-level training and generation
  • 5. Purpose of study To incorporate recurrent architecture into DGP with fast computation to enable utterance-level sequential modeling ‣ Approach •Utilize simple recurrent unit (SRU) [Lei2018] •Separate parallel computation of GPR from recurrent architecture
  • 6. Simple recurrent unit (SRU) [Lei2018] SRU does not use the past hidden-layer value 
 to calculate gates or update memory cell hℓ t−1 cℓ t <latexit sha1_base64="Zn0U1cf4rLt4wCaqCE5UdU8jWIw=">AAAUI3iclZi/b9tGFMfPStu46o/Y1VKgC1HHQYoixikIkCJAgEQOjARJEP+S5SRMBJE+iYRJkSBp+QehTu3SoWvmFOhQ9M/oEqBb0Q4Z+gcUBbpk6NKh746UREl3fBcStI7vvc/3vbujjmdZoefGCaWvFyrn3nn3vfOL71c/+PCjjy8sLX+yFwdHkc2aduAF0b7ViZnn9lkzcROP7YcR6/iWx1rW4Tr3twYsit2gv5uchuyZ3+n13a5rdxIwtZcX/jFDx220k+cm8zzj0k3DjN2e3zFMj3WTy2ar0eYBmdd0eGDK21fqQ+NLw7SKbrgfTIUHB0FimHajnSYQL4xm5Pac5AvDMM0q9xTyFuuQoVxfFFU3rkxFjzRzaFy4XVK1PV9NVDoKUYlYVOx/hHeeZ3OmskXSno/7nXV5HDTfVaekOqeYu720QteoOIz5Rj1vrJD82AyWF78nJjkgAbHJEfEJI32SQNsjHRLD+ZTUCSUh2J6RFGwRtFzhZ2RIqsAeQRSDiA5YD+FvD+6e5tY+3HPNWNA2ZPHgioA0yCr9g/5E39BX9Gf6F/1PqZUKDV7LKXxaGcvC9oXvPt35F6V8+EyIM6FKa05Il3wlanWh9lBYeC/sjB+cvXizc2N7Nb1Ef6B/Q/0v6Wv6C30FmtlpQiwjx6Knvsjdh7FNwc5HricsJ6DFLaPKAshTvI+glc7ED8HvQU0+XAnUPtTKxfvwNrmyeP1c/LlxxejhOUaRQ3FWS+bNgvghMpqcikGRP0190sh7aYlMs94hMlo+1DGrwG0YtwMZshmapSceTOOBeEotuJtXKfownQ70NYSxnleZeDANC+yJRGFkx3g+13KFiQfT2FVq7Gpr7MPoz9LchnEnEu5Eg2uWPovNt3oWQ0kNoUYNj8Qq25OM28SDaWyKOmf5zIqx8jnTm687YPek/MSD1+5Ia3fGta/C+p15DkCjK9Y6lo9vkQ1zpnyWZNnCQrYy+vaYHK2QFtSTgh3rZUNBNjS+23LSQklbQdooua4g1zWeCDl5ByW7CrKLkhsKcgMlHQXpoORdBXkXJe8pyHsoeaggD1HyvoK8r/GWk5MPUPKhgnyo8VaXk77GWionH2msQnJyU2Pdl5MhSm4pyC2U3FaQ2ygZKchIY88kJ3c0dgZyclfj/Swnmyh5pCDxveFAQQ5Qck9B7qHksYI8RsmWgmxp7Jnk5InGLk1O7qPkqYI8RcnHCvIxSp4pyDOUfKIgn2jkZPCNChQ8RflAeOR0fbxPKX+bjmrgGim5CJZ2QSv7z457vwYf/m4eqfXhisXaVq56cVxltnvTrXNjPDq63HRFE77aXlqpz/56Mt/Yu7pWp2v1rWsrtxr5LyuL5DPyObkMI32d3II3+iasN3alWUkr31S+rf1Y+7X2W+33LLSykDM1MnXU/vwfEmpa0Q==</latexit> <latexit sha1_base64="wW4g1/uON3A4J2o4GJ588dFDFZ4=">AAAVK3iclZhLb9tGEMc3dh+u+rCd6pCiF6KOAwdBjVURoEWBAqkcGAmSIH7JUhKmgkivRMKkSJC0/CDUQ4/9Aj23t6Lop+jF96KHFL32EPSYQy89dHapB0XtcjYiZC1n5jfz3yG1XMsKPTdOKH1xZWHxjTffenvpncq7773/wfLK6tXDODiJbNawAy+IWlYnZp7bZ43ETTzWCiPW8S2PNa3jLe5vDlgUu0H/IDkP2XO/0+u7XdfuJGBqry58Z4aOW28n35jM84wbXxlm7Pb8jmF6rJtsmM16mwdkXtPhgSkff1obGrcMs1F0pwl4shAeYJhWPgKIwQwRHAWJYdpjjhvNyO05yU1AzYpZrqxMFqJpRpCGGi7GnhGTb5uM5LnnIibS7RLptlq60G7P9yko7VNQUiwo71OQ71MwP9v5LjkzUoJiB4S2jTx8s72yRjepeBnzg9posEZGr51gdemSmOSIBMQmJ8QnjPRJAmOPdEgMxzNSI5SEYHtOUrBFMHKFn5EhqQB7AlEMIjpgPYa/PTh7NrL24ZznjAVtQxUP3hGQBlmnf9Cf6St6SX+hL+l/ylypyMG1nMOnlbEsbC9/f23/X5Ty4TMhzpQq1ZyQLvlCaHVBeygsfBZ2xg8ufni1/+XeenqD/kT/Af0/0hf0N3oJObPDhFhGTsVMfVG7D71Nwc471xOWM8jFLWNlAdTJn0cwSgvxQ/B7oMmHdwLah1q1+Bxep1YWr1+L3zeu6B5eYxw5FEel5LpZED9EusmpGDLyu6lP6qNZWqJS0TtEuuWDjmIGbsO4faiQXaEiPfVgOR6Ku9SCs/kseR+WpwNzDaHX81mmHiyHBfZEkmFsx3h+reUZph4sx4Eyx4F2jhZ0v0hzG8adSbgzDa5Rei82XuteDCUaQg0Nj8Uq25P0berBcuwInUU+s2Ks/JrpXa+7YPek/NSDa3ek2p2J9nVYvzPPEeToirWOjfqbZ8MRU36VZNXCXLUy+usJOV4hLdCTgh2bZV1B1jW+23LSQklbQdoouaUgtzTuCDl5FyW7CrKLktsKchslHQXpoOQ9BXkPJV0FiX9T7yvI+yh5rCCPUfKBgnyg8XyUkw9R8pGCfKSxH5CTPkoGCjLQWL/l5GONlU9O7mg8a+RkiJK7CnIXJfcU5B5KRgoy0tinycl9jd2InDzQ2BPIyQZKnihIfD86UJADlDxUkIcoeaogT1GyqSCbGvs0OXmGki0F2ULJcwV5jpJPFOQTlLxQkBco+VRBPtWoyeAbpVrHqMYKyD1yujbZG5U/wccaeI6UXAdLO5cr+2+Se78FH74fGGfrwzsWa1t51usTldmOUVfn9qQ7utysoilfaa+s1Yq/2MwPDj/brNHN2u7ttTv10a85S+Rj8gnZgE5/Tu7ALmIH1ht74eXi8uK1xY+qv1Z/r/5Z/SsLXbgyYj4kM6/q3/8DE72k7w==</latexit> LSTM SRU
  • 7. Simple recurrent unit (SRU) [Lei2018] SRU can be decomposed into two blocks: parallel computation and light recurrent state gate state layer output layer input gate Light recurrent block Linear Parallel computation block
  • 8. Simple recurrent unit for DGP Replace linear transformation by GPR
 in parallel computation block state gate state layer output layer input gate Light recurrent block GPR Parallel computation block
  • 9. SRU-DGP-based speech synthesis Speech param Context GPR GPR SRU-layer w/ GPR GPR Context SRU-layer w/ GPR GPR Context GPR GPR Speech param Speech param # of SRU
 layers Time t
  • 10. Utterance-level sampling for training ‣ In training process of DGP, inference and sampling is repeatedly performed for each layer [Salimbeni19] ‣ Utterance-level predictive distribution is multivariate Gaussian distribution: •Hidden-layer values of adjacent frames are correlated ‣ Although the sampling can be performed by using Cholesky decomposition of , this often unstable ‣ Use random feature expansion [Rahimi2008, Cutajar2017] for stability of training Σ 𝒩(h; μ, Σ)
  • 11. Methods for experiments Architecture Models Kernel Bayes FeedForward LSTM SRU NN - - FF-NN LSTM-RNN SRU-NN BayesNN - ✓ FF-BayesNN - SRU- BayesNN DGP ✓ ✓ FF-DGP - SRU-DGP
 (proposed)
  • 12. Experimental conditions: database Database JSUT corpus [Sonobe2017] 1 female, BASIC0001~BASIC2000 Train / valid / test
 sentences 1788 (1.95 h) / 60 / 60 Input featrue 575 dim. linguistic feature vector Output feature 187 dim. acoustic feature vector (Mel-cepstrum, log F0, code aperiodicity, v/uv & Δ, Δ2)
  • 13. Experimental conditions: model configurations Hidden layer dim. 256 # of inducing points 1024 Kernel function ArcCos [Cho09] Optimizer Adam (learning rate: 0.01) DGP Hidden units 1024 Activation ReLU Optimizer Adam (learning rate: 10-5) BayesNN NN: Hyperparameters were tuned by Optuna [Akiba2019] with 100 trials.
  • 14. Objective evaluation: spectral feature distortion Bayesian and SRU models yield smaller distortions 1 2 3 4 5 6 7 8 Number of layers 5.5 5.6 5.7 5.8 5.9 6.0 6.1 MCD[dB] FF-NN (best) FF-BayesNN SRU-BayesNN SRU-DGP FF-DGP LSTM-RNN (best) SRU-NN (best)
  • 15. Subjective evaluation Proposed SRU-DGP gave higher score than other methods 1 2 3 4 5 Score Method MOS LSTM-RNN SRU-NN SRU-BayesNN FF-DGP SRU-DGP ORIG 2.99 2.98 3.09 3.01 3.19 3.97
  • 16. Computation time SRU-DGP can generate speech faster than LSTM-RNN 1 2 3 4 5 6 7 8 Number of layers SRU-DGP FF-DGP LSTM-RNN SRU-NN 0.00 0.02 0.04 0.06 0.08 0.10Real-timefactor
  • 17. Conclusions ‣ Incorporate simple recurrent unit (SRU) into DGP ‣ Achieve utterance-level sequential modeling ‣ The proposed SRU-DGP •Outperformed feedforward (FF)-DGP and LSTM-RNN •Achieved faster generation than LSTM-RNN ‣ Future work •Investigate other differentiable components in DGP - attention, self-attention