Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
UTTERANCE-LEVEL SEQUENTIAL MODELING FOR DEEP GAUSSIAN PROCESS BASED SPEECH SYNTHESIS USING SIMPLE RECURRENT UNIT
1. UTTERANCE-LEVEL SEQUENTIAL MODELING
FOR DEEP GAUSSIAN PROCESS BASED
SPEECH SYNTHESIS USING
SIMPLE RECURRENT UNIT
Tomoki Koriyama, Hiroshi Saruwatari
The University of Tokyo, Japan
May 7, 2020
TH2.PB.10, SPE-P12, ICASSP 2020
2. Background: deep learning on speech synthesis
‣ Neural network (NN)-based speech synthesis
•Model the relationship between texts and speech parameters
‣ Differentiable components enable complicated models
•RNN (LSTM, GRU), CNN, Self-attention, Attention
‣ RNN for speech synthesis
•Can capture continuously changing speech parameters
•Was used in the best framework in Blizzard Challenge 2019
[Jiang2019]
•Is included in end-to-end frameworks (e.g. Tacotron [Wang2017])
3. Background: deep Gaussian process
‣ Deep Gaussian process (DGP) [Damianou2013][Salimbeni2018]
•Multi-layer Gaussian process regressions (GPRs)
•Nonlinear regression by kernel methods
•Bayesian learning considering model complexity
•Differentiable by variational approximation
‣ DGP-based speech synthesis [Koriyama2019]
Outperformed DNN-based method
Restricted model to feedforward architecture
p(y|x)
x
p(h1
|x)
p(h2
|x)
GPR
GPR
GPR
Is it possible to apply recurrent architecture to DGP?
4. Extension of DGPs
‣ Convolutional DGP [Kumar2018], TICK-GP [Dutordoir2019]
•Incorporate CNN architecture into DGP
‣ Probabilistic recurrent state space model (PR-SSM)
[Doerr2018]
•Incorporate RNN architecture into DGP
•Perform GPR in each time step
•Require much time for utterance-level training and generation
5. Purpose of study
To incorporate recurrent architecture into DGP with fast
computation to enable utterance-level sequential modeling
‣ Approach
•Utilize simple recurrent unit (SRU) [Lei2018]
•Separate parallel computation of GPR from recurrent architecture
6. Simple recurrent unit (SRU) [Lei2018]
SRU does not use the past hidden-layer value
to calculate gates or update memory cell
hℓ
t−1
cℓ
t
<latexit sha1_base64="Zn0U1cf4rLt4wCaqCE5UdU8jWIw=">AAAUI3iclZi/b9tGFMfPStu46o/Y1VKgC1HHQYoixikIkCJAgEQOjARJEP+S5SRMBJE+iYRJkSBp+QehTu3SoWvmFOhQ9M/oEqBb0Q4Z+gcUBbpk6NKh746UREl3fBcStI7vvc/3vbujjmdZoefGCaWvFyrn3nn3vfOL71c/+PCjjy8sLX+yFwdHkc2aduAF0b7ViZnn9lkzcROP7YcR6/iWx1rW4Tr3twYsit2gv5uchuyZ3+n13a5rdxIwtZcX/jFDx220k+cm8zzj0k3DjN2e3zFMj3WTy2ar0eYBmdd0eGDK21fqQ+NLw7SKbrgfTIUHB0FimHajnSYQL4xm5Pac5AvDMM0q9xTyFuuQoVxfFFU3rkxFjzRzaFy4XVK1PV9NVDoKUYlYVOx/hHeeZ3OmskXSno/7nXV5HDTfVaekOqeYu720QteoOIz5Rj1vrJD82AyWF78nJjkgAbHJEfEJI32SQNsjHRLD+ZTUCSUh2J6RFGwRtFzhZ2RIqsAeQRSDiA5YD+FvD+6e5tY+3HPNWNA2ZPHgioA0yCr9g/5E39BX9Gf6F/1PqZUKDV7LKXxaGcvC9oXvPt35F6V8+EyIM6FKa05Il3wlanWh9lBYeC/sjB+cvXizc2N7Nb1Ef6B/Q/0v6Wv6C30FmtlpQiwjx6Knvsjdh7FNwc5HricsJ6DFLaPKAshTvI+glc7ED8HvQU0+XAnUPtTKxfvwNrmyeP1c/LlxxejhOUaRQ3FWS+bNgvghMpqcikGRP0190sh7aYlMs94hMlo+1DGrwG0YtwMZshmapSceTOOBeEotuJtXKfownQ70NYSxnleZeDANC+yJRGFkx3g+13KFiQfT2FVq7Gpr7MPoz9LchnEnEu5Eg2uWPovNt3oWQ0kNoUYNj8Qq25OM28SDaWyKOmf5zIqx8jnTm687YPek/MSD1+5Ia3fGta/C+p15DkCjK9Y6lo9vkQ1zpnyWZNnCQrYy+vaYHK2QFtSTgh3rZUNBNjS+23LSQklbQdooua4g1zWeCDl5ByW7CrKLkhsKcgMlHQXpoORdBXkXJe8pyHsoeaggD1HyvoK8r/GWk5MPUPKhgnyo8VaXk77GWionH2msQnJyU2Pdl5MhSm4pyC2U3FaQ2ygZKchIY88kJ3c0dgZyclfj/Swnmyh5pCDxveFAQQ5Qck9B7qHksYI8RsmWgmxp7Jnk5InGLk1O7qPkqYI8RcnHCvIxSp4pyDOUfKIgn2jkZPCNChQ8RflAeOR0fbxPKX+bjmrgGim5CJZ2QSv7z457vwYf/m4eqfXhisXaVq56cVxltnvTrXNjPDq63HRFE77aXlqpz/56Mt/Yu7pWp2v1rWsrtxr5LyuL5DPyObkMI32d3II3+iasN3alWUkr31S+rf1Y+7X2W+33LLSykDM1MnXU/vwfEmpa0Q==</latexit>
<latexit sha1_base64="wW4g1/uON3A4J2o4GJ588dFDFZ4=">AAAVK3iclZhLb9tGEMc3dh+u+rCd6pCiF6KOAwdBjVURoEWBAqkcGAmSIH7JUhKmgkivRMKkSJC0/CDUQ4/9Aj23t6Lop+jF96KHFL32EPSYQy89dHapB0XtcjYiZC1n5jfz3yG1XMsKPTdOKH1xZWHxjTffenvpncq7773/wfLK6tXDODiJbNawAy+IWlYnZp7bZ43ETTzWCiPW8S2PNa3jLe5vDlgUu0H/IDkP2XO/0+u7XdfuJGBqry58Z4aOW28n35jM84wbXxlm7Pb8jmF6rJtsmM16mwdkXtPhgSkff1obGrcMs1F0pwl4shAeYJhWPgKIwQwRHAWJYdpjjhvNyO05yU1AzYpZrqxMFqJpRpCGGi7GnhGTb5uM5LnnIibS7RLptlq60G7P9yko7VNQUiwo71OQ71MwP9v5LjkzUoJiB4S2jTx8s72yRjepeBnzg9posEZGr51gdemSmOSIBMQmJ8QnjPRJAmOPdEgMxzNSI5SEYHtOUrBFMHKFn5EhqQB7AlEMIjpgPYa/PTh7NrL24ZznjAVtQxUP3hGQBlmnf9Cf6St6SX+hL+l/ylypyMG1nMOnlbEsbC9/f23/X5Ty4TMhzpQq1ZyQLvlCaHVBeygsfBZ2xg8ufni1/+XeenqD/kT/Af0/0hf0N3oJObPDhFhGTsVMfVG7D71Nwc471xOWM8jFLWNlAdTJn0cwSgvxQ/B7oMmHdwLah1q1+Bxep1YWr1+L3zeu6B5eYxw5FEel5LpZED9EusmpGDLyu6lP6qNZWqJS0TtEuuWDjmIGbsO4faiQXaEiPfVgOR6Ku9SCs/kseR+WpwNzDaHX81mmHiyHBfZEkmFsx3h+reUZph4sx4Eyx4F2jhZ0v0hzG8adSbgzDa5Rei82XuteDCUaQg0Nj8Uq25P0berBcuwInUU+s2Ks/JrpXa+7YPek/NSDa3ek2p2J9nVYvzPPEeToirWOjfqbZ8MRU36VZNXCXLUy+usJOV4hLdCTgh2bZV1B1jW+23LSQklbQdoouaUgtzTuCDl5FyW7CrKLktsKchslHQXpoOQ9BXkPJV0FiX9T7yvI+yh5rCCPUfKBgnyg8XyUkw9R8pGCfKSxH5CTPkoGCjLQWL/l5GONlU9O7mg8a+RkiJK7CnIXJfcU5B5KRgoy0tinycl9jd2InDzQ2BPIyQZKnihIfD86UJADlDxUkIcoeaogT1GyqSCbGvs0OXmGki0F2ULJcwV5jpJPFOQTlLxQkBco+VRBPtWoyeAbpVrHqMYKyD1yujbZG5U/wccaeI6UXAdLO5cr+2+Se78FH74fGGfrwzsWa1t51usTldmOUVfn9qQ7utysoilfaa+s1Yq/2MwPDj/brNHN2u7ttTv10a85S+Rj8gnZgE5/Tu7ALmIH1ht74eXi8uK1xY+qv1Z/r/5Z/SsLXbgyYj4kM6/q3/8DE72k7w==</latexit>
LSTM
SRU
7. Simple recurrent unit (SRU) [Lei2018]
SRU can be decomposed into two blocks:
parallel computation and light recurrent
state
gate
state
layer output
layer input
gate
Light recurrent block
Linear
Parallel computation block
8. Simple recurrent unit for DGP
Replace linear transformation by GPR
in parallel computation block
state
gate
state
layer output
layer input
gate
Light recurrent block
GPR
Parallel computation block
10. Utterance-level sampling for training
‣ In training process of DGP, inference and sampling is
repeatedly performed for each layer [Salimbeni19]
‣ Utterance-level predictive distribution is multivariate
Gaussian distribution:
•Hidden-layer values of adjacent frames are correlated
‣ Although the sampling can be performed by using
Cholesky decomposition of , this often unstable
‣ Use random feature expansion [Rahimi2008, Cutajar2017] for
stability of training
Σ
𝒩(h; μ, Σ)
13. Experimental conditions: model configurations
Hidden layer dim. 256
# of inducing points 1024
Kernel function ArcCos [Cho09]
Optimizer Adam (learning rate: 0.01)
DGP
Hidden units 1024
Activation ReLU
Optimizer Adam (learning rate: 10-5)
BayesNN
NN: Hyperparameters were tuned by Optuna [Akiba2019] with 100 trials.